to navigate

to select

to close

On this page

Performance and Monitoring

Performance Tuning Methodology

Measure — baseline latency, ops/sec, memory, hit ratio
Identify bottleneck — CPU, memory, network, slow commands, connection count
Change one thing — isolate impact
Verify — compare before/after under realistic load
Monitor continuously — performance regressions appear after deploys

Never tune randomly — every change should trace to measured evidence.

Memory Management

  INFO memory
MEMORY USAGE user:1001
MEMORY STATS
MEMORY DOCTOR

Key memory fields:

Field	Meaning
`used_memory`	Total bytes allocated by Redis
`used_memory_rss`	OS-reported physical memory
`used_memory_peak`	High water mark
`maxmemory`	Configured limit
`mem_fragmentation_ratio`	RSS / used_memory — > 1.5 may indicate fragmentation

  maxmemory 4gb
maxmemory-policy allkeys-lru
maxmemory-samples 10

Eviction Policies

Policy	Behavior
`noeviction`	Return errors when full — use for sessions/queues
`allkeys-lru`	Evict any key — LRU approximation
`volatile-lru`	Evict keys with TTL only
`allkeys-lfu`	Evict least frequently used (Redis 4+)
`volatile-lfu`	LFU among keys with TTL
`allkeys-random`	Random eviction
`volatile-ttl`	Evict keys with shortest TTL

Cache workloads: allkeys-lru or allkeys-lfu Mixed cache + sessions: volatile-lru with TTL on cache keys only, sessions use noeviction on dedicated instance

Latency Monitoring

  CONFIG SET latency-monitor-threshold 10
LATENCY LATEST
LATENCY HISTORY command
LATENCY DOCTOR
LATENCY GRAPH command

Built-in latency doctor summarizes issues:

  LATENCY DOCTOR
# Analyzes spikes, suggests causes (fork, AOF, slow commands)

Slowlog

Commands exceeding slowlog-log-slower-than (default 10,000 microseconds = 10ms):

  CONFIG GET slowlog-log-slower-than
SLOWLOG GET 20
SLOWLOG LEN
SLOWLOG RESET

Common slow command culprits: KEYS *, SMEMBERS on huge sets, LRANGE on long lists, SORT, large HGETALL.

Avoid Expensive Commands

  # Bad on large datasets
KEYS *
SMEMBERS huge_set
HGETALL massive_hash
FLUSHALL

# Good alternatives
SCAN 0 MATCH user:* COUNT 100
SSCAN huge_set 0 COUNT 100
HSCAN massive_hash 0 COUNT 100

KEYS is O(N) and blocks the single event loop — never use in production.

Pipelining and Batching

  pipe = redis.pipeline(transaction=False)
for i in range(10000):
    pipe.set(f"key:{i}", f"value:{i}")
pipe.execute()
# One round trip vs 10,000

Pipelining can improve throughput 10–100× for bulk operations.

Connection Pooling

  import redis

pool = redis.ConnectionPool(
    max_connections=50,
    host="localhost",
    port=6379,
    decode_responses=True
)
r = redis.Redis(connection_pool=pool)

One TCP connection per command wastes resources. Size pools to expected concurrent requests per process.

  INFO clients
# connected_clients, blocked_clients, rejected_connections
CONFIG GET maxclients

Key Metrics to Watch

  INFO stats
INFO replication
INFO cpu
INFO commandstats

Metric	Healthy Signal	Warning
`instantaneous_ops_per_sec`	Stable under load	Sudden drop = issue
`keyspace_hits` / `keyspace_misses`	Hit ratio > 90%	Low hit ratio = wrong cache design
`rejected_connections`	0	Pool or maxclients exhausted
`used_memory` vs `maxmemory`	< 80%	Evictions or OOM imminent
`latest_fork_usec`	< 10ms	Large RDB fork causing latency

Hit Ratio Calculation

  INFO stats | grep keyspace
# hit_ratio = hits / (hits + misses)

Command Statistics

  INFO commandstats
# usec_per_call, calls per command
CONFIG RESETSTAT

Identify hot commands consuming disproportionate CPU time.

Monitoring Stack

Tool	Purpose
redis_exporter	Prometheus metrics
RedisInsight	GUI exploration, profiler
Grafana	Dashboards for ops/sec, memory, latency
Datadog / New Relic	APM integration
redis-cli INFO	Quick manual checks

Example Prometheus alerts:

  # Memory > 85% maxmemory
# hit_ratio < 80% for 15 minutes
# rejected_connections > 0
# replication lag > 10s

Best Practices

Set maxmemory and eviction policy before production traffic
Use SCAN family, never KEYS
Pipeline bulk operations
Pool connections in every application process
Monitor slowlog weekly
Separate instances for cache vs sessions vs queues

Common Mistakes

Mistake	Impact
No maxmemory limit	Host OOM kill
KEYS in production script	Latency outage
One connection per request	Connection exhaustion
Ignoring mem_fragmentation_ratio	Wasted RAM, need restart
Tuning without baseline metrics	Cannot verify improvement

Troubleshooting

Latency spikes every N minutes:

  LATENCY DOCTOR
INFO persistence
# RDB BGSAVE or AOF rewrite fork — schedule off-peak

Ops/sec ceiling:

  INFO cpu
redis-benchmark -q -n 100000 -c 50
# Single-threaded — scale via sharding or more instances

High rejected_connections:

  CONFIG GET maxclients
INFO clients
# Increase maxclients AND fix connection pooling in apps

Performance Tips

Disable THP on Linux hosts running Redis
Use UNLINK instead of DEL for large keys (async reclaim, Redis 4+)
Prefer many small values over few huge values for even latency
Use CLIENT KILL to drop idle connections during incidents
Run MEMORY PURGE (Redis 4+) if fragmentation ratio > 1.5

Production Scenario

A ad-tech platform serving 500K ops/sec monitored Redis via Prometheus + Grafana. Alerts fired when p99 latency exceeded 5ms (normal: 1.2ms). Slowlog revealed a deployment introduced HGETALL on 50KB session hashes. Fix: switched to HMGET for required fields — p99 dropped to 1.4ms. Memory alert at 85% triggered proactive node addition to Cluster before evictions impacted hit ratio.

Profile before optimizing — measure latency, memory, and command distribution, then fix the highest-impact issues first.

Redis Cluster

Redis Sentinel High Availability

Performance and Monitoring

Performance Tuning Methodology link

Memory Management link

Eviction Policies link

Latency Monitoring link

Slowlog link

Avoid Expensive Commands link

Pipelining and Batching link

Connection Pooling link

Key Metrics to Watch link

Hit Ratio Calculation link

Command Statistics link

Monitoring Stack link

Best Practices link

Common Mistakes link

Troubleshooting link

Performance Tips link

Production Scenario link