Caching Fundamentals

Caching stores frequently accessed data in fast memory to reduce load on slower backing stores. Redis is the most common application-level cache because of its speed, TTL support, and rich data structures.

Key metrics to track:

Metric Target
Hit ratio > 90% for read-heavy workloads
p99 cache latency < 2 ms
Eviction rate Stable, not spiking
Memory usage < 80% of maxmemory
  INFO stats
# keyspace_hits, keyspace_misses
# hit_ratio = hits / (hits + misses)
  

Cache-Aside (Lazy Loading)

The application checks Redis first; on miss, reads from the database and populates the cache.

  import json
import redis

r = redis.Redis(decode_responses=True)

def get_user(user_id):
    key = f"user:{user_id}"
    cached = r.get(key)
    if cached:
        return json.loads(cached)

    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    if user:
        r.setex(key, 3600, json.dumps(user))
    return user
  

Pros: Simple, cache only what’s requested, survives cache failures (degraded to DB). Cons: Cache miss penalty, possible stale data until TTL expires.

Read-Through

The cache layer itself loads data on miss (often via a library or sidecar). Application always talks to cache.

  App → Cache → (miss) → Database → populate cache → return
  

Less common in custom apps; more typical in dedicated cache proxies.

Write-Through

Updates go to cache and database together — cache always reflects DB state.

  def update_user(user_id, data):
    db.update("users", user_id, data)
    r.setex(f"user:{user_id}", 3600, json.dumps(data))
  

Pros: Cache consistency. Cons: Write latency includes cache update; unused keys still cached.

Write-Behind (Write-Back)

Write to cache immediately; asynchronously flush to database.

  def update_user_async(user_id, data):
    r.setex(f"user:{user_id}", 3600, json.dumps(data))
    queue.enqueue("persist_user", user_id, data)
  

Pros: Fast writes. Cons: Data loss risk if cache fails before DB persist — use only when acceptable.

Cache Invalidation

“There are only two hard things in Computer Science: cache invalidation and naming things.”

  def delete_user(user_id):
    db.delete("users", user_id)
    r.delete(f"user:{user_id}")
    r.delete("users:list")           # invalidate list cache
    r.delete(f"users:count")         # invalidate aggregate cache
  

Invalidation Strategies

Strategy When to Use
TTL only Stale data acceptable (product catalog)
Delete on write Strong consistency needed (user profile)
Versioned keys Avoid thundering herd on bulk invalidation
Pub/Sub broadcast Multi-instance cache invalidation
  # Versioned keys — invalidate without deleting
CACHE_VERSION = r.get("users:version") or "1"
key = f"user:{CACHE_VERSION}:{user_id}"

def invalidate_all_users():
    r.incr("users:version")   # old keys expire via TTL
  

TTL Strategies

  # Short TTL for frequently changing data
SETEX stock:sku:42 60 "150"

# Long TTL for static reference data
SETEX config:feature_flags 86400 "{...}"

# Jitter — avoid synchronized expiry
import random
ttl = 3600 + random.randint(0, 300)
r.setex(key, ttl, value)
  
Data Type Typical TTL
Session 30 min – 24 hr
API response 1 – 5 min
User profile 15 – 60 min
Static config Hours to days
Stock/price 10 – 60 sec

Cache Stampede Prevention

When a hot key expires, thousands of requests may hit the database simultaneously.

Lock-Based Recomputation

  import time

def get_popular_article(article_id):
    key = f"article:{article_id}"
    cached = r.get(key)
    if cached:
        return json.loads(cached)

    lock_key = f"{key}:lock"
    if r.set(lock_key, "1", nx=True, ex=10):
        try:
            data = fetch_from_db(article_id)
            r.setex(key, 300, json.dumps(data))
            return data
        finally:
            r.delete(lock_key)
    else:
        time.sleep(0.05)
        return get_popular_article(article_id)
  

Probabilistic Early Expiration

Refresh cache slightly before TTL expires under load:

  import random

def get_with_early_refresh(key, fetch_fn, base_ttl=300):
    cached = r.get(key)
    if cached:
        ttl = r.ttl(key)
        if ttl < 60 and random.random() < 0.1:
            fresh = fetch_fn()
            r.setex(key, base_ttl, json.dumps(fresh))
            return fresh
        return json.loads(cached)

    data = fetch_fn()
    r.setex(key, base_ttl, json.dumps(data))
    return data
  

Negative Caching

Cache “not found” results to protect the database from repeated lookups for missing keys:

  def get_user(user_id):
    key = f"user:{user_id}"
    cached = r.get(key)
    if cached == "__NOT_FOUND__":
        return None
    if cached:
        return json.loads(cached)

    user = db.query(...)
    if user:
        r.setex(key, 3600, json.dumps(user))
    else:
        r.setex(key, 60, "__NOT_FOUND__")   # short TTL for misses
    return user
  

Key Design

Use namespaced, predictable keys:

  {service}:{entity}:{id}:{attribute}
app:users:1001:profile
app:products:42:details
app:cache:homepage:v3
  

Best Practices

  1. Design cache keys with namespaces for safe bulk invalidation
  2. Add TTL jitter to prevent synchronized expiry
  3. Monitor hit ratio — below 80% suggests wrong keys or TTLs
  4. Cache aggregates carefully — invalidation complexity grows fast
  5. Document which pattern (cache-aside, write-through) each entity uses

Common Mistakes

Mistake Impact
No TTL on any keys Memory exhaustion, stale data forever
Caching everything Low hit ratio wastes memory
Same TTL for all keys Stampede on synchronized expiry
Ignoring cache on write paths Stale reads after updates
Caching errors/exceptions Propagates failures to all users

Troubleshooting

Hit ratio suddenly drops:

  INFO stats
# Check deployment (cache flush?), TTL changes, or traffic pattern shift
MONITOR   # dev only — watch key patterns
  

Database load unchanged after adding cache:

  # Verify cache is actually hit — log misses in application
# Check if keys are unique per request (cache bypass)
  

Memory growing despite TTL:

  INFO keyspace
# Keys without expiry? SCAN for TTL=-1 keys
redis-cli --scan --pattern 'user:*' | head | xargs -I{} redis-cli TTL {}
  

Performance Tips

  • Use pipelining for bulk cache warming after deploy
  • Prefer hashes for object caches with field-level invalidation needs
  • Set maxmemory-policy allkeys-lru for pure cache workloads
  • Warm critical caches before traffic spikes (flash sales, launches)

Production Scenario

An e-commerce site cached product detail pages with cache-aside and 5-minute TTL plus 10% jitter. A flash sale on one SKU caused a stampede when TTL expired — database p99 spiked to 2 seconds. Adding lock-based recomputation and probabilistic early refresh reduced p99 to 45ms. Negative caching for discontinued products eliminated 30K daily DB queries for invalid SKUs.

Effective caching is measured, not assumed — track hit ratio, design invalidation explicitly, and plan for stampedes before they happen in production.