Why Redis Cluster?

Single-instance Redis is limited by one machine’s RAM and CPU. Cluster provides:

  • Horizontal sharding across 16,384 hash slots
  • Automatic failover when masters fail
  • Linear scale-out — add nodes to increase capacity
  • No single point of failure (with proper replica count)

Use Cluster when data exceeds one node’s RAM or write throughput exceeds one CPU core’s capacity.

Architecture

           ┌─────────┐     ┌─────────┐     ┌─────────┐
         │ Master 1│     │ Master 2│     │ Master 3│
         │ slot    │     │ slot    │     │ slot    │
         │ 0-5460  │     │5461-10922│    │10923-   │
         └────┬────┘     └────┬────┘     │  16383  │
              │               │          └────┬────┘
         ┌────▼────┐     ┌────▼────┐     ┌────▼────┐
         │ Replica │     │ Replica │     │ Replica │
         └─────────┘     └─────────┘     └─────────┘
  
  • 16384 hash slots distributed across master nodes
  • Each master should have ≥1 replica for failover
  • Minimum production setup: 3 masters + 3 replicas (6 nodes)
  • Minimum cluster: 3 masters (no replicas — not production-ready)

Create a Cluster (Development)

  # Start 6 instances (ports 7000-7005)
for port in 7000 7001 7002 7003 7004 7005; do
  mkdir -p /tmp/redis-cluster/$port
  cat > /tmp/redis-cluster/$port/redis.conf <<EOF
port $port
cluster-enabled yes
cluster-config-file nodes-$port.conf
cluster-node-timeout 5000
appendonly yes
dir /tmp/redis-cluster/$port
daemonize yes
EOF
  redis-server /tmp/redis-cluster/$port/redis.conf
done

# Form cluster — 3 masters, 1 replica each
redis-cli --cluster create \
  127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
  127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
  --cluster-replicas 1
  

Cluster Commands

  redis-cli -c -p 7000    # -c enables cluster mode (follows MOVED/ASK redirects)
CLUSTER INFO
CLUSTER NODES
CLUSTER SLOTS
CLUSTER KEYSLOT mykey
  

Key Hashing and Hash Tags

Keys map to slots via CRC16:

  CLUSTER KEYSLOT user:1001:name
# (integer) 9189

SET user:1001:name "Alice"    # routed to appropriate node
  

Hash tags force related keys to the same slot — required for multi-key operations:

  SET user:{1001}:name "Alice"
SET user:{1001}:email "[email protected]"
SET user:{1001}:cart "[...]"
# All keys with {1001} hash to same slot
  

Only content inside {...} determines the hash tag slot.

Multi-Key Operations

These require all keys in the same slot:

  • MGET, MSET, DEL (multiple keys)
  • SUNION, SINTER, SDIFF
  • RENAME (source and destination)
  • Lua scripts touching multiple keys
  • Transactions (MULTI/EXEC)

Design key names with hash tags when atomic multi-key ops are needed.

Failover

Automatic failover when a master is unreachable:

  CLUSTER FAILOVER              # manual graceful failover
CLUSTER FAILOVER TAKEOVER     # force immediate failover (dangerous)

redis-cli --cluster check 127.0.0.1:7000
redis-cli --cluster info 127.0.0.1:7000
  

Quorum: majority of masters must agree a node is down before promotion.

Adding and Removing Nodes

  # Add new master
redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000

# Reshard slots to new node
redis-cli --cluster reshard 127.0.0.1:7000
# Follow prompts: source nodes, destination, slot count

# Remove node (drain slots first)
redis-cli --cluster reshard 127.0.0.1:7000
redis-cli --cluster del-node 127.0.0.1:7006 <node-id>
  

Resharding migrates slots online — plan during maintenance windows for large datasets.

Client Requirements

Clients must support cluster mode — handle MOVED and ASK redirects:

  from redis.cluster import RedisCluster

rc = RedisCluster(
    host="127.0.0.1",
    port=7000,
    decode_responses=True
)
rc.set("key", "value")
rc.get("key")
  
  const Redis = require('ioredis');
const cluster = new Redis.Cluster([
  { host: '127.0.0.1', port: 7000 },
  { host: '127.0.0.1', port: 7001 },
]);
  

Non-cluster clients fail with MOVED errors.

Limitations

Limitation Workaround
Multi-key ops need same slot Hash tags {id}
No cross-slot transactions Redesign keys or use Lua with same slot
Pub/Sub broadcasts to all nodes Sharded Pub/Sub (Redis 7+)
SELECT database not supported Use key prefixes instead
KEYS across cluster Scan each node individually

Cluster vs Sentinel

Feature Cluster Sentinel
Sharding Yes No
HA failover Yes Yes
Complexity Higher Lower
Min nodes (prod) 6 3 Sentinel + 1 master + replicas
Max RAM Cluster total Single node

Choose Cluster for scale + HA. Choose Sentinel for HA on a single large instance that fits in RAM.

Best Practices

  1. Run 3 masters + 3 replicas minimum in production
  2. Use hash tags deliberately for related multi-key data
  3. Deploy nodes across availability zones
  4. Monitor slot coverage — all 16384 slots must be assigned
  5. Test failover quarterly
  6. Use cluster-aware clients — verify library cluster support

Common Mistakes

Mistake Impact
3 nodes without replicas No failover — node death loses slot range
Multi-key ops without hash tags CROSSSLOT errors
Non-cluster client library MOVED errors, app failures
Uneven slot distribution Hot nodes, imbalanced load
Resharding during peak traffic Latency spikes

Troubleshooting

CLUSTERDOWN errors:

  CLUSTER INFO
# cluster_state:fail — not all slots covered
CLUSTER NODES
# Find failed nodes, restore or failover
  

CROSSSLOT errors:

  # Keys in different slots — add hash tags or split operations
CLUSTER KEYSLOT key1
CLUSTER KEYSLOT key2
  

Uneven memory across nodes:

  redis-cli --cluster check 127.0.0.1:7000
# Reshard slots from overloaded nodes
  

Performance Tips

  • Batch operations to same slot via hash tags and pipelining
  • Keep values similar size across slots for even memory use
  • Set cluster-node-timeout appropriately (default 15s) — lower = faster failover, more false positives
  • Use replicas for read scaling with READONLY + replica reads (client support required)

Production Scenario

A social media feed service sharded 180 GB of timeline data across 6 Redis Cluster nodes (3 masters, 3 replicas) in three AZs. Hash tag {user_id} colocated user profile, feed, and follower count keys. Resharding added two masters during 6-month growth without downtime. Failover test: killing a master promoted replica in 6 seconds; client retry logic handled brief errors. Monitoring tracked per-node memory, slot distribution, and cluster_stats_messages_sent for imbalance.

Redis Cluster scales memory and throughput horizontally — design key patterns with hash tags from day one to avoid painful migrations later.