Redis Sentinel High Availability
What Sentinel Provides
Redis Sentinel monitors master-replica topologies and performs automatic failover when the master is unreachable. Unlike Cluster, Sentinel does not shard data — it provides HA for a single master with replicas.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Sentinel 1 │ │ Sentinel 2 │ │ Sentinel 3 │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ monitor │ │
└──────────┬────────┴───────────────────┘
▼
┌─────────────┐ ┌─────────────┐
│ Master │ ──────► │ Replica │
│ (writes) │ repl │ (reads) │
└─────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Replica │
└─────────────┘
Use Sentinel when:
- Data fits on one node
- You need automatic failover
- Sharding (Cluster) is not yet required
Sentinel Responsibilities
| Function | Description |
|---|---|
| Monitoring | Check master/replica health via PING |
| Notification | Alert admins or scripts on state changes |
| Automatic failover | Promote best replica to master |
| Configuration provider | Clients ask Sentinel for current master address |
Minimum Production Topology
| Component | Count | Notes |
|---|---|---|
| Sentinel | 3+ (odd) | Quorum requires majority agreement |
| Master | 1 | Accepts writes |
| Replicas | ≥2 | Failover candidates, read scaling |
Deploy Sentinels on different machines/AZs from the Redis nodes they monitor.
Sentinel Configuration
/etc/redis/sentinel.conf:
port 26379
sentinel monitor mymaster 10.0.1.10 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
sentinel auth-pass mymaster YourRedisPassword
# Notification script (optional)
sentinel notification-script mymaster /var/redis/notify.sh
sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
Key parameters:
| Parameter | Meaning |
|---|---|
monitor mymaster <ip> <port> <quorum> |
Master to watch; quorum = min Sentinels to agree on failure |
down-after-milliseconds |
Subjective downtime before marking SDOWN |
failover-timeout |
Max time for failover steps |
parallel-syncs |
Replicas syncing from new master simultaneously |
Quorum = 2 with 3 Sentinels means 2 must agree the master is down.
Start Sentinel
redis-sentinel /etc/redis/sentinel.conf
# Or: redis-server /etc/redis/sentinel.conf --sentinel
# Verify
redis-cli -p 26379 SENTINEL master mymaster
redis-cli -p 26379 SENTINEL replicas mymaster
redis-cli -p 26379 SENTINEL sentinels mymaster
Master-Replica Redis Setup
Master redis.conf:
bind 0.0.0.0
requirepass YourRedisPassword
masterauth YourRedisPassword
appendonly yes
Replica redis.conf:
replicaof 10.0.1.10 6379
requirepass YourRedisPassword
masterauth YourRedisPassword
replica-read-only yes
appendonly yes
Verify replication:
redis-cli -a YourRedisPassword INFO replication
# role:master / role:slave
# connected_slaves:2
Failover Process
- Master fails PING checks → marked subjectively down (SDOWN) by one Sentinel
- Majority of Sentinels agree → objectively down (ODOWN)
- Sentinel leader elected among Sentinels
- Best replica selected (priority, offset, run ID)
- Replica promoted to master (
REPLICAOF NO ONE) - Other replicas reconfigured to new master
- Old master reconfigured as replica when it returns
# Manual failover (graceful maintenance)
redis-cli -p 26379 SENTINEL failover mymaster
Client Integration
Applications must use Sentinel-aware clients — not hardcoded master IP.
from redis.sentinel import Sentinel
sentinel = Sentinel([
('10.0.1.21', 26379),
('10.0.1.22', 26379),
('10.0.1.23', 26379),
], socket_timeout=0.5)
master = sentinel.master_for('mymaster', password='YourRedisPassword')
replica = sentinel.slave_for('mymaster', password='YourRedisPassword')
master.set('key', 'value')
replica.get('key')
const Redis = require('ioredis');
const sentinel = new Redis({
sentinels: [
{ host: '10.0.1.21', port: 26379 },
{ host: '10.0.1.22', port: 26379 },
],
name: 'mymaster',
password: 'YourRedisPassword',
});
Clients subscribe to Sentinel pub/sub channel +switch-master for topology updates.
Read Scaling with Replicas
Route read-only queries to replicas:
user = replica.get(f"user:{user_id}") # eventual consistency
master.set(f"user:{user_id}", data) # writes always to master
Account for replication lag — do not read-your-writes on replica immediately after write.
Split-Brain Prevention
Never allow two writable masters accepting the same data:
- Sentinel ensures only one master at a time
- Use
min-replicas-to-writeon master:
min-replicas-to-write 1
min-replicas-max-lag 10
Master stops accepting writes if fewer than 1 replica is connected with lag < 10 seconds.
Best Practices
- Run 3+ Sentinels on independent failure domains
- Use odd Sentinel count for clear quorum
- Set
down-after-millisecondsbased on network tolerance (5–30s) - Test failover quarterly — measure client recovery time
- Monitor Sentinel logs and
+switch-masterevents - Use replica-priority to prefer specific nodes for promotion
# On preferred failover replica
replica-priority 10
# On weaker replica (avoid promotion)
replica-priority 100
Common Mistakes
| Mistake | Impact |
|---|---|
| 2 Sentinels only | No quorum during one Sentinel failure |
| Hardcoded master IP in app | Failover breaks application |
| All Sentinels on same host | Single point of failure |
No min-replicas-to-write |
Writes accepted with zero replicas — data loss on crash |
| Ignoring replication lag on replica reads | Stale data bugs |
Troubleshooting
Failover not triggering:
redis-cli -p 26379 SENTINEL master mymaster
# flags: master_down? quorum met?
SENTINEL ckquorum mymaster
SENTINEL is-master-down-by-addr 10.0.1.10 6379
Split brain after network partition:
# Verify only one master
redis-cli -h <each-node> INFO replication
# Manually: SENTINEL reset mymaster (careful — last resort)
Client connection errors during failover:
# Expected briefly (5–30s) — ensure client retry/backoff configured
# Check sentinel client-reconfig-script logs
Performance Tips
- Keep Sentinel overhead minimal — Sentinels are lightweight
- Use
parallel-syncs 1during failover to avoid replica sync overwhelming network - Place replicas in same region as master for low replication lag
- Monitor
master_repl_offsetvsslave_repl_offsetgap
Production Scenario
A payment gateway ran Redis master + 2 replicas with 3 Sentinels across 3 AZs. Quarterly failover drill: SENTINEL failover mymaster completed in 12 seconds. Application Sentinel client reconnected automatically. min-replicas-to-write 1 prevented writes during a replica outage. Read-heavy fraud checks routed to replicas with 50ms lag tolerance. PagerDuty alert on +switch-master pub/sub hook triggered runbook verification within 2 minutes.
Sentinel delivers production HA without Cluster complexity — deploy 3 Sentinels, configure clients correctly, and test failover before you need it at 3 AM.