Persistence (RDB and AOF)
Persistence Overview
Redis stores data in memory for speed. RDB and AOF persist data to disk for recovery after restarts, crashes, or planned maintenance. Without persistence, a restart yields an empty database.
| Method | How It Works | Trade-off |
|---|---|---|
| RDB | Point-in-time snapshots | Fast recovery, may lose recent writes |
| AOF | Append every write to log | More durable, larger files |
| Both | RDB + AOF combined | Best balance for production |
| None | Pure in-memory | Fastest, rebuild from primary DB |
Choose based on acceptable data loss window (RPO — Recovery Point Objective).
RDB (Redis Database Backup)
RDB creates compact binary snapshots at configured intervals or on demand.
# redis.conf
save 900 1 # save if ≥1 key changed in 900 seconds
save 300 10 # save if ≥10 keys changed in 300 seconds
save 60 10000 # save if ≥10000 keys changed in 60 seconds
dbfilename dump.rdb
dir /var/lib/redis
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
Manual RDB Operations
SAVE # blocking snapshot — avoid in production
BGSAVE # fork child process, non-blocking
LASTSAVE # timestamp of last successful save
INFO persistence
BGSAVE forks the process — on large datasets, fork latency can cause momentary pauses (see latest_fork_usec in INFO).
AOF (Append Only File)
AOF logs every write command. On restart, Redis replays the log to reconstruct state.
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec # recommended default
# appendfsync always # fsync every write — safest, slowest
# appendfsync no # OS decides — fastest, riskiest
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-use-rdb-preamble yes
fsync Policy Comparison
appendfsync |
Durability | Performance |
|---|---|---|
always |
Lose at most 1 write if crash during fsync | Slowest writes |
everysec |
Lose up to ~1 second of writes | Recommended |
no |
OS buffer may lose seconds of data | Fastest, risky |
Use everysec for most production workloads. Use always only when every write must survive immediate crash.
AOF Rewrite
AOF grows indefinitely as commands accumulate. Rewrite compacts the log to minimum commands needed for current state.
BGREWRITEAOF # manual rewrite
INFO persistence
# aof_rewrite_in_progress, aof_last_rewrite_time_sec
Automatic rewrite triggers when AOF size doubles since last rewrite (configurable):
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-use-rdb-preamble yes embeds an RDB snapshot at the start of the AOF file for faster restarts.
Hybrid Persistence (Recommended)
save 900 1
save 300 10
appendonly yes
appendfsync everysec
aof-use-rdb-preamble yes
RDB provides fast cold starts; AOF limits data loss between snapshots.
Recovery Procedures
Normal Restart
Redis automatically loads persistence files from dir on startup:
1. If AOF exists → load AOF (with RDB preamble if present)
2. Else if RDB exists → load RDB
3. Else → empty database
Corrupted AOF
redis-check-aof --fix appendonly.aof
# Review truncated content before restarting
redis-server /etc/redis/redis.conf
Corrupted RDB
redis-check-rdb dump.rdb
# If unrecoverable — restore from backup
Point-in-Time Recovery
Maintain off-site RDB copies:
# Cron job — copy after BGSAVE
cp /var/lib/redis/dump.rdb /backups/dump-$(date +%Y%m%d-%H%M).rdb
For finer granularity, ship AOF to object storage with replication.
Disable Persistence (Pure Cache)
Valid when Redis is a cache layer and all data can be rebuilt from the primary database:
save ""
appendonly no
Document this decision — teams sometimes forget and expect durability.
Replication as Durability Layer
Even with local persistence, replicate to a replica for off-node copy:
Primary (AOF) → Replica (AOF) → Backup from replica
Backups from replicas avoid impacting primary write latency.
Best Practices
- Use both RDB and AOF for production data stores
- Set
appendfsync everysecunless strict durability required - Test recovery quarterly — untested backups fail in incidents
- Monitor
rdb_last_bgsave_statusandaof_last_write_status - Store backups off-node (S3, GCS) with retention policy
- Run backups from replicas, not primary
Common Mistakes
| Mistake | Impact |
|---|---|
| No persistence on session store | Mass logout after restart |
SAVE in production cron |
Blocks all clients |
| AOF on network filesystem (NFS) | fsync failures, corruption |
| Never testing restore | Discover corrupt backups during outage |
stop-writes-on-bgsave-error no |
Silent data loss when disk full |
Troubleshooting
BGSAVE failing:
INFO persistence
# rdb_last_bgsave_status:err
tail /var/log/redis/redis-server.log
# Common: disk full, permission denied, fork failed (overcommit)
sysctl vm.overcommit_memory
AOF rewrite blocking writes:
INFO persistence
# aof_rewrite_in_progress:1
# Normal during rewrite — monitor duration
CONFIG GET auto-aof-rewrite-min-size
Slow restart after crash:
# Large AOF — ensure aof-use-rdb-preamble yes
# Or: redis-check-aof --fix, then BGREWRITEAOF on clean start
ls -lh /var/lib/redis/
Performance Tips
- Schedule RDB snapshots during low-traffic windows if fork latency matters
- Use SSD for AOF/RDB directory — disk I/O affects fsync latency
- Set
no-appendfsync-on-rewrite yesonly if brief everysec lag during rewrite is acceptable - Size
auto-aof-rewrite-min-sizeto avoid too-frequent rewrites
Production Scenario
A fintech app stored payment session tokens in Redis with AOF everysec and hourly RDB backups to S3. During an AZ outage, the primary failed; Sentinel promoted a replica with 0.8 seconds of unreplicated writes (within RPO). Recovery drill each quarter: team restores RDB to a fresh instance, verifies key count and sample data, documents recovery time (target: < 15 minutes). Pure cache instances run with persistence disabled — rebuilt automatically on cold start.
Test recovery regularly — a persistence file you cannot restore is worthless. Design RPO/RTO explicitly and match persistence config to those requirements.