Persistence Overview

Redis stores data in memory for speed. RDB and AOF persist data to disk for recovery after restarts, crashes, or planned maintenance. Without persistence, a restart yields an empty database.

Method How It Works Trade-off
RDB Point-in-time snapshots Fast recovery, may lose recent writes
AOF Append every write to log More durable, larger files
Both RDB + AOF combined Best balance for production
None Pure in-memory Fastest, rebuild from primary DB

Choose based on acceptable data loss window (RPO — Recovery Point Objective).

RDB (Redis Database Backup)

RDB creates compact binary snapshots at configured intervals or on demand.

  # redis.conf
save 900 1       # save if ≥1 key changed in 900 seconds
save 300 10      # save if ≥10 keys changed in 300 seconds
save 60 10000    # save if ≥10000 keys changed in 60 seconds

dbfilename dump.rdb
dir /var/lib/redis
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
  

Manual RDB Operations

  SAVE          # blocking snapshot — avoid in production
BGSAVE        # fork child process, non-blocking
LASTSAVE      # timestamp of last successful save
INFO persistence
  

BGSAVE forks the process — on large datasets, fork latency can cause momentary pauses (see latest_fork_usec in INFO).

AOF (Append Only File)

AOF logs every write command. On restart, Redis replays the log to reconstruct state.

  appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec     # recommended default
# appendfsync always     # fsync every write — safest, slowest
# appendfsync no         # OS decides — fastest, riskiest

no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-use-rdb-preamble yes
  

fsync Policy Comparison

appendfsync Durability Performance
always Lose at most 1 write if crash during fsync Slowest writes
everysec Lose up to ~1 second of writes Recommended
no OS buffer may lose seconds of data Fastest, risky

Use everysec for most production workloads. Use always only when every write must survive immediate crash.

AOF Rewrite

AOF grows indefinitely as commands accumulate. Rewrite compacts the log to minimum commands needed for current state.

  BGREWRITEAOF    # manual rewrite
INFO persistence
# aof_rewrite_in_progress, aof_last_rewrite_time_sec
  

Automatic rewrite triggers when AOF size doubles since last rewrite (configurable):

  auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
  

aof-use-rdb-preamble yes embeds an RDB snapshot at the start of the AOF file for faster restarts.

  save 900 1
save 300 10
appendonly yes
appendfsync everysec
aof-use-rdb-preamble yes
  

RDB provides fast cold starts; AOF limits data loss between snapshots.

Recovery Procedures

Normal Restart

Redis automatically loads persistence files from dir on startup:

  1. If AOF exists → load AOF (with RDB preamble if present)
2. Else if RDB exists → load RDB
3. Else → empty database
  

Corrupted AOF

  redis-check-aof --fix appendonly.aof
# Review truncated content before restarting
redis-server /etc/redis/redis.conf
  

Corrupted RDB

  redis-check-rdb dump.rdb
# If unrecoverable — restore from backup
  

Point-in-Time Recovery

Maintain off-site RDB copies:

  # Cron job — copy after BGSAVE
cp /var/lib/redis/dump.rdb /backups/dump-$(date +%Y%m%d-%H%M).rdb
  

For finer granularity, ship AOF to object storage with replication.

Disable Persistence (Pure Cache)

Valid when Redis is a cache layer and all data can be rebuilt from the primary database:

  save ""
appendonly no
  

Document this decision — teams sometimes forget and expect durability.

Replication as Durability Layer

Even with local persistence, replicate to a replica for off-node copy:

  Primary (AOF) → Replica (AOF) → Backup from replica
  

Backups from replicas avoid impacting primary write latency.

Best Practices

  1. Use both RDB and AOF for production data stores
  2. Set appendfsync everysec unless strict durability required
  3. Test recovery quarterly — untested backups fail in incidents
  4. Monitor rdb_last_bgsave_status and aof_last_write_status
  5. Store backups off-node (S3, GCS) with retention policy
  6. Run backups from replicas, not primary

Common Mistakes

Mistake Impact
No persistence on session store Mass logout after restart
SAVE in production cron Blocks all clients
AOF on network filesystem (NFS) fsync failures, corruption
Never testing restore Discover corrupt backups during outage
stop-writes-on-bgsave-error no Silent data loss when disk full

Troubleshooting

BGSAVE failing:

  INFO persistence
# rdb_last_bgsave_status:err
tail /var/log/redis/redis-server.log
# Common: disk full, permission denied, fork failed (overcommit)
sysctl vm.overcommit_memory
  

AOF rewrite blocking writes:

  INFO persistence
# aof_rewrite_in_progress:1
# Normal during rewrite — monitor duration
CONFIG GET auto-aof-rewrite-min-size
  

Slow restart after crash:

  # Large AOF — ensure aof-use-rdb-preamble yes
# Or: redis-check-aof --fix, then BGREWRITEAOF on clean start
ls -lh /var/lib/redis/
  

Performance Tips

  • Schedule RDB snapshots during low-traffic windows if fork latency matters
  • Use SSD for AOF/RDB directory — disk I/O affects fsync latency
  • Set no-appendfsync-on-rewrite yes only if brief everysec lag during rewrite is acceptable
  • Size auto-aof-rewrite-min-size to avoid too-frequent rewrites

Production Scenario

A fintech app stored payment session tokens in Redis with AOF everysec and hourly RDB backups to S3. During an AZ outage, the primary failed; Sentinel promoted a replica with 0.8 seconds of unreplicated writes (within RPO). Recovery drill each quarter: team restores RDB to a fresh instance, verifies key count and sample data, documents recovery time (target: < 15 minutes). Pure cache instances run with persistence disabled — rebuilt automatically on cold start.

Test recovery regularly — a persistence file you cannot restore is worthless. Design RPO/RTO explicitly and match persistence config to those requirements.