Backup and Recovery
Backup Strategy Framework
Every production MySQL deployment needs:
- Regular full backups — daily or weekly baseline
- Continuous binlog archiving — point-in-time recovery (PITR)
- Offsite/copy storage — separate region or object storage
- Restore testing — automated monthly verification
- Documented RTO/RPO — recovery time and data loss targets
RPO (Recovery Point Objective): max acceptable data loss — binlog frequency determines this.
RTO (Recovery Time Objective): max acceptable downtime — backup method and size determine this.
Logical Backup — mysqldump
# Single database
mysqldump -u backup_user -p \
--single-transaction \
--routines \
--triggers \
--set-gtid-purged=OFF \
myapp > myapp_$(date +%Y%m%d).sql
# All databases
mysqldump -u backup_user -p --all-databases | gzip > full_$(date +%Y%m%d).sql.gz
# Specific tables
mysqldump -u backup_user -p myapp users orders > partial.sql
| Flag | Purpose |
|---|---|
--single-transaction |
Consistent InnoDB snapshot without locking |
--routines |
Include stored procedures/functions |
--triggers |
Include triggers |
--no-tablespaces |
Avoid PROCESS privilege requirement |
--set-gtid-purged=OFF |
Avoid GTID issues when restoring to different server |
mysqldump Limitations
- Slow on large databases (hours for 100GB+)
- Restores are single-threaded
- Table locks on MyISAM without
--single-transaction - Not suitable for sub-minute RPO alone — combine with binlogs
Physical Backup — Percona XtraBackup
# Full backup
xtrabackup --backup --target-dir=/backup/full_$(date +%Y%m%d) \
--user=backup_user --password='...'
# Prepare (apply redo logs)
xtrabackup --prepare --target-dir=/backup/full_20240601
# Restore
systemctl stop mysql
rm -rf /var/lib/mysql/*
xtrabackup --copy-back --target-dir=/backup/full_20240601
chown -R mysql:mysql /var/lib/mysql
systemctl start mysql
XtraBackup supports hot backups — minimal impact on running production.
Incremental Backups
# Incremental based on last full
xtrabackup --backup --target-dir=/backup/inc1 \
--incremental-basedir=/backup/full_20240601
# Prepare incremental chain
xtrabackup --prepare --apply-log-only --target-dir=/backup/full_20240601
xtrabackup --prepare --apply-log-only --target-dir=/backup/full_20240601 \
--incremental-dir=/backup/inc1
xtrabackup --prepare --target-dir=/backup/full_20240601 \
--incremental-dir=/backup/inc1
Binary Log Archiving (PITR)
[mysqld]
log_bin = mysql-bin
binlog_format = ROW
binlog_expire_logs_seconds = 604800
sync_binlog = 1
Archive binlogs to S3/object storage before expiration:
mysqlbinlog mysql-bin.000042 | gzip > mysql-bin.000042.sql.gz
aws s3 cp mysql-bin.000042.sql.gz s3://backups/mysql/binlogs/
Point-in-Time Recovery
# 1. Restore latest full backup
mysql -u root -p < full_backup_20240601.sql
# 2. Apply binlogs up to incident time
mysqlbinlog \
--start-datetime="2024-06-01 08:00:00" \
--stop-datetime="2024-06-01 10:30:00" \
/backup/binlogs/mysql-bin.000042 \
/backup/binlogs/mysql-bin.000043 | mysql -u root -p myapp
For GTID environments, use --include-gtids and --exclude-gtids carefully.
Restore Verification
#!/bin/bash
# Automated restore test
mysql -e "DROP DATABASE IF EXISTS restore_test;"
mysql -e "CREATE DATABASE restore_test;"
gunzip -c latest_backup.sql.gz | mysql restore_test
ROW_COUNT=$(mysql -N -e "SELECT COUNT(*) FROM restore_test.users;")
if [ "$ROW_COUNT" -gt 0 ]; then
echo "Restore test PASSED: $ROW_COUNT users"
else
echo "Restore test FAILED" && exit 1
fi
Run weekly against latest backup. Alert on failure.
Cloud Managed Backups
| Service | Feature |
|---|---|
| AWS RDS | Automated snapshots, PITR to 5-minute granularity |
| Google Cloud SQL | Automated daily + transaction logs |
| Azure Database | Geo-redundant backup storage |
Understand retention limits and test cross-region restore.
Best Practices
- Dedicated backup user with minimal privileges (
SELECT,RELOAD,LOCK TABLES,REPLICATION CLIENT) - Encrypt backups at rest (S3 SSE, gpg)
- Store backups in different availability zone/region from primary
- Never rely on replication as backup — accidental DROP propagates to replicas
- Document and practice restore procedures before incidents
Common Mistakes
| Mistake | Consequence |
|---|---|
| Untested backups | Discover corruption during real disaster |
| Backups on same disk as data | Lost in disk/server failure |
| No binlog archiving | Can only restore to last full backup time |
--single-transaction on mixed MyISAM/InnoDB |
Inconsistent MyISAM data |
Troubleshooting
mysqldump hangs:
SHOW PROCESSLIST;
-- Look for waiting on metadata lock — long transaction blocking dump
Restore fails on DEFINER:
sed 's/DEFINER=`root`@`localhost`/DEFINER=`app`@`localhost`/g' backup.sql | mysql myapp
XtraBackup prepare fails: ensure incremental chain is complete and applied in order.
Production Scenario
A fintech platform requires RPO < 1 minute and RTO < 30 minutes:
- XtraBackup full backup nightly to S3
- Incremental backups every hour
- Binlogs streamed continuously to S3 via
mysqlbinlogpipe - Monthly automated restore to isolated VPC validates backup integrity
- Runbook documents exact commands for ops team with on-call rotation
Backups are insurance — the premium is storage cost; the payout is your company’s data surviving human error, hardware failure, and ransomware.