Backup Strategy Framework

Every production MySQL deployment needs:

  1. Regular full backups — daily or weekly baseline
  2. Continuous binlog archiving — point-in-time recovery (PITR)
  3. Offsite/copy storage — separate region or object storage
  4. Restore testing — automated monthly verification
  5. Documented RTO/RPO — recovery time and data loss targets

RPO (Recovery Point Objective): max acceptable data loss — binlog frequency determines this.
RTO (Recovery Time Objective): max acceptable downtime — backup method and size determine this.

Logical Backup — mysqldump

  # Single database
mysqldump -u backup_user -p \
  --single-transaction \
  --routines \
  --triggers \
  --set-gtid-purged=OFF \
  myapp > myapp_$(date +%Y%m%d).sql

# All databases
mysqldump -u backup_user -p --all-databases | gzip > full_$(date +%Y%m%d).sql.gz

# Specific tables
mysqldump -u backup_user -p myapp users orders > partial.sql
  
Flag Purpose
--single-transaction Consistent InnoDB snapshot without locking
--routines Include stored procedures/functions
--triggers Include triggers
--no-tablespaces Avoid PROCESS privilege requirement
--set-gtid-purged=OFF Avoid GTID issues when restoring to different server

mysqldump Limitations

  • Slow on large databases (hours for 100GB+)
  • Restores are single-threaded
  • Table locks on MyISAM without --single-transaction
  • Not suitable for sub-minute RPO alone — combine with binlogs

Physical Backup — Percona XtraBackup

  # Full backup
xtrabackup --backup --target-dir=/backup/full_$(date +%Y%m%d) \
  --user=backup_user --password='...'

# Prepare (apply redo logs)
xtrabackup --prepare --target-dir=/backup/full_20240601

# Restore
systemctl stop mysql
rm -rf /var/lib/mysql/*
xtrabackup --copy-back --target-dir=/backup/full_20240601
chown -R mysql:mysql /var/lib/mysql
systemctl start mysql
  

XtraBackup supports hot backups — minimal impact on running production.

Incremental Backups

  # Incremental based on last full
xtrabackup --backup --target-dir=/backup/inc1 \
  --incremental-basedir=/backup/full_20240601

# Prepare incremental chain
xtrabackup --prepare --apply-log-only --target-dir=/backup/full_20240601
xtrabackup --prepare --apply-log-only --target-dir=/backup/full_20240601 \
  --incremental-dir=/backup/inc1
xtrabackup --prepare --target-dir=/backup/full_20240601 \
  --incremental-dir=/backup/inc1
  

Binary Log Archiving (PITR)

  [mysqld]
log_bin = mysql-bin
binlog_format = ROW
binlog_expire_logs_seconds = 604800
sync_binlog = 1
  

Archive binlogs to S3/object storage before expiration:

  mysqlbinlog mysql-bin.000042 | gzip > mysql-bin.000042.sql.gz
aws s3 cp mysql-bin.000042.sql.gz s3://backups/mysql/binlogs/
  

Point-in-Time Recovery

  # 1. Restore latest full backup
mysql -u root -p < full_backup_20240601.sql

# 2. Apply binlogs up to incident time
mysqlbinlog \
  --start-datetime="2024-06-01 08:00:00" \
  --stop-datetime="2024-06-01 10:30:00" \
  /backup/binlogs/mysql-bin.000042 \
  /backup/binlogs/mysql-bin.000043 | mysql -u root -p myapp
  

For GTID environments, use --include-gtids and --exclude-gtids carefully.

Restore Verification

  #!/bin/bash
# Automated restore test
mysql -e "DROP DATABASE IF EXISTS restore_test;"
mysql -e "CREATE DATABASE restore_test;"
gunzip -c latest_backup.sql.gz | mysql restore_test
ROW_COUNT=$(mysql -N -e "SELECT COUNT(*) FROM restore_test.users;")
if [ "$ROW_COUNT" -gt 0 ]; then
  echo "Restore test PASSED: $ROW_COUNT users"
else
  echo "Restore test FAILED" && exit 1
fi
  

Run weekly against latest backup. Alert on failure.

Cloud Managed Backups

Service Feature
AWS RDS Automated snapshots, PITR to 5-minute granularity
Google Cloud SQL Automated daily + transaction logs
Azure Database Geo-redundant backup storage

Understand retention limits and test cross-region restore.

Best Practices

  1. Dedicated backup user with minimal privileges (SELECT, RELOAD, LOCK TABLES, REPLICATION CLIENT)
  2. Encrypt backups at rest (S3 SSE, gpg)
  3. Store backups in different availability zone/region from primary
  4. Never rely on replication as backup — accidental DROP propagates to replicas
  5. Document and practice restore procedures before incidents

Common Mistakes

Mistake Consequence
Untested backups Discover corruption during real disaster
Backups on same disk as data Lost in disk/server failure
No binlog archiving Can only restore to last full backup time
--single-transaction on mixed MyISAM/InnoDB Inconsistent MyISAM data

Troubleshooting

mysqldump hangs:

  SHOW PROCESSLIST;
-- Look for waiting on metadata lock — long transaction blocking dump
  

Restore fails on DEFINER:

  sed 's/DEFINER=`root`@`localhost`/DEFINER=`app`@`localhost`/g' backup.sql | mysql myapp
  

XtraBackup prepare fails: ensure incremental chain is complete and applied in order.

Production Scenario

A fintech platform requires RPO < 1 minute and RTO < 30 minutes:

  • XtraBackup full backup nightly to S3
  • Incremental backups every hour
  • Binlogs streamed continuously to S3 via mysqlbinlog pipe
  • Monthly automated restore to isolated VPC validates backup integrity
  • Runbook documents exact commands for ops team with on-call rotation

Backups are insurance — the premium is storage cost; the payout is your company’s data surviving human error, hardware failure, and ransomware.