Always On Deep Dive
Always On Availability Groups (AG) are SQL Server’s primary enterprise HA/DR mechanism. This deep dive covers architecture decisions, edge cases, and operational patterns beyond basic AG setup — for DBAs running multi-replica production clusters.
Architecture Layers
Application
↓ (AG Listener + MultiSubnetFailover)
WSFC / Pacemaker (quorum)
↓
Availability Group (replicas)
↓
Database (synchronized / synchronizing)
Each layer must be healthy — a working AG with broken quorum still blocks production failover.
Quorum Deep Dive
Voting Configuration
| Node | Votes | Typical Setup |
|---|---|---|
| SQL-PRIMARY | 1 | Production site |
| SQL-SECONDARY | 1 | Same site sync replica |
| File share / cloud witness | 1 | Breaks 2-2 ties |
Two-node clusters require a witness — without it, losing one node may remove quorum entirely.
# Add cloud witness (Windows)
Set-ClusterQuorum -CloudWitness -AccountName "StorageAccount" -AccessKey "key"
Dynamic Quorum
Modern WSFC adjusts votes when nodes go offline — reduces accidental quorum loss. Still document expected vote count after planned maintenance.
Replica Modes in Production
| Mode | RPO | RTO | Use Case |
|---|---|---|---|
| Synchronous + automatic | ~0 | Seconds | Same-rack HA |
| Asynchronous + manual | Minutes | Minutes–hours | Cross-region DR |
| Sync + manual | ~0 | Manual approval | Avoid auto split-brain across WAN |
Never use synchronous commit across high-latency WAN — transactions wait for remote log harden, destroying throughput.
Distributed Availability Groups
Chain AGs across geographic sites without merging WSFC clusters:
Site A: AG1 (Primary + Local Secondary)
↓ async forward
Site B: AG2 (Primary + Local Secondary)
CREATE AVAILABILITY GROUP [GlobalAG]
WITH (DISTRIBUTED)
AVAILABILITY GROUP ON
N'AG_SiteA' WITH (LISTENER_URL = N'TCP://ag1-listener:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT),
N'AG_SiteB' WITH (LISTENER_URL = N'TCP://ag2-listener:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);
Use for multi-datacenter DR without stretching a single WSFC across regions.
Seeding vs Manual Initial Sync
Automatic Seeding
-- Requires MATCHING SERVICE ACCOUNTS or certificates between replicas
ALTER AVAILABILITY GROUP [AppAG] GRANT CREATE ANY DATABASE;
-- Secondary receives streamed backup
Monitor seeding progress:
SELECT local_database_name, role_desc, internal_state_desc, transferred_size_bytes, database_size_bytes
FROM sys.dm_hadr_physical_seeding_stats;
Manual Backup/Restore
When seeding fails (firewall, size, version mismatch):
-- Primary
BACKUP DATABASE MyApp TO DISK = N'\\share\MyApp_full.bak' WITH COPY_ONLY;
-- Secondary
RESTORE DATABASE MyApp FROM DISK = N'\\share\MyApp_full.bak' WITH NORECOVERY;
ALTER DATABASE MyApp SET HADR = ON;
Listener and Multi-Subnet Failover
CREATE AVAILABILITY GROUP LISTENER [AppAG-Listener]
WITH (
IP ((N'10.0.1.50', N'255.255.255.0'), (N'10.0.2.50', N'255.255.255.0')),
PORT = 1433,
DNS = N'appag-sql.contoso.com'
);
Connection string for .NET:
Server=tcp:appag-sql.contoso.com,1433;Database=MyApp;MultiSubnetFailover=True;Application Name=MyApp;
MultiSubnetFailover=True — parallel connection attempts on failover (critical for multi-subnet AG).
Read-Scale Routing Advanced
Define routing priorities per replica:
ALTER AVAILABILITY GROUP [AppAG]
MODIFY REPLICA ON N'SQL-REPORT1' WITH (
SECONDARY_ROLE(READ_ONLY_ROUTING_URL = N'TCP://sql-report1:1433', READ_ONLY_ROUTING_PRIORITY = 50)
);
ALTER AVAILABILITY GROUP [AppAG]
MODIFY REPLICA ON N'SQL-REPORT2' WITH (
SECONDARY_ROLE(READ_ONLY_ROUTING_URL = N'TCP://sql-report2:1433', READ_ONLY_ROUTING_PRIORITY = 100)
);
Higher priority receives read traffic first. Requires ApplicationIntent=ReadOnly.
Secondary Lag and redo_queue
SELECT
ar.replica_server_name,
drs.database_name,
drs.synchronization_state_desc,
drs.log_send_queue_size AS send_kb,
drs.redo_queue_size AS redo_kb,
drs.last_redone_time,
drs.last_commit_time
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id;
| Symptom | Cause | Action |
|---|---|---|
| Growing send queue | Network bandwidth, primary log generation rate | Check network, index maintenance on primary |
| Growing redo queue | Secondary CPU/disk slower than primary | Match hardware, limit read workload |
| NOT SYNCHRONIZING | Endpoint down, login failure | Verify 5022, service accounts |
Failover Scenarios
Planned Maintenance (Zero Data Loss)
-- Verify sync healthy
SELECT synchronization_health_desc FROM sys.dm_hadr_database_replica_states;
ALTER AVAILABILITY GROUP [AppAG] FAILOVER;
-- Patch former primary, verify SYNCHRONIZED, fail back if desired
Disaster — Primary Site Lost
- Confirm primary truly unavailable (avoid split-brain)
- If async DR replica:
FORCE_FAILOVER_ALLOW_DATA_LOSS - Repoint applications to DR listener
- Rebuild former site as new secondary when restored
Document decision tree with executive approval for forced failover.
Login and Job Synchronization
SQL logins with SIDs don’t auto-sync to secondaries:
-- Compare SID mismatch (orphan detection after failover)
SELECT name FROM sys.server_principals WHERE sid NOT IN (
SELECT sid FROM sys.sql_logins
);
-- Transfer login script via sp_help_revlogin (deprecated) or dbatools Copy-DbaLogin
Agent jobs must exist on new primary — automate with multi-server administration or IaC.
Extended Events for AG
CREATE EVENT SESSION [AG_Health] ON SERVER
ADD EVENT sqlserver.always_on_ddl_executed,
ADD EVENT sqlserver.hadr_ag_wsfc_cluster_quorum_state,
ADD EVENT sqlserver.hadr_db_manager_state_change
ADD TARGET package0.event_file(SET filename = N'D:\XEvents\AG_Health.xel');
ALTER EVENT SESSION [AG_Health] ON SERVER STATE = START;
Common Expert Mistakes
| Mistake | Outcome |
|---|---|
| AG as backup replacement | Still need off-site backups — AG replicates corruption too |
| Readable secondary without RCSI awareness | Blocking on primary during redo conflicts |
| Same AG listener DNS TTL too high | Slow client failover |
Forgetting DB_FAILOVER = ON |
Manual DB join after instance failover |
| Mixing SQL versions in AG | Unsupported — blocks upgrade path |
Performance Tuning for AG
- Compression on backups — reduces seeding time
- Instant File Initialization on all replicas
- Matching trace flags — inconsistent flags cause hard-to-debug sync issues
- Limit concurrent readable queries —
max read-only connectionsworkload governance - Use delayed durability never on sync replicas — breaks durability assumptions
Production Architecture Example
[Cloud Witness]
|
+-----------------+-----------------+
| |
[Site A WSFC] [Site B WSFC]
SQL-PROD (sync) SQL-DR (async)
SQL-PROD2 (sync) SQL-DR2 (async)
| |
Listener: prod-sql Listener: dr-sql
RPO: 0 (site A) RPO: ~5 min (manual failover)
Quarterly: failover drill to DR, restore app connectivity, measure RTO, fail back.
Troubleshooting Commands
-- Endpoint state
SELECT name, state_desc FROM sys.database_mirroring_endpoints;
-- AG cluster state
SELECT cluster_name, quorum_type_desc, quorum_state_desc
FROM sys.dm_hadr_cluster;
-- Last failover reason
SELECT ag.name, ar.replica_server_name, rh.failover_reason, rh.failover_timestamp
FROM sys.availability_replicas ar
JOIN sys.availability_groups ag ON ar.group_id = ag.group_id
CROSS APPLY (
SELECT TOP 1 failover_reason, failover_timestamp
FROM sys.dm_hadr_automatic_seeding_history
) rh; -- Use error logs + XEvents for failover detail
Always On rewards meticulous quorum, listener, and runbook design — treat AG as a system spanning Windows/Linux clustering, networking, and SQL Server—not just a wizard click.