Always On Availability Groups (AG) are SQL Server’s primary enterprise HA/DR mechanism. This deep dive covers architecture decisions, edge cases, and operational patterns beyond basic AG setup — for DBAs running multi-replica production clusters.

Architecture Layers

  Application
    ↓ (AG Listener + MultiSubnetFailover)
WSFC / Pacemaker (quorum)
    ↓
Availability Group (replicas)
    ↓
Database (synchronized / synchronizing)
  

Each layer must be healthy — a working AG with broken quorum still blocks production failover.

Quorum Deep Dive

Voting Configuration

Node Votes Typical Setup
SQL-PRIMARY 1 Production site
SQL-SECONDARY 1 Same site sync replica
File share / cloud witness 1 Breaks 2-2 ties

Two-node clusters require a witness — without it, losing one node may remove quorum entirely.

  # Add cloud witness (Windows)
Set-ClusterQuorum -CloudWitness -AccountName "StorageAccount" -AccessKey "key"
  

Dynamic Quorum

Modern WSFC adjusts votes when nodes go offline — reduces accidental quorum loss. Still document expected vote count after planned maintenance.

Replica Modes in Production

Mode RPO RTO Use Case
Synchronous + automatic ~0 Seconds Same-rack HA
Asynchronous + manual Minutes Minutes–hours Cross-region DR
Sync + manual ~0 Manual approval Avoid auto split-brain across WAN

Never use synchronous commit across high-latency WAN — transactions wait for remote log harden, destroying throughput.

Distributed Availability Groups

Chain AGs across geographic sites without merging WSFC clusters:

  Site A: AG1 (Primary + Local Secondary)
           ↓ async forward
Site B: AG2 (Primary + Local Secondary)
  
  CREATE AVAILABILITY GROUP [GlobalAG]
WITH (DISTRIBUTED)
AVAILABILITY GROUP ON
    N'AG_SiteA' WITH (LISTENER_URL = N'TCP://ag1-listener:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT),
    N'AG_SiteB' WITH (LISTENER_URL = N'TCP://ag2-listener:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);
  

Use for multi-datacenter DR without stretching a single WSFC across regions.

Seeding vs Manual Initial Sync

Automatic Seeding

  -- Requires MATCHING SERVICE ACCOUNTS or certificates between replicas
ALTER AVAILABILITY GROUP [AppAG] GRANT CREATE ANY DATABASE;
-- Secondary receives streamed backup
  

Monitor seeding progress:

  SELECT local_database_name, role_desc, internal_state_desc, transferred_size_bytes, database_size_bytes
FROM sys.dm_hadr_physical_seeding_stats;
  

Manual Backup/Restore

When seeding fails (firewall, size, version mismatch):

  -- Primary
BACKUP DATABASE MyApp TO DISK = N'\\share\MyApp_full.bak' WITH COPY_ONLY;

-- Secondary
RESTORE DATABASE MyApp FROM DISK = N'\\share\MyApp_full.bak' WITH NORECOVERY;
ALTER DATABASE MyApp SET HADR = ON;
  

Listener and Multi-Subnet Failover

  CREATE AVAILABILITY GROUP LISTENER [AppAG-Listener]
WITH (
    IP ((N'10.0.1.50', N'255.255.255.0'), (N'10.0.2.50', N'255.255.255.0')),
    PORT = 1433,
    DNS = N'appag-sql.contoso.com'
);
  

Connection string for .NET:

  Server=tcp:appag-sql.contoso.com,1433;Database=MyApp;MultiSubnetFailover=True;Application Name=MyApp;
  

MultiSubnetFailover=True — parallel connection attempts on failover (critical for multi-subnet AG).

Read-Scale Routing Advanced

Define routing priorities per replica:

  ALTER AVAILABILITY GROUP [AppAG]
MODIFY REPLICA ON N'SQL-REPORT1' WITH (
    SECONDARY_ROLE(READ_ONLY_ROUTING_URL = N'TCP://sql-report1:1433', READ_ONLY_ROUTING_PRIORITY = 50)
);
ALTER AVAILABILITY GROUP [AppAG]
MODIFY REPLICA ON N'SQL-REPORT2' WITH (
    SECONDARY_ROLE(READ_ONLY_ROUTING_URL = N'TCP://sql-report2:1433', READ_ONLY_ROUTING_PRIORITY = 100)
);
  

Higher priority receives read traffic first. Requires ApplicationIntent=ReadOnly.

Secondary Lag and redo_queue

  SELECT
    ar.replica_server_name,
    drs.database_name,
    drs.synchronization_state_desc,
    drs.log_send_queue_size AS send_kb,
    drs.redo_queue_size AS redo_kb,
    drs.last_redone_time,
    drs.last_commit_time
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id;
  
Symptom Cause Action
Growing send queue Network bandwidth, primary log generation rate Check network, index maintenance on primary
Growing redo queue Secondary CPU/disk slower than primary Match hardware, limit read workload
NOT SYNCHRONIZING Endpoint down, login failure Verify 5022, service accounts

Failover Scenarios

Planned Maintenance (Zero Data Loss)

  -- Verify sync healthy
SELECT synchronization_health_desc FROM sys.dm_hadr_database_replica_states;

ALTER AVAILABILITY GROUP [AppAG] FAILOVER;
-- Patch former primary, verify SYNCHRONIZED, fail back if desired
  

Disaster — Primary Site Lost

  1. Confirm primary truly unavailable (avoid split-brain)
  2. If async DR replica: FORCE_FAILOVER_ALLOW_DATA_LOSS
  3. Repoint applications to DR listener
  4. Rebuild former site as new secondary when restored

Document decision tree with executive approval for forced failover.

Login and Job Synchronization

SQL logins with SIDs don’t auto-sync to secondaries:

  -- Compare SID mismatch (orphan detection after failover)
SELECT name FROM sys.server_principals WHERE sid NOT IN (
    SELECT sid FROM sys.sql_logins
);

-- Transfer login script via sp_help_revlogin (deprecated) or dbatools Copy-DbaLogin
  

Agent jobs must exist on new primary — automate with multi-server administration or IaC.

Extended Events for AG

  CREATE EVENT SESSION [AG_Health] ON SERVER
ADD EVENT sqlserver.always_on_ddl_executed,
ADD EVENT sqlserver.hadr_ag_wsfc_cluster_quorum_state,
ADD EVENT sqlserver.hadr_db_manager_state_change
ADD TARGET package0.event_file(SET filename = N'D:\XEvents\AG_Health.xel');
ALTER EVENT SESSION [AG_Health] ON SERVER STATE = START;
  

Common Expert Mistakes

Mistake Outcome
AG as backup replacement Still need off-site backups — AG replicates corruption too
Readable secondary without RCSI awareness Blocking on primary during redo conflicts
Same AG listener DNS TTL too high Slow client failover
Forgetting DB_FAILOVER = ON Manual DB join after instance failover
Mixing SQL versions in AG Unsupported — blocks upgrade path

Performance Tuning for AG

  • Compression on backups — reduces seeding time
  • Instant File Initialization on all replicas
  • Matching trace flags — inconsistent flags cause hard-to-debug sync issues
  • Limit concurrent readable queries — max read-only connections workload governance
  • Use delayed durability never on sync replicas — breaks durability assumptions

Production Architecture Example

                      [Cloud Witness]
                          |
        +-----------------+-----------------+
        |                                   |
   [Site A WSFC]                      [Site B WSFC]
   SQL-PROD (sync)                     SQL-DR (async)
   SQL-PROD2 (sync)                   SQL-DR2 (async)
        |                                   |
   Listener: prod-sql                   Listener: dr-sql
   RPO: 0 (site A)                      RPO: ~5 min (manual failover)
  

Quarterly: failover drill to DR, restore app connectivity, measure RTO, fail back.

Troubleshooting Commands

  -- Endpoint state
SELECT name, state_desc FROM sys.database_mirroring_endpoints;

-- AG cluster state
SELECT cluster_name, quorum_type_desc, quorum_state_desc
FROM sys.dm_hadr_cluster;

-- Last failover reason
SELECT ag.name, ar.replica_server_name, rh.failover_reason, rh.failover_timestamp
FROM sys.availability_replicas ar
JOIN sys.availability_groups ag ON ar.group_id = ag.group_id
CROSS APPLY (
    SELECT TOP 1 failover_reason, failover_timestamp
    FROM sys.dm_hadr_automatic_seeding_history
) rh;  -- Use error logs + XEvents for failover detail
  

Always On rewards meticulous quorum, listener, and runbook design — treat AG as a system spanning Windows/Linux clustering, networking, and SQL Server—not just a wizard click.