High availability (HA) keeps databases online through hardware failures, patching, and disasters. SQL Server’s flagship HA feature is Always On Availability Groups (AG) — multi-database replication with optional automatic failover and readable secondaries.

HA Options Compared

Feature Always On AG Failover Cluster (FCI) Log Shipping Replication
Failover Automatic (AG) Automatic (shared storage) Manual Manual
Read scale-out Yes (secondaries) No No (standby) Yes (subscribers)
RPO Zero with sync Storage dependent Minutes Minutes
Edition Enterprise (full); Standard (basic) Standard+ All All
Complexity High Medium Low Medium

Always On Prerequisites

  • Windows Server Failover Cluster (WSFC) on Windows, or Pacemaker on Linux
  • Matching SQL Server version/build on replicas
  • Enterprise Edition for multi-DB AG, readable secondaries, basic features beyond limits
  • Standard Edition — one database per AG, no readable secondaries (verify current limits)
  • Network latency low enough for sync mode (typically same datacenter)
  • Open port 5022 for database mirroring endpoint

WSFC Quorum

Quorum determines cluster viability when nodes fail:

Quorum Mode When to Use
Node majority Odd number of voting nodes
Node and disk majority Shared disk witness
Cloud witness Azure blob — recommended for 2-node clusters
No quorum (manual) Maintenance only

Split-brain occurs without quorum — neither side accepts writes. Test failover quarterly.

Create Availability Group

  -- On primary replica
CREATE AVAILABILITY GROUP [AppAG]
WITH (
    AUTOMATED_BACKUP_PREFERENCE = SECONDARY,
    DB_FAILOVER = ON,
    DTC_SUPPORT = NONE
)
FOR DATABASE MyApp
REPLICA ON
    N'SQL-PRIMARY' WITH (
        ENDPOINT_URL = N'TCP://sql-primary.contoso.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = AUTOMATIC,
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY)
    ),
    N'SQL-SECONDARY' WITH (
        ENDPOINT_URL = N'TCP://sql-secondary.contoso.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = AUTOMATIC,
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY)
    );
GO
  

Join Secondary Replica

  -- On secondary
ALTER AVAILABILITY GROUP [AppAG] JOIN;
ALTER DATABASE MyApp SET HADR = ON;
  

Automatic seeding streams backup over network — firewall must allow replica-to-replica traffic.

Async Replica for DR

  -- Disaster recovery site — async accepts data loss on failover
N'SQL-DR' WITH (
    ENDPOINT_URL = N'TCP://sql-dr.contoso.com:5022',
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
    FAILOVER_MODE = MANUAL,
    SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)
)
  

Use async when latency exceeds ~10 ms or cross-region DR is required.

Failover

  -- Planned manual failover (no data loss with sync)
ALTER AVAILABILITY GROUP [AppAG] FAILOVER;

-- Force failover with data loss (async DR emergency only)
ALTER AVAILABILITY GROUP [AppAG] FORCE_FAILOVER_ALLOW_DATA_LOSS;
  

Application connection strings must target AG listener, not individual node names:

  Server=tcp:AppAG-Listener.contoso.com,1433;Database=MyApp;MultiSubnetFailover=True;
  

Read-Only Routing

Offload reporting to secondaries:

  ALTER AVAILABILITY GROUP [AppAG]
MODIFY REPLICA ON N'SQL-SECONDARY' WITH (
    SECONDARY_ROLE(READ_ONLY_ROUTING_URL = N'TCP://sql-secondary.contoso.com:1433')
);

ALTER AVAILABILITY GROUP [AppAG]
MODIFY REPLICA ON N'SQL-PRIMARY' WITH (
    PRIMARY_ROLE(READ_ONLY_ROUTING_LIST = (N'SQL-SECONDARY', N'SQL-PRIMARY'))
);
  

Connection string: ApplicationIntent=ReadOnly routes to readable secondary.

Monitoring AG Health

  SELECT
    ag.name AS ag_name,
    ar.replica_server_name,
    ars.role_desc,
    drs.synchronization_state_desc,
    drs.synchronization_health_desc,
    drs.log_send_queue_size,
    drs.redo_queue_size,
    drs.last_commit_time
FROM sys.availability_groups AS ag
INNER JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id
INNER JOIN sys.dm_hadr_availability_replica_states AS ars ON ar.replica_id = ars.replica_id
INNER JOIN sys.dm_hadr_database_replica_states AS drs ON ar.replica_id = drs.replica_id;
  

Alert when:

  • synchronization_health_desc ≠ HEALTHY
  • log_send_queue_size grows continuously (network or secondary load)
  • redo_queue_size high (secondary CPU/disk bottleneck)

Log Shipping Alternative

Simpler DR without AG licensing complexity:

  -- Primary: backup log every 15 minutes
BACKUP LOG MyApp TO DISK = N'\\share\logs\MyApp_log.trn' WITH COMPRESSION;

-- Secondary: restore with NORECOVERY
RESTORE LOG MyApp FROM DISK = N'\\share\logs\MyApp_log.trn' WITH NORECOVERY;
  

Manual failover — restore final log WITH RECOVERY. Higher RPO/RTO than sync AG.

Common Mistakes

Mistake Impact
Connecting to node name App breaks on failover
Sync AG across regions Performance collapse, timeouts
No quorum witness Cluster goes offline on one node loss
Skipping failover drills Discover broken logins during outage
Backups only on primary Miss secondary preference benefits

Performance Tips

  • Run backups on secondary (AUTOMATED_BACKUP_PREFERENCE = SECONDARY)
  • Match hardware on sync replicas — redo lag causes backlog
  • Use dedicated network for replication traffic
  • Limit readable secondary workload — competes with redo thread

Production Scenario: Patch Cycle

  1. Failover to secondary (planned)
  2. Patch former primary (now secondary)
  3. Verify sync health
  4. Fail back during maintenance window
  5. Patch other node

Document runbook with listener name, credentials, and validation queries.

Troubleshooting

Issue Resolution
AG not synchronizing Check endpoint, firewall 5022, service accounts
Secondary not readable ALLOW_CONNECTIONS = READ_ONLY, database in SYNCHRONIZING
Failover failed Quorum lost — restore witness
Login failures after failover Sync SQL logins (jobs/contained users)
  -- Verify listener
SELECT dns_name, port FROM sys.availability_group_listeners;
  

Always On delivers enterprise RPO/RTO — invest in quorum design, listener connection strings, and regular failover testing before production dependency.