to navigate

to select

to close

On this page

High Availability (Always On)

High availability (HA) keeps databases online through hardware failures, patching, and disasters. SQL Server’s flagship HA feature is Always On Availability Groups (AG) — multi-database replication with optional automatic failover and readable secondaries.

HA Options Compared

Feature	Always On AG	Failover Cluster (FCI)	Log Shipping	Replication
Failover	Automatic (AG)	Automatic (shared storage)	Manual	Manual
Read scale-out	Yes (secondaries)	No	No (standby)	Yes (subscribers)
RPO	Zero with sync	Storage dependent	Minutes	Minutes
Edition	Enterprise (full); Standard (basic)	Standard+	All	All
Complexity	High	Medium	Low	Medium

Always On Prerequisites

Windows Server Failover Cluster (WSFC) on Windows, or Pacemaker on Linux
Matching SQL Server version/build on replicas
Enterprise Edition for multi-DB AG, readable secondaries, basic features beyond limits
Standard Edition — one database per AG, no readable secondaries (verify current limits)
Network latency low enough for sync mode (typically same datacenter)
Open port 5022 for database mirroring endpoint

WSFC Quorum

Quorum determines cluster viability when nodes fail:

Quorum Mode	When to Use
Node majority	Odd number of voting nodes
Node and disk majority	Shared disk witness
Cloud witness	Azure blob — recommended for 2-node clusters
No quorum (manual)	Maintenance only

Split-brain occurs without quorum — neither side accepts writes. Test failover quarterly.

Create Availability Group

  -- On primary replica
CREATE AVAILABILITY GROUP [AppAG]
WITH (
    AUTOMATED_BACKUP_PREFERENCE = SECONDARY,
    DB_FAILOVER = ON,
    DTC_SUPPORT = NONE
)
FOR DATABASE MyApp
REPLICA ON
    N'SQL-PRIMARY' WITH (
        ENDPOINT_URL = N'TCP://sql-primary.contoso.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = AUTOMATIC,
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY)
    ),
    N'SQL-SECONDARY' WITH (
        ENDPOINT_URL = N'TCP://sql-secondary.contoso.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = AUTOMATIC,
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY)
    );
GO

Join Secondary Replica

  -- On secondary
ALTER AVAILABILITY GROUP [AppAG] JOIN;
ALTER DATABASE MyApp SET HADR = ON;

Automatic seeding streams backup over network — firewall must allow replica-to-replica traffic.

Async Replica for DR

  -- Disaster recovery site — async accepts data loss on failover
N'SQL-DR' WITH (
    ENDPOINT_URL = N'TCP://sql-dr.contoso.com:5022',
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
    FAILOVER_MODE = MANUAL,
    SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)
)

Use async when latency exceeds ~10 ms or cross-region DR is required.

Failover

  -- Planned manual failover (no data loss with sync)
ALTER AVAILABILITY GROUP [AppAG] FAILOVER;

-- Force failover with data loss (async DR emergency only)
ALTER AVAILABILITY GROUP [AppAG] FORCE_FAILOVER_ALLOW_DATA_LOSS;

Application connection strings must target AG listener, not individual node names:

  Server=tcp:AppAG-Listener.contoso.com,1433;Database=MyApp;MultiSubnetFailover=True;

Read-Only Routing

Offload reporting to secondaries:

  ALTER AVAILABILITY GROUP [AppAG]
MODIFY REPLICA ON N'SQL-SECONDARY' WITH (
    SECONDARY_ROLE(READ_ONLY_ROUTING_URL = N'TCP://sql-secondary.contoso.com:1433')
);

ALTER AVAILABILITY GROUP [AppAG]
MODIFY REPLICA ON N'SQL-PRIMARY' WITH (
    PRIMARY_ROLE(READ_ONLY_ROUTING_LIST = (N'SQL-SECONDARY', N'SQL-PRIMARY'))
);

Connection string: ApplicationIntent=ReadOnly routes to readable secondary.

Monitoring AG Health

  SELECT
    ag.name AS ag_name,
    ar.replica_server_name,
    ars.role_desc,
    drs.synchronization_state_desc,
    drs.synchronization_health_desc,
    drs.log_send_queue_size,
    drs.redo_queue_size,
    drs.last_commit_time
FROM sys.availability_groups AS ag
INNER JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id
INNER JOIN sys.dm_hadr_availability_replica_states AS ars ON ar.replica_id = ars.replica_id
INNER JOIN sys.dm_hadr_database_replica_states AS drs ON ar.replica_id = drs.replica_id;

Alert when:

synchronization_health_desc ≠ HEALTHY
log_send_queue_size grows continuously (network or secondary load)
redo_queue_size high (secondary CPU/disk bottleneck)

Log Shipping Alternative

Simpler DR without AG licensing complexity:

  -- Primary: backup log every 15 minutes
BACKUP LOG MyApp TO DISK = N'\\share\logs\MyApp_log.trn' WITH COMPRESSION;

-- Secondary: restore with NORECOVERY
RESTORE LOG MyApp FROM DISK = N'\\share\logs\MyApp_log.trn' WITH NORECOVERY;

Manual failover — restore final log WITH RECOVERY. Higher RPO/RTO than sync AG.

Common Mistakes

Mistake	Impact
Connecting to node name	App breaks on failover
Sync AG across regions	Performance collapse, timeouts
No quorum witness	Cluster goes offline on one node loss
Skipping failover drills	Discover broken logins during outage
Backups only on primary	Miss secondary preference benefits

Performance Tips

Run backups on secondary (AUTOMATED_BACKUP_PREFERENCE = SECONDARY)
Match hardware on sync replicas — redo lag causes backlog
Use dedicated network for replication traffic
Limit readable secondary workload — competes with redo thread

Production Scenario: Patch Cycle

Failover to secondary (planned)
Patch former primary (now secondary)
Verify sync health
Fail back during maintenance window
Patch other node

Document runbook with listener name, credentials, and validation queries.

Troubleshooting

Issue	Resolution
AG not synchronizing	Check endpoint, firewall 5022, service accounts
Secondary not readable	`ALLOW_CONNECTIONS = READ_ONLY`, database in SYNCHRONIZING
Failover failed	Quorum lost — restore witness
Login failures after failover	Sync SQL logins (jobs/contained users)

  -- Verify listener
SELECT dns_name, port FROM sys.availability_group_listeners;

Always On delivers enterprise RPO/RTO — invest in quorum design, listener connection strings, and regular failover testing before production dependency.

Transactions and Locking

MongoDB Atlas

High Availability (Always On)

HA Options Compared link

Always On Prerequisites link

WSFC Quorum link

Create Availability Group link

Join Secondary Replica link

Async Replica for DR link

Failover link

Read-Only Routing link

Monitoring AG Health link

Log Shipping Alternative link

Common Mistakes link

Performance Tips link

Production Scenario: Patch Cycle link

Troubleshooting link