Distributed Availability Groups Jennifer Brocato Distributed Availability Groups
About Me Lead DBA for the brown truck company 25+ years experience with Enterprise Development Worked with SQL Server since 2000 My article on SQLServerCentral.com on SQLDAG
What we will cover Availability group refresher The HA/DR solution Distributed availability groups – what is it Configuring a SQLDAG solution Failover a SQLDAG Monitoring and health
Availability Groups Refresher
Availability Group Requirements Requires a Windows Server Failover Clustering (WSFC) cluster. An instance of SQL Server must reside on a WSFC node Each availability replica of a given availability group must reside on a different node of the same WSFC cluster. Relies on WSFC to monitor and manage the current roles of the availability replicas Overall health is based on all nodes in the cluster Quorum configuration – node and disk majority sys.dm_hadr_cluster sys.dm_hadr_cluster_members
Example Configuration (GEO Cluster)
Configuring Quorum
Availability Group Basics Contains one or more databases Database must be in Full Recovery Model One Unit of Failover SQL Server Agent Jobs, Logins, Linked Servers do not fail over Create Availability Groups then add database
Creating the HA/DR Solution Make sure Failover Clustering is installed on nodes by going to Administrator Tools, Failover Cluster Manager Perform a Cluster Validation Configure Quorum Node Votes (powershell script) Primary Data Center Nodes have vote set to 1 Secondary Data Center Nodes have vote set to 0 Install stand-alone or failover cluster instance Enable AlwaysOn Availability Group for each SQL Server Service
Creating the HA/DR Solution Add login and grant connect to SQL Create endpoint Create database or restore to primary instance Create full/log backup Create availability group Restore full/log backup to each secondary Join availability group USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is depricated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc]
Primary Replica with 4 Secondaries
Distributed Availability Group (SQLDAG) One of the new features in SQL2016 is the ability to distribute availability groups across clusters. This solution makes high availability and disaster recovery geographically dispersed. Distributed Availability groups allows you to associate availability groups on two different Windows Server Failover Clusters.
Distributed Availability Group Two or more clusters Mix of standalone and FCIs Secondary cluster only knows that it is a secondary and does not know which is primary (DMV is coming for visibility with this) No GUI No Alerts USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is depricated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc]
Creating the HA/DR Solution using SQLDAG (…continued) Add login and grant connect to SQL Create endpoint Create database or restore to primary instance Create full/log backup Restore full/log backup to each secondary including secondary cluster(s) Create availability group Join availability group on primary cluster (may contain just one replica) Create availability group on secondary (primary instance) cluster USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is depricated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc] ALTER AVAILABILITY GROUP AG_1 GRANT CREATE ANY DATABASE (for automatic seeding)
Now for the FUN PART! WITH (DISTRIBUTED) Go to the Primary Cluster Primary Instance and create SQLDAG CREATE AVAILABILITY GROUP [distributedag] WITH (DISTRIBUTED) AVAILABILITY GROUP ON 'AG_1' WITH ( LISTENER_URL = 'tcp://<virtualname>:5022', --Use listener name when there is a standalone AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL ), 'AG_2' WITH LISTENER_URL = 'tcp://<virtualname>:5022', --Use SQLVNN not listener name when there is an FCI --check port to make sure it is 5022/5023/5024 etc ); GO USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is depricated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc]
Now for the FUN PART…again! Go to the Secondary Cluster Primary Instance and join SQLDAG CREATE AVAILABILITY GROUP [distributedag] JOIN AVAILABILITY GROUP ON 'AG_1' WITH ( LISTENER_URL = 'tcp://<virtualname>:5022', --Use listener name when there is a standalone AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL ), 'AG_2' WITH LISTENER_URL = 'tcp://<virtualname>:5022', --Use SQLVNN not listener name when there is an FCI --check port to make sure it is 5022/5023/5024 etc ); GO USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is depricated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc]
Now Let’s Get Synchronized ALTER DATABASE D000DB1 SET HADR AVAILABILITY GROUP = xxxx Then WAIT…. Check the health of the distributed availability group to see if all replicas are synchronizing select r.replica_server_name, r.endpoint_url, rs.connected_state_desc, rs.last_connect_error_description, rs.last_connect_error_number, rs.last_connect_error_timestamp from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r on rs.replica_id=r.replica_id where rs.is_local=1 There is no GUI for SQLDAG so you will need to rely on DMVs USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is depricated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc] ALTER AVAILABILITY GROUP AG_1 GRANT CREATE ANY DATABASE (for automatic seeding)
Gathering Information About Availablity Groups dm_hadr_automatic_seeding dm_hadr_physical_seeding_stats dm_hadr_availability_replica_cluster_nodes dm_hadr_availability_group_states dm_hadr_availability_replica_states USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is depricated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc] ALTER AVAILABILITY GROUP AG_1 GRANT CREATE ANY DATABASE (for automatic seeding)
Manual Failover of SQLDAG ALTER AVAILABILITY GROUP of SQLDAG (secondary cluster) to SYNCHRONOUS_COMMIT Verify from dm_hadr_database_replica_states that status is SYNCHRONIZED and end_of_log_lsn matches from primarys in both clusters On instance that hosts primary SQLDAG, set SQLDAG role to Secondary ALTER AVAILABILITY GROUP xxx SET (ROLE = SECONDARY) At this point the SQLDAG is unavailable Verify again readiness to failover Issue failover – ALTER AVAILABILITY GROUP xxx FORCE_FAILOVER_ALLOW_DATA_LOSS After this step the SQLDAG is available again Set SQLDAG back to ASYNCHRONOUS_COMMIT USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is depricated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc] ALTER AVAILABILITY GROUP AG_1 GRANT CREATE ANY DATABASE (for automatic seeding)
Configuration Example 2 sets of clustered VMs Availability group in each VM cluster Automatic Failover Synchronous Distributed availability group between each cluster set ESX host cluster A in data center 1 ESX host cluster B in data center 1 ESX host cluster C in data center 2 ESX host cluster D in data center 2 VM1 on ESX host cluster A VM2 on ESX host cluster B VM3 on ESX host cluster C VM4 on ESX host cluster D Cluster VM1 and VM2 – Cluster X Cluster VM3 and VM4 – Cluster Y VM1 and VM2 set up with AG with automatic failover and synchronous VM3 and VM4 set up with AG with automatic failover and synchronous SQLDAG between cluster X and cluster Y
Monitor SQLDAG Replication Health State SELECT * FROM (SELECT… FROM sys.availability_replicas r LEFT JOINsys.availability_groups ds ON r.group_id = ds.group_id WHERE ds.name = '<AGName>' ) a inner join ( SELECT … FROM sys.availability_groups ag JOIN sys.dm_hadr_availability_group_states as ags ON ag.group_id=ags.group_id LEFT JOIN sys.dm_hadr_database_replica_states ds ON ds.group_id = ag.group_id ) b ON b.availability_group = a.replica_server_name select CAST(r.replica_id as VARCHAR(36)) as replica_id, CAST(r.group_id as VARCHAR(36)) as group_id, replica_server_name, endpoint_url, ISNULL(ds.is_distributed,0) as is_distributed, ds.name from sys.availability_replicas r left join sys.availability_groups ds on r.group_id = ds.group_id where ds.name = '<AGName>' ) a inner join ( select availability_group=cast(ag.name as varchar(30)), primary_replica=cast(ags.primary_replica as varchar(30)), primary_recovery_health_desc=cast(ags.primary_recovery_health_desc as varchar(30)), synchronization_health_desc=cast(ags.synchronization_health_desc as varchar(30)), CAST(ag.group_id as VARCHAR(36)) as group_id , CAST(ag.resource_id as VARCHAR(36)) as resource_id, CAST(ag.resource_group_id as VARCHAR(36)) as resource_group_id, ag.failure_condition_level, ag.health_check_timeout, ag.is_distributed, ds.recovery_lsn, ds.truncation_lsn, CAST(ISNULL(ds.last_sent_lsn,0) AS VARCHAR) last_sent_lsn, ISNULL(ds.last_sent_time,0) as last_sent_time, CAST(ISNULL(ds.last_received_lsn,0) AS VARCHAR) as last_received_lsn, ISNULL(ds.last_received_time,0) as last_received_time, CAST(ISNULL(ds.last_hardened_lsn,0) AS VARCHAR) as last_hardened_lsn, ISNULL(ds.last_hardened_time,0) as last_hardened_time, ds.last_redone_lsn, ds.last_redone_time, ds.log_send_queue_size, CAST(ISNULL(ds.log_send_rate,0) AS VARCHAR) as log_send_rate, ds.redo_queue_size, CAST(ISNULL(ds.redo_rate,0) AS VARCHAR) as redo_rate, CAST(ds.end_of_log_lsn AS VARCHAR) as end_of_log_lsn, CAST(ds.last_commit_lsn AS VARCHAR) as last_commit_lsn, ds.last_commit_time, automated_backup_preference_desc=cast(ag.automated_backup_preference_desc as varchar(10)) from sys.availability_groups ag join sys.dm_hadr_availability_group_states ags on ag.group_id=ags.group_id left join sys.dm_hadr_database_replica_states ds on ds.group_id = ag.group_id ) b on b.availability_group = a.replica_server_name
Health State Metric Collection timestamp, LSN information and Health State
Health State LSN Hardened, RedoRate, LSN Commit.
Alert on Unhealth SQLDAG State Record health state with query in a set interval using a job (5 minutes is a suggestion). Use a job to check latest health state in comparison to prior health state Repeated unhealthy SQLDAG replication to trigger alerting mechanism (such as email).
Worth Mentioning SQL2017 Read-scale availability groups Not for HA or DR, just for synchronization, no WSFC The official abbreviation for distributed availability groups is not DAG. DAG is used for Exchange Database Availability Groups.
Questions
Thank You