Download presentation
Published byKellie Collins Modified over 8 years ago
1
SQL Server 2012: AlwaysOn HA and DR Design Patterns, and Lessons Learned from Early Customer Deployments Sanjay Mishra SQLCAT
2
Who and Why Who am I? Why I am speaking about this?
3
Setting the Stage AlwaysOn ≠ Availability Groups
AlwaysOn = { SQL Server Failover Cluster Instances, Availability Groups } Availability Groups ≠ Database Mirroring
4
Key Learnings from Early Customer Deployments
Windows Cluster (Windows Server Failover Cluster, WSFC) is the foundation for HA and DR in SQL Server 2012 AlwaysOn AlwaysOn inherits all “characteristics” of Windows Cluster Windows Cluster every single AlwaysOn deployment is a Windows Cluster deployment must understand Windows Cluster if you want to deploy, operate, monitor, troubleshoot, administer AlwaysOn key areas are: quorum model, cluster network communication, DR procedures, cluster.exe, PowerShell ≠ SQL Cluster (SQL Server Failover Cluster Instance) therefore, is NOT necessarily a shared-storage cluster many key enhancements have been made to Windows Cluster specifically for SQL Server 2012 AlwaysOn Asymmetric Disk Node Votes Asymmetric Disk as Quorum resource These are the things we had repeated remind the customers to stay on track for successful deployments.
5
Key Learnings from Early Customer Deployments
Organizational structure Typically, teams and skills are organized into separate groups – SQL Server DBA team and Windows Server Admin team AlwaysOn reaches out beyond the SQL Server DBA DBAs need to work closely with Windows / Network Administration teams Not just for initial deployment, but for troubleshooting and disaster recovery as well Historical experience need to unlearn and relearn a few things if you are already experienced with Windows Cluster, but new to AlwaysOn For example, if you haven’t read the Windows Cluster documentation in the last few months, it is worth a re-read now New/Different Tools for administration and troubleshooting Windows cluster log Failover Cluster Manager Powershell cluster.exe Knowledge of PowerShell and cluster.exe command lines will come very handy These are the things we had repeated remind the customers to stay on track for successful deployments.
6
SQL Server 2012 AlwaysOn HA+DR Design Patterns
SQL Server 2012 AlwaysOn HA+DR Solution Solution Characteristics Corresponding Pre-Denali Solution 1 Multi-site Failover Cluster Instance (FCI) for HA and DR Shared Storage solution * Instance Level HA (automatic) Instance Level DR (automatic *) Uses storage replication Doesn’t require database to be in FULL recovery model Multi-site FCI using stretch VLAN 2 Availability Group for HA and DR Non-Shared Storage solution (Group of) Database Level HA (automatic) (Group of) Database Level DR (manual) DR replica can be Active Secondary Requires database to be in FULL recovery model Database Mirroring for Local HA and Log Shipping for DR 3 Failover Cluster Instance for local HA + Availability Group for DR Combined Shared Storage and Non-Shared Storage Asymmetric storage is the key to this solution Failover Cluster Instance for Local HA and Database Mirroring for DR Basic knowledge of: AlwaysOn Failover Cluster Instances (FCI) AlwaysOn Availability Groups (AG) Definition: For the purpose of this presentation High Availability (Local HA): Availability within a data center Disaster Recovery (DR): Availability across data centers Key concept behind FCI+AG architecture Windows Server Failover Clustering capability introduced in: Windows Server 2008 R2 SP1 Windows Server 2008 with QFE Symmetric storage = a cluster disk that is shared between all the WSFC nodes Asymmetric storage = a cluster disk that is shared between a subset of nodes
7
SQL Server 2012: Multi-site FCI for HA and DR
Primary Site DR Site Node 1 Node 2 Node 3 Node 4 SQL-FCI Windows Server Failover Cluster Active Passive Passive Passive Typically used quorum model: Node+Fileshare Majority Storage Replication
8
SQL Server 2012: AG for HA and DR
Primary Data Center Synchronous / Asynchronous Disaster Recovery Data Center Primary Secondary Fileshare Witness Synchronous Windows Server Failover Cluster Availability Group Possible Quorum Models: Node Majority Node and Fileshare Majority (shown here)
9
SQL Server 2012: FCI for HA, AG for DR
Primary Site Primary DR Site Secondary Node 1 Node 2 Node 3 Node 4 SQL-FCI-1 SQL-FCI-2 Windows Server Failover Cluster All 4 quorum modes are possible. Possible Quorum Models: Node Majority Node and Fileshare Majority Node and (Asymmetric) Disk Majority (Asymmetric) Disk Only Asymmetric Disk as Quorum Resource Prior to Asymmetric Disk capability, for a disk to be a cluster resource (and a quorum resource) it was required to be visible from all the nodes. With Asymmetric Disk capability, a cluster disk can be visible to a subset of nodes. Asymmetric Disk can be used a quorum resource: Not through Failover Cluster Manager GUI, or PowerShell But through cluster.exe command line Asymmetric Disk as quorum resource enables quorum models: Node + Asymmetric Disk Majority Asymmetric Disk Only Availability Group
10
Lessons Learned from Customer Deployments
We will discuss two Tier-1 Mission-Critical deployments to relate lessons learned with specific scenarios ServiceU Microsoft IT SAP ERP Deployment
11
Customer Deployment Example: ServiceU Corporation
Discuss the flow of the presentation here (Challenges, Solutions – some easy, some complex), etc.
12
Database Mirroring Asynchronous
FCI + DBM (prior to SQL Server 2012) (FCIs for local high availability, DBM for Disaster Recovery) PRIMARY SECONDARY Database Mirroring Asynchronous Site 2 Site 1 We wanted last man standing – if there is a single node up, it should be serving requests with no user intervention. Two independent Windows clusters – one at each site Each site has a 3-node FCI Using Disk-Only Quorum
13
Availability Group Asynchronous
FCI + AG (SQL Server 2012) (FCIs for local high availability, AG for Disaster Recovery) PRIMARY SECONDARY Availability Group Asynchronous Site 2 Site 1 We wanted last man standing – if there is a single node up, it should be serving requests with no user intervention. There are two issues to get last man standing functionality: connectivity between sites and, even if we could solve that (and we can), then we still have node majority problems. With SQL Server 2012 and AGs, it is a single Windows cluster spanning the two sites, instead of a different Windows cluster at each site. NOTE: Asymmetric Storage requires Windows Server 2008 R2 + SP1 or QFE’s
14
Lesson 1: Choose Appropriate Quorum Model
For mission-critical infrastructure, ServiceU has extra passive nodes at each site (3 node FCIs with only 1 active SQL Server instance) so that in the event of a hardware failure, still have extra node for local HA ServiceU SLA demands that SQL Server be online, with no user intervention, as long as one node (out of 3) is up in the primary site => Last Man Standing Node Majority or Node+Fileshare Majority can’t provide Last Man Standing In FCI+AG configuration across two sites, the storage is asymmetric storage. Can asymmetric disk resource be used as quorum?
15
Availability Group Asynchronous Asymmetric Disk-Only Quorum
FCI + AG (SQL Server 2012) (FCIs for local high availability, AG for Disaster Recovery) PRIMARY SECONDARY Solution: Windows Server 2008 now supports Asymmetric Disk-Only Quorum Availability Group Asynchronous Site 1 Site 2 Asymmetric Disk-Only Quorum Must be configured with cluster.exe; not supported in GUI Requires testing and thorough knowledge of Windows clustering Allows “Last Man Standing”
16
Lesson 2: Understand Operational Procedures for DR
Scenario: Disaster = entire primary site is unavailable Considerations #1: Must force quorum on the DR site to get Windows cluster back online (any quorum model) Impact: change operational procedures /FQ switch seems to work faster than the GUI #2: Rethink quorum model after disaster (any quorum model) Re-establishing quorum is a Windows cluster admin job, not DBA job. Until this is done you cannot bring SQL online. Who needs to be involved? #3: Changing quorum model is a multi-step process for ServiceU Convert from disk-only to node majority Then, convert from node majority to disk-only (with a separate new disk resource at the DR site) Ensure the process is documented and understood by all players.
17
Lesson 3: Instance Name & Path
Plan the file paths and instance names before starting the first installation Default installation paths on shared disk: ..\MSSQL11.InstanceName\MSSQL\DATA ..\MSSQL11.InstanceName\MSSQL\Log Requirements for AG between two FCIs: DIFFERENT instance names between FCIs Recommendation for all AG deployments: SAME file path across all FCIs Solution: Be sure to use a path for Data and Logs that is always the same for all instances participating in an AG. File path recommendations is the same as in DBM, but this is more significant now because it is one Windows cluster, and the two FCI instance names can’t be the same like they could in DBM between two FCIs.
18
Lesson 4: Concurrent FCI setup on multiple nodes will set possible owners incorrectly
Initial setup Node1 and Node2 added during the initial install. Both are possible owners of the FCI What happened during a concurrent install on Node3 and Node4: Node3 setup started (possible owners=Node1 & Node2) Node4 setup started a few seconds later (possible owners=Node1 & Node2) Node3 setup finished: possible owners set to previous owners (Node1 & Node2) PLUS this node (Node3) Node4 setup finished: possible owners set to previous owners (Node1 & Node2) PLUS this node (Node4) End result: Possible owners=Node1, Node2, Node4 Correct result: Possible owners=Node1, Node2, Node3, Node4 Solution: DO NOT to use concurrent installation – it is unsupported and NOT recommended.
19
Lesson 5: After adding a node, possible owners = every node in the Windows Cluster
Example: After the DR nodes were added to the Windows Cluster, the DR nodes show up as possible owners of the primary site FCI. Solution: Manually remove invalid owners from every resource (every time you add a node). Not necessary on disks.
20
Lesson 6: Failover Cluster Manager shows as if a FCI can be moved to a node that is not a possible owner Issue #3: Move operations allow selection of nodes that are not possible owners
21
Customer Deployment Example: MSIT SAP ERP Deployment
22
DBM + LS (Prior to SQL Server 2012) Database Mirroring for local HA, Log Shipping for DR
Production Test Disaster Recovery Log Shipping SAP Volume Test and Integration System Image of production Synchronous DBM Witness
23
Deployment Architecture: SQL Server 2012 Availability Group for HA and DR
Production File share for Cluster Quorum Main Site Sync 1 1 1 SAP production CI cluster containing File Share quorum for DBMS cluster Async Production Availability Group on production DBMS cluster Log Shipping DR Site SAP test CI cluster containing File Share quorum for test DBMS cluster Test Sync SAP Volume Test and Integration System Image of production Test Availability Group on test DBMS cluster
24
Lesson 1: Working across organizational boundaries
Working with the Windows Team in IT Organization: Separate team from the DBA team DBA team underestimated the role of Windows Cluster Challenge: Concept of a Windows Cluster with “no” (?) shared storage
25
Lesson 2: Choose Appropriate Quorum Model
Choice of Quorum Model What do we chose: Node Majority vs Node+Fileshare Majority The Windows IT team didn’t know anything other than a shared storage quorum Picked Node+Fileshare Majority Node votes Don’t want issues at DR data center or the connectivity between the data centers affect HA in the primary data center Assigned zero vote to DR server
26
Lesson 3: Monitoring and Alerting AG
Alerting compared to DBM Challenge: No DBM Monitor Developed custom scripts for monitoring and alerting key measures Alternative: System Center Operations Manager
27
Lesson 4: DO NOT use Windows Failover Cluster Manager to perform AG failover
DO NOT change the preferred owners and possible owners settings for the AG. DO NOT change the preferred owners and possible owners settings for the AG listener. DO NOT move AG from one to another node using the FCM. Solution: Use SSMS or T-SQL
28
Questions ? sanjaymi@microsoft.com
29
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.