Presentation is loading. Please wait.

Presentation is loading. Please wait.

Failover Clustering & Hyper-V: Multisite Disaster Recovery

Similar presentations


Presentation on theme: "Failover Clustering & Hyper-V: Multisite Disaster Recovery"— Presentation transcript:

1 Failover Clustering & Hyper-V: Multisite Disaster Recovery
Prakash Gopinadham Support Escalation Engineer Microsoft Corporation

2 Multi-Site Clustering Content
Design guide: Deployment guide/checklist: Customer case studies using multi-site clustering: Setting multi-site cluster up is easy. Design guide and deployment checklist available to help.

3 Multi-Site Clustering
Introduction Networking Storage Quorum

4 Defining High-Availability
High-Availability allows applications to maintain service availability by moving them between nodes in a cluster But what if there is a catastrophic event and you lose the entire datacenter? Site A

5 Defining Disaster Recovery
Disaster Recovery (DR) allows applications to maintain service availability by moving them to a cluster node in a different physical location Node is located at a physically separate site Site A Site B SAN Site A Site B

6 Benefits of a Multi-Site Cluster
Protects against loss of an entire location Power Outage, Fires, Hurricanes, Floods, Earthquakes, Terrorism Automates failover Reduced downtime Lower complexity disaster recovery plan What is the primary reason why DR solutions fail? Dependence on People

7 Multi-Site Clustering
Introduction Networking Storage Quorum

8 Stretching the Network
Longer distance traditionally means greater network latency Missed inter-node health checks can cause false failover Cluster heartbeating is fully configurable SameSubnetDelay (default = 1 second) Frequency heartbeats are sent SameSubnetThreshold (default = 5 heartbeats) Missed heartbeats before an interface is considered down CrossSubnetDelay (default = 1 second) Frequency heartbeats are sent to nodes on dissimilar subnets CrossSubnetThreshold (default = 5 heartbeats) Missed heartbeats before an interface is considered down to nodes on dissimilar subnets Command Line: Cluster.exe /prop PowerShell (R2): Get-Cluster | fl *

9 Security over the WAN Encrypt inter-node communication
Trade-off security versus performance 0 = clear text 1 = signed (default) 2 = encrypted Site A Site B

10 Network Considerations
Network Deployment Options: Stretch VLANs across sites Cluster nodes can reside in different subnets Public Network Site A Site B Redundant Network

11 DNS Considerations Nodes in dissimilar subnets
VM obtains new IP address Clients need that new IP Address from DNS to reconnect DNS Replication DNS Server 1 DNS Server 2 Record Created Record Obtained Record Updated Record Updated Site A Site B VM = VM =

12 Faster Failover for Multi-Subnet Clusters
RegisterAllProvidersIP (default = 0 for FALSE) Determines if all IP Addresses for a Network Name will be registered by DNS TRUE (1): IP Addresses can be online or offline and will still be registered Ensure application is set to try all IP Addresses, so clients can come online quicker HostRecordTTL (default = 1200 seconds) Controls time the DNS record lives on client for a cluster network name Shorter TTL: DNS records for clients updated sooner Exchange Server 2007 recommends a value of five minutes (300 seconds) Get-ClusterResource “Resource Name” | Get-ClusterParameter Get-ClusterResource | %{ $_.RegisterAllProvidersIP=1} Cluster.exe res “Resource Name” /priv Cluster.exe res “Resource Name” /priv RegisterAllProvidersIP=1

13 Solution #1: Local Failover First
Configure local failover fist for high availability No change in IP addresses No DNS replication issues No data going over the WAN Cross-site failover for disaster recovery DNS Server 1 Site A Site B

14 Solution #2: Stretch VLANs
Deploying a VLAN minimizes client reconnection times IP of the VM never changes DNS Server 2 DNS Server 1 VLAN FS = Site A Site B

15 Solution #3: Abstraction in Networking Device
Networking device uses independent 3rd IP Address 3rd IP Address is registered in DNS & used by client DNS Server 2 DNS Server 1 Site A Site B VM =

16 Multi-Site Clustering
Introduction Networking Storage Quorum

17 Storage in Multi-Site Clusters
Different than local clusters: Multiple storage arrays – independent per site Nodes commonly access own site storage No ‘true’ shared disk visible to all nodes Site A Site B Site B Site A Site B SAN

18 Storage Considerations
Site A Site B Site B Site A Site A Site B SAN Replica Changes are made on Site A and replicated to Site B DR requires data replication mechanism between sites

19 Replication Partners Hardware storage-based replication
Block-level replication Software host-based replication File-level replication Appliance replication

20 Synchronous Replication
Host receives “write complete” response from the storage after the data is successfully written on both storage devices Replication Write Request Write Complete Secondary Storage Site A Site B Primary Storage Acknowledgement

21 Asynchronous Replication
Host receives “write complete” response from the storage after the data is successfully written to just the primary storage device, then replication Replication Write Request Write Complete Site A Site B Primary Storage Secondary Storage

22 Synchronous versus Asynchronous
No data loss Potential data loss on hard failures Requires high bandwidth/low latency connection Enough bandwidth to keep up with data replication Stretches over shorter distances Stretches over longer distances Write latencies impact application performance No significant impact on application performance

23 Cluster Validation with Replicated Storage
Multi-Site clusters are not required to pass the Storage tests to be supported Validation Guide and Policy ?LinkID=119949

24 Challenges of Block Storage Replication
Storage block level replication typically Uni-Directional (per LUN) Change blocks flow from source to remote Possible to have different LUNs replicating in different directions Storage cannot enforce block level collision resolution Application must determine resolution, or be coordinated in some way Applications today implement shared nothing model Surfacing storage as R/W at multiple sites is only useful if applications can handle a distributed access device Few applications implement the necessary support Obvious exception is Cluster Shared Volumes for Hyper-V

25 Multi-Site Clustering
Introduction Networking Storage Quorum

26 Quorum Overview Majority is greater than 50% Possible Voters:
Nodes (1 each) + 1 Witness (Disk or File Share) 4 Quorum Types Disk only (not recommended) Node and Disk majority Node majority Node and File Share majority Vote Vote Vote Vote Vote

27 Replicated Disk Witness
A witness is a tie breaker when nodes lose network connectivity The witness disk must be a single decision maker, or problems can occur Do not use a Disk Witness in multi-site clusters unless directed by vendor Vote Vote Vote ? Replicated Storage

28 Node Majority 5 Node Cluster: Majority = 3
can I communicate with majority of the nodes in the cluster? 5 Node Cluster: Majority = 3 Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up No, drop out of Cluster Membership Cross site network connectivity broken! Site A Site B Majority in Primary Site

29 Node Majority 5 Node Cluster: Majority = 3 Site A Disaster at Site 1
We are down! 5 Node Cluster: Majority = 3 Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Site A Need to force quorum manually Site A Site B Disaster at Site 1 Majority in Primary Site

30 Forcing Quorum Forcing quorum is a way to manually override and start a node even if the cluster does not have quorum Important: understand why quorum was lost Cluster starts in a special “forced” state Once majority achieved, drops out of “forced” state Command Line: net start clussvc /fixquorum (or /fq) PowerShell (R2): Start-ClusterNode –FixQuorum (or –fq)

31 Multi-Site with File Share Witness
Site C (branch office) Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Share WAN Site A Site B

32 Multi-Site with File Share Witness
Site C (branch office) File Share Witness Can I communicate with majority of the nodes (+FSW) in the cluster? Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up \\Foo\Share WAN Complete resiliency and automatic recovery from the loss of connection between sites No (lock failed), drop out of Cluster Membership Site A Site B

33 File Share Witness (FSW) Considerations
Simple Windows File Server Single file server can serve as a witness for multiple clusters Each cluster requires it’s own share Can be made highly available on a separate cluster Recommended to be at 3rd separate site for DR FSW cannot be on a node in the same cluster FSW should not be in a VM running on the same cluster

34 Node and File Share Majority
Quorum Model Recap Even number of nodes Highest availability solution has FSW in 3rd site Node and File Share Majority Odd number of nodes More nodes in primary site Node Majority Use as directed by vendor Node and Disk Majority Not Recommended No Majority: Disk Only

35 Session Summary Multi-site Failover Clusters have many benefits
You can achieve high-availability and disaster recover in a single solution using Windows Server Failover Clustering Multi-site clusters have additional considerations: Determine network topology across sites Choose a storage replication solution Plan quorum model & nodes

36 Failover Clustering Resources
Design for a Clustered Service or Application in a Multi-Site Failover Cluster Checklist: Setting Up a Clustered Service or Application in a Multi-Site Failover Cluster Cluster Information Portal: Clustering Technical Resources: Clustering Forum (2008): R2 Cluster Features:

37 Software Application Developers Infrastructure Professionals
Resources Software Application Developers Infrastructure Professionals msdnindia @msdnindia technetindia @technetindia

38 © 2011 Microsoft Corporation. All rights reserved
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Failover Clustering & Hyper-V: Multisite Disaster Recovery"

Similar presentations


Ads by Google