Presentation is loading. Please wait.

Presentation is loading. Please wait.

Failover Clustering & Hyper-V: Multi-Site Disaster Recovery Symon Perriman Technical Evangelist Microsoft

Similar presentations


Presentation on theme: "Failover Clustering & Hyper-V: Multi-Site Disaster Recovery Symon Perriman Technical Evangelist Microsoft"— Presentation transcript:

1 Failover Clustering & Hyper-V: Multi-Site Disaster Recovery Symon Perriman Technical Evangelist Microsoft Twitter @SymonPerriman

2 Multi-Site Clustering IntroductionIntroduction NetworkingNetworking StorageStorage QuorumQuorum

3 Defining High-Availability But what if there is a catastrophic event? Fire, flood, earthquake… Site A High-Availability (HA) with Failover Clustering allows applications or VMs to maintain service availability by moving them between nodes in a cluster

4 Multi-Site Clusters for Disaster Recovery Extends a cluster from being a High-Availability solution, to also being a Disaster Recovery solution Site B VM’s are failed over to a separate physical location Node is located at a physically separate site Site ASite B

5 Benefits of a Multi-Site Cluster Protects against loss of an entire location Automates failover Reduced downtime Lower complexity disaster recovery plan Reduces administrative overhead Automatically synchronize application and cluster changes Easier to keep consistent than standalone servers Top 3 reasons disaster recovery plans fail 3. Failure detection failed – no failover 2. Poor testing – something did not work as expected 1. No automation – a dependence of people during a disaster

6 Multi-Site Clustering IntroductionIntroduction NetworkingNetworking StorageStorage QuorumQuorum

7 Network Considerations Network Deployment Options: 1.Stretch VLAN’s across sites 2.Cluster nodes can reside in different subnets Site A Public Network 10.10.10.1 20.20.20.1 30.30.30.1 40.40.40.1 Redundant Network Site B

8 Stretching the Network Longer distance traditionally means greater network latency Missed inner-node health checks can cause false failover Cluster inner-node heartbeating is fully configurable SameSubnetDelay (default = 1 second) Frequency heartbeats are sent SameSubnetThreshold (default = 5 heartbeats) Missed heartbeats before an interface is considered down CrossSubnetDelay (default = 1 second) Frequency heartbeats are sent to nodes on dissimilar subnets CrossSubnetThreshold (default = 5 heartbeats) Missed heartbeats before an interface is considered down to nodes on dissimilar subnets PowerShell (R2): Get-Cluster | fl * Command Line: Cluster.exe /prop

9 Security over the WAN Encrypt intra-node communication 0 = clear text 1 = signed (default) 2 = encrypted Site A 10.10.10.1 20.20.20.1 30.30.30.1 40.40.40.1 Site B

10 Updating VM’s IP on Subnet Failover On cross-subnet failover, if guest is… Best to use DHCP in guest OS for cross-subnet failover IP updated automatically DHCP Admin needs to configure new IP Can be scripted Static IP

11 Client Reconnect Considerations Nodes in dissimilar subnets VM obtains new IP address Clients need that new IP Address from DNS to reconnect 10.10.10.111 20.20.20.222 DNS Server 1 DNS Server 2 DNS Replication Record Created VM = 10.10.10.111 Record Updated VM = 20.20.20.222 Site A Site B Record Updated Record Obtained

12 Solution #1: Local Failover First Scale up for local failover for higher availability No change in IP addresses for HA Means not going over the WAN and is still usually preferred Cross-site failover for disaster recovery 10.10.10.111 VM = 10.10.10.111 Site A Site B 20.20.20.222

13 Solution #2: Stretch VLANs Deploying a VLAN minimizes client reconnection times IP of the VM never changes DNS Server 1 DNS Server 2 FS = 10.10.10.111 Site ASite B 10.10.10.111 VLAN

14 Solution #3: Network Device Abstraction Network device uses 3 rd IP 3 rd IP is the one registered in DNS & used by client 10.10.10.111 20.20.20.222 DNS Server 1 DNS Server 2 VM = 30.30.30.30 Site ASite B 30.30.30.30

15 Faster Failover for Multi-Subnet Clusters RegisterAllProvidersIP (default = 0 for FALSE) Determines if all IP Addresses for a Network Name will be registered by DNS TRUE (1): IP Addresses can be online or offline and will still be registered Ensure application is set to try all IP Addresses, so clients can come online quicker HostRecordTTL (default = 1200 seconds) Controls time the DNS record lives on client for a cluster network name Shorter TTL: DNS records for clients updated sooner Exchange Server 2007+ recommends a value of five minutes (300 seconds)

16 Live Migrating Across Sites Live migration moves a VM to another host TCP reconnects makes the move unnoticeable to clients Use VLAN’s to achieve live migrations between sites IP client is connected to will not change Plan appropriate bandwidth between sites Live migration may require significant network bandwidth based on amount of memory allocated to VM Migration times will naturally be longer with higher latency or lower bandwidth WAN connections

17 CSV Networking Considerations Cluster Shared Volumes does not support having nodes in dissimilar subnets Use VLAN’s if you want to use CSV with multi-site clusters Site A Site B VLAN CSV Network

18 Multi-Subnet vs. VLAN Recap Choosing the right network model for you depends on your business requirements

19 Multi-Site Clustering IntroductionIntroduction NetworkingNetworking StorageStorage QuorumQuorum

20 Storage in Multi-Site Clusters Different than local clusters: Multiple storage arrays – independent per site Nodes commonly access own site storage No ‘true’ shared disk visible to all nodes Site B Site ASite B

21 Storage Considerations Site A Changes are made on Site A and replicated to Site B Requires data replication mechanism between sites Site B Site ASite B Replica

22 Hardware Replication Partners Hardware storage-based replication

23 Software Replication Partners Double-Take Availability (Vision Solutions) SteelEye DataKeeper Cluster Edition (SIOS Technology Corp.) Symantec Storage Foundation for Windows Sanbolic Melio 2010 Software host-based replication

24 Appliance Replication Partners Appliance Replication

25 Synchronous Replication Host receives “write complete” response from the storage after the data is successfully written on both storage devices Primary Storage Secondary Storage Write Complete Replication Acknowledgement Write Request

26 Asynchronous Replication Host receives “write complete” response from the storage after the data is successfully written to only the primary storage device, then replicates later Primary Storage Secondary Storage Write Complete Write Request Replication

27 Synchronous vs. Asynchronous

28 Validation with Replicated Storage Multi-Site clusters are not required to pass the Storage tests to be supported Validation Guide and Policy http://go.microsoft.com/fwlink/ ?LinkID=119949 http://go.microsoft.com/fwlink/ ?LinkID=119949

29 What about DFS-Replication? Not supported to use the file server DFS-R feature to replicate VM data on a multi-site Failover Cluster DFS-R performs replication on file close Works well for Office documents like.docx,.pptx, and.xlsx Not designed for application workloads where the file is held open, like VHD

30 Site B Site A CSV with Replicated Storage Regular cluster disks – one node accesses the disk CSV disks - all nodes can access a disk Which CSV disk is accessed when it appears in multiple sites? Talk to your storage vendor for their support story VHD Read/OnlyRead/Write VM attempts to access replica

31 Site B Site A Storage Virtualization Abstraction Some replication solutions provide complete abstraction in storage array Servers are unaware of accessible disk location Fully compatible with Cluster Shared Volumes (CSV) Virtualized storage presents logical LUN Servers abstracted from storage

32 Choosing a Stretched Storage Model Choosing the right model for you depends on your business requirements Consult Vendor

33 Multi-Site Clustering IntroductionIntroduction NetworkingNetworking StorageStorage QuorumQuorum

34 Quorum Overview Disk only (not recommended) Node and Disk majority Node majority Node and File Share majority Vote Majority is greater than 50% Possible Voters: Nodes (1 each) + 1 Witness (Disk or File Share) 4 Quorum Types

35 Replicated Disk Witness The witness will decide which partition of nodes stays running when the nodes lose network connectivity Witness disk should be a single decision maker Do not use in multi-site clusters unless directed by vendor Replicated Storage ? Vote

36 Node Majority Site B Site A Cross site network connectivity broken! Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership 5 Node Cluster: Majority = 3 Majority in Primary Site

37 Node Majority Disaster at Site 1 Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Majority in Primary Site 5 Node Cluster: Majority = 3 Need to force quorum manually Site A We are down! Site B

38 Forcing Quorum Forcing quorum is a way to manually override and start a node even though it has not achieved quorum Always understand why quorum was lost Used to bring cluster online without quorum Cluster starts in a special “forced” state Once majority achieved, drops out of “forced” state PowerShell (R2): Start-ClusterNode –FixQuorum (or –fq) Command Line: net start clussvc /fixquorum (or /fq)

39 Multi-Site with File Share Witness Site ASite B Site C (branch office) Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Share WAN File Share Witness

40 Multi-Site with File Share Witness \\Foo\Share WAN Complete resiliency and automatic recovery from the loss of connection between sites! Can I communicate with majority of the nodes in the cluster? No lock on FSW, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No lock on FSW, drop out of Cluster Membership Site BSite A Can we communicate with majority of the voters in the cluster? Yes, including the lock with the FSW, so we stay up Can we communicate with majority of the voters in the cluster? Yes, including the lock with the FSW, so we stay up Site C (branch office)

41 File Share Witness (FSW) Considerations Simple Windows File Server Single file server can serve as a witness for multiple clusters Each cluster requires it’s own share FSW can be made highly available on a separate cluster Recommended to be at 3rd separate site to enable automatic site failover FSW cannot be on a node in the same cluster FSW should not be in a VM running on the same cluster

42 Recent Changes Asymmetrical Storage 2008 R2 Service Pack 1 Optimized to allow storage only visible to a subset of nodes Improves multi-site cluster experience Node Vote Weight Post-SP1 Hotfix Granular control of which nodes have votes in determining quorum Flexibility for multi-site clusters Primary Secondary

43 Quorum Model Recap Even number of nodes Best availability solution – FSW in 3rd site Node and File Share Majority Odd number of nodes More nodes in primary site Node Majority Use as directed by vendor Node and Disk Majority Not Recommended Use as directed by vendor No Majority: Disk Only

44 Session Summary Multi-site Failover Clusters have many benefits You can achieve HA and DR in a single solution Multi-site clusters have additional considerations 1.Determine network topology across sites 2.Choose a replication solution 3.Plan quorum model & nodes

45 Multi-Site Clustering Content Design guide: http://technet.microsoft.com/en-us/library/dd197430.aspx Deployment guide/checklist: http://technet.microsoft.com/en-us/library/dd197546.aspx

46 Additional Information Hyper-V Business Continuity portal http://www.microsoft.com/virtualization/en/us/solution- continuity.aspx http://www.microsoft.com/virtualization/en/us/solution- continuity.aspx Microsoft Cross-Site Disaster Recovery Solutions whitepaper http://download.microsoft.com/download/3/6/1/36117F2E- 499F-42D7-9ADD-A838E9E0C197/ SiteRecoveryWhitepaper_final_120309.pdf http://download.microsoft.com/download/3/6/1/36117F2E- 499F-42D7-9ADD-A838E9E0C197/ SiteRecoveryWhitepaper_final_120309.pdf

47 Passion for High-Availability? Become a Cluster MVP! Contact: ClusMVP@Microsoft.com

48 Speaker info: please do not delete the slides in this section Show these slides at the end of your session before going to Thank you page.

49 Stay up to date with TechNet Belux Register for our newsletters and stay up to date: http://www.technet-newsletters.be Technical updates Event announcements and registration Top downloads Join us on Facebook http://www.facebook.com/technetbe http://www.facebook.com/technetbelux LinkedIn: http://linkd.in/technetbelux/ Twitter: @technetbelux Download MSDN/TechNet Desktop Gadget http://bit.ly/msdntngadget

50 TechDays 2011 On-Demand Watch this session on-demand via TechNet Edge http://technet.microsoft.com/fr-be/edge/ http://technet.microsoft.com/nl-be/edge/ Download to your favorite MP3 or video player Get access to slides and recommended resources by the speakers

51 THANK YOU


Download ppt "Failover Clustering & Hyper-V: Multi-Site Disaster Recovery Symon Perriman Technical Evangelist Microsoft"

Similar presentations


Ads by Google