Download presentation
Presentation is loading. Please wait.
Published byAvis Thornton Modified over 9 years ago
1
Failover Clustering & Hyper-V: Multi-Site Disaster Recovery Symon Perriman Technical Evangelist Microsoft Twitter @SymonPerriman
2
Multi-Site Clustering IntroductionIntroduction NetworkingNetworking StorageStorage QuorumQuorum
3
Defining High-Availability But what if there is a catastrophic event? Fire, flood, earthquake… Site A High-Availability (HA) with Failover Clustering allows applications or VMs to maintain service availability by moving them between nodes in a cluster
4
Multi-Site Clusters for Disaster Recovery Extends a cluster from being a High-Availability solution, to also being a Disaster Recovery solution Site B VM’s are failed over to a separate physical location Node is located at a physically separate site Site ASite B
5
Benefits of a Multi-Site Cluster Protects against loss of an entire location Automates failover Reduced downtime Lower complexity disaster recovery plan Reduces administrative overhead Automatically synchronize application and cluster changes Easier to keep consistent than standalone servers Top 3 reasons disaster recovery plans fail 3. Failure detection failed – no failover 2. Poor testing – something did not work as expected 1. No automation – a dependence of people during a disaster
6
Multi-Site Clustering IntroductionIntroduction NetworkingNetworking StorageStorage QuorumQuorum
7
Network Considerations Network Deployment Options: 1.Stretch VLAN’s across sites 2.Cluster nodes can reside in different subnets Site A Public Network 10.10.10.1 20.20.20.1 30.30.30.1 40.40.40.1 Redundant Network Site B
8
Stretching the Network Longer distance traditionally means greater network latency Missed inner-node health checks can cause false failover Cluster inner-node heartbeating is fully configurable SameSubnetDelay (default = 1 second) Frequency heartbeats are sent SameSubnetThreshold (default = 5 heartbeats) Missed heartbeats before an interface is considered down CrossSubnetDelay (default = 1 second) Frequency heartbeats are sent to nodes on dissimilar subnets CrossSubnetThreshold (default = 5 heartbeats) Missed heartbeats before an interface is considered down to nodes on dissimilar subnets PowerShell (R2): Get-Cluster | fl * Command Line: Cluster.exe /prop
9
Security over the WAN Encrypt intra-node communication 0 = clear text 1 = signed (default) 2 = encrypted Site A 10.10.10.1 20.20.20.1 30.30.30.1 40.40.40.1 Site B
10
Updating VM’s IP on Subnet Failover On cross-subnet failover, if guest is… Best to use DHCP in guest OS for cross-subnet failover IP updated automatically DHCP Admin needs to configure new IP Can be scripted Static IP
11
Client Reconnect Considerations Nodes in dissimilar subnets VM obtains new IP address Clients need that new IP Address from DNS to reconnect 10.10.10.111 20.20.20.222 DNS Server 1 DNS Server 2 DNS Replication Record Created VM = 10.10.10.111 Record Updated VM = 20.20.20.222 Site A Site B Record Updated Record Obtained
12
Solution #1: Local Failover First Scale up for local failover for higher availability No change in IP addresses for HA Means not going over the WAN and is still usually preferred Cross-site failover for disaster recovery 10.10.10.111 VM = 10.10.10.111 Site A Site B 20.20.20.222
13
Solution #2: Stretch VLANs Deploying a VLAN minimizes client reconnection times IP of the VM never changes DNS Server 1 DNS Server 2 FS = 10.10.10.111 Site ASite B 10.10.10.111 VLAN
14
Solution #3: Network Device Abstraction Network device uses 3 rd IP 3 rd IP is the one registered in DNS & used by client 10.10.10.111 20.20.20.222 DNS Server 1 DNS Server 2 VM = 30.30.30.30 Site ASite B 30.30.30.30
15
Faster Failover for Multi-Subnet Clusters RegisterAllProvidersIP (default = 0 for FALSE) Determines if all IP Addresses for a Network Name will be registered by DNS TRUE (1): IP Addresses can be online or offline and will still be registered Ensure application is set to try all IP Addresses, so clients can come online quicker HostRecordTTL (default = 1200 seconds) Controls time the DNS record lives on client for a cluster network name Shorter TTL: DNS records for clients updated sooner Exchange Server 2007+ recommends a value of five minutes (300 seconds)
16
Live Migrating Across Sites Live migration moves a VM to another host TCP reconnects makes the move unnoticeable to clients Use VLAN’s to achieve live migrations between sites IP client is connected to will not change Plan appropriate bandwidth between sites Live migration may require significant network bandwidth based on amount of memory allocated to VM Migration times will naturally be longer with higher latency or lower bandwidth WAN connections
17
CSV Networking Considerations Cluster Shared Volumes does not support having nodes in dissimilar subnets Use VLAN’s if you want to use CSV with multi-site clusters Site A Site B VLAN CSV Network
18
Multi-Subnet vs. VLAN Recap Choosing the right network model for you depends on your business requirements
19
Multi-Site Clustering IntroductionIntroduction NetworkingNetworking StorageStorage QuorumQuorum
20
Storage in Multi-Site Clusters Different than local clusters: Multiple storage arrays – independent per site Nodes commonly access own site storage No ‘true’ shared disk visible to all nodes Site B Site ASite B
21
Storage Considerations Site A Changes are made on Site A and replicated to Site B Requires data replication mechanism between sites Site B Site ASite B Replica
22
Hardware Replication Partners Hardware storage-based replication
23
Software Replication Partners Double-Take Availability (Vision Solutions) SteelEye DataKeeper Cluster Edition (SIOS Technology Corp.) Symantec Storage Foundation for Windows Sanbolic Melio 2010 Software host-based replication
24
Appliance Replication Partners Appliance Replication
25
Synchronous Replication Host receives “write complete” response from the storage after the data is successfully written on both storage devices Primary Storage Secondary Storage Write Complete Replication Acknowledgement Write Request
26
Asynchronous Replication Host receives “write complete” response from the storage after the data is successfully written to only the primary storage device, then replicates later Primary Storage Secondary Storage Write Complete Write Request Replication
27
Synchronous vs. Asynchronous
28
Validation with Replicated Storage Multi-Site clusters are not required to pass the Storage tests to be supported Validation Guide and Policy http://go.microsoft.com/fwlink/ ?LinkID=119949 http://go.microsoft.com/fwlink/ ?LinkID=119949
29
What about DFS-Replication? Not supported to use the file server DFS-R feature to replicate VM data on a multi-site Failover Cluster DFS-R performs replication on file close Works well for Office documents like.docx,.pptx, and.xlsx Not designed for application workloads where the file is held open, like VHD
30
Site B Site A CSV with Replicated Storage Regular cluster disks – one node accesses the disk CSV disks - all nodes can access a disk Which CSV disk is accessed when it appears in multiple sites? Talk to your storage vendor for their support story VHD Read/OnlyRead/Write VM attempts to access replica
31
Site B Site A Storage Virtualization Abstraction Some replication solutions provide complete abstraction in storage array Servers are unaware of accessible disk location Fully compatible with Cluster Shared Volumes (CSV) Virtualized storage presents logical LUN Servers abstracted from storage
32
Choosing a Stretched Storage Model Choosing the right model for you depends on your business requirements Consult Vendor
33
Multi-Site Clustering IntroductionIntroduction NetworkingNetworking StorageStorage QuorumQuorum
34
Quorum Overview Disk only (not recommended) Node and Disk majority Node majority Node and File Share majority Vote Majority is greater than 50% Possible Voters: Nodes (1 each) + 1 Witness (Disk or File Share) 4 Quorum Types
35
Replicated Disk Witness The witness will decide which partition of nodes stays running when the nodes lose network connectivity Witness disk should be a single decision maker Do not use in multi-site clusters unless directed by vendor Replicated Storage ? Vote
36
Node Majority Site B Site A Cross site network connectivity broken! Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership 5 Node Cluster: Majority = 3 Majority in Primary Site
37
Node Majority Disaster at Site 1 Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Majority in Primary Site 5 Node Cluster: Majority = 3 Need to force quorum manually Site A We are down! Site B
38
Forcing Quorum Forcing quorum is a way to manually override and start a node even though it has not achieved quorum Always understand why quorum was lost Used to bring cluster online without quorum Cluster starts in a special “forced” state Once majority achieved, drops out of “forced” state PowerShell (R2): Start-ClusterNode –FixQuorum (or –fq) Command Line: net start clussvc /fixquorum (or /fq)
39
Multi-Site with File Share Witness Site ASite B Site C (branch office) Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Share WAN File Share Witness
40
Multi-Site with File Share Witness \\Foo\Share WAN Complete resiliency and automatic recovery from the loss of connection between sites! Can I communicate with majority of the nodes in the cluster? No lock on FSW, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No lock on FSW, drop out of Cluster Membership Site BSite A Can we communicate with majority of the voters in the cluster? Yes, including the lock with the FSW, so we stay up Can we communicate with majority of the voters in the cluster? Yes, including the lock with the FSW, so we stay up Site C (branch office)
41
File Share Witness (FSW) Considerations Simple Windows File Server Single file server can serve as a witness for multiple clusters Each cluster requires it’s own share FSW can be made highly available on a separate cluster Recommended to be at 3rd separate site to enable automatic site failover FSW cannot be on a node in the same cluster FSW should not be in a VM running on the same cluster
42
Recent Changes Asymmetrical Storage 2008 R2 Service Pack 1 Optimized to allow storage only visible to a subset of nodes Improves multi-site cluster experience Node Vote Weight Post-SP1 Hotfix Granular control of which nodes have votes in determining quorum Flexibility for multi-site clusters Primary Secondary
43
Quorum Model Recap Even number of nodes Best availability solution – FSW in 3rd site Node and File Share Majority Odd number of nodes More nodes in primary site Node Majority Use as directed by vendor Node and Disk Majority Not Recommended Use as directed by vendor No Majority: Disk Only
44
Session Summary Multi-site Failover Clusters have many benefits You can achieve HA and DR in a single solution Multi-site clusters have additional considerations 1.Determine network topology across sites 2.Choose a replication solution 3.Plan quorum model & nodes
45
Multi-Site Clustering Content Design guide: http://technet.microsoft.com/en-us/library/dd197430.aspx Deployment guide/checklist: http://technet.microsoft.com/en-us/library/dd197546.aspx
46
Additional Information Hyper-V Business Continuity portal http://www.microsoft.com/virtualization/en/us/solution- continuity.aspx http://www.microsoft.com/virtualization/en/us/solution- continuity.aspx Microsoft Cross-Site Disaster Recovery Solutions whitepaper http://download.microsoft.com/download/3/6/1/36117F2E- 499F-42D7-9ADD-A838E9E0C197/ SiteRecoveryWhitepaper_final_120309.pdf http://download.microsoft.com/download/3/6/1/36117F2E- 499F-42D7-9ADD-A838E9E0C197/ SiteRecoveryWhitepaper_final_120309.pdf
47
Passion for High-Availability? Become a Cluster MVP! Contact: ClusMVP@Microsoft.com
48
Speaker info: please do not delete the slides in this section Show these slides at the end of your session before going to Thank you page.
49
Stay up to date with TechNet Belux Register for our newsletters and stay up to date: http://www.technet-newsletters.be Technical updates Event announcements and registration Top downloads Join us on Facebook http://www.facebook.com/technetbe http://www.facebook.com/technetbelux LinkedIn: http://linkd.in/technetbelux/ Twitter: @technetbelux Download MSDN/TechNet Desktop Gadget http://bit.ly/msdntngadget
50
TechDays 2011 On-Demand Watch this session on-demand via TechNet Edge http://technet.microsoft.com/fr-be/edge/ http://technet.microsoft.com/nl-be/edge/ Download to your favorite MP3 or video player Get access to slides and recommended resources by the speakers
51
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.