Elden Christensen Senior Program Manager Lead Microsoft Session Code: SVR319
Session Objectives And Takeaways Session Objective(s): Understanding the need and benefit of multi-site clusters What to consider as you plan, design, and deploy your first multi-site cluster Windows Server Failover Clustering is a great solution for not only high availability, but also disaster recovery
Multi-Site Clustering Introduction Networking Storage Quorum Workloads
Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location Is my Cluster Resilient to Site Failures?
Site B Site A Applications are failed over to a separate physical location Node is moved to a physically separate site Multi-Site Clusters for DR Extends a cluster from being a High Availability solution, to also being a Disaster Recovery solution
Benefits of a Multi-Site Cluster Protects against loss of an entire datacenter Automates failover Reduced downtime Lower complexity disaster recovery plan Reduces administrative overhead Automatically synchronize application and cluster changes Easier to keep consistent than standalone servers The primary reason DR solutions fail is dependence on people The primary reason DR solutions fail is dependence on people
Multi-Site Clustering Introduction Networking Storage Quorum Workloads
Network Considerations Network Options: 1. Stretch VLAN’s across sites 2. Cluster nodes can reside in different subnets Site A Public Network Site B Separate Network
Stretching the Network Longer distance traditionally means greater network latency Too many missed health checks can cause false failover Heartbeating is fully configurable SameSubnetDelay (default = 1 second) Frequency heartbeats are sent SameSubnetThreshold (default = 5 heartbeats) Missed heartbeats before an interface is considered down CrossSubnetDelay (default = 1 second) Frequency heartbeats are sent to nodes on dissimilar subnets CrossSubnetThreshold (default = 5 heartbeats) Missed heartbeats before an interface is considered down to nodes on dissimilar subnets Command Line: Cluster.exe /prop PowerShell (R2): Get-Cluster | fl *
Security over the WAN Encrypt intra-node traffic 0 = clear text 1 = signed (default) 2 = encrypted Site A Site B
Enhanced Dependencies – OR Network Name resource stays up if either IP Address Resource A OR IP Address Resource B is up OR Network Name resource IP Address Resource A IP Address Resource A IP Address Resource B IP Address Resource B
Client Reconnect Considerations Nodes in dissimilar subnets Failover changes resource’s IP Address Clients need that new IP Address from DNS to reconnect DNS Server 1 DNS Server 2 DNS Replication Record Updated Record Created Record Obtained FS = Record Updated FS = Site A Site B
Solution #1: Configure NN Setting RegisterAllProvidersIP (default = 0 for FALSE) Determines if all IP Addresses for a Network Name will be registered by DNS TRUE (1): IP Addresses can be online or offline and will still be registered Ensure application is set to try all IP Addresses, so clients can connect quicker HostRecordTTL (default = 1200 seconds) Controls time the DNS record lives on client for a cluster network name Shorter TTL: DNS records for clients updated sooner
Solution #2: Prefer Local Failover Local failover for higher availability No change in IP Address Cross-site failover for disaster recovery DNS Server 1 DNS Server 2 FS = Site A Site B
Solution #3: Stretch VLAN’s Deploying a VLAN minimizes client reconnection times DNS Server 1 DNS Server 2 FS = Site A Site B VLAN
Solution #4: Abstraction in Device Network device uses 3 rd IP 3 rd IP is the one registered in DNS & used by client Example: r/App_Networking/extmsftw2k8vistacisco.pdfhttp:// r/App_Networking/extmsftw2k8vistacisco.pdf DNS Server 1 DNS Server 2 FS = Site A Site B
This is generic guidance… If you have other creative ideas, that’s ok!
Multi-Site Clustering Introduction Networking Storage Quorum Workloads
Storage in Multi-Site Clusters Different than local clusters: Multiple storage arrays – independent per site Nodes commonly access own site storage No “true” shared disk visible to all nodes Site A Site B
Site A Changes are made on Site A and replicated to Site B Site B Replica Storage Considerations Need a data replication mechanism between sites
Replication Options Replication levels: Hardware storage-based replication Software host-based replication Application-based replication
Synchronous Replication Host receives “write complete” response from the storage after the data is successfully written on both storage devices Primary Storage Secondary Storage Write Complete Replication Acknowledgement Write Request
Asynchronous Replication Host receives “write complete” response from the storage after the data is successfully written to the primary storage device Primary Storage Secondary Storage Write Complete Replication Write Request
Synchronous vs. Asynchronous SynchronousAsynchronous No data lossPotential data loss on hard failures Requires high bandwidth/low latency connection Enough bandwidth to keep up with data replication Stretches over shorter distances Stretches over longer distances Write latencies impact application performance No significant impact on application performance
Ensures node is communicating with local storage and array state Disk Resource Resource Group Custom Resource IP Address Resources* Network Name Resource Establishes start order timing Group determines smallest unit of failover Storage Resource Dependencies Ensures node is communicating with local storage and array state Ensures application comes online after replication is complete Workload Resource (example File Server)
Cluster Validation and Replication Multi-Site clusters are not required to pass the Storage tests to be supported Validation Guide and Policy link/?LinkID=119949
HP’s Multi-Site Implementation & Demo Matthias Popp Architect HP
HP's Multi-Site Implementation: CLX for Windows Virtual Machine VM Config FilePhysical Disk HP CLX All Physical Disk resources of one Resource Group (VM) depend on a CLX resource Very smooth integration
HP Cluster Extension – What’s new? Support for Hyper-V Live Migration across disk arrays Support for Windows 2008 R2 Support for Windows Hyper-V Server 2008 R2 TT337AAE – HP StorageWorks Cluster Extension EVA for Window e-LTU There is no change to current CLX product pricing XP Cluster Extension does not yet support Live Migration - planed for 2010
Live Migration with Storage Failover Initiate Live Migration storage based remote replication Host 1 Host 2 HP EVA Storage Create VM on target node Copy memory pages from source server to target server via Ethernet Check disk array for replication link and disk pair states Initiate Live Migration Create VM on target node Copy memory pages from source server to target server via Ethernet Check disk array for replication link and disk pair states Final state transfer Pause virtual machine Move storage connectivity from source server to target server Change storage replication direction Initiate Live Migration Create VM on target node Copy memory pages from source server to target server via Ethernet Check disk array for replication link and disk pair states Final state transfer Pause virtual machine Move storage connectivity from source server to target server Change storage replication direction Run new VM on target server; Delete VM on source server
HP Storage for Virtualization Hyper-V Live Migration between Replicated Disk Arrays End-user transparent app migration across data centers; across servers and storage Zero Downtime Array Load Balancing (IOPS, cache utilization, response times, power consumption, etc.) Zero Downtime Maintenance Firmware/HBA/Server updates without user interruption Plan maintenance without the need to check for downtimes Follow the sun/moon data center access model Move the app/VM closest to the users or closest to the cheapest power source Failover, failback, Quick and Live Migration using the same management software No need to learn x different tools and their limitations
EVA CLX with Exchange 2010 Live Migration
Hyper-V Geo Cluster with Exchange
Automatically re-direct storage replication during Live Migration Hyper-V Geo Cluster with Exchange
37 Additional HP Resources HP website for Hyper-V HP and Microsoft Frontline Partnership website HP website for Windows Server 2008 R2 HP website for management tools HP OS Support Matrix Information on HP ProLiant Network Adapter Teaming for Hyper-V pdf Technical overview on HP ProLiant Network Adapter Teaming pdf?jumpid=reg_R1002_USEN Whitepaper: Disaster Tolerant Virtualization Architecture with HP StorageWorks Cluster Extension and Microsoft Hyper-V™ ENW.pdf
Multi-Site Clustering Introduction Networking Storage Quorum Workloads
Quorum Overview Disk only (not recommended) Node and Disk majority Node majority Node and File Share majority Vote Majority is greater than 50% Possible Voters: Nodes (1 each) + 1 Witness (Disk or File Share) 4 Quorum Types
Replicated Disk Witness A witness is a decision maker when nodes lose network connectivity When a witness is not a single decision maker, problems occur Do not use in multi-site clusters unless directed by vendor Replicated Storage from vendor ? Vote
Site B Site A Cross site network connectivity broken! Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership 5 Node Cluster: Majority = 3 Majority in Primary Site Node Majority
Site BSite A Disaster at Site 1 We are down! Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Majority in Primary Site 5 Node Cluster: Majority = 3 Need to force quorum manually
Forcing Quorum Always understand why quorum was lost Used to bring cluster online without quorum Cluster starts in a special “forced” state Once majority achieved, no more “forced” state Command Line: net start clussvc /fixquorum (or /fq) PowerShell (R2): Start-ClusterNode –FixQuorum (or –fq)
Site A Site B Site C Complete resiliency and automatic recovery from the loss of any 1 site Replicated Storage \\Foo\Cluster1 WAN Multi-Site With File Share Witness File Share Witness
WAN Site A Site B Site C Complete resiliency and automatic recovery from the loss of connection between sites Replicated Storage Multi-Site With File Share Witness Can I communicate with majority of the nodes (+FSW) in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes (+FSW) in the cluster? Yes, then Stay Up File Share Witness Can I communicate with majority of the nodes in the cluster? No (lock failed), drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No (lock failed), drop out of Cluster Membership \\Foo\Cluster1
FSW Considerations Simple Windows File Server Single file server can serve as a witness for multiple clusters Each cluster requires it’s own share Can be clustered in a second cluster Recommended to be at 3 rd separate site so that there is no single point of failure FSW cannot be on a node in the same cluster
Quorum Model Summary No Majority: Disk Only Not Recommended Use as directed by vendor Node and Disk Majority Use as directed by vendor Node Majority Odd number of nodes More nodes in primary site Node and File Share Majority Even number of nodes Best availability solution – FSW in 3rd site
Multi-Site Clustering Introduction Networking Storage Quorum Workloads
Hyper-V in a Multi-Site Cluster AreaConsiderations Network-On cross-subnet failover, if guest is … -DHCP, then IP updated automatically -Statically configured IP, then admin needs to configure new IP -Use VLAN preferred with live migration between sites Storage-3 rd party replication solution required -Configuration with CSV (explained next) Quorum-No special considerations Links:
CSV in a Multi-Site Cluster Architectural assumptions collide… Replication solutions assume only 1 array accessed at a time CSV assumes all nodes can concurrently access the LUN CSV is not required for Live Migration Talk to your storage vendor for their support story CSV requires VLAN’s VHD Nodes in Primary Site Nodes in Disaster Recovery Site Read/OnlyRead/Write Replication VM attempts to access replica
SQL in a Multi-Site Cluster AreaConsiderations Network-SQL does not support OR dependency -Need to stretch VLAN between sites Storage-No special considerations -3 rd party replication solution required Quorum-No special considerations Links:
Exchange in a Multi-Site Cluster AreaConsiderations Network-No VLAN needed -Change HostRecordTTL from 20 minutes to 5 minutes -CCR supports 2 nodes, one per site Storage-Exchange CCR provides application-based replication Quorum-File share witness on the Hub Transport server on primary site Links:
Session Summary Multi-Site Failover Clustering has many benefits Redundancy is needed everywhere Understand your replication needs Compare VLANs with multiple subnets Plan quorum model & nodes before deployment Follow the checklist and best practices
Sessions On-Demand & Community Resources for IT Professionals Resources for Developers Microsoft Certification & Training Resources Resources
Related Content Breakout Sessions SVR208 Gaining Higher Availability with Windows Server 2008 R2 Failover Clustering SVR319 Multi-Site Clustering with Windows Server 2008 R2 DAT312 All You Needed to Know about Microsoft SQL Server 2008 Failover Clustering UNC307 Microsoft Exchange Server 2010 High Availability SVR211 The Challenges of Building and Managing a Scalable and Highly Available Windows Server 2008 R2 Virtualisation Solution SVR314 From Zero to Live Migration. How to Set Up a Live Migration Demo Sessions SVR01-DEMO Free Live Migration and High Availability with Microsoft Hyper-V Server 2008 R2 Hands-on Labs UNC12-HOL Microsoft Exchange Server 2010 High Availability and Storage Scenarios
Multi-Site Clustering Content Design guide: Deployment guide/checklist:
Complete an evaluation on CommNet and enter to win an Xbox 360 Elite!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. Required Slide