Symon Perriman Program Manager II Clustering & High-Availability Microsoft Corporation SESSION CODE: VIR303
Introduction Networking Storage Quorum
But what if there is a catastrophic event and you lose the entire datacenter? Site A
Site B Node is located at a physically separate site Site ASite B
Dependence on People
Introduction Networking Storage Quorum
Site A Site B
Site A Public Network Redundant Network Site B
DNS Server 1 DNS Server 2 DNS Replication Record Created VM = Record Updated VM = Site A Site B Record Updated Record Obtained
DNS Server 1 VM = Site ASite B
DNS Server 1 DNS Server 2 FS = Site ASite B VLAN
DNS Server 1 DNS Server 2 VM = Site ASite B
Site A Site B VLAN CSV Network
IP updated automatically DHCP Admin needs to configure new IP Can be scripted Static IP
Multi-SubnetVLAN Live Migration (seamless) Quick Migration Fast failover Cluster Shared Volumes Static IPs in guest Flexibility Complexity
Introduction Networking Storage Quorum
Site B Site ASite B
Site A Changes are made on Site A and replicated to Site B DR requires data replication mechanism between sites Site B Site ASite B Replica
Primary Storage Secondary Storage Write Complete Replication Acknowledgement Write Request
Primary Storage Secondary Storage Write Complete Write Request Replication
SynchronousAsynchronous No data lossPotential data loss on hard failures Requires high bandwidth/low latency connection Enough bandwidth to keep up with data replication Stretches over shorter distances Stretches over longer distances Write latencies impact application performance No significant impact on application performance
Disk5 Single Volume VHD Concurrent access to a single file system
Site B Site A VHD Read/OnlyRead/Write VM attempts to access replica
Site B Site A Virtualized storage presents logical LUN Servers abstracted from storage
Traditional Cluster Storage Cluster Shared Volumes Live Migration Hardware ReplicationConsult vendor Software Replication Appliance ReplicationConsult vendor
Concurrent Replication Cascaded Replication Heterogeneous Replication
CSV - Volume1 - OS VHDs CSV - Volume2 - OS VHDs CSV - Volume3 - OS VHDs CSV - Volume4 - OS VHDs NewYork-01 NewYork-02 NewYork-03 NewYork-04 NewJersey-01 NewJersey-02 NewJersey-03 NewJersey-04 VPLEX Cluster-1 VPLEX Cluster-2 CSV - Volume1 - SQL VHDs CSV - Volume2 - SQL VHDs CSV - Volume3 - SQL VHDs CSV - Volume4 - SQL VHDs
Introduction Networking Storage Quorum
Vote
Replicated Storage ? Vote
Site B Site A Cross site network connectivity broken! Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership 5 Node Cluster: Majority = 3 Majority in Primary Site
Disaster at Site 1 Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership 5 Node Cluster: Majority = 3 Need to force quorum manually Site A We are down! Site B Majority in Primary Site
Site ASite B Site C (branch office) Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Share WAN File Share Witness
\\Foo\Share WAN Complete resiliency and automatic recovery from the loss of connection between sites Can I communicate with majority of the nodes in the cluster? No (lock failed), drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No (lock failed), drop out of Cluster Membership Site BSite A Can I communicate with majority of the nodes (+FSW) in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes (+FSW) in the cluster? Yes, then Stay Up Site C (branch office)
Even number of nodes Highest availability solution has FSW in 3rd site Node and File Share Majority Odd number of nodes More nodes in primary site Node Majority Use as directed by vendor Node and Disk Majority Not Recommended Use as directed by vendor No Majority: Disk Only
Passion for High Availability? Become a Cluster MVP!
Breakout Sessions WSV313 | Failover Clustering Deployment Success WSV314 | Failover Clustering Pro Troubleshooting with Windows Server 2008 R2 VIR303 | Disaster Recovery by Stretching Hyper-V Clusters across Sites ARC308 | High Availability: A Contrarian View DAT207 | SQL Server High Availability: Overview, Considerations, and Solution Guidance DAT303 | Architecting and Using Microsoft SQL Server Availability Technologies in a Virtualized World DAT305 | See the Largest Mission Critical Deployment of Microsoft SQL Server around the World DAT401 | High Availability and Disaster Recovery: Best Practices for Customer Deployments DAT407 | Windows Server 2008 R2 and Microsoft SQL Server 2008: Failover Clustering Implementations UNC304 | Microsoft Exchange Server 2010: High Availability Deep Dive UNC305 | Microsoft Exchange Server 2010 High Availability Design Considerations Visit the Cluster Team in the TLC Failover Clustering Booth WSV-7
Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31 st You can also register at the North America 2011 kiosk located at registration Join us in Atlanta next year