VAROVANJE VIRTUALIZIRANEGA DATACENTRA – VISOKA RAZPOLOŽLJIVOST Gorazd Šemrov Microsoft Consulting Services

VAROVANJE VIRTUALIZIRANEGA DATACENTRA – VISOKA RAZPOLOŽLJIVOST Gorazd Šemrov Microsoft Consulting Services gorazd.semrov@microsoft.com

DATA PROTECTION PLANNING CONSIDERATIONS What needs protection? Local resources (physical & virtual) Remote sites What are your recovery goals? Prioritize by tier Organizational expectations Do you have a disaster recovery plan? Downtime RPO/RTO Testing How much bandwidth do you have to manage protection? Is your time better spent on other priorities? What are your budgetary realties?

HOST CLUSTERING Cluster service runs inside (physical) host and manages VMs VMs move between cluster nodes Live Migration – No downtime Quick Migration – Session state saved to disk CLUSTER SAN

CLUSTER GUEST CLUSTERING Cluster service runs inside a VM Apps and services inside the VM are managed by the cluster Apps move between clustered VMs iSCSI

GUEST VS. HOST: HEALTH DETECTION FaultHost ClusterGuest Cluster Host Hardware Failure Parent Partition Failure VM Failure Guest OS Failure Application Failure

HOST + GUEST CLUSTERING Optimal solution offer the most flexibility and protection VM high-availability & mobility between physical nodes Application & service high-availability & mobility between VMs Increases complexity CLUSTER GUEST CLUSTER SAN CLUSTER SAN iSCSI

SETTINGS: ANTIAFFINITYCLASSNAMES AntiAffinityClassNames Groups with same AACN try to avoid moving to same node http://msdn.microsoft.com/en- us/library/aa369651(VS.85).aspxhttp://msdn.microsoft.com/en- us/library/aa369651(VS.85).aspx Enables VM distribution across host nodes Better utilization of host OS resources Failover behavior on large clusters: KB 299631KB 299631

SETTINGS: AUTO-START Mark groups as lower priority Enables the most important VM to start first Group property Enabled by default Disabled VMs needs manual restart to recover after a crash

SETTINGS: PERSISTENT MODE HA Service or Application will return to original owner Better VM distribution after cold start Enabled by default for VM groups Disabled by default for other groups

MULTI-SITE CLUSTERING CONSIDERATIONS Network Compute Quorum Storage

MULTI-SITE CLUSTERS FOR DISASTER RECOVERY What are Multi-Site Clusters? A single cluster solution extended over metropolitan wide distances to protect against datacenter failures Site A Site B Nodes are located at a physically separate site

STORAGE DEPLOYMENT OPTIONS Cluster Traditional Storage Shared-Nothing Storage Model Unit of Failover at LUN/Disk level Ideal for Hyper-V Quick Migration scenarios Disks

STORAGE DEPLOYMENT OPTIONS Disk Multiple nodes to concurrently access Unit of Failover is at VM level Ideal for Hyper-V Quick and Live Migration VENDOR support Cluster Shared Volumes (CSV)

REPLICATION METHOD: SYNCHRONOUS Host receives “write complete” response from the storage after the data is successfully written on both storage devices Primary Storage Secondary Storage 4. Write Complete 2. Replication 3. Acknowledgement 1. Write Request

REPLICATION METHOD: ASYNCHRONOUS Host receives “write complete” response from the storage after the data is successfully written to just the primary storage device, then replication Primary Storage Secondary Storage 2. Write Complete 1. Write Request 3. Replication

COMPARING DATA REPLICATION METHODS SynchronousAsynchronous Recovery Point Objectives (RPO) High Business Impact, critical application (RPO = 0) Medium-Low Business Impact, critical applications ( RPO > 0 ) Application I/O Performance For applications not sensitive to high IO latency Applications sensitive to high IO latency Distance between sites50 km to 300 km >200 km Bandwidth costHighMid-Low

CLUSTER VALIDATION WITH REPLICATED STORAGE Multi-Site clusters are not required to pass the Storage tests to be supported Validation Guide and Policy http://go.microsoft.com/fwlink/?LinkID=119949

ASYMMETRICAL STORAGE SUPPORT IN SP1 Improves multi-site cluster experience Storage only visible to a subset of nodes Storage Topology used for Smart placement Workload placement based on it’s underlying storage connectivity Disk set #1 Disk set #2 N1 N2 N3 N4 Disk #1 is visible on N1&N2 and Disk #2 on N3 & N4 SQL and non-SQL workloads separated

CHOOSING A STRETCHED STORAGE MODEL Traditional Cluster Storage Cluster Shared Volumes Live Migration Hardware ReplicationConsult vendor Software Replication Appliance ReplicationConsult vendor

NETWORK DEPLOYMENT OPTIONS Stretched VLANs Site B Public Network 10.10.10.* 20.20.20.* Redundant Network Site A

NETWORK DEPLOYMENT OPTIONS Site B Different Subnets Public Network 10.10.10.* 20.20.20.* 30.30.30.* 40.40.40.* Redundant Network Site A

CHALLENGES WITH STRETCHED NETWORKS

STRETCHING THE NETWORK Clustering has no distance limitations (although 3 rd party plug-ins may) Longer distance traditionally means greater network latency Missed inter-node health checks can cause false failover Cluster heartbeating is fully configurable SameSubnetDelay (default = 1 second) Frequency heartbeats are sent SameSubnetThreshold (default = 5 heartbeats) Missed heartbeats before an interface is considered down CrossSubnetDelay (default = 1 second) Frequency heartbeats are sent to nodes on dissimilar subnets CrossSubnetThreshold (default = 5 heartbeats) Missed heartbeats before an interface is considered down to nodes on dissimilar subnets Command Line: Cluster.exe /prop PowerShell (R2): Get-Cluster | fl *

SECURITY OVER THE WAN Encrypt inter-node communication Trade-off security versus performance SecurityLevel (default = signed communication) 0 = clear text 1 = signed (default) 2 = encrypted Site A 10.10.10.1 20.20.20.1 30.30.30.1 40.40.40.1 Site B

UPDATING VM’S IP ADDRESSES ON FAILOVER Not needed if on same subnet On cross-subnet failover, if guest is… If using multiple subnets, it is easier to use DHCP in guest OS IP updated automatically DHCP A new IP Address must be configured after failover This can be scripted Static IP

DNS CONSIDERATIONS Nodes in dissimilar subnets VM obtains new IP address Clients need that new IP Address from DNS to reconnect 10.10.10.111 20.20.20.222 DNS Server 1 DNS Server 2 DNS Replication Record Created VM = 10.10.10.111 Record Updated VM = 20.20.20.222 Site A Site B Record Updated Record Obtained

SOLUTION #1: LOCAL FAILOVER FIRST Configure local failover first for high availability No change in IP addresses No DNS replication issues No data going over the WAN Cross-site failover for disaster recovery 10.10.10.111 DNS Server 1 VM = 10.10.10.111 Site ASite B 20.20.20.222

SOLUTION #2: STRETCH VLANS Deploying a VLAN minimizes client reconnection times IP Address of the VM never changes DNS Server 1DNS Server 2 FS = 10.10.10.111 Site ASite B 10.10.10.111 VLAN

SOLUTION #3: NETWORKING DEVICE ABSTRACTION Networking device uses independent 3 rd IP Address 3 rd IP Address is registered in DNS & used by client 10.10.10.111 20.20.20.222 DNS Server 1 DNS Server 2 VM = 30.30.30.30 Site ASite B 30.30.30.30

CSV NETWORKING CONSIDERATIONS Cluster Shared Volumes requires nodes to be in the same subnet Use a VLAN on your CSV network Other networks can still support multiple subnets Site A Site B VLAN CSV Network

LIVE MIGRATING ACROSS SITES Live migration moves a running VM between cluster nodes TCP reconnects makes the move unnoticeable to clients Use VLANs to achieve live migrations between sites The VM’s IP Address connecting the client to VM will not change Network Bandwidth Planning Live migration may require significant network bandwidth based on the amount of memory allocated to VM Live migration times will be longer with high latency or low bandwidth WAN connections Remember that CSV and live migration are independent, but complimentary, technologies

MULTI-SUBNET VS. VLAN RECAP

QUORUM DEPLOYMENT OPTIONS 1. Disk only 2. Node Majority 3. Node & Disk Majority 4. File Share Witness

REPLICATED DISK WITNESS A witness is a tie breaker when nodes lose network connectivity When a witness is not a single decision maker, problems occur Do not use in multi-site clusters unless directed by vendor Replicated Storage ? Vote

NODE MAJORITY Site B Site A Cross site network connectivity broken! Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership 5 Node Cluster: Majority = 3 Majority in Primary Site

NODE MAJORITY Disaster at Site 1 Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Majority in Primary Site 5 Node Cluster: Majority = 3 Need to force quorum manually Site A We are down! Site B

FORCING QUORUM Forcing quorum is a way to manually override and start a node even though it has not achieved quorum Always understand why quorum was lost Used to bring cluster online without quorum Cluster starts in a special “forced” state Once majority achieved, drops out of “forced” state Command Line: net start clussvc /fixquorum (or /fq) PowerShell (R2): Start-ClusterNode –FixQuorum (or –fq) Command Line: net start clussvc /fixquorum (or /fq) PowerShell (R2): Start-ClusterNode –FixQuorum (or –fq)

MULTI-SITE WITH FILE SHARE WITNESS Site ASite B Site C (branch office) Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Share WAN File Share Witness

CHANGES IN SERVICE PACK 1 Node Vote Weight Granular control of which nodes have votes in determining quorum Flexibility for multi-site clusters Prefer primary Site during network split Complete failure of Backup Site will not bring down the cluster Primary Site Backup Site Cluster.exe Cluster.exe. node /prop NodeWeight=0 PowerShell (R2): (Get-ClusterNode “NodeName”).NodeWeight = 0 Cluster.exe Cluster.exe. node /prop NodeWeight=0 PowerShell (R2): (Get-ClusterNode “NodeName”).NodeWeight = 0

CHANGES IN SERVICE PACK 1 Prevent Quorum Admin Started Backup Site with /ForceQuorum option When Primary is restarted N1 & N2 overwrites the Authoritative Cluster configuration Changes Made by B3 & B4 overwritten When Primary is started with /Prevent Quorum – recomm. Quorum override avoided Changes Made by B3 &` B4 are maintained N1 & N2 gracefully joins the existing membership

QUORUM MODEL RECAP Even number of nodes Best availability solution – FSW in 3rd site Node and File Share Majority Odd number of nodes More nodes in primary site Node Majority Use as directed by vendor Node and Disk Majority Not Recommended Use as directed by vendor No Majority: Disk Only

MULTI-SITE CLUSTERING CONTENT Design guide: http://technet.microsoft.com/en-us/library/dd197430.aspx Deployment guide/checklist: http://technet.microsoft.com/en-us/library/dd197546.aspx

VPRAŠANJA? Po zaključku predavanja prosim izpolnite vprašalnik. Vprašalniki bodo poslani na vaš e-naslov, dostopni pa bodo tudi preko profila na spletnem portalu konference. www.ntk.si.www.ntk.si Z izpolnjevanjem le tega pripomorete k izboljšanju konference. Hvala!

VAROVANJE VIRTUALIZIRANEGA DATACENTRA – VISOKA RAZPOLOŽLJIVOST Gorazd Šemrov Microsoft Consulting Services

Similar presentations

Presentation on theme: "VAROVANJE VIRTUALIZIRANEGA DATACENTRA – VISOKA RAZPOLOŽLJIVOST Gorazd Šemrov Microsoft Consulting Services"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VAROVANJE VIRTUALIZIRANEGA DATACENTRA – VISOKA RAZPOLOŽLJIVOST Gorazd Šemrov Microsoft Consulting Services

Similar presentations

Presentation on theme: "VAROVANJE VIRTUALIZIRANEGA DATACENTRA – VISOKA RAZPOLOŽLJIVOST Gorazd Šemrov Microsoft Consulting Services"— Presentation transcript:

Similar presentations

About project

Feedback