6/23/2018 10:29 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Availability Strategies for a Resilient Private Cloud 6/23/2018 10:29 PM Availability Strategies for a Resilient Private Cloud Elden Christensen Principal Program Manager Lead Microsoft WS-B302 © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Sources of Downtime Data Corruption Component Failure Application Failure Human Error Maintenance Site Outage
Agenda Understanding the various technologies in Windows Server 2012 that can reduce downtime for a private cloud deployment Planned Downtime Unplanned Downtime Disaster Recovery
Hyper-V Availability Suite Planned Downtime Live Migration Moves a running VM from one host to another with zero perceived downtime Storage Migration Moves a VHD from one storage location to another with zero perceived downtime Unplanned Downtime Failover Clustering Monitors health of servers & VMs, then starts and recovers on the same server or another one in the event of a failure Disaster Recovery Hyper-V Replica Replicates VMs another server in another location when a site is lost Multi-Site Clusters Stretching a cluster across sites with hardware or software replication
Core OS Resiliency Features Data Corruption Storage Spaces Software fault tolerance which provides resiliency to disk failures Chkdsk Repairs data corruption when it occurs Vastly improved performance in Windows Server 2012 Component Failure NIC Teaming Resiliency to an network card failure Storage Multi-Path IO (MPIO) Resiliency to an HBA failure Application Failure Guest Clustering Application health monitoring and mobility VM Monitoring Application health monitoring for non-cluster aware applications
Planned Downtime
Live Migration Moves a running VM between hosts with no user-perceived downtime Client is not aware the VM moved to another server Maintains open TCP connections to the guest OS Clients stay connected Enables draining a server with zero downtime for planned maintenance Note: PING is a poor tool to evaluate a live migration ICMP works at the IP layer and TCP is what makes a live migration seamless
Complete VM Mobility Across the Datacenter Live Migrate VM and Storage Between Clusters Live Migrate VM and Storage to Clusters Live Migrate VM and Storage to Stand-Alone Server Cluster Cluster You can move a VM anywhere in your datacenter with zero downtime!
Live Migration - Initiate Migration Client accessing VM Live Migrate this VM to another physical machine VHDX IT Admin initiates a Live Migration to move a VM from one host to another
Live Migration - Memory Setup Copy Memory content is copied to new server VM pre-staged VHDX The first initial copy is of all in memory content
Live Migration - Brownout: Copy Dirty Pages Client continues accessing VM Pages are being dirtied VHDX Client continues to access VM, which results in memory being modified
Live Migration - Brownout: Incremental Copy Recopy of changes Smaller set of changes VHDX Hyper-V tracks changed data, and re-copies over incremental changes Subsequent passes get faster as data set is smaller
Live Migration - Blackout Partition State copied VM Paused VHDX Window is very small and within TCP connection timeout
Live Migration - Post-Transition: Cleanup Client directed to new host Old VM deleted once migration is verified successful ARP issued to have routing devices update their tables Since session state is maintained, no reconnections necessary VHDX
Simultaneous Live Migrations Windows Server 2012 now supports the ability to do multiple live migrations in parallel Unlimited number of live migrations can be performed in parallel Default configuration of 2 simultaneous LM’s per host Wield this power wisely Excessive number of simultaneous migrations may actually result in overall longer times than serially Lets discuss on the next slide…
Parallel Live Migrations Disclaimer Results will vary based on hardware Parallel Live Migrations Migrate VM Network Minutes Key Takeaway Excessively increasing live migration threshold may do more harm than good Recommended 4 with modern hardware Number of simultaneous live migrations
Dynamic Optimization Feature in SCVMM 2012 Options Rebalances VMs across hosts Live migration Keeps cluster balanced Avoids VM downtime Supports heterogeneous clusters Managed resources Considers CPU, memory, disk IO, network IO Optimize when above resource threshold Considers entire cluster Options Manual or automatic User controlled frequency Configurable aggressiveness
Power Optimization Feature in SCVMM 2012 Rebalance the workload and turn off machines when using Dynamic Optimization Conserve energy in the data center Keeps the cluster balanced, and avoids VM downtime or latency through lack of resources Uses out-of-band power management User defined schedule
Zero Downtime Automated Patching Initiate Cluster-Aware Updating Cluster-Aware Updating Streamlines ‘Patch Tuesday’ Zero downtime patching! Coordinator updates nodes in the cluster Coordinates with Windows Update Agent (WUA) Updates in a rolling fashion, 1 node at a time Serially steps through all nodes Coordinator can be made clustered, for Self-Updating mode Workflow Scan nodes to identify appropriate updates needed Identify node with fewest workloads Nodes drained Call to WUA to patch (which leverages WSUS or Windows Update) Verify successful Repeat Steps 2 – 5 on next node Repeat on remaining nodes Admin Update Coordinator Windows Update
Storage Migration Move a VHD or VHDX from one host to another with zero downtime Storage Migration between hosts without shared storage is done over SMB protocol Storage Migration accelerated by arrays that support Offloaded Data Transfer (ODX) Enables draining a storage array for planned maintenance
Storage Migration – Initiate Migration Client accessing VM Storage Migrate this VHDX to another disk VM stays running servicing clients VHDX
Storage Migration – Create Destination VHDX New VHDX created on destination storage New VHDX created on new storage VHDX VHDX
Storage Migration – Mirror Writes Reads are from Source VHDX Writes are done to Source VHDX and also synchronously to the Destination VHDX Writes mirrored to new Destination VHDX Reads are from Source VHDX VHDX VHDX
Storage Migration – Copy Data Source VHDX data is copied over to Destination VHDX Only unchanged blocks are copied over ODX will accelerate file copy SMB leveraged if storage is not accessible to this server VHDX data copied from Source to Destination VHDX VHDX
Storage Migration – Post-Transition Cleanup Once all data is synchronized VM is switched to new VHDX Source VHDX is only removed once verified to be running on Destination VHDX Enables roll-back Reads and Writes transitioned to new VHDX VHDX
Unplanned Downtime
Failover Clustering Failover Clustering is a distributed system that health monitors servers and takes recovery action Protects from unplanned downtime: Hardware Host OS VM Guest OS Apps in VMs Unplanned downtime results in VM restarted on another server Session state lost
Failover Cluster Health Monitoring Extensive health monitoring up and down the stack Node 1 Node 2 Guest OS Guest OS VDev VDev VMMS VMMS RHS RHS vmclusres.dll vmclusres.dll User Mode ClusSvc ClusSvc Kernel Mode NetFT NetFT
Resiliency Delivered by Host Clustering Avoids a single point of failure when consolidating Survive Host Crashes VMs restarted on another node Restart VM Crashes VM OS restarted on same node Recover VM Hangs Zero Downtime Maintenance & Patching Live migrate VMs to other hosts Mobility & Load Distribution Live migrate VMs to different servers to load balance
Flexible storage choices for building clusters Shared Storage Data Replication Software Replication Application Replication FC SAS RBOD Spaces RAID HBA SMB Hardware Replication SAS JBOD 3rd party software replication solution Example: Exchange SQL AlwaysOn Hyper-V Replica iSCSI FCoE
Demo: Configuring a highly available VM
Cluster Shared Volumes (CSV) Cluster Shared Volumes (CSV) is a clustered file system in Windows Server 2012 Enables all servers in a Failover Cluster to access a common NTFS volume Provides a layer of abstraction above NTFS Provides applications complete abstraction with respect to which nodes actually own a LUN Applications can failover without requiring drive ownership changes No dismounting and remounting of volumes Faster failover times (aka. less downtime) Increases resiliency and availability
CSV I/O Fault Tolerance 6/23/2018 10:29 PM CSV I/O Fault Tolerance I/O Redirected via network VM running on Node 2 is unaffected Coordination Node SAN Connectivity Failure VHDX VM’s can then be live migrated to another node with zero client downtime
Application Availability with Guest Clustering Guest Clustering is creating a Failover Cluster inside of the virtual machines and failing over applications across VMs Delivers: Application Health Monitoring Application within VM crashes, application automatically restarts or fails over Application Mobility Guest OS needs patching or VM needs maintenance, application moved to other node Cluster
Combining Host & Guest Clustering Best of both worlds for flexibility and protection VM high-availability & mobility between physical nodes Application & service high-availability & mobility between VMs Cluster-on-a-cluster does increase complexity Mixing physical and virtual nodes is supported Must pass Validate Guest Cluster CLUSTER CLUSTER iSCSI or FC SAN SAN
VM Monitoring The host identifies & recovers from services failures in the guest Application level recovery Service Control Manager (SCM) or event triggered Guest level HA recovery Failover Clustering gracefully reboots VM Host level HA recovery Failover Clustering fails over VM to another node Generic health monitoring for any application Monitor services through Service Control Manager Generation of specific Event IDs
Disaster Recovery
Hyper-V Replica Hypervisor level replication Point-in-time replication of a VM to a remote server RPO of 5 minutes SiteA SiteB VHDX VHDX
Multi-Site Clustering Automatic and manual failover for DR Supports 3rd party hardware and software based replication Single Cluster SiteA SiteB
Choosing a Disaster Recovery Solution Service Level Agreement / Business Requirements Hyper-V Replica RPO = 5 minutes RTO = Manual (longer) Cost = In-box in Windows Server Complexity = Low Multi-Site Clustering RPO = 0 minutes* RTO = Automatic (fast) Cost = High Complexity = High *depending on 3rd party replication solution
Fault Tolerant Solutions Fault tolerant solutions can deliver zero downtime for any unplanned hardware failure Requires special hardware Gives higher levels of availability for a broader set of unplanned downtime scenarios Partnership with Stratus Mission-Critical Hyper-V Systems in lock step Windows
Summary Windows Server delivers a breath of features which can increase the resiliency of your Private Cloud Windows Server 2012 delivers many new availability features Delivering a resilient cloud includes planning for: Planned Downtime Unplanned Downtime Disaster Recovery
We want to hear from you! Evaluation Complete your session evaluations today and enter to win prizes daily. Provide your feedback at a CommNet kiosk or log on at www.2013mms.com. Upon submission you will receive instant notification if you have won a prize. Prize pickup is at the Information Desk located in Attendee Services in the Mandalay Bay Foyer. Entry details can be found on the MMS website.
Access MMS Online to view session recordings after the event. Resources Access MMS Online to view session recordings after the event. http://channel9.msdn.com/Events
6/23/2018 10:29 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.