Click to edit Master text styles Andreas Tsangaris, Chief Technical Officer PERFORMANCE Business Continuity and Disaster Recovery
Disclaimer This session may contain product features that are currently under development. This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined. “These features are representative of feature areas under development. Feature commitments are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery.”
Agenda Business Continuity Requirements Minimizing Downtime in the Datacenter Providing Effective Disaster Recovery Summary and Next Steps
Solutions to reduce downtime need to address both planned and unplanned downtime Sources of Downtime Eliminating planned downtime can increase system availability by a full order of magnitude
Data Loss and Time to Recover TIME Disaster Strikes How far back?How long to recover? Systems Recovered Last Backup or Point Where Data is in Usable State Recovery TimeRecovery Point Common Challenges: – Data loss of more than 24 Hours? – Recovery Time greater than 4 Hours? Flexible Recovery Point and minimal Recovery Time
Requirements for Business Continuity Solutions Ensure minimum interruption time Protection across operating systems and applications Independent of physical infrastructure Protection against a broad spectrum of downtime causes
ESX and ESXi – Serious Availability Proven by customers – Over 100,000 customers – Over seven years of maturation – Over 85% of customers using for production workloads – Years of continuous uptime at customer sites Reliable by design ESXi: 32MB on disk Less code = fewer bugs, fewer patches, etc. No dependence on OS or arbitrary drivers 2008 Editor’s Choice Awards Most Reliable Category 1. VMware ESX 2. IBM mainframe
Agenda Business Continuity Requirements Minimizing Downtime in the Datacenter – Protection against failures – Eliminating planned downtime Providing Effective Disaster Recovery Summary and Next Steps
VMware FT Hardware Failure Tolerance UNPROTECTED Application Coverage AUTOMATED RESTART CONTINUOUS 0% 10% 100% with VMware HA Transforming Availability Service Levels
Fast Recovery from Hardware and Software Failures… Interconnect Pool CPU Pool Memory PoolStorage Pool VMware Infrastructure – Storage VMotion – HA—High Availability Application vServices – VMotion, DRS – Update Manager App ServerExchangeFile/Print
High Availability vServices X VMware High Availability makes all Servers and Applications protected against component and complete system failure. Only One-Click to configure! Recover from Unplanned Downtime
VMware HA Enhancements 32-node clusters Additional isolation addresses Configurable failure detection time VMs are now restarted on hosts with most resources Proactive cluster configuration checks VM Failure monitoring (experimental) : – Monitors virtual machines for guest OS failures – Automatically restarts VM after specified interval Resource Pool X Automatic restart of virtual machines in case of physical server failures Simple, cost effective availability for any workload Minimizes unplanned downtime due to hardware and OS failures
High Availability vServices X Set at cluster level Applies to all VMs in the cluster Can disable using “Restart priority” for individual VM Uses the VMtools heartbeat Virtual Machine Monitoring
Proactively Avoid Planned Downtime VMotion
Array A (off lease) LUN A2 LUN A1 Array B (NEW) LUN B2 LUN B1 Eliminating Downtime for Storage Changes Examples – Redistributing load – Optimizing storage configuration – Storage refresh Storage VMotion – Online migration of virtual machine disks to new datastore – Zero downtime for applications and users
Summary availabilty functionality Planned Update Manager Maintainance mode VMotion Storage VMotion Snapshots Unplanned Network Port trunking HA Site Recovery Manger VCB / Snapshots
New Solutions for Reduced Downtime Server ESX Server Storage App OS App OS App OS Zero downtime, zero data loss continuous availability Fault Tolerance Integrated backup and recovery appliance Data Recovery Availability
VirtualCenter Copyright © 2005 VMware, Inc. All rights reserved. vCenter Data Recovery Copyright © 2005 VMware, Inc. All rights reserved. Agent-less, disk-based backup and recovery of your VMs VM or file level restore Incremental backups and data de-dupe to save disk space Quick, simple and complete data protection for your VMs Centralized Management through VirtualCenter Cost Effective Storage Management Backup 2. Restore 1.VM goes down 2.Select VM images/files to recover 3.Restore…VM running in seconds X 1.Schedule backups via VC 2.Snapshots taken 3.Data de-duped and stored VirtualCenter X De-duplicated Storage Availability VirtualCenter
X Futures: VMware Fault Tolerance Application protection against hardware failures, with NO down time that is Application and Operating System Independent.
Agenda Business Continuity Requirements Minimizing Downtime in the Datacenter Providing Effective Disaster Recovery Summary and Next Steps
Virtual Datacenter OS from VMware SaaS LinuxGridWindowsJ2EE.Net VMware Infrastructure -> Virtual Datacenter OS Application vServices Scalability Infrastructure vServices SecurityAvailability vNetworkvStoragevCompute Cloud vServices ……. Web 2.0 vCenter Application Management Infrastructure Management Site Recovery Manager Lifecycle Manager ConfigControl Orchestrator Capacity IQ Chargeback
Dependent on Perfect Training, Documentation, and Execution Complex Recovery Processes and Infrastructure Recovery takes days to weeks Recovery tests often fail Significant IT time and resources consumed Failure to Meet Recovery Requirements Unplanned: Protecting from Hardware Failures
Key Features of Virtualization for DR Hardware Independence Encapsulation Partitioning and Consolidation Resource Pooling
Automate the Failover of an Entire Datacenter ProductionRecovery Site Recovery Manager transforms disaster recovery VMware Infrastructure
Site Recovery Manager Simplifies and Automates DR Ensure that disaster recovery is rapid, reliable, and manageable Setup Testing Failover Allocates recovery resources Integrates with replication Helps build recovery plans Creates isolated test environment Automates tests of recovery plans Cleans up after tests completed Allocates resources for recovery Prepares storage for recovery Automates recovery process
Site Recovery Manager Use Cases Target scenarios – Restart of tens or hundreds of VMs in another datacenter – Restart can be unplanned (disaster) or planned (migration) – Can tolerate RTO of minutes to hours Requirements – Second site running VirtualCenter and ESX – Replicated Fibre Channel or iSCSI LUNs from supported storage vendors SRM is not – A replication product – Geo-clustering for applications in VMs
Protected SiteRecovery Site VirtualCenter Site Recovery Manager VirtualCenter Site Recovery Manager Datastore Groups Array Replication Datastore Groups X So what does it look like? Protected VMs powered on become unavailable online in Protected Site offline Recovery SiteProtected Site Supports bi- directional site protection
Disaster Recovery Setup Integrate with replication – Identify which virtual machines are protected by replication configuration Map recovery resources – Network resources, server resources, management objects Create recovery plans – For virtual machines, applications, business units – Convert manual runbook to pre- programmed response – Customizable with scripting and callouts
Disaster Recovery Setup Integrate with replication – Identify which virtual machines are protected by replication configuration Map recovery resources – Network resources, server resources, management objects Create recovery plans – For virtual machines, applications, business units – Convert manual runbook to pre- programmed response – Customizable with scripting and callouts Storage Partners
Failover Automation Detect site failures – Raise alert when heartbeat lost Initiate failover – User confirmation of outage – Granular failover initiation Manage replication failover – Break replication – Make replica visible to recovery hosts Execute recovery process – Use pre-programmed plan – Provide visibility into progress Manage networking – Put VMs on right VLAN – Change IP addresses
Failover Automation Detect site failures – Raise alert when heartbeat lost Initiate failover – User confirmation of outage – Granular failover initiation Manage replication failover – Break replication – Make replica visible to recovery hosts Execute recovery process – Use pre-programmed plan – Provide visibility into progress Manage networking – Put VMs on right VLAN – Change IP addresses
Testing Replication Management – Snapshot replicated LUNs before test – Delete snapshots of replicated LUNs after test Network Management – Change all virtual machines to a test port group before powering them on Customization/extensibility – Same breakpoints and callouts as failover sequence – Extra breakpoints and callouts around the test bubble
Testing Replication Management – Snapshot replicated LUNs before test – Delete snapshots of replicated LUNs after test Network Management – Change all virtual machines to a test port group before powering them on Customization/extensibility – Same breakpoints and callouts as failover sequence – Extra breakpoints and callouts around the test bubble
Failback Setup DR protection from DR site back to primary site – Failover makes VMs reside at the DR site – Provide the failed-over VMs with protection – Same setup as was done for initial protection Work with storage to reverse replication Test failback – Test repeatedly – same mechanism as with test failover – Only set the failback date after the plan is perfect Failback to primary site – Just hit the failover button—failback is failover in the reverse direction
SRM Benefit Summary 1 Accelerate Recovery 2 Ensure Reliable Recovery 3 Simplify Planning and Recovery 4 Expand Disaster Recovery Protection 5 Reduce Cost 6 Enable Compliance
Agenda Business Continuity Requirements Minimizing Downtime in the Datacenter Providing Effective Disaster Recovery Summary and Next Steps
VMware Infrastructure: The Safest Place To Run Applications Prevent Planned Outages Minimize Downtime from Unplanned Outages Prevent Unplanned Outages Component NIC Teaming Multipathing Server DRS Maintenance Mode, VMotion HA Fault Tolerance Storage Storage VMotion VCB + Backup ISV products Data Recovery Data N/A VCB + Backup ISV products Data Recovery Site Site Recovery Manager All available across physical hardware, operating systems, and applications Virtualisation enables new and easier ways of BC/DR
Next Steps Learn more – Read more about VMware Business Continuity Solutions at – Find more business continuity customer case studies at Start your evaluation – VMware and partners can help you evaluate VMware software