Download presentation
Presentation is loading. Please wait.
1
EMC PROTECTION FOR MICROSOFT EXCHANGE SERVER 2010
EMC VNX5700, EMC RecoverPoint, EMC Replication Manager, EMC Data Protection Advisor, VMware vSphere, and VMware vCenter Site Recovery Manager Fully automated recovery for Microsoft Exchange Server 2010 Unlimited snapshots to protect from logical database corruption In-depth monitoring and reporting EMC Solutions Group — Strategic Solutions Engineering
2
Solution Offering and Options
Full end-to-end Protection for Exchange 2010 environment in a virtualized infrastructure Local application protection with Exchange deployed in DAG Local continuous data protection (CDP) with RecoverPoint (unlimited snaps) Remote continuous replication (CRR) with RecoverPoint for DR Automated failover and failback with vCenter SRM 5 Database protection and instant restore with Replication Manager In-depth monitoring and reporting with DPA Flexible deployment options White paper for this solution is published here:
3
Exchange 2010 DR Options Exchange 2010 DR Options Replication Method
Description Reference Material Cross-site (stretch) DAG Exchange Continuous Replication (DAG) Built in to Exchange 2010 for HA and DR Co-branded White Paper: Business Continuity for Microsoft Exchange 2010 Enabled by EMC Unified Storage, Cisco UCS and Microsoft Hyper-V Database Portability RecoverPoint Just the Exchange data is replicated. The users are re-homed at the second site to a different server White Paper: Exchange 2010 Disaster Recovery Options with Cross-Site Exchange DAG and EMC RecoverPoint Replication Enabler for Exchange (REE) Synchronous solution today. Integrated with Exchange rd Party Replication API. Replaces native DAG replication Co-branded White Paper: Zero Data Loss Disaster Recovery for Microsoft Exchange 2010 Enabled by EMC Unified Storage, EMC Replication Enabler for Exchange 2010 and Microsoft Hyper-V Server/Site Move SRDF, RecoverPoint Both the OS and Exchange data is replicated. Failover includes bringing the servers up, changing IP & updating DNS (automated with SRM 5) White Paper – SRDF Solution: Virtualized Exchange 2010 DR Enabled by EMC Symmetrix VMAX and VMware vCenter SRM White Paper – RecoverPoint Solution: EMC Protection For Microsoft Exchange Server 2010 EMC offers many options to protect Microsoft Exchange. We typically see four approaches used. Each method has its benefits and unique considerations. This solution enables an entire site move by using RecoverPoint replication with automation provided by VMware SRM.
4
Solution Architecture
Two active-active Sites 10,000 Exchange 2010 users per site (20,000 total) 1 DAG per site with 2 local copies 4 Mailbox Server VMs per DAG (each VM configured to run 5,000 users during local switchovers — 2,500 active/2,500 passive) Exchange user profile 2 GB mailbox per user 0.15 IOPS per user This solution virtualizes Exchange 2010 on VMware vSphere 5 and leverages EMC, Microsoft, and VMware best practices and guidelines for deploying Exchange in virtual configurations. This diagram illustrates the Exchange Server 2010 environment deployed for this solution. The environment represents two active sites with four vSphere hosts at each site. Each site provides enough performance and capacity to support up to 20,000 active users (10,000 users during normal operations and another 10,000 users during a disaster recovery event). For this solution, the Active Directory and DNS services at each site are provided outside of the virtual infrastructure. The Active Directory is configured to meet all of the requirements for this Exchange deployment. Redundant Active Directory servers, as well as DNS servers, are provisioned at both sites. Data protection for both sites is provided by RecoverPoint, with Replication Manager initiating application-consistent (VSS) copies for Exchange databases and logs. In the event of a site-specific disaster, or if you want to move the servers or test disaster recovery, vCenter SRM is used to coordinate an entire Exchange server failover from one site to another site and back again. In an active/active site configuration, such as the one this solution represents, both sites (or all sites, if there are more than two) are active production sites, that are locally protected and also globally protected by every other site. In other words, each site can be referred to as both a protected site and a recovery site. With this in mind, the subsequent sections refer to a pair of protected/recovery sites, but each site can reverse roles with the other, depending on where the disaster event occurs. Further, in a configuration with more than two active sites, every site is both a protected site and a potential recovery site for any peer site affected by a disaster. Note: Even though this solution demonstrates the protection of an active/active site configuration, the solution can also be used to protect active/passive site configurations.
5
Site Protection with RecoverPoint and Automated Failover with SRM
Site A Site B RecoverPoint WAN Link RecoverPoint SRM Data Data
6
Parameters Used to Test and Validate the Solution
Local DAG with RecoverPoint CRR environment Number of sites Two active-active Number of users per site 10,000 users (20,000 total during site failover) Exchange 2010 user profile 150 messages sent/received/day (0.15 IOPS), 2 GB mailbox Database read/write ratio 3:2 in Mailbox Resiliency configuration Test Environment 4 mailbox servers per DAG (local site) 3 copies (2 local with DAG and 1 remote with RecoverPoint) Connectivity between sites 1 Gbps Ethernet Storage connectivity to physical host FC Storage design 1 VNX5700 storage array per site 2 TB NL-SAS (7.2k rpm) disks for Exchange database and logs (RDM/P) 2 TB NL-SAS (7.2k rpm) disks for virtual machine OS (VMFS) 600 GB SAS (10k rpm) disks for RecoverPoint journals vSphere host design 4 Intel-based servers with 4 six-core Xeon X GHz (24 CPUs) and 128 GB RAM. 2 servers per site, each supports 4 Exchange Mailbox Role virtual machines. 4 Intel based servers with 4 six-core Xeon X GHz (16 CPUs) and 64 GB RAM. 2 servers per site, each supports 4 Exchange Client Access and Hub Transport Role virtual machines. Virtualization Platform vSphere 5.0 Data protection Daily full backups (hardware VSS) 24 application-consistent snapshots per day (24 hours) Unlimited crash-consistent snapshots per day (24 hours)
7
ESXi Host & VM Design Exchange 2010 Role CPU Memory Mailbox Server 6
32 Client Access and Hub Transport Combo 4 16 To satisfy the requirement to continue to support all 20,000 Exchange users in the event of a disaster at either site, four vSphere servers are deployed at each site. Two vSphere servers with 32 CPUs and 128 GB of RAM capacity support eight Mailbox Server Role virtual machines (with four virtual machines per vSphere server). Another two vSphere servers with 16 CPUs and 128 GB of RAM support eight Exchange Client Access and Hub Transport Role virtual machines (again, with four virtual machines per vSphere server). NOTE: CPU and memory requirements are based on Microsoft Guidelines. See for additional details.
8
Mailbox Server Role Design
Each Mailbox server is configured with 10 databases, five active and five passive. All databases are balanced and distributed between Mailbox servers within the DAG and between vSphere hosts to eliminate a single point of failure. This design eliminates downtime during vSphere host or Mailbox server virtual machine maintenance. This Figure presents a screenshot from the Exchange Management Console showing the database configuration for one Mailbox server. It shows that ten databases are deployed on Mailbox server MBX1, where five of these databases are active and the other five are passive.
9
Exchange 2010 Mailbox Server VM Design
ESX Server 1 ESX Server 2 MBX1 MBX2 MBX3 MBX4 DB1 C1 C2 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12 DB13 DB14 DB15 DB16 DB17 DB18 DB19 DB20 Design Attributes 5,000 users during local switchovers (2500 active/2500 passive) 10 DBs per server 500 users per DB 1.6 TB DB LUNs 100 GB Log LUNs VNX pools Physical isolation of DAG copies The solution design eliminates downtime during vSphere host or Mailbox server virtual machine maintenance. Domains protected against failure Mailbox server Hypervisor host Storage Network Active Copies Passive Copies
10
Mailbox Server Building Block Design
Users per Exchange Mailbox server virtual machine Disks per Exchange Mailbox server virtual machine vCPUs per Exchange Mailbox server virtual machine Memory per Exchange Mailbox server virtual machine 5,000 20 (16 + 4): 16 x 2 TB NL-SAS disks in RAID 1/0 for databases 4 x 2 TB NL-SAS disks in RAID 1/0 for logs 6 CPUs 32 GB For this solution, the Exchange Mailbox server virtual machine design is based on a building block for 5,000 users. The storage, CPU, and memory resources for this building block are derived from the customer requirements specified for this solution.
11
Storage Design Details for a 5,000-User Mailbox Server Building Block
Attribute Details Number of users 5,000 (in a switchover condition) User profile 150 messages sent/received (0.15 IOPS), 2 GB Mailbox Disk type 2 TB NL-SAS, 7.2k rpm Storage configuration/RAID type 20 disks per building block for Exchange data: 16 disks in a RAID 1/0 storage pool for databases 4 disks in a RAID 1/0 RAID group for logs Databases per server/users per database 10 databases per server, 500 users per database LUNs/LUN sizes 21 LUNs per Mailbox server virtual machine: 10 database LUNs (1.6 TB) (RDM/P) 10 log LUNs (100 GB) (RDM/P) 1 LUN for virtual machine OS (200 GB) (VMFS) For this solution, the Exchange Mailbox server virtual machine design is based on a building block for 5,000 users. The storage, CPU, and memory resources for this building block are derived from the customer requirements specified for this solution.
12
ESX/VMs Storage Design
For this solution, one homogeneous RAID 1/0 storage pool with 40 NL-SAS disks is deployed for each vSphere host hosting Exchange Mailbox Server Role virtual machines. This configuration supports databases for two Mailbox Server Role virtual machines on each vSphere host. It provides just over TB of usable storage and meets the IOPS and total mailbox capacity requirements for two building blocks, each with 5,000 very heavy users and a 2 GB mailbox limit. To support log capacity requirements, four 2 TB NL-SAS drives are deployed in a RAID 1/0 (2+2) RAID group configuration for each vSphere host with two Exchange Mailbox Server Role virtual machines. From each database storage pool, 20 x 1.6 TB database LUNs are configured to accommodate a 1 TB database and index. The LUNs are distributed among Exchange Mailbox server virtual machines on the same vSphere host, and the LUNs are configured as RDM/P volumes. The same approach is used for log LUNs. 20 x 100 GB LUNs are created from each RAID group. The LUNs are distributed among Mailbox server virtual machines on the same vSphere host, and the LUNs are configured as RDM/P volumes. Note: Logs and databases in this solution are separated on different disks to satisfy the protection requirements. At each site, 200 GB virtual machine operating system (OS) volumes are configured on the same VNX5700 storage array with 2 TB NL-SAS drives configured as RAID1 (2+2) RAID groups. Each OS LUN is formatted as a VMFS volume on the vSphere host.
13
RecoverPoint Design In this solution, EMC RecoverPoint provides protection from Exchange logical database corruption and DR protection in the case of a site failure. Native Exchange replication does not protect from logical corruption. Corruption can potentially replicate to other Exchange database copies within the DAG locally and remotely. In this solution, RecoverPoint is configured as follows: The replication method is Concurrent Local and Remote Replication (CLR) (combines both CDP and CRR). Each Mailbox server VM is placed into its own CG with all replicated database and log LUNs, including the LUN containing the guest OS. Ensures full consistency across all volumes used by the Mailbox Server Role virtual machines. In this solution EMC RecoverPoint provides protection from Exchange logical database corruption, which can potentially replicate to other Exchange database copies within the DAG. RecoverPoint also provides near-instantaneous point-in-time recovery with minimal data loss. In this solution, RecoverPoint is configured as follows: The replication method is Concurrent Local and Remote Replication (CLR). This combines both CDP and CRR technologies and provides point-in-time recovery from either CDP or CRR copies. Each Mailbox server virtual machine is placed into its own consistency group with all replicated database and log LUNs, including the LUN containing the guest OS. This configuration ensures full consistency across all volumes used by the Mailbox Server Role virtual machines. Three journals are set up for each consistency group. Two journals reside at the protected site to support the protected volumes and their local replicas, and one journal resides at the recovery site to support the remote replica. Note: RecoverPoint CRR alone can provide protection from logical database corruption, but CRR can replicate only a single DAG copy. With CRR, replicated Exchange databases can be mounted at the recovery site on any Exchange DAG member server, but the databases then require seeding to provide local native high availability for Exchange at the recovery site.
14
Consistency Group (CG) Design for Active/Active Data Centers
Site A Site B CDP CRR MBX1 CG MBX1 Ensures consistency across production and targets – one Exchange Mailbox server per CG Application recovery can be independent from SLAs Enables independent replication of various applications Utilizes Consistency Groups to organize data CDP CRR MBX2 CG MBX2 MBX3 CDP CRR MBX3 CG RecoverPoint has a concept of consistency groups (CGs). RecoverPoint consistency groups enable the user to associate the LUNs or volumes in their replication sets together so that all operations work consistently across all the replicated LUNs or volumes. For example, for Exchange CGs, if the production volume (green) fails, the user can recover the latest point-in-time (PIT) image from the remote CRR copy and use that to rebuild the production volume. Another use for consistency groups is if the user needs to apply different service levels to different applications. On this slide, might be the most critical application, so the policy for its consistency group might be set to minimize lag and compression and to prioritize its bandwidth usage above the other two groups. The user can ensure that applications can be protected independently by creating a consistency group for each application and then managing the consistency group independent of other groups, such as performing local and/or remote recovery or starting and stopping individual consistency groups. In addition, the user can set a specific SLA or policy optimizations such as application importance, usage of RecoverPoint resources, or recommended recovery point objectives on a per-consistency group basis. MBX4 CDP CRR MBX4 CG CRR CDP MBX5 CG MBX1 CRR CDP MBX6 CG MBX2 CRR MBX3 CDP MBX7 CG CRR MBX4 CDP MBX8 CG
15
RecoverPoint Consistency Group Configuration
One Consistency Group per Mailbox Server VM 21 Replication Sets per Consistency Group For this solution, RecoverPoint consistency groups are created for each replicated Mailbox server virtual machine. Separating replicated virtual machines into multiple consistency groups makes it possible to plan for different disaster recovery scenarios. For example, each Exchange Mailbox server virtual machine can be failed over separately from the other virtual machines, or the entire protected environment can be failed over in a single process.
16
RecoverPoint Group Sets
A Group Set is a set of consistency groups to which the system applies parallel bookmarks at a user-defined frequency. Allows for automatic, periodic bookmarking Provides same RPO for multiple Consistency Groups distributed across multiple RPAs Useful for consistency groups that depend on each other or must work together as a single unit Four Mailbox servers in a DAG at each site – combined into one Group Set to meet 15-minute RPO requirement
17
RecoverPoint CG Policy Settings for SRM
When RecoverPoint is deployed with VMware vCenter SRM, RecoverPoint must be configured to enable automatic failover capability. This configuration is accomplished by using the “Group is managed by SRM, RecoverPoint can only monitor” policy setting in the RecoverPoint Management Console.
18
RecoverPoint Journal Sizing
RecoverPoint journal volumes store consistent point-in-time snapshots (or bookmarks) to enable point-in-time recovery of Exchange data. Journal size will dictate how much protection you will have (how many snaps you can keep). For each CG in the RecoverPoint CLR configuration, three journals are set up: Two at the local site One at the remote site The two most important considerations when sizing storage for journal volumes: Performance Protection window RecoverPoint journal volumes store consistent point-in-time snapshots (or bookmarks) to enable point- in-time recovery of Exchange data. These bookmarks allow DVR-like rollbacks of the data to different points in time. For each consistency group in the RecoverPoint CLR configuration, three journals are set up, two at the local site to support the protected volumes and their local replicas, and one at the remote site to support the remote replica. The production journals, one for each consistency group, contain system delta-marking information. Production journals do not contain snapshots for point-in- time recovery. For optimal journal performance, ensure that you choose the appropriate RAID and drive types. The solution, as validated, uses 600 GB 10k rpm SAS drives in a RAID 1/0 configuration for journal volumes, which are deployed in RAID 1/0 RAID groups. The protection window depends on the change rate of the application and RPO requirements (how far back Exchange administrators want to go to for point-in-time recovery). Both of these factors determine the required journal size.
19
RecoverPoint Journal Sizing (cont.)
Formula to calculate journal volume size based on the expected change rate and required rollback time period: Example Protection requirements: 3 days (72 hours) Change rate: 5 Mbps Journal size = 5 Mbps x 259,200 seconds / 0.8 x 1.05 = 1,701,000 Mb (~213 GB) Use the following formula to calculate journal volume size based on the expected change rate and required rollback time period. For example, to support a 3-day (72-hour) rollback requirement (259,200 seconds), with 5 Mbps of new data writes to the replication volumes in a consistency group, the calculation is as follows: Journal size = 5 Mbps x 259,200 seconds /0.8 x 1.05 = 1,701,000 Mb (~213 GB) In this calculation, 259,200 seconds represents a 72-hour rollback window, and 0.8 represents 20 percent for reserved journal space. As a general rule, EMC recommends making the journal size at least 20 percent of size of the data being replicated, when change rates are not available.
20
RecoverPoint Journal Sizing (cont.)
Journals deployed in this solution
21
RecoverPoint Storage Design Summary
Volume type Site A and B Comments Repository 3 GB One volume per storage array/RPA cluster per site Journal 40 x 100 GB 32 x 5 GB Journal LUNs configured from RAID 1/0 groups. 500 GB journal per consistency group: 5 x 100 GB volumes for each local and remote copy. 20 GB journal per consistency group: 4 x 5 GB volumes for each production copy. Replication 80 x 1.6 TB for database volumes 80 x 100 GB for log volumes RAID 1/0 pools Replication volume sizes must be equal or larger than production volumes This table provides summary of RP storage requirements for this solution. For this RecoverPoint CLR configuration, 500 GB journal space is allocated for each local and remote copy of each consistency group. (In this solution, each consistency group contains all LUNs for one Mailbox server.) The 500 GB journal is composed of five 100 GB journal volumes. The deployment of smaller volumes enables journal space to be increased, seamlessly, by smaller increments. For best performance, it is a best practice to create and grow journals by using volumes that are identical in size, because RecoverPoint re-stripes the volumes during expansion. Production journals, which do not contain snapshots for point-in-time recovery, are allocated 20 GB of storage. These journals are composed of four 5 GB journal volumes.
22
RecoverPoint Integration with Replication Manager
RM leverages RP replication technology to automate and schedule VSS copies of Exchange, both locally and remotely RM automates Exchange 2010 tasks such as: Create Exchange database application sets Create and manage multiple-site, application-consistent replicas of Exchange databases Run a full Exchange job to create a VSS bookmark, run the Microsoft Exchange utility eseutil, and then truncate the Exchange logs Run an Exchange copy job that creates a VSS bookmark Start an on-demand mount and dismount of an Exchange database Restore Exchange databases from either local or remote bookmarks to the production Exchange server in seconds Replication Manager’s built-in application intelligence and automation enables you to leverage RecoverPoint’s replication technology to schedule and create VSS copies of Exchange (both locally and remotely). Replication Manager freezes the production Exchange database, requests a VSS copy, flushes Exchange Server’s cache to disk, and creates a VSS bookmark for RecoverPoint and Replication Manager. Exchange is then thawed, all within the required VSS window of time. Once thawed, production resumes. Meanwhile the RecoverPoint VSS bookmark is captured in the local CDP journal as well as the remote CRR journal. Microsoft Exchange Server supports the VSS API that enables RecoverPoint to integrate with Exchange. RecoverPoint provides a utility called KVSS that invokes the Microsoft VSS framework to prepare a VSS image. For full details on using KVSS utility for protecting Exchange, refer to Replicating Microsoft Exchange Server with EMC RecoverPoint – Technical Notes. In this solution, Replication Manager provides an additional level of automation by coordinating application-consistent snapshots. It is very easy to use Replication Manager’s GUI interface to automate the restore of Exchange to an application-consistent point in time. Replication Manager handles the restore locally by using the RecoverPoint CDP replica or, if necessary, RM can restore back to production by using the RecoverPoint CRR Exchange replica.
23
How this solution uses RM
RM automates RecoverPoint application-consistent replicas and performs Exchange log truncation 24 application-consistent (VSS) snapshots in a 24-hour cycle In addition to maintaining hourly application-consistent snapshots initiated by RM, RP journal maintains many more crash-consistent snapshots from which Exchange databases can easily be recovered RM application set is created for each Exchange Mailbox server virtual machine. Application set includes all ten databases configured on the server This design aligns with the RP CG design, where each consistency group contains all of the Exchange databases and logs for a specific Mailbox server In this solution, Replication Manager automates RecoverPoint application-consistent replicas and performs Exchange log truncation. To satisfy the solution requirements presented earlier, the design supports 24 application-consistent (VSS) snapshots in a 24-hour cycle. In addition to maintaining hourly application-consistent replicas (snapshots) initiated by Replication Manager, the RecoverPoint journal maintains many more crash-consistent snapshots from which Exchange databases can be easily recovered. The number of crash-consistent snapshots in the journal depends on the size of the journal and the application change rate. The Validation section of the white paper provides information about RecoverPoint journal usage under Exchange load using Loadgen.
24
Replication Manager Design
RM Application Set configuration Created for each Mailbox VM Includes 10 DB’s per Mailbox Server Design aligned with RecoverPoint For this solution, an RM application set is created for each Exchange Mailbox server virtual machine. The application set includes all ten databases configured on the server. This design aligns with the RecoverPoint consistency group design, where each consistency group contains all of the Exchange databases and logs for a specific Mailbox server.
25
Special Environment Variable for RM
In this solution each RecoverPoint consistency group is configured with an additional replication set for the Mailbox server OS LUN that resides on the VMware VMFS datastore volume This configuration creates a potential conflict between Replication Manager and RecoverPoint, because the Microsoft VSS framework does not support VMFS data stores To resolve this conflict, environment variable is created on each Exchange Mailbox server virtual machine with Replication Manager Agent installed: ERM_RP_EXTRA_DEVS_ALLOWED=1 With this environment setting in place RecoverPoint continues to provide crash-consistent replicas of all LUNs attached to the Exchange Mailbox server virtual machine (including the OS VMFS volume) Replication Manager continues to provide application-consistent replicas of all Exchange RDM volumes
26
RM Replication options
In this solution, RM jobs are configured with two Replica Type options to protect the Exchange Server 2010 databases: Online Full Online Copy Online Full jobs are set up for daily backups Online Copy jobs are set up to run at hourly intervals during business hours to provide application-consistent recovery (24 snapshots per 24 hours) In this solution, Replication Manager jobs are configured with two Replica Type options to protect the Exchange Server 2010 databases, Online Full and Online Copy. Replication Manager uses the VSS framework to initiate both of these replica types. Online Full jobs are set up for daily backups, and Online Copy jobs are set up to run at hourly intervals during business hours to provide application- consistent recovery. This picture shows the Replication Manager Online Full job configuration settings to perform daily backup of Mailbox server MBX1. Advanced replication settings enable Replication Manager to verify database states before performing this job.
27
RecoverPoint + RM = Integrated Local/Remote Protection
EMC Replication Manager Technical Review September 2009 RecoverPoint + RM = Integrated Local/Remote Protection BOOKMARK Replication Manager Server Exchange Mount Host Exchange Production SAN SAN RecoverPoint BOOKMARK RecoverPoint WAN Note to Presenter: Show this slide in presentation mode. (Step through the animation.) This slide shows Replication Manager’s integration with RecoverPoint. With Replication Manager’s built in application intelligence, schedule a VSS copy of Exchange (or SQL Server) both locally and remotely using RecoverPoint Continuous Local and Remote (CLR) replication technology. Replication Manager freezes the production Exchange database, requests a VSS copy, flushing the cache to disk and creating the VSS bookmark. Exchange is then thawed, all within the required VSS window of time (less than 10 sec). Once thawed, production resumes. Meanwhile the VSS bookmark is captured in the local CDP journal as well as the remote CRR journal. With Replication Manager’s easy to use point-and-click interface, restore Exchange to an application- consistent point in time locally with the RecoverPoint CDP replica or, if necessary, restore back to production with the RecoverPoint CRR Exchange replica. LUN LUN Production LUN Local CDP Copy Local CDP Journal Remote CRR Journal Remote CRR Copy Replication Manager server freezes application Replication Manager server requests VSS-compliant bookmark Replication Manager server thaws application
28
Site Recovery Manager Core Capabilities
Centralized management for DR Create, test, update and execute recovery plans from a single point of management Seamless integration with Virtual Center Disaster recovery automation Build recovery process in advance Automate testing of recovery plans Automate execution of recovery process Simplified setup and integration Allocate and manage recovery resources Easy integration with leading vendors’ storage replication systems Virtual Center Site Recovery Manager Virtual Machines ESX Server ESX Server ESX Server ESX Server Servers Storage
29
Key SRM Components Protected site Recovery site VC Server 1
VC 2 DB VC 1 DB SRM Server 1 SRM Server 2 SRM 2 DB SRM 1 DB Storage Replication Adapter Storage Replication Adapter Array 1 Array 2 RecoverPoint replication RecoverPoint replication
30
vCenter SRM Requirements
SRM requirements for the vSphere configuration at each site: Each site must include a vCenter server containing at least one vSphere data center. The recovery site must support array‐based replication with the protected site, and the recovery site must have hardware and network resources that can support the same virtual machines and workloads as the protected site. At least one virtual machine must be located on a data store that is replicated by RecoverPoint at the protected site. The protected and recovery sites must be connected by a reliable IP network. Storage arrays may have additional network requirements. The recovery site must have access to the same public and private networks as the protected site, although the recovery site does not necessarily require access to the same range of network addresses. VMware tools must be installed on all virtual machines. When designing a disaster recovery solution, it is essential that all software components are configured and working correctly. Careful planning and coordination of IT personnel responsible for these activities is also essential for a successful recovery plan execution for a planned or unplanned failover. This ensures the shortest RTO and compliance with SLAs. SRM has several requirements for the vSphere configuration at each site: Each site must include a vCenter server containing at least one vSphere data center. The recovery site must support array‐based replication with the protected site, and the recovery site must have hardware and network resources that can support the same virtual machines and workloads as the protected site. At least one virtual machine must be located on a data store that is replicated by RecoverPoint at the protected site. The protected and recovery sites must be connected by a reliable IP network. Storage arrays may have additional network requirements. The recovery site must have access to the same public and private networks as the protected site, although the recovery site does not necessarily require access to the same range of network addresses. VMware tools must be installed on all virtual machines.
31
SRM Configuration Steps
Install and configure SRM with the RecoverPoint SRA adapter. Configure the connection between the protected and recovery sites. Configure RecoverPoint array managers. Configure resource inventory mappings. Configure protection groups. Create the recovery plan. Customize virtual machine recovery options, including IP customization. Installing and configuring VMware vCenter Site Recovery Manager (SRM) for automated site recovery using RecoverPoint involves these steps: Install and configure SRM with the RecoverPoint SRA adapter. Configure the connection between the protected and recovery sites. Configure RecoverPoint array managers. Configure resource inventory mappings. Configure protection groups. Create the recovery plan. Customize virtual machine recovery options, including IP customization. Most of these steps are executed from the SRM interface in vCenter, where intelligent wizards support quick and easy configuration.
32
SRM Protection Groups An SRM protection group is a collection of virtual machines that all use the same datastore group (the same set of replicated LUNs) and that all fail over together. For this solution, an SRM protection group is created for each Exchange Mailbox VM. A protection group is a collection of virtual machines that all use the same datastore group (the same set of replicated LUNs) and that all fail over together. To create a protection group, select the data store groups to protect, and specify a data store group at the recovery site where SRM can create placeholders for members of the protection group. Use the Create Protection Group wizard to do this. The wizard automatically detects the data stores currently protected by RecoverPoint and allows you to select the ones to include in the protection group. Eight protection groups were created for this solution, one for each Exchange Mailbox server virtual machine
33
SRM Protection Groups (cont.)
After a protection group is created, shadow VMs are automatically created in the recovery site vCenter inventory. They act as placeholders for the virtual machines that can be failed over with SRM. The shadow VMs cannot be started independently from SRM and are removed if their protection group is deleted. After a protection group is created, shadow virtual machines are automatically created in the recovery site vCenter inventory. These act as placeholders for the virtual machines that can be failed over with SRM. The shadow virtual machines cannot be started independently from SRM and are removed if their protection group is deleted.
34
SRM Configuration — Recovery Plans
When designing a disaster recovery solution, it is essential that all software components are configured and working correctly. Careful planning and coordination of IT personnel responsible for these activities is also essential for a successful recovery plan execution for a planned or unplanned failover. This ensures the shortest RTO and compliance with SLAs. As shown in the picture, ten recovery plans are created for this solution, two to fail over the entire environment from each site, and the other eight to fail over individual Exchange Mailbox server virtual machines independently. The figure also shows the SRM option for creating a recovery plan that contains individual protection groups from one of the sites.
35
SRM 5 Features for Multi-subnet Data Centers
Recovery and Reprotect options IP customization Power-on priority and dependencies The recovery options for individual virtual machines can be customized to satisfy specific requirements. The available customization options constitute a powerful tool for ensuring that complex recovery plans can be implemented with ease. For example, if there are multiple virtual machines in a protection group, their startup sequence can be customized by assigning them different recovery priorities. Options include the ability to change virtual machine IP addresses so that they point to specific subnets at the recovery site. Options can also be set to run specific pre- or post-processing scripts before or after the execution of a recovery plan. To ensure that the entire Exchange DAG at the protected site successfully fails over to the recovery site, the IP address of each Mailbox server virtual machine at the recovery site is specified. The original IP addresses of the protected virtual machines are also specified to ensure successful and seamless failback. The picture shows VMs customization wizard for one Exchange Mailbox server virtual machine where the IP addresses and DNS information are specified.
36
Changing the DAG IP address at the Recovery Site
Set-DatabaseAvailabilityGroup -Identity <DAG Name>-DatabaseAvailabilityGroupIPAddresses <new DAG IP-address at recovery site> It is essential to ensure that the Exchange DAG IP address is updated in the DNS before the first Mailbox server member of the DAG is powered on at the recovery site. If the DNS is not updated, the underlying Windows Failover Cluster DAG resource name is not brought online. The DAG node with the “Cluster Group” acts as the Primary Active Manager. The Primary Active Manager is the DAG node responsible for choosing which databases are activated in a failover. For more information on the Primary Active Manager, refer to the Microsoft TechNet article at us/library/dd aspx. To change the DAG IP address, run the following Exchange PowerShell command on the SRM server. Set-DatabaseAvailabilityGroup -Identity <DAG Name> -DatabaseAvailabilityGroupIPAddresses <new DAG ip-address at recovery site> <DAG Name> is the name of the Exchange DAG from Site A (in this solution it is <DAG-Site-A>), and <ip-address> is the address provided for the recovery site subnet that is different from the protected site subnet. Note: Exchange Management Tools must be installed on the SRM server in order to run Exchange Management Shell cmdlets. To simplify the DAG IP address change process during SRM failover, an additional step that calls an Exchange PowerShell script can be added to a recovery plan. The picture shows a step that issues a PowerShell command to call a batch file to change the DAG IP address.
37
Testing an SRM Recovery Plan
Recovery plans can be tested at any time without interrupting replication or the business’s RPO. Predict how much time the recovery of the virtual environment is likely to take. Administrators can quickly access a replica of the entire protected environment for testing purposes. Automatically generated summary reports can be used to demonstrate compliance with regulatory requirements. Having a disaster recovery plan in place is the first step toward ensuring that you are prepared for the worst-case scenario. However, the only way to know if your disaster recovery plan works is to test it. The ability of SRM to execute recovery plans in test mode allows you to do this. Recovery plans can be tested at any time without interrupting replication or the business’s RPO. Accurate time measurement of recovery plan testing enables administrators to predict how much time the recovery of the virtual environment is likely to take. Administrators can quickly access a replica of the entire protected environment for testing purposes. Automatically generated summary reports can be used to demonstrate compliance with regulatory requirements. SRM Recovery Plan Time elapsed (minutes) RTO Met? Single Mailbox server 17 Yes Four Mailbox servers in a DAG 25
38
Testing an SRM Recovery Plan (cont.)
39
SRM Site Failover Test Results
Failover of all Mailbox Servers from Site A to Site B Results include DNS name change and IP changes SRM recovery plan Time elapsed (min) RTO met? Four Mailbox servers in a DAG 38 Yes
40
Site Failback with SRM SRM 5 provides automatic failback functionality
RecoverPoint SRA version 2.0 integrates with SRM 5 This functionality enables you to perform an automated failback operation after a planned or unplanned failover VMware Site Recovery Manager version 5 now provides automatic failback functionality. This functionality enables you to perform an automated failback operation after a planned or unplanned failover. A “reprotect” process automatically reverses replication and synchronizes any changes from the recovery site back to the protected site. This picture shows a screen shot from the SRM Management Console, from which you can initiate a failback operation. vCenter SRM, integrated with the RecoverPoint SRA plug-in, automatically initiates storage replication. In the case of an unplanned failover following an actual disaster, if all storage was lost at the stricken site (the site required a total rebuild), it is likely that a very large amount of data will need to be replicated from the recovery site back to the newly rebuilt, original protected site. This process is time consuming and must be very carefully planned. Even though you can use the SRM automatic reprotect process to initiate resynchronization of the data and configuration settings automatically, you can instead exercise greater control over the resynchronization by initiating replication using the RecoverPoint or Unisphere management consoles. Note: For the purposes of validating this solution, the protected data was not destroyed following the failover. Thus, the failback and reprotect operation required RecoverPoint to replicate only minimal changes to the data. The SRM failback process for the entire Exchange DAG with two copies took just over 30 minutes.
41
SRM Failback Process Verify all network communications between the recovery site and the original protected site. Verify connectivity between vCenter servers. Ensure all resource mappings are in place. Ensure no changes need to be made to SRM protection groups or protected virtual machines. Modify the recovery plan to update any custom scripts. For example, update Exchange DAG IP addresses that have changed. Also update CAS array IP addresses if the CAS array was previously replicated. To perform failback: Incrementally—Use the RecoverPoint or Unisphere management console to replicate data and resynchronize configuration settings from the recovery site back to the originally protected site. Automatically—Initiate “Reprotect” from the SRM management interface. When failback process completed, ensure that all Microsoft Exchange services are started and the databases are mounted. Verify flow and user access. For step 5: Modify the recovery plan to update any custom scripts. For example, update Exchange DAG IP addresses that have changed. Also update CAS array IP addresses if the CAS array was previously replicated. Note: For the purposes of validating this solution, Exchange CAS/HUB servers were not replicated. Instead, the CAS/HUB servers deployed at each site provided enough resources to support users from other site following a failover event. For step 6: To perform failback incrementally, specifying which data to replicate and in what sequence, use the RecoverPoint or Unisphere management console to replicate data and resynchronize configuration settings from the recovery site back to the originally protected site. Alternatively, to have SRM perform the failback automatically, initiate “Reprotect” from the SRM management interface.
42
Recovery with Replication Manager
43
RecoverPoint Journal Usage With Exchange Server 2010 Under Load
LoadGen workload for 5,000 users per server with 150 sent/received messages per user Hourly application-consistent (VSS) bookmarks for a RecoverPoint consistency group with all of the LUNs for a single Mailbox server with 10 databases Automatic crash-consistent (VSS) bookmarks for a RecoverPoint consistency group with all of the LUNs for a single Mailbox server with 10 databases Nightly scheduled backup job initiated by Replication Manager The numbers presented in this table are intended to serve only as an approximate guideline to size RecoverPoint journals for Exchange It is recommended to size journals based on actual change rates in your production environment. The results presented in the table are based on the following: LoadGen workload for 5,000 users per server with 150 sent/received messages per user Hourly application-consistent (VSS) bookmarks for a RecoverPoint consistency group with all of the LUNs for a single Mailbox server with 10 databases Automatic crash-consistent (VSS) bookmarks for a RecoverPoint consistency group with all of the LUNs for a single Mailbox server with 10 databases Nightly scheduled backup job initiated by Replication Manager Percentages are based on a 375 GB usable journal space (500 GB minus 25 percent reserved by RecoverPoint) provisioned for each consistency group and 13.6 TB of replicated Exchange data (80 percent of 17 TB provisioned storage). Test timeline Journal usage (GB) Journal usage per user (MB) Journal usage (percentage)* Start 0.20 ---- 0.05 4 hrs. (day time) 113 23 MB 30.1 8 hrs. (day time) 174 36 MB 46.4 12 hrs. (night time) 184 38 MB 49.0 16 hrs. (night time) 228 47 MB 60.8 20 hrs. (day time) 279 57 MB 74.4 24 hrs. (day time) 345 71 MB 92.0 *Percentages are based on a 375 GB usable journal space (500 GB minus 25 percent reserved by RecoverPoint) provisioned for each consistency group and 13.6 TB of replicated Exchange data (80 percent of 17 TB provisioned storage).
44
DPA Reporting — RP Journal Utilization
45
Press In Case of Disaster Summary
VMware vCenter Site Recovery Manager and EMC RecoverPoint make Exchange 2010 Disaster Recovery… Rapid Automate disaster recovery process Eliminate complexities of traditional recovery Reliable Ensure proper execution of recovery plans Enable easier, more frequent tests Manageable Centrally manage recovery plans Make plans dynamic to match environment Affordable Utilize recovery site infrastructure Reduce management costs Press In Case of Disaster White paper for this solution is published here: servr2010.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.