Download presentation
Presentation is loading. Please wait.
Published byMarcia Goodwin Modified over 9 years ago
2
Scott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC307
3
Agenda Exchange 2010 High Availability Vision/Goals Exchange 2010 High Availability Features Exchange 2010 High Availability Deep Dive Deploying Exchange 2010 High Availability Features Transitioning to Exchange 2010 High Availability High Availability Design Examples
5
Exchange 2010 High Availability Vision and Goals Vision: Deliver a fast, easy-to-deploy and operate, economical solution that can provide messaging service continuity for all customers Goals Deliver a native solution for high availability/site resilience Enable less expensive and less complex storage Simplify administration and reduce support costs Increase end-to-end availability Support Exchange Server 2010 Online Support large mailboxes at low cost
6
DB1 Front End Server NodeB (passive) Outlook OWA, ActiveSync, or Outlook Anywhere San Jose Dallas Standby Cluster Third-party data replication needed for site resilience Complex site resilience and recovery Clustering knowledge required DB2 DB3 DB4 DB5 DB6 Failover at Mailbox server level DB1 DB2 DB3 Clustered Mailbox Server had to be created manually Exchange Server 2003 NodeA (active)
7
DB1 Client Access Server NodeB (passive) SCR Outlook OWA, ActiveSync, or Outlook Anywhere San Jose Dallas Standby Cluster No GUI to manage SCR Complex activation for remote server / datacenter Clustering knowledge required DB2 DB3 DB4 DB5 DB6 DB1 DB2 DB3 DB4 DB5 DB6 Failover at Mailbox server level DB1 DB2 DB3 Clustered Mailbox Server can’t co-exist with other roles Exchange Server 2007 NodeA (active) CCR
8
DB2 DB3 DB2 DB3 DB4 DB5 Client Access Server Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 Mailbox Server 6 Mailbox Server 4 Dallas San Jose Mailbox Server 5 DB5 DB2 DB3 DB4 DB5 DB1 Failover managed by/with Exchange Database level failover Easy to extend across sites All clients connect via CAS servers DB3 DB5 DB1 Exchange Server 2010
10
Exchange 2010 High Availability Terminology High Availability – Solution must provide data availability, service availability, and automatic recovery from failures Disaster Recovery – Process used to manually recover from a failure Site Resilience – Disaster recovery solution used for recovery from site failure *over – Short for switchover/failover; a switchover is a manual activation of one or more databases; a failover is an automatic activation of one or more databases after a failure
11
Exchange 2010 High Availability Feature Names Mailbox Resiliency – Name of Unified High Availability and Site Resilience Solution Database Mobility – The ability of a single mailbox database to be replicated to and mounted on other mailbox servers Incremental Deployment – The ability to deploy high availability /site resilience after Exchange is installed Exchange Third Party Replication API – An Exchange- provided API that enables use of third-party replication for a DAG in lieu of continuous replication
12
Exchange 2010 High Availability Feature Names Database Availability Group – A group of up to 16 Mailbox servers that host a set of replicated databases Mailbox Database Copy – A mailbox database (.edb file and logs) that is either active or passive RPC Client Access service – A Client Access server feature that provides a MAPI endpoint for Outlook clients Shadow Redundancy – A transport feature that provides redundancy for messages for the entire time they are in transit
13
Exchange 2010 *overs Within a datacenter Database or server *overs Datacenter level: switchover Between datacenters Database or server *overs Assumptions: Each datacenter is a separate Active Directory site Each datacenter has live, active messaging services Standby datacenter must be active to support single database *over
14
Exchange 2007 Concepts Brought Forward Extensible Storage Engine (ESE) Databases and log files Continuous Replication Log shipping and replay Database seeding Store service/Replication service Database health and status monitoring Divergence Automatic database mount behavior Concepts of quorum and witness Concepts of *overs
15
Exchange 2010 Cut Concepts Storage Groups Databases identified by the server on which they live Server names as part of database names Clustered Mailbox Servers Pre-installing a Windows Failover Cluster Running Setup in Clustered Mode Moving a CMS network identity between servers Shared Storage Two HA Copy Limits Requirement of Two Networks Concepts of public, private and mixed networks
16
HA/Backup Strategy Changes Accidentally Deleted Items Data Center Failures Administrator Error Mailbox Corruption Long Term Data Retention Long Term Data Retention Mailbox Resiliency Single Item Recovery Personal Archive + Retention Policies Lagged Copy Fast recovery Data redundancy Guaranteed item retention Past point-in-time DB copy Alternate mailbox for older data Fast Recovery Data Retention HW/SW Failures
18
Exchange 2010 HA Fundamentals Database Availability Group Server Database Database Copy Active Manager RPC Client Access DAG
19
Database Availability Group (DAG) Base component of high availability and site resilience A group of up to 16 servers that host a set of replicated databases “Wraps” a Windows Failover Cluster Manages membership (DAG member = node) Provides heartbeat of DAG member servers Active Manager stores data in cluster database Defines a boundary for: Mailbox database replication Database and server *overs Active Manager
20
DAG Requirements Windows Server 2008 SP2 Enterprise Edition or Windows Server 2008 R2 Enterprise Edition Exchange Server 2010 Standard Edition or Exchange Server 2010 Enterprise Edition Standard supports up to 5 databases per server Enterprise supports up to 100 databases per server At least one network card per DAG member
21
Active Manager Exchange component that manages *overs Runs on every server in the DAG Selects best available copy on failovers Is the definitive source of information on where a database is active Stores this information in cluster database Provides this information to other Exchange components (e.g., RPC Client Access and Hub Transport) Two Active Manager roles: PAM and SAM Active Manager client runs on CAS and Hub
22
Active Manager Primary Active Manager (PAM) Runs on the node that owns the cluster group Gets topology change notifications Reacts to server failures Selects the best database copy on *overs Standby Active Manager (SAM) Runs on every other node in the DAG Responds to queries about which server hosts the active copy of the mailbox database Both roles are necessary for automatic recovery If Replication service is stopped, automatic recovery will not happen
23
Active Manager Selection of Active Database Copy Active Manager selects the “best” copy to become active when existing active fails 1. Ignores servers that are unreachable or activation is temporarily or regularly blocked 2. Sorts copies by currency to minimize data loss 3. Breaks ties during sort based on Activation Preference 4. Selects from sorted listed based on copy status of each copy
24
Active Manager Selection of Active Database Copy Active Manager selects the “best” copy to become active when existing active fails CatalogHealthy Copy statusHealthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource CopyQueueLength< 10 ReplayQueueLength< 50 CatalogCrawling Copy statusHealthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource CopyQueueLength< 10 ReplayQueueLength< 50 CatalogHealthy Copy statusHealthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource ReplayQueueLength< 50 CatalogCrawling Copy statusHealthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource ReplayQueueLength< 50 5 Copy statusHealthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource ReplayQueueLength< 50 6 CatalogHealthy Copy statusHealthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource CopyQueueLength< 10 7 CatalogCrawling Copy statusHealthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource CopyQueueLength< 10 8 CatalogHealthy Copy statusHealthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource 9 CatalogCrawling Copy statusHealthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource 10 Copy statusHealthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource
25
Automatic Recovery Process When a failure occurs that affects a database: Active Manager determines the best copy to activate The Replication service on the target server attempts to copy missing log files from the source (ACLL) If successful, then the database will mount with zero data loss If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial setting The mounted database will generate new log files (using the same log generation sequence) Transport Dumpster requests will be initiated for the mounted database to recover lost messages When original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed
26
Example: Database Failover Database failure occurs Failure item is raised Active Manager moves active database Database copy is restored Similar flow within and across datacenters DB2 DB3 DB2 DB3 DB4 DB5 Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 Mailbox Server 4 Mailbox Server 5 DB5 DB2 DB3 DB4 DB5 DB1 DAG
27
Example: Server Failover Server failure occurs Cluster notification of node down Active Manager moves active databases Server is restored Cluster notification of node up Database copies resynchronize with active databases Similar flow within and across datacenters DB2 DB3 DB2 DB3 DB4 DB5 Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 Mailbox Server 4 Mailbox Server 5 DB5 DB2 DB3 DB4 DB5 DB1 DAG
28
Example: RCA service and AM RPC Client Access Server CAS Array MAPI RPC Store Active Manager MAPI RPC Store Active Manager MAPI RPC Store Active Manager DAG Outlook1Outlook3 MAPI RPC Store Active Manager Outlook2 Disk Fails CAS Fails RPC Client Access Server Where’s the DB mounted? Active Manager Returns Mailbox Server1 Outlook’s reconnect triggers new AM request Outlook tries to reconnect If failover is in progress AM returns old server & connect fails DB failover is complete & AM returns new server Outlook tries again Active Manager Client
29
DAG Lifecycle DAG is created initially as empty object in Active Directory Continuous replication or 3 rd party replication using Third Party Replication mode DAG is given a name and one or more IP addresses (or configured to use DHCP) When first Mailbox server is added to a DAG A Windows failover cluster is formed with a Node Majority quorum using the name of the DAG The server is added to the DAG object in Active Directory A cluster network object (CNO) for the DAG is created in the built-in Computers container The Name and IP address of the DAG is registered in DNS The cluster database for the DAG is updated with info on configured databases, including if they are locally active (which they should be)
30
DAG Lifecycle When second and subsequent Mailbox server is added to a DAG The server is joined to cluster for the DAG The quorum model is automatically adjusted Node Majority - DAGs with odd number of members Node and File Share Majority - DAGs with even number of members File share witness cluster resource, directory, and share are automatically created by Exchange when needed The server is added to the DAG object in Active Directory The cluster database for the DAG is updated with info on configured databases, including if they are locally active (which they should be)
31
DAG Lifecycle After servers have been added to a DAG Configure the DAG Network Encryption Network Compression Configure DAG networks Network subnets Enable/disable MAPI traffic/replication Create mailbox database copies Seeding is performed automatically Monitor health and status of database copies Perform switchovers as needed
32
DAG Lifecycle Before you can remove a server from a DAG, you must first remove all replicated databases from the server When a server is removed from a DAG: The server is evicted from the cluster The cluster quorum is adjusted as needed The server is removed from the DAG object in Active Directory Before you can remove a DAG, you must first remove all servers from the DAG
34
Deploying Exchange 2010 HA Features Legacy Deployment Steps (CCR/SCC) 1.Prepare hardware, install proper OS, and update Extra for SCC: configure storage 2.Build Windows Failover Cluster Extra for SCC: configure storage 3.Configure cluster quorum, file share witness, and public and private networks 4.Run Setup in Custom mode and install clustered mailbox server 5.Configure clustered mailbox server Extra for SCC: configure disk resource dependencies 6.Test *overs Legacy Deployment Steps (CCR/SCC)Exchange 2010 Incremental Deployment 1.Prepare hardware, install proper OS, and update Extra for SCC: configure storage 2.Build Windows Failover Cluster Extra for SCC: configure storage 3.Configure cluster quorum, file share witness, and public and private networks 4.Run Setup in Custom mode and install clustered mailbox server 5.Configure clustered mailbox server Extra for SCC: configure disk resource dependencies 6.Test *overs 1.Prepare hardware, install proper OS, and update 2.Run Setup and install Mailbox role 3.Create a DAG and replicate databases 4.Test *overs
35
Exchange 2010 Incremental Deployment New-DatabaseAvailabilityGroup -Name DAG1 –WitnessServer EXHUB1 - WitnessDirectory C:\DAG1FSW -DatabaseAvailablityGroupIpAddresses 10.0.0.8 New-DatabaseAvailabilityGroup -Name DAG2 - DatabaseAvailablityGroupIpAddresses 10.0.0.8,192.168.0.8 Create a DAG New-DatabaseAvailabilityGroup -Name DAG1 –WitnessServer EXHUB1 - WitnessDirectory C:\DAG1FSW -DatabaseAvailablityGroupIpAddresses 10.0.0.8 New-DatabaseAvailabilityGroup -Name DAG2 - DatabaseAvailablityGroupIpAddresses 10.0.0.8,192.168.0.8 Add-DatabaseAvailbilityGroupServer -Identity DAG1 -MailboxServer EXMBX1 Add first Mailbox Server to DAG Add-DatabaseAvailbilityGroupServer -Identity DAG1 -MailboxServer EXMBX1 Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer EXMBX2 Add second and subsequent Mailbox Server Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer EXMBX2 Add-MailboxDatabaseCopy -Identity MBXDB1 -MailboxServer EXMBX3 Add a Mailbox Database Copy Add-MailboxDatabaseCopy -Identity MBXDB1 -MailboxServer EXMBX3 Extend as needed
37
Transition Steps Verify that you meet requirements for Exchange 2010 Deploy Exchange 2010 Use Exchange 2010 mailbox move features to migrate Unsupported Transitions In-place upgrade to Exchange 2010 from any previous version of Exchange Using database portability between Exchange 2010 and non-Exchange 2010 databases Backup and restore of earlier versions of Exchange databases on Exchange 2010 Using continuous replication between Exchange 2010 and Exchange 2007
39
Client Access Hub Transport Mailbox Client Access Hub Transport Mailbox Client Access Hub Transport Mailbox Member servers of DAG can host other server roles DB2 2-server DAGs should use RAID High Availability Design Example Branch/Small Office Design 8 processor cores recommended with a maximum of 64GB RAM UM role not recommended for co-location
40
Single Site 3 HA Copies Database Availability Group Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 3 Nodes X X JBOD -> 3 physical Copies 2 servers out -> manual activation of server 3 In 3 server DAG, quorum is lost DAGs with more servers sustain more failures – greater resiliency High Availability Design Example Double Resilience – Maintenance + DB Failure
41
Database Availability Group (DAG) Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 X Mailbox Server 4 X High Availability Design Example Double Node/Disk Failure Resilience
42
High Availability on JBOD 6 Servers, 3 Racks, 3 Copy DAG DB1 DB1 DB1 DB1 DB1DB2DB3DB4DB5DB6 DB7DB8DB9DB10DB11DB12 DB13DB14DB15DB16DB17DB18 DB19DB20DB21DB22DB23DB24 DB25DB26DB27DB28DB29DB30 Legend Active copyPassive copySpare Disk DB46DB47DB48DB49DB50DB51 DB55DB56DB57DB58DB59DB60 DB64DB65DB66DB67DB68DB69 DB73DB74DB75DB76DB77DB78 DB82DB83DB84DB85DB86DB87 Mbx Server 1 24,000 Mailboxes 4,000 Active Mbxs/Svr 6 Servers, 3 Copies = double server failure resiliency 8 Cores 48 GB RAM 8 Cores 48 GB RAM 2GB Mailbox Size.1 IOPS/Mailbox 1TB 7.2k SATA disks Online Spares (3) Battery Backed Caching Array Controller Heavy Profile: 100 Messages/day JBOD: 48 Disks/node Database Availability Group (DAG) Mbx Server 2 DB52DB53 DB61DB62 DB70DB71 DB79DB80 DB88DB89 DB31DB32 DB34DB35 DB37DB38 DB40DB41 DB43DB44 DB54 DB63 DB72 DB81 DB90 DB33 DB36 DB39 DB42 DB45 MAPI network Replication network 4,000 Active Mbxs/Svr 1 st failure: ~5,000 active 2 nd failure: 6,000 active Soft active limit: 24 288 disks total 30 TB of db space
43
Key Takeaways Greater end-to-end availability with Mailbox Resiliency Unified framework for high availability and site resilience Faster and easier to deploy with Incremental Deployment Reduced TCO with core ESE architecture changes and more storage options Supports large mailboxes for less money
45
www.microsoft.com/teched Sessions On-Demand & Community http://microsoft.com/technet Resources for IT Professionals http://microsoft.com/msdn Resources for Developers www.microsoft.com/learning Microsoft Certification & Training Resources Resources Required Slide Speakers, TechEd 2009 is not producing a DVD. Please announce that attendees can access session recordings at TechEd Online. Required Slide Speakers, TechEd 2009 is not producing a DVD. Please announce that attendees can access session recordings at TechEd Online.
46
Complete an evaluation on CommNet and enter to win an Xbox 360 Elite!
48
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. Required Slide
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.