Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scott Schnoll Exchange Server 2013 High Availability.

Similar presentations


Presentation on theme: "Scott Schnoll Exchange Server 2013 High Availability."— Presentation transcript:

1 Scott Schnoll Exchange Server 2013 High Availability

2 Agenda DAG Architecture HA Changes in Exchange 2013
Monitoring and Server Maintenance Best Copy and Server Selection

3 DAG Architecture

4 DAG Replication Service
Introduced in Exchange 2007 RTM Microsoft Exchange Replication service | MSExchangeRepl MSExchangeRepl.exe Runs on all Mailbox servers (not just DAG members) Communicates with Active Directory and other DAG members Includes 16 components Active Directory lookup Replay RPC server wrapper TPR API manager Copy status lookup Remote data provider wrapper Support API manager Replay core manager VssWriter Server locator manager Seed manager Active Manager Health state tracker Autoreseed manager Active Manager RPC server wrapper Disk reclaimer manager Failure item manager

5 DAG Management Service
TechReady 17 5/3/2018 DAG Management Service Introduced in RTM CU2 Microsoft Exchange DAG Management service | MSExchangeDagMgmt MSExchangeDagMgmt.exe Runs on all Mailbox servers (not just DAG members) Communicates with Active Directory and other DAG members Includes 4 components Active Directory lookup Copy status lookup Monitoring Tracer instance © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6 DAG Management Service
TechReady 17 5/3/2018 DAG Management Service Writes events to same place as Replication service Application event log (source of MSExchangeRepl) HighAvailability crimson channel Created for two primary reasons: so the Replication service can have more focused functionality so Managed Availability actions can kill lower-priority activities Other functions will move to this service AutoReseed Disk Reclaimer Future AutoDAG copy layout and mobility features © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7 Cluster Service Introduced in NT Server Enterprise Edition (1997)
TechReady 17 5/3/2018 Cluster Service Introduced in NT Server Enterprise Edition (1997) Cluster Service | ClusSvc Clussvc.exe Exchange DAGs use several Cluster components Quorum Membership and Node Management Networks and Heartbeating Cluster Registry © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 Cluster Service Quorum is required in order to mount databases
Quorum is based on votes, not membership Voting can be rigged Votes can be taken away manually or dynamically Exchange manages quorum model, not quorum Exchange management of quorum model based on nodes, not votes Removing votes requires manual configuration of quorum model Exchange will make incorrect quorum model management decisions if votes are manually removed at the cluster level

9 Cluster Registry Active Manager stores database / server information in the cluster registry for DAG members Registry changes are replicated immediately to all DAG members Stored information is used as part of BCSS

10 Cluster Registry ActiveServer LastMountServer LastMountedTime
IsEntryExist?True*ActiveServer?ex2*LastMountedServer?ex2*LastMountedTime? T22:29:39*MountStatus?Mounted*IsAdminDismounted?False*IsAutomaticActionsAllowe d?True* ActiveServer Name of the server where the database is currently mounted or is expected to be mounted when mount operations complete LastMountServer The name of the server where the database was last successfully mounted LastMountedTime The date and time stamp of the last time the database was mounted

11 Cluster Registry MountStatus IsAdminDismounted
IsEntryExist?True*ActiveServer?ex2*LastMountedServer?ex2*LastMountedTime? T22:29:39*MountStatus?Mounted*IsAdminDismounted?False*IsAutomaticActionsAllowed?True* MountStatus The current mount status for the database Possible values are mounted / dismounted IsAdminDismounted Designates whether the current dismounted status of the database is the result of administrator action Possible values are True / False IsAutomaticActionsAllowed Designates whether the database can be automatically activated by AM

12 Cluster Registry Last Log
Entry for each database copy in the DAG (named by the database GUID) Stores the last sequence number of the last generated log (in decimal)

13 Crimson Channel Applications and Services logs
Area of Windows Server event log used by applications for logging and internal communication These logs store events from a single application or component rather than events that might have system-wide impact This is referred to as an application's crimson channel Exchange 2013 has multiple channels ActiveMonitoring HighAvailability MailboxDatabaseFailureItems ManagedAvailability PushNotifications Troubleshooters

14 Crimson Channel

15 HA Changes in Exchange 2013

16 HA Changes in Exchange Server 2013
Exchange can automatically recovery from Disk Failures Network Failures Server Failures Datacenter Failures Failover time decreased by 50% over Exchange 2010 58% faster reseeds when using multiple databases per volume

17 MEC 2014 5/3/2018 6:14 PM © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 DAGs without Cluster Admin Access Points
MEC 2014 5/3/2018 6:14 PM DAGs without Cluster Admin Access Points Easier deployment and management Fewer things that can fail © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19 Lagged Copy Management
MEC 2014 5/3/2018 6:14 PM Lagged Copy Management Failed backups are worse than no backups Lagged database copies will play forward beyond their configured value when: A database has a bad page and needs a patch There isn’t enough space to keep all the logs There is a risk of losing all available copies of a database © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 AutoReseed overview Restoring redundancy, so you don’t have to
MEC 2014 5/3/2018 6:14 PM AutoReseed overview Restoring redundancy, so you don’t have to Configured by setting mount points for volumes In-Use Storage X Spares © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

21 AutoReseed – why? Forget about replacing disks as they fail
MEC 2014 5/3/2018 6:14 PM AutoReseed – why? Forget about replacing disks as they fail Probability you’ll need to replace more than monthly: =(1-BINOM.DIST(spares + 1, disks per server, AFR/12, TRUE))*servers © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

22 Recovering from storage failures
Exchange Server 2010 ESE database hung IO (4 min) Crimson channel heartbeat (30s) System disk heartbeat (2 min) Exchange Server 2013 System bad state (5 min) Long I/O times (.6 min) MSExchangeRepl.exe memory threshold (4 GB) Replication service won’t restart (65 min) Store timeout (1 min) Exchange Server 2013 SP1 Cluster service repeated crashes (60 min)

23 Monitoring and Server Maintenance
5/3/2018 6:14 PM Monitoring and Server Maintenance © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

24 Managed Availability Primary purpose:
MEC 2014 5/3/2018 6:14 PM Managed Availability Primary purpose: Detect service degradation affecting users Attempt to recover from failures If recovery fails – escalate to Exchange administrators © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

25 Managed Availability XYZ_Monitor XYZ_ResetAppPool XYZ_Probe\ Resource1
MEC 2014 5/3/2018 6:14 PM Managed Availability XYZ_ResetAppPool XYZ_Probe\ Resource1 XYZ_Probe\ Resource2 XYZ_Probe\ Resource3 XYZ_Monitor XYZ_Restart XYZ_Restart XYZ_Failover Monitor engine: contains business logic to evaluate health of customer- impacting features XYZ_Reboot XYZ_Escalate Probe engine: data collection and notifications mechanism, feeding into… Responder engine: set of recovery actions that can be taken to recover degraded state of the monitored resource © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

26 Managed Availability Get-ServerHealth provides status of all monitors tracking a particular server Get-HealthReport provides a rollup of health sets for a server or for group of servers Complete set of monitors, probes and responders can be found in Windows crimson log channel

27 HA Managed Availability
HA uses MA to monitor data redundancy, cluster health, physical storage health and database logical corruption HA Probes, Monitors and Responders are grouped into DataProtection and Clustering Health Sets

28 ServerOneCopyMonitor
MEC 2014 5/3/2018 6:14 PM ServerOneCopyMonitor ServerOneCopyMonitor: HA’s most important redundancy protection Once a minute each database on a server is checked: Copy is (Healthy || Mounted) && ServerComponentState is NOT Offline && Copy is NOT Activation Blocked && Server is NOT exceeding MaxActive && Copy Queue Length < MountDial && Server is NOT Activation Disabled © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

29 HA Monitors – ServerOneCopyMonitor
30 consecutive failures are considered an Escalating condition Immediately after that OneCopyMonitor is notified and becomes Unhealthy OneCopyMonitor Healthy OneCopyMonitor UNHEALTHY OneCopyEscalate OneCopy notification 1 2 3 30

30 ServiceHealthMSExchangeReplEndpointMonitor
MEC 2014 5/3/2018 6:14 PM ServiceHealthMSExchangeReplEndpointMonitor Three probes and five responders RestartResponder ReplEndpointProbe\ RPC RestartResponder2 ReplEndpontMonitor ReplEndpointProbe\ TCP FailoverResponder RebootResponder ReplEndpointProbe\ ServerLocator EscalateResponder © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

31 DAG Member Server Maintenance
In Exchange 2013, the story is a little bit more complicated than Exchange 2010 Mailbox Server has multiple roles installed To prevent outages, we need to make sure the server is not serving any client protocols

32 Exchange 2013 server maintenance
Put server into maintenance mode Set Transport and UM to draining their queues Set messaging redirection to (preferably) another server in the DAG Suspend cluster node Set server to be Activation Disabled Set server to be Activation Blocked Set all ServerComponentStates Offline Confirm All ServerComponentStates are offline Server is activation blocked and activation disabled Cluster node is “Paused” Transport queues are empty

33 Best Copy and Server Selection
5/3/2018 6:14 PM Best Copy and Server Selection © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

34 Best Copy and Server Selection
Tech Ready 15 5/3/2018 Best Copy and Server Selection What’s the same as Exchange 2010? Still an Active Manager algorithm Performed at *over time Uses extracted system health Same replication criteria and phases What’s different in Exchange 2013? Cap replay queue to limit mount time New max actives soft limit BCS criteria includes protocol stack health Protocol health prioritized to control impact Tuned replication health criteria thresholds MA failover responder targets not worse server © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

35 Activation controls Load management limits
Controls server max load Server-level activation controls Controls server usage Database-level activation control Prevent copy activation – questionable database copy?

36 Load management limits
MEC 2014 5/3/2018 6:14 PM Load management limits MaximumActiveDatabases Hard limit for activation– i.e. worst case Enforced by BCS Dismount databases over limit Control “exceptional failure” load Set to most databases you want per server Follow role requirements calculator guidance MaximumPreferredActiveDatabases Soft limit for activation – added in SP1 Copies deprioritized in BCS Catalog and copy queue health Failovers can exceed limit Load balancing optimizes to this limit Move-Activ boxDatabase -SkipMaximumActiveDatabaseChecks skips both © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

37 Best Copy and Protocol Health
Normal *over behavior All health sets healthy All medium priority health sets and above are healthy All health sets on target are better than source All health sets on target are the same as source Server health not considered MA failover behavior Skip target if not better than source All health sets healthy All medium priority and above are healthy All health sets better than source server

38 Questions?


Download ppt "Scott Schnoll Exchange Server 2013 High Availability."

Similar presentations


Ads by Google