Exchange high availability

Exchange high availability
Speaker name Title Microsoft Corporation

Email is business critical
is mission critical for most organizations. IT administrators are faced with the challenge of keeping the service highly available for their businesses, and users - and with high integrity. For many businesses, if doesn’t work, the business stops.

Information explosion
A clear challenge is that mailboxes have only gotten bigger over time. In the next decade, data will increase by 44x and over the next year we’ll have generated more data than in all of mankind’s history. However, the number of IT professionals will only have grown by 1.4 times. Most of this business critical data lies in users’ inboxes. IT administrators now need to deal with maintaining and keeping a greater volume of data highly available, ensuring it’s backed up and easily restored if an error or disaster occurs but are more pressured than ever to reduce costs and IT administrative overhead. It’s a challenging situation for IT departments who want to reduce stress, costs and overhead. And as IT administrators, you don’t really care about monitoring, backups, disaster recovery: you care about keeping your service up for your users. 1.4X

Evolved architecture with Exchange building block model Help simplify deployments Integrate availability throughout the system Exchange Building Block Model: The Exchange building block model simplifies Exchange deployments at all scales, standardizes high availability and client load balancing, and improves cross-version interoperability. As an IT administrator, your focus is not necessarily backups, monitoring or disaster recovery. Your focus is to help keep your service available for your business. We have created a system that: Is flexible and efficient to allow deployment on a wide range of hardware Enables large, low cost mailboxes, but Provides a single solution for high availability, business continuity, data protection and backups Helps isolate failures with built in monitoring and availability management Helps you reduce risk and focus on your business

Architecture overview
Exchange building blocks Client Access Server comprises of client protocols and SMTP Mailbox Server hosts all components to process, render and store data Internet Enterprise network Layer 4 load balancer Edge CAS MBX In the new version of Exchange we envision two basic building blocks within Exchange – the Client Access Server or CAS and the Mailbox Server. CAS is comprised of two components: client protocols and SMTP. A CAS array is a series of thin, stateless servers from a protocol session perspective. Because they are stateless, they do not require session affinity or layer 7 load balancing. They are designed to work with TCP affinity or layer 4 load balancing which is protocol unaware. This is important because this provides flexibility and choice with respect to load balancing and high availability. It increases the capability/utilization of the LB as you won’t have to do SSL processing, session cookie processing, etc – it reduces complexity and cost. CAS has the logic to route all protocol requests to the correct back end or mailbox server, even older versions of Exchange. It is domain joined, meaning it is not an edge or gateway server. From a functionality perspective, we want to avoid dependencies between functionalities CAS and MBX so that we enable independent upgrade the two and allow cross-version interaction, which is critical to making the upgrade/coexistence story simple and flexible for customers. In terms of deployment flexibility this also means that there is no expectation that CAS needs to be in the same location as MBX in Exchange. Many customers will have them in the same sites but some large organizations may want the flexibility to consolidate CAS or consolidate MBX. Meanwhile, the mailbox servers host all components that process, render and store data – RPC CA, OWA, RPC proxy, transport, UM, etc) Clients do not connect directly to MBXservers; connectivity is through CAS. MBX servers are the evolution of what we provided in Exchange 2010 with a DAG; a collection of these servers form an HA unit. Remote clients & devices Local clients PBX

Simplified high availability Coordinated recovery management Managed availability Simplified High Availability We’ve simplified the Exchange architecture and made it more flexible and scalable. More importantly, from an HA perspective, we’re made all of the core roles use the same high availability model. This makes it easier to set up cross-site high availability not only for core mailbox data that resides in the mailbox store but transport as well. Every DAG represents a transport HA boundary and its own HA implementation; if you stretch a DAG across sites, you also have transport site resilience. Coordinated Recovery Management High availability is integrated throughout the product without having to set up HA solutions for separate aspects of the server due to the new flexible architecture. Managed Availability: We have a built-in monitoring and availability management solution that’s tied together and can be used to make decisions on whether to perform a database failover. Plus, choosing the best database copy includes the health of the entire protocol stack. This should reduce failover times and also reduce complexity, actionless alerts and help maintain the service until administrator intervention is required.

High availability and business continuity
Chicago San Jose DB1 DB2 DB3 DB4 DB1 DB2 DB3 DB4 is mission critical and IT staff is tasked with helping to keep data highly available. This has been a challenging tasks; in the past when customers needed to deploy expensive and extensive shared-storage clustering, rely on third party data replication products or simply deal with traditional back ups and recovery. Database Availability Group (DAG): Evolved from Exchange Server 2010, a database availability group (DAG) is a collection of mailbox servers that form a high availability unit. These servers use continuous replication to update database copies, communicate to manage failures and can provide automatic failover to recover from a variety of issues that can affect individual components, databases, servers and datacenters. Databases are replicated between servers in a given DAG and this can extend to servers in different locations and datacenters. All core Exchange functionality rendered for a given end user mailbox is served by the Exchange server where that mailbox’s database is currently activated. For an end user, mailbox access fails over when a database fails over. Evolution of E2010 DAG Collection of servers that form a HA unit Databases are replicated between servers in a given DAG Servers can be in different locations, for site resiliency All core Exchange functionality for a given mailbox is served by the MBX server where that mailbox’s database is currently activated Mailbox access fails over when a database fails over

Coordinated recovery management
Database availability groups help keep more than just data available. Because all core Exchange functionality rendered for a given end user mailbox is served by the Exchange server where that mailbox’s database is currently activated, recovery across functionality is coordinated for the mailbox. For an end user, mailbox access fails over when a database fails over. Recovery is simplified and it is coordinated. This means that when a database fails over, client access, hub transport and mailbox data moves together during database failovers or switchovers. Protocols shift to the server that is hosting the active database copy. Transport resilience: Every message is redundantly persisted before its receipt is acknowledged to the sender. Delivered messages are kept redundant in transport similar to active messages Resubmits due to transport DB loss or MDB failover are fully automatic and do not require any manual involvement Every message is redundantly persisted before its receipt is acknowledged to the sender (where the sender can be an external MTA, a user mailbox, an Exchange server of a different version and so on). The primary goal of Transport HA is to prevent message loss due to various outages. Making messages redundant before their receipt is acknowledged eliminates the Exchange 2010 Shadow Redundancy dependency on multiple hops and protocol extensions; it also eliminates the need for Delayed ACK for HA-unaware senders. Basically this works by redundantly persisting a copy of the message elsewhere in the DAG prior issuing a response to the DATA verb. Every DAG represents a Transport HA boundary and owns it’s HA implementation. This tenet scopes down the number of servers that can possibly hold a redundant copy of a given message and makes resubmits more manageable. This also provides for a simple versioning story where future versions can have different HA implementation without any back-port requirements. And the fact that Mailbox Store and Transport end up with the same HA boundary is a big plus. Delivered messages (Dumpster/SafetyNet) are kept redundant in transport similar to active messages. Hub Transport functionality may be co-located with the Mailbox Store it delivers a given message to, so transport has to maintain redundant copies of dumpster messages in case the entire server goes down. The Safety Net keeps messages over a long period of time and having a single copy makes it likely that some data will be missing due to a prior outage when a Safety Net resubmit is requested. Resubmits due to Transport DB loss or MDB failover are fully automatic and do not require any manual involvement.

High availability moves
Users remain online while their mailboxes are moved Administrators can perform maintenance during regular hours IT administrators commonly move mailboxes between servers and databases as part of maintenance activities, when introducing new servers, when moving users to new Exchange versions or when migrating to Exchange Online. With legacy versions of Exchange such as Exchange 2003, when an administrator moves a mailbox, it often takes the user offline during the move, which leaves them with no access during the period. Since mailbox sizes are larger than ever, mailboxes take a longer time to move and administrators have needed to perform moves in off-hours to minimize disruptions. Starting in Exchange 2010 and continuing in the latest version of Exchange, users can remain online while mailboxes are being moved between servers or to the cloud and can continue to: Send messages Receive messages Access the entire mailbox Administrators can perform migration and maintenance during regular hours since users are connected and administrators no longer need to sacrifice their weekends or evenings to perform maintenance and upgrades. Exchange 2007 SP3 also allows for online moves.

Managed availability Exchange integrates monitoring with recovery- oriented, high availability features. Managed Availability: With managed availability, internal monitoring and recovery-oriented features are tightly integrated to help prevent failures, proactively restore services, initiate server failovers automatically or alert administrators to take action. The focus is on monitoring and managing the end user experience rather than just server and component uptime to help keep the service continuously available.

Manage availability, not just uptime
High availability User experience focused Cloud lessons learned High availability User experience focused Lessons learned from the cloud Exchange introduces a new concept: managed availability. This stems from the fact that IT pros don’t necessarily want to focus on managing components or features such as monitoring – they want to focus on helping keep the service available for their end users. Managed availability helps: Keep the service continuously available and provides: Recovery-oriented features that are self healing to help prevent failures Layered monitoring that proactively restores services or drives human interactions. No one wants an alert for alert’s sake unless it’s actionable. Managed availability is user-experience focused which means: Monitoring is based on the end user’s experience, not just server uptime Synthetic transactions that factor in service degradation and errors in addition to availability or uptime Exchange has improved upon high availability features introduced in previous version of Exchange, based on lessons learned with Exchange Online. The Exchange engineering team helps run Exchange Online. They have brought the learning from running the service at scale with millions of mailboxes to the improvement of on-premises product. Because it was built from cloud lessons, Exchange is built and optimized for scale, simple deployment and high availability Managed Availability: With managed availability, internal monitoring and recovery-oriented features are tightly integrated to help prevent failures, proactively restore services, initiate server failovers automatically or alert administrators to take action. Exchange provides features that allow IT pros to focus on monitoring and managing the end user experience rather than just server and component uptime.

Focus on high availability
Exchange health manager Monitors the state of health with probes and synthetic transactions Performs system checks to measure traffic and failure thresholds Takes action to restore services, prevent failures or send an actionable alert Check Escalate Probe Monitor Recover Local Health Manager: A managed availability component on an Exchange server that monitors the state of its health by probing to measure a user experience through synthetic transactions, performing system checks to measure traffic and failure thresholds through performance counters, and taking action to restore services or prevent failures. Alerts are sent using Systems Center. SCOM is the portal for notifications. In other words, “Stuff breaks but the experience doesn’t have to.” Infrastructure includes four key components: Probes – synthetic transactions that perform tasks and look at performance counters, events, etc. Monitors – similar to a monitor in SCOM in that initiates an action if certain criteria is met. The action may be to recover, or escalate to an administrator by throwing an alert Notifications – a means by which the system/admin can override the probe and trigger an immediate response Recover service – process by which recovery or repair is performed to restore service or prevent failure (e.g., restart service or application pool, perform a failover, bugcheck the OS, etc.) Notify

Focus on the end user experience
What does the user see? Availability measures if users can access the service Performance issues such as latency negatively impacts the user experience Errors get in the way of users accomplishing tasks Those checks and probes help the Exchange Health Manager view the system from an end user perspective. Server uptime or individual component uptime does not necessarily mean that the system is healthy or performing consistently. The system has layers of monitoring which do self-checks which measure: Availability – is the service accessible? Can users get in the door? Latency or performance – how is that experience? Can users get in but the system is slow? Errors – are users able to accomplish what they want? Or are errors being thrown? PROBES The key goal is to measure the customer’s perception of the service. These are typically synthetic end to end customer transactions. CHECKS The key goal is to measure actual customer traffic and become aware when they are experiencing issues. These are typically implemented as performance counters where thresholds can be set to detect spikes in customer failures. NOTIFY The key goal is to take action immediately based on a critical event. These are typically exceptions or conditions that can be detected without a large sample set. Check Probe Notify

Monitoring layers CAS Array Database Availability Group Intervals
20 sec 5 min 20 min Proxy self check System Checks Managed availability checkpoints Protocol self check Mailbox self check Experience level check Redundancy groups Corresponding group These tests determine viability of various components on backend server Database connectivity and replication Protocol services (Outlook, OWA, EAS, IMAP, POP) Recommend HA actions when service-impacting condition found Database failover Restart service Restart computer Escalate when auto recovery unsuccessful and service not restored Integration with System Center to raise awareness of service-impacting conditions that cannot be automatically resolved Tests occur at different intervals System Level Checks Proxy Self Test – PrST (CAS or CAS array redundancy group) (e.g. OWA PrST) [detection 20 secs] Protocol Self Test – PST (CAS and mailbox) (e.g. OWA PST) [detection 20 secs] Mailbox Self Test – MST (mailbox or DAG redundancy group) (e.g. OWA MST) [detection 5m] End User Experience Level Checks Customer Touch Point – CTP (e.g. OWA CTP) [detection 20m] Redundancy groups The core principle here is that the system can at any time sideline a single server within a grouping without impacting the user experience. System health is therefore managed and measured at the group level. The health of the group will be evaluated by a “worst of” evaluation of the servers in the group of the servers in the group. A server can be in one of four states: functional, degraded, failed, or sidelined (in repair) state. Rollup health into two grouping types: DAGs – collections of MBX servers Arrays – collection of CAS15 servers CAS Array Database Availability Group

Server health index Rollout of health of server components
Get-HealthSet –Server EXCHMBX01 -HealthSet All HealthGroup HealthSet State LastTransitionTime OWA CTP Functional 9:53 AM 8/15/2011 Outlook CTP Functional 9:53 AM 8/15/2011 Mobile CTP Functional 9:53 AM 8/15/2011 UM CTP Functional 9:53 AM 8/15/2011 Provision CTP Functional 9:53 AM 8/15/2011 MRS SvcHealth Functional 9:53 AM 8/15/2011 OAB SvcHealth Functional 9:53 AM 8/15/2011 DiskSpace SrvHealth Functional 9:53 AM 8/15/2011 IOPs SrvHealth Functional 9:53 AM 8/15/2011 Memory SrvHealth Functional 9:53 AM 8/15/2011 CPU SrvHealth Functional 9:53 AM 8/15/2011 AD Dependency Functional 9:53 AM 8/15/2011 DNS Dependency Functional 9:53 AM 8/15/2011 Rollout of health of server components Components can be: functional, degraded, failed or sidelined Health set categories: -Customer touch points -Service components -Server components -Dependency availability Server Health Index The server health index is a rollup of the health of the server components A server component can be in one of four states: functional, degraded, failed, or sidelined state. These are organized into four Health Set categories: Customer Touch Points – components which effect the real time, customer facing interactions (OWA, OLK, Mobile, UM, etc…) Service Components – components without direct real time, customer interactions (MRS, OABGen) Server Components – physical resources of the physical server (disk space, memory, network) Dependency Availability – server’s ability to call out to dependencies (AD, DNS, etc…)

Summary Simplified architecture helps make deploying high availability better than ever Coordinated recovery management integrated throughout Exchange Managed availability ties in monitoring and focuses on the experience, not just uptime Simplified Coordinated recovery Managed availability Cloud-tested Simplified High Availability In summary, we’ve simplified the Exchange architecture so it is flexible and helps scale. This architecture makes deploying an HA solution and deploying cross- site business continuity simpler than before without multiple solutions or configuration across components. Coordinated Recovery Management High availability is integrated throughout Exchange without having to set up HA solutions for separate aspects of the server due to the new flexible architecture. This means that when a database fails over, client access, hub transport and mailbox data moves together during database failovers or switchovers. Protocols shift to the server that is hosting the active database copy. Managed Availability: Monitoring and availability management is integrated tightly and is focused not just on uptime, but on performance and the end user experience. It will help reduce management overhead and makes alerts more focused and actionable. Even better, this solution was directly built based on lessons learned from Exchange Online so the solution is cloud tested and trained

9/10/2018 © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Exchange high availability

Similar presentations

Presentation on theme: "Exchange high availability"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Exchange high availability

Similar presentations

Presentation on theme: "Exchange high availability"— Presentation transcript:

Similar presentations

About project

Feedback