Copyright© Microsoft Corporation
DAG Architecture
Active Directory lookupReplay RPC server wrapperTPR API manager Copy status lookupRemote data provider wrapperSupport API manager Replay core managerVssWriterServer locator manager Seed managerActive ManagerHealth state tracker Autoreseed managerActive Manager RPC server wrapper Disk reclaimer managerFailure item manager
Copyright© Microsoft Corporation
Witness Server Placement
Copyright© Microsoft Corporation
Deployment ScenarioRecommendations Single DAG deployed in a single datacenterLocate witness server in the same datacenter as DAG members Single DAG deployed across two datacenters; no additional locations available Locate witness server in primary datacenter Multiple DAGs deployed in a single datacenterLocate witness server in the same datacenter as DAG members. Additional options include: Using the same witness server for multiple DAGs Using a DAG member to act as a witness server for a different DAG Multiple DAGs deployed across two datacenters Locate witness server in the same datacenter as DAG members. Additional options include: Using the same witness server for multiple DAGs Using a DAG member to act as a witness server for a different DAG Single or Multiple DAGs deployed across more than two datacenters Locate the witness server in the datacenter where you want the majority of quorum votes to exist
Copyright© Microsoft Corporation
Dynamic Quorum
Copyright© Microsoft Corporation
X X X
X X X X
X X X X X
NameDynamicWeightNodeWeightState EX111Up
Copyright© Microsoft Corporation
DAG Member Maintenance
Copyright© Microsoft Corporation
Managed Availability
Bringing the learnings from the service to the enterprise Monitoring based on the end user’s experience Protect the user’s experience through recovery oriented computing
Copyright© Microsoft Corporation
If you can’t measure it, you cannot manage it Availability Can I access the service? Latency How is my experience? Errors Am I able to accomplish what I want? Customer Touch Points
—OWA send —OWA failure —OWA fast recovery —OWA verified as healthy —OWA send —OWA failure —OWA fast recovery —Failover server’s databases —OWA verified as healthy —Server becomes “good” failover target (again) LBCAS-1 CAS-2 DAG MBX-1 DB1 DB2 MBX-2 OWA DB1 DB2 MBX-3 OWA DB1 DB2 OWA DB1 “stuff breaks and the Experience does not”
System Level Checks 1.Mailbox Self Test (e.g. OWA MST) [detection 5m] 2.Protocol Self Test (e.g. OWA PST) [detection 20 secs] 3.Proxy Self Test (e.g. OWA PrST) [detection 20 secs] End User Experience Level Checks 4.Customer Touch Point – CTP (e.g. OWA CTP) [detection 20m]
PROBES The key goal is to measure the customer’s perception of the service These are typically synthetic end to end customer transactions CHECKS The key goal is to measure actual customer traffic and become aware when they are experiencing issues These are typically implemented as performance counters where thresholds can be set to detect spikes in customer failures NOTIFY The key goal is to take action immediately based on a critical event These are typically exceptions or conditions that can be detected without a large sample set
Monitors query the data collected by the probes and determine if an action needs to occur based on a rule set Depending on the rule, a monitor can escalate or initiate a responder Monitors can be Healthy, Degraded, Unhealthy, Repairing, Disabled, or Unavailable Defines the time from failure that a responder is executed
A responder is a “plug-in” that executes a response to an alert generated by a monitor There are several types of responders Restart Responder – Terminates and restarts service Reset AppPool Responder – Cycles IIS application pool Failover Responder – Takes a MBX server out of service Bugcheck Responder – Initiates a bugcheck of the server Offline Responder- Takes a protocol on a machine out of service Online Responder – Places a machine back into service Escalate Responder – escalates an issue Specialized Component Responders Built-in sequencing mechanism to control recovery actions
Monitor States Sampling DetectionRecovery Probe Probe Definition Monitor Monitor Results (Alerts) Monitor Definition Responder Responder Results (Responses) Responder Definition Healthy T1 T2 T3 00:00:00 00:00:10 00:00:30 Restart Responder Reset AppPool Responder Failover responder Bugcheck responder Offline Responder Escalate Responder Sequenced HA Responder Pipeline Example Named Times Probe Results (Samples) Notification Item
Copyright© Microsoft Corporation
Scott Schnoll Microsoft Corporation