Download presentation
Presentation is loading. Please wait.
Published byAnna Lawson Modified over 9 years ago
1
High Availability and Failover Clusters in Exchange Server 2007
2
What We Will Cover In this session you will learn how Exchange Server 2007 enables in-the-box high-availability solutions with fast, one click recovery. This session covers advances to availability management on clustered and non-clustered Exchange servers. Learn about the improvements made to high availability in Exchange Server 2007, including Local Continuous Replication, Cluster Continuous Replication and Single Copy Clusters.
3
Exchange Server 2007 High Availability Goals Deliver data and service availability solutions Decrease deployment and operational costs Enable options for more Exchange customers Improve solution behavior Enable large, low-cost mailboxes (> 1 GB)
4
Introducing Continuous Replication aka Log Shipping A data outage is expensive to recover from Restoring from backup takes a long time There may be significant data loss Keep a copy of the data It has to be up-to-date Two configurations A copy of the data on the same machine (LCR) A copy of the data on a different machine (CCR)
5
Exchange Server Store DBCopy Active Node Store DB Passive Node Copy CCR LCR Cluster Standalone Server
6
Continuous Replication Theory Make a copy of the data As the original data is modified, make the same modifications to the copy Less expensive than re-copying all the data This provides an up-to-date copy of the data
7
ESE Logging A logfile is a list of database modifications Physical changes are logged Used for recovery after a crash Basic technology is industry standard Lots of subtleties
8
Logging a Database Update Update: Page 25: Mark message 8 read Insert: Page 1376: Message 101 Delete: Page 1376: Delete message 101 Insert: Page 1030: Message 103 Insert: Page 1029: Message 102 Insert: Page 1030: Message 104 priv1.edb E00.log Page 367 Message 105 To: AUser@microsof.. Subject: Update What is the latest status on LLR Perf? Insert: Page 367: Message 105 …
9
Implementing Replication Make a copy of a database As log records are created, apply the changes in them to the copy
10
… Logfile Insert: Page 367: Message 105 … Page 367 Message 105 To: AUser@microsof.. Subject: Update What is the latest status on LLR Perf? Page 367 Message 105 To: AUser@microsof.. Subject: Update What is the latest status on LLR Perf? Original Copy
11
LCR/CCR Basic Architecture Exchange store runs normally Replication service keeps a copy of the database up-to-date Copies and replays log files Cluster service provides failover Move network identity (client transparency) LCR failover is manual Restore-StorageGroupCopy task
12
Exchange Server Store DB Replication Service Copy Active Node Store DB Passive Node Replication Service Copy CCR LCR Cluster Log Records Log Records Standalone Server
13
Basic Replication Pipeline Source DB Inspector Directory Target Log Directory DB Copy Store Replication Service Source Log Directory Replication Service Replication Service
14
ESE Logfiles Each storage group has a number (00 for the first one) Each log has a generation number, starting with 1 Exx.log (e.g. E00.log) is the current log file A full Exx.log is renamed using its generation number Sample log stream E0000000001.log E0000000002.log E00.log (current log file) When Exx.log is filled E0000000001.log E0000000002.log E00.log renamed to E0000000003.log New E00.log created
15
Logging Details Logged changes are physical, not logical Delivering one message is actually many low-level physical operations Logging is write-ahead The database page is modified in memory The log is written to disk The database page is written to disk Changes are cumulative
16
Log Copying A ‘pull’ model Exchange server creates logfiles normally Logfiles are copied by Replication service Exxnnnnnnnn.log files copied as they appear Exx.log is copied for handoff/failover If it can’t be copied loss setting is consulted
17
Log Verification Log files are copied to the Inspector directory Checksum and signature are verified Checksum failures cause a log file to be recopied If a log file can’t be copied a re-seed is required Log file is moved to the log directory after successful inspection
18
Log Replay Apply changes in log files to database copy A special recovery mode Different from ‘eseutil /r’ Undo phase is skipped If possible, log files are replayed in batches Improves performance
19
Get-StorageGroupCopyStatus LastLogCopyNotified Last generation seen in the source directory LastLogCopied Last generation copied by Replication service Copied to inspector directory LastLogInspected Last generation inspected Moved to log file directory LastLogReplayed Last generation replayed into the database copy Available through performance monitor
20
Replication Pipeline Source DB Inspector Directory Copy Log Directory DB Copy Store Source Log Directory LastLogCopyNotifiedLastLogCopied LastLogInspected LastLogReplayed Replication Service Replication Service Replication Service
21
CCR Failover Cluster service monitors the resources Failure detection is not instantaneous IP Address or Network Name resource failures cause failover A machine, or network access to it, has failed completely Exchange service failure or timeout doesn’t cause failover The service is restarted on the same node Database failure doesn’t cause failover Don’t want to move 49 databases because 1 failed
22
CCR File Share Replication service runs remotely but needs access to log files Share created on the active node Readable by ‘Exchange Servers’ group Machine accounts of all Exchange servers Run as LocalSystem to access the share ‘Exchange Servers’ group granted R/O access to files CCR servers only
23
CCR File Share Permissions This is normal! (Permissions are very restrictive)
24
Active E00 (Gen 5)E00 0000 0005 E00 0000 0004 Move-ClusteredMailboxServer Passive node copies log files Exx.log is in use On move, Exx.log is copied Designations are now reversed E00 0000 0001 E00 (Gen 2) E00 (Gen 3) E00 (Gen 4) E00 (Gen 6) E00 (Gen 5) E00 0000 0003 E00 0000 0002 E00 0000 0001 E00 0000 0002 E00 0000 0003 E00 0000 0004 Node 1 Node 2
25
Active Lossy Failover Failover without copying all log files Passive DB is not completely up-to-date Log generation numbers are reused Log files have different content! Databases are different! E00 0000 0001 E00 (Gen 2) E00 (Gen 3) E00 (Gen 4) E00 (Gen 5) E00 (Gen 6) E00 (Gen 5) E00 0000 0003 E00 0000 0002 E00 0000 0001 E00 0000 0002 E00 0000 0003 E00 0000 0004 E00 (Gen 5) E00 0000 0004 E00 0000 0005 Node 1 Node 2
26
Divergence When the copy has information not in the original it is diverged Divergence may be in database or log files Lossy failover will produce a divergence ‘Split-brain’ operation on a cluster also causes divergence Even if clients can’t connect, background maintenance still modifies the database Administrator error can cause divergence! e.g. running eseutil /r
27
Detecting Divergence Replication service detects divergence Comparing databases would be too slow Compare the last log file on the copy to the same log file on the active Runs when the store creates a new log file Log file headers contain creation times which chain them together Only replace EXX.log with a log file which is a superset
28
Active ≠ Divergence Detection Node 1 compares its last full log file with Node 2 E00 0000 0001 E00 (Gen 2) E00 (Gen 3) E00 (Gen 4) E00 (Gen 5) E00 (Gen 6) E00 (Gen 5) E00 0000 0003 E00 0000 0002 E00 0000 0001 E00 0000 0002 E00 0000 0003 E00 0000 0004 E00 (Gen 5) E00 0000 0004 E00 0000 0005 Node 1 Node 2
29
Active ≤ E00 0000 0004 Replacing Exx.log The two E00.logs are different generations E00.log on the passive is a subset of E00…5.log Delete E00.log Copy E00...5.log E00 0000 0001 E00 (Gen 2) E00 (Gen 3) E00 (Gen 4) E00 (Gen 5) E00 (Gen 6) E00 (Gen 5) E00 0000 0003 E00 0000 0002 E00 0000 0001 E00 0000 0002 E00 0000 0003 E00 0000 0004 E00 0000 0005 Node 1 Node 2
30
Correcting Divergence Reseed will always work Expensive for large databases Look at the common case Lossy failover Only a few log files are lost Solution Decrease log file size to reduce data loss Use Lost Log Resilience (LLR)
31
Lost Log Resilience Remember a log record is written before the modified database page But the database page can be written as soon as that happens Delay writing the page to the database until 1 or more log generations are created A new ESE feature in Exchange 2007
32
Log Stream Landmarks Checkpoint/Minimum Log Required Recovery has to start from this point Waypoint/Maximum Log Required The last log file actually required for recovery No modifications after the waypoint have been written to the database Committed/Log Committed Last log file generated
33
Logs DBs Standalone Server Data Availability Problems: Data outages expensive to recover Significant data loss (hours?) Today replication requires integration of partner products Logs DBs LCR Key Characteristics One machine Enabled per storage group Two copies, Replay One datacenter Easy configuration
34
Local Continuous Replication Other requirements and behaviors Manual activation per storage group Resource costs Range of configurations Variety of backup options Reduced Backup TCO Configuration limitations Benefits Enables recovery in minutes Enables recovery without loss Decreases backup costs Enables large mailbox Enables I/O offload Logs DBs Logs DBs
35
Exchange Server Clusters Exchange Server 2003 Requires shared storage Single copy of mailbox data Transport, OWA, and Mailbox cluster aware Up to 8 node active/passive 2 Node active/active Exchange Server 2007 (Single Copy Cluster) Requires shared storage Single copy of mailbox data Mailbox Only Simple redundancy for Edge, Hub Transport, Client Access, and UM Up to 8 node active/passive Active/active cut Improvements in: Install, Management, Behavior Q DB Logs SMTP MB OWA DB Q Logs MB
36
SCC Resource Model IP Address Physical Disk Information Store System Attendant Network Name Mailbox Database 3 Database 3 HTTP (DAV) Beta 2 RTM (expected) System Attendant
37
Single Copy Cluster Lacks full redundancy Quorum and Exchange levels Deployment and operational complexity Cost Recovery time after corruption or data failure varies based on backup technology Two datacenter solution requires integration of partner technology Created cluster continuous replication to address DB Q Logs MB
38
Cluster Continuous Replication Two node cluster Witness on Hub Transport Configurable heartbeat retries Two copies Clustered Automatic recovery Server HCL Full redundancy Replay 1 or 2 datacenters Q Q q DB Logs File Share KB 921181
39
Copy and verify logs \\node1\GUID E00.logE0000000012.logE0000000011.log E0000000012.logE0000000011.log Advance DB by playing logs Online seed Updated DB Cluster Continuous Replication Example Active Passive \\node2\GUID Advance DB by playing logs Updated DB Active Passive E00’.logE0000000014.logE0000000013.log E0000000013.log Incremental Reseed E0000000014.log
40
Cluster Continuous Replication Other requirements and behaviors Outage Management Easy-to-use scheduled outage support Automatic recovery of an unscheduled outage Automatic database mount dial Transport dumpster Symmetric failover Resource requirements (no penalty) Variety of backup options Reduced backup TCO Configuration limitations Q Q q DB Logs File Share KB 921181
41
Benefits of CCR Fast recovery to data problems on active node No single point of failure More flexibility in hardware selection Direct Attached Storage No cluster validation Simplified storage requirements Exchange provided database replication solution Enables a single Mailbox server failover to second data center Simplified deployment Improved management experience Ability to offload workload Q Q q DB Logs File Share KB 921181
42
Exchange Server 2007 High Availability Takeaways Delivers standalone and clustered solutions Delivers standalone and clustered solutions Decreases deployment and operational costs Decreases deployment and operational costs Enables HA options for more Exchange customers Enables HA options for more Exchange customers Improves solution behavior Improves solution behavior Enables large low cost mailboxes (1GB+) Enables large low cost mailboxes (1GB+)
43
See LCR and CCR in Action!
44
Blogcast: LCR http://msexchangeteam.com/archive/2006/05/24/ 427788.aspx http://msexchangeteam.com/archive/2006/05/24/ 427788.aspx
45
Blogcast: CCR http://msexchangeteam.com/archive/2006/08/09/ 428642.aspx http://msexchangeteam.com/archive/2006/08/09/ 428642.aspx
47
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.