2 Exchange Design Concepts and Best Practices
BRK3131 Exchange Design Concepts and Best Practices Boris Lokhvitsky MCM | Exchange Principal Consultant / Delivery Architect Microsoft Consulting Services

3 Agenda Architecture Concepts and Design Principles
4 Architecture Concepts and Design Principles
5 Hardware vs. Software powered solution
Exchange 2003 Shared infrastructure with redundant hardware components Exchange 2010/2013/2016 Commodity building blocks with software controlled redundancy New architecture and design principles How much disk performance Exchange database needs? I/O Meter I/O Meter

6 Exchange Design Principles
In modern Exchange world software, not hardware, powers and controls the solution Availability Reduce complexity, simplify the solution Decrease the number of system dependencies to improve availability and lower the risks Use native capabilities where possible as it makes the design simpler Deploy redundant solution components to increase availability and protect the solution Avoid failure domains: do not group redundant solution components into blocks that could be impacted by a single failure Functionality / Productivity Enable and enhance user experience Provide functionality and access that is required or expected by the end users Provide large low cost mailboxes Use Exchange as a single data repository Increase value with Lync and SharePoint integration Build a bridge to the cloud – ensure feature rich cloud integration and co-existence Operations: Optimize People and Process, not just Technology Decrease complexity of team collaboration by leveraging solution / workload focused teams Simplify / optimize administration / monitoring / troubleshooting process Cost: Reduce / minimize total cost of the ownership (TCO) for the solution Use commodity hardware and leverage native product capabilities Implement storage solution that minimizes cost, complexity, and administrative overhead

7 Availability and Reliability
Failures *do* happen! Critical system dependencies decrease availability Deploy Multi-role servers Avoid intermediate and extra components (e.g. SAN; network teaming; archiving servers) Simpler design is always better: KISS Redundant components increase availability Multiple database copies Multiple balanced servers Failure domains combining redundant components decrease availability Examples: SAN; Blade chassis; Virtualization hosts Software, not hardware is driving the solution Exchange powered replication and managed availability Redundant transport and Safety Net Load balancing and proxying to the destination Availability principles: DAG beyond the “A”

8 Shared Infrastructure and Failure Domains
Classical shared infrastructure design introduces numerous critical dependency components Relying on hardware requires expensive redundant components Failure domains reduce availability and introduce significant extra complexity

9 Building Block Architecture
Inexpensive commodity servers and storage Scale the solution out, not in; more servers mean better availability Nearline SAS storage: provide large mailboxes by using large low cost drives Exchange I/O reduced by 93% since Exchange 2003 Exchange 2013 database needs ~10 IOPS; single Nearline SAS disk provides ~60 IOPS; single 2.5” 15K rpm SAS disk provides ~230 IOPS Redundancy and availability provided and controlled by Exchange, not by infrastructure 3+ redundant database copies eliminate the need for RAID and backups Redundant servers eliminate the need for redundant server components (e.g. NIC teaming or MPIO) DAG is the ultimate building block allowing you to scale the solution

10 Modern Server: Commodity Hardware
Google, Microsoft, Amazon, Yahoo! use commodity hardware for 10+ years already Not only for messaging but for other technologies as well (started with search, actually) Inexpensive commodity server and storage as a building block Easily replaceable, highly scalable, extremely cost efficient Software, not hardware is the brain of the solution Photo Credit: Stephen Shankland/CNET

11 People and Process (not just Technology) Solution focused teams
Decrease complexity of team collaboration Simplify administration / monitoring / troubleshooting Solution focused teams Traditional application team owns only a small piece of the solution / workflow Multiple teams must be engaged to implement the design or troubleshoot the issue Team organization based on solution, not on specific infrastructure or technology simplifies administration / troubleshooting and reduces operational costs Own your solution! Or… if you can’t do it right, reduce your OpEx by moving to O365 ;)

12 Exchange Product Line Architecture
Exchange PLA: Special tightly scoped reference architecture offering from Microsoft Consulting Services Based on deployment best practices and collective customer experience Structured design based on detailed rule sets to avoid common mistakes and misconceptions Based on cornerstone design principles: 4 database copies across 2 sites Unbound Service Site model (single namespace) Witness in the 3rd site Multi-role servers DAS storage with NL SAS or SATA JBOD configuration L7 load balancing (no session affinity) Large low cost mailboxes (25/50 GB standard mailbox size) Enable access for all internal / external clients System Center for monitoring Exchange Online Protection for messaging hygiene Supported Exchange On-Premises Custom Design Exchange On-Premises recommended best practices Exchange On-Premises PLA Design Exchange Online Public Cloud (Office 365) Recommended You should be here! Structured Standardized

13 How Architecture Drives Design Decisions
14 Consolidated vs. Distributed Design
Consolidated design is usually preferred option Max reduction of server footprint, deployment costs, and administrative overhead Simplifies use of single unified namespace Still need 2-3 datacenters for proper implementation of site resilience Distributed design places servers closer to end users Optimizes client to server traffic (maybe except the DR scenarios) Might be driven by regulatory compliance requirements (keep data on the home soil) Presents challenges for site resilience / DR design / firewalls / administration Choice between client-server vs. server-server traffic

15 Bound vs. Unbound Service Site Model
Unbound Service model (recommended) Users don’t have a preferred datacenter Allows use of single unified namespace Uses Active/Active DAG design Building block: one active/active DAG PLA recommended for Exchange 2013 Bound Service model Binds user mailboxes to a preferred datacenter Uses site specific namespaces Uses Active/Passive DAG design Building block: two active/passive DAGs PLA recommended for Exchange 2010

16 DAG Sizing: Small, Medium, Large?
DAG size is important because DAG is the building block General guidance is to prefer larger DAG size Larger DAGs provide better availability and load balancing If 1 server with X active mailboxes fails in the N-node DAG, active mailbox count on each server increases only by X/(N-1) Proper symmetric database copy layout is important to achieve good mailbox load balancing Larger DAGs, however, have disadvantages too Large DAGs are more vulnerable to network issues as number of network connections in the N-server DAG is N*(N-1)/2 (16-node DAG needs 240 P2P connections!) Intermittent network issues can cause databases misbalanced and DAG nodes evicted; see this article for details: More impact on cluster writes and increased failure zone Scalability planning due to growth also impacts the decision Adding just a few servers to the existing DAG is hard as it requires database copy layout changes

17 Database Copy Layout Principles
Goal: Provide symmetric database copy layout to ensure even load distribution Server3 Failure  Server6 Failure 

18 Homeless DAG New capability in Exchange 2013: DAG without a Cluster Administrative Access Point (a.k.a. IP-less DAG) Recommended and preferred model, default in Exchange 2016 Advantages: reduced complexity No need to deal with cluster network object (CNO) computer account and permissions No need to reserve and manage DAG IP addresses Single cluster resource left (File Share Witness) Disadvantages: Cannot use Failover Cluster Manager – must use Powershell cluster commands Might present issues to some 3rd party applications that use cluster name (e.g. backups) – move away from those Useful Powershell cmdlets: Get-Cluster -Name DAG01 | select * Get-ClusterNode -Cluster DAG01 [-Name SVR01] | select * Get-ClusterNetwork -Cluster DAG01 [-Name DAGNetwork01] | select * Get-ClusterQuorum -Cluster DAG01 | fl Get-ClusterGroup -Cluster DAG01 Move-ClusterGroup -Cluster DAG01 -Name "Cluster Group" -Node SVR01 Get-ClusterLog –Cluster DAG01

19 Network: HA/SR and Replication Network
High Availability (HA) is redundancy of solution components within a datacenter Site Resilience (SR) is redundancy across datacenters providing a DR solution Both HA and SR are based on native Exchange data replication Each database exists in multiple copies, one of them active Data is shipped to passive copies via transaction log replication over the network It is possible to use dedicated isolated network for Exchange data replication Network requirements for replication: Each active  passive database replication stream generates X bandwidth The more database copies, the more bandwidth is required Exchange natively encrypts and compresses replication traffic Pros and cons for dedicated replication network => Not recommended Replication network can help isolating client traffic from replication traffic Replication network must be truly isolated along the entire data transfer path: having separate NICs but sharing the network path after the first switch is meaningless Replication network requires configuring static routes and eliminating cross talk; this leads to extra complexity and increases risk of human error If server NICs are 10Gbps capable, it’s easier to have a single network for everything No need for network teaming: think of a NIC as JBOD

20 Virtualization vs. Role Consolidation
Introduces additional critical solution component and associated performance and maintenance overhead Reduces availability and introduces extra complexity Could make sense for small deployments helping consolidate workloads – but this introduces shared infrastructure Consolidated roles is a guidance since Exchange 2010 – and now there is only a single role in Exchange 2016! Deploying multiple Exchange servers on the same host would create failure domain Hypervisor powered high availability is not needed with proper Exchange DAG designs No real benefits from Virtualization as Exchange provides equivalent benefits natively at the application level

21 Storage Design Options / Challenges
Many designs are supported; there are three storage design dimensions SAN  DAS SAN is NOT faster than DAS Reduce complexity No need in expensive redundant high performing intermediate SAN components SAN concept follows shared infrastructure model, not building block FC  SAS  SATA Need large disks to provide large mailboxes In Ex2013 IOPS requirements reduced ~93% from Ex2003! Typical Ex2013 database requires ~10 IOPS 7200 rpm LFF (3.5”) SATA/NL-SAS disk provides ~60 IOPS 15K rpm SFF (2.5”) SAS/FC disk provides ~230 IOPS No need for fast but small and expensive high performing disks RAID  JBOD (RBOD) No need for disk redundancy: data redundancy is moved to application level Think of Ex2013 servers as software RAID RAID is supported but doubles disk count (assuming RAID-10) and cost Enable controller caching: 75/25 write/read

22 RAID vs. JBOD with Native Replication
Conceptually similar replication – goal is to introduce redundant copy of the data Software, not hardware powered  Application aware replication Enables each server and associated storage as independent isolated building block Exchange 2013 is capable of automatic reseed using hot spare (no manual actions besides replacing the failed disk!) Finally, cost factor: RAID1/0 requires 2x storage (you still want 4 database copies for Exchange availability)!

23 Thick Risks of Thin Provisioning
Exchange mailboxes will grow but they don’t consume that much on Day 1 The desire not to pay for full storage capacity upfront is understood However, inability to provision more storage and extend capacity quickly when needed is a big risk Successful thin provisioning requires significant operational maturity and process excellence unseen in the wild Microsoft guidance and best practice is to use thick provisioning with low cost storage Incremental provisioning model can be considered a reasonable compromise Thick Provisioning Thin Provisioning Incremental Provisioning

24 Native vs. Low Level Data Replication
Exchange continuous replication is a native transactional replication (based on transaction data shipping) Database itself is not replicated (transaction logs are played back to target database copy) Each transaction is checked for consistency and integrity before replay (hence physical corruption cannot propagate) Page patching is automatically activated for corrupted pages Replication data stream can be natively encrypted and compressed (both settings configurable, default is cross site only) In case of data loss Exchange automatically reseeds or resynchronizes the database (depending on type of loss) If hot spare disk is configured, Exchange automatically uses it for reseeding (like RAID rebuild does)

25 Typical Exchange Disk Layout
Two mirrored (RAID1) disks for system partition (OS; Exchange binaries, transport queues, logs) One hot spare disk Nine or more RBOD disks (single disk RAID-0) for Exchange databases with collocated transaction logs Four database copies collocated per disk, not to exceed 2TB database size

26 Need more? Layout for more storage
There are servers that can house more than 12 LFF disks (up to 16 with rear bay) There are already DAS enclosures available that provide 720 TB capacity in a single 4U unit (90 x 8TB drives)! Scalability limits: still 100 database copies / server This means no more than 25 4 databases/disk or 50 2 databases/disk

27 Backup and Recovery Challenge 1: Backup or Extra copy?
Exchange Native Data Protection: 3+ highly available database copies Lagged copy to protect from unlikely scenarios (logical corruption, admin error) Replay Lag Manager with automatic copy conversion What will you do with your tapes? Challenge 2: How do I recover data without backup? Self Service Recovery to restore items from Recoverable Items – Deletions Single Item Recovery to protect items in Recoverable Items and restore them administratively via mailbox search Large mailboxes can accommodate large “dumpster” – no need for archive mailbox or 3rd party archiving

28 Archiving: Retention vs. Compliance
Where is my archive? 10+ years ago archiving enabled offloading Exchange data to the lower cost storage With large mailboxes on commodity storage it does not make sense anymore Single data repository is better than multiple solutions and lots of PST files What do you need archiving for? There’s still in-place archive a.k.a. online archive which is just a second mailbox available in online mode Only needed based on client performance considerations or when you cannot extend mailbox capacity Outlook has the magic slider = no impact from large OST file 1K  20K  100K  1M items in critical path folders (client has impact too) – are you stubbing? Retention Retention is to help users delete their data when it is no longer needed Retention tags and retention policies Compliance Compliance is to prevent users from deleting sensitive data that might be requested by legal Litigation Hold, In-Place Hold to protect and preserve data at rest (in the mailbox) Data Loss Prevention (DLP) to protect data in transit Unified Compliance Center to combine Exchange/Lync and SharePoint together for eDiscovery

29 Client Breakdown In today’s world, users access Exchange mailboxes from many clients and devices Cumulative client concurrency can be over 100% Penalty factors are measured in units of load created by a single Outlook client Some clients generate more server load than a baseline Outlook client Penalty factor should be calculated as weighted average across all types of clients Sample client breakdown calculation (for illustration only) Caveats of this model: Individual penalty factors are provided for illustration purpose only and should be adjusted based on internal test results, client configurations, vendor guidance and other relevant factors Penalty factor for BES 5 is based on performance benchmarking guide published by Blackberry at Penalty factor for Good is based on data published on Good Portal at Continue to monitor system utilization closely and adjust sizing model as necessary as you scale out

30 Exchange 2013 PLA Conceptual Design
Four or more physical servers (collocated roles) in each DAG split symmetrically between two datacenter sites Four database copies (one lagged with lag replay manager) for HA and SR on DAS storage with JBOD; minimized failure domains Unbound Service site model with single unified load balanced namespace and Witness in the 3rd datacenter

31 Main takeaways Keep your design simple!
Follow building block architecture principles Ensure sufficient availability Do your Exchange design right or go to Office 365!

