Gopal Ashok Program Manager Microsoft Corporation DAT306
Agenda Introduction to High Availability and Disaster Recovery SQL Server Always On Technologies Developing Your Availability Solution Conclusion
Introduction to High Availability and Disaster Recovery Definitions Introduce key terms and concepts Business Continuity Planning Overview of the BCP process SQL Server High Availability Planning How does BCP apply to SQL Server availability?
High Availability and Disaster Recovery: Definition High Availability High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period Availability defined in terms of service level agreements (SLA) Recovery Time Data loss during unplanned downtime A highly available application should be accessible by users x% of the time Disaster Recovery Processes and procedures designed to restore business operations due to a natural or human-induced disaster Typically involves providing redundancy spanning multiple sites or across geographic regions
Defining x and SLA Recovery Time Objective (RTO) guided by availability requirements How much downtime can you tolerate? Recovery Point Objective (RPO) guided by criticality of application data How much data can you lose? Availability Class Acceptable Downtime (hrs/yr) OR RTO Acceptable Data Loss (time of last copy) OR RPO Tier 1 >99.99% (1 hr or less) 5 min or less Tier % % ( hrs) 5 mins to 8.5 hrs Tier 3 (<99.9%) (Hours to days) Hours to days Tier1 RTO RPO Tier2 Tier3 Availability Requirements
Protection Levels Protection against resource failures Machine Database Corruption Disk Location Redundancy Building < 10 miles Local HA Regional DR Geographic DR Protection against Network Outages Site Failures Location Redundancy – City, County – < miles Protection against Natural Disasters Location Redundancy – State, Country – > miles
Business Continuity Planning Analysis Solution Design Implementati on TestingMaintenance Impact Analysis Critical Functions Threat Identification Recovery Objectives Solution Design Achieve recovery objectives for relevant threats within specified constraints like budget, human resources etc Cost\Benefit analysis of solutions Implementation Deploy the recommended solution Testing Test to see if the solution meets the recovery requirements Maintenance Yearly testing and review of procedures
SQL Server High Availability Planning Analysis Application tiers serviced by the databases Causes of database downtime Protection levels: Local HA, Regional DR, Geographic DR Solution Design Need to understand what solutions exists? What are the characteristics and cost of the solution? Implementation What are the deployment steps and best practices? Testing How do I test my implementation? Maintenance How do I monitor and maintain the solution? AnalysisSolution DesignImplementationTestingMaintenance
Database Downtime Drivers Database Downtime Unplanned Downtime Failure Protection User Errors Planned Downtime Online Administration Predictable Resourcing Analysis
Solution Design Solution ArchitectureHA CapabilitiesLimitations and CaveatsCost Vector Understand the solutions and choices before making a decision Solution Design
SQL Server Always On Technologies Solution Design
Always On Technologies Provides a full range of options to minimize downtime and maintain appropriate levels of application availability Solution Design Backup and Restore Log Shipping Database Mirroring Failover Clustering Peer-Peer Replication Increases Availability Online Index Operations Table Partitioning Enhanced Locking Resource Governor Database Snapshot Dedicated Admin Connection Dynamic Configuration Decreased Downtime
Always On Technology Overview Architecture Overview How does it work? Solution Characteristics Data Loss Guarantees Failover Characteristics Redundancy Levels and Utilization Cost Limitations and Caveats Solution Design Backup and Restore Log Shipping Database Mirroring Failover Clustering Peer-Peer Replication Increases Availability
Backup and Restore Base availability technology for any solution Protects against failures and recovery from errors Provides Local HA and Site DR Need to ensure the backups are accessible if site goes down High RTO due to restore time RPO=0 can never be guaranteed Types: Full, Differential, and Transaction Log File-group backup/restore for large databases Backup Compression provides faster and smaller backups in SQL Server 2008 Solution Design
Log Shipping Automated transaction log backup and restore provides redundancy at the database level SQLLogship.exe provides the underlying framework for doing automated backup, copy and restore Backup on primary instance Restore on secondary instance(s) Scheduling is done through SQL Server Agent jobs SQL Server 2008 provides sub-minute scheduling interval providing the ability to do quick backup and restores No automatic failover capabilities Solution Design
Database Mirroring A database level high availability solution that provides complete protection against data loss and fast recovery through automatic failover Maintains a redundant database by shipping log blocks when the transactions are committed on the principal Synchronous and Asynchronous modes provide the spectrum of options to choose between availability and performance Automatic failover when using witness server Solution Design
Failover Clustering Instance level protection built on Windows Failover Clustering shared disk model Cluster nodes typically co-located within the same site to provide local HA Regional DR possible using VLAN and stretch storage level replication No built in data redundancy like database mirroring and log shipping Data protection has to be provided at the storage level or by combining with other solutions Solution Design
Transactional Replication A high performance data replication solution that provides granular table level replication Logical data movement provides flexibility and better hardware utilization Key scenarios: Customized application-specific DR Real-time reporting on secondary server that be used for Site DR Scale out application queries with ability to use any one database copy for Site DR Two types relevant for HA and DR Transactional and Peer-to-Peer Solution Design
Always On Solution Characteristics No Data Loss (RPO=0) Failover UnitAuto Failover (RTO) InstDBTab + ** ReadMult- iple Write * * * Solutions Log Shipping DBMSync Async Cluster Transactional Replication Peer-Peer Replication RPOFailover Redundancy and Utilization Hard- ware App Perf Impact Manag- eability Low HighLow High***Low *** Low High Low High Cost Solution Design * Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively ** Database Mirroring provides fastest failover to hot secondary *** Depends on SAN technology
What’s New in SQL Server 2008 New Features Resource Governor Manage SQL Server workloads and resources by specifying limits on resource consumption Backup Compression Reduce backup and restore time Feature Enhancements Database Mirroring Automatic recovery from page corruption Log stream compression Faster recovery on failover Log Shipping Sub-Minute Log Shipping Backup compression Failover Clustering 16 nodes Rolling upgrade Peer-Peer Replication Hot add new nodes
Backup Compression Common questions: “How much compression will I see?” “Will it be comparable to, say, SQL Litespeed?” One simple answer: “It depends!” All data compresses differently – the compression ratio achieved depends on: The type of data in the database Whether the data in the database is already compressed Whether the data/database is encrypted “We saw an 85 percent reduction in file size using SQL Server 2008 Backup Compression,” says Colin Neller, Senior Software Engineer at ServiceU and part of the company’s SQL Server 2008 implementation team. “A backup file that was previously over 300 GB is now only 40 GB, and the job runs in about half the time.”
Backup Compression: Backup Performance Backup of a 322 MB Adventureworks database A LOT more CPU used (avg 25%) BUT runtime = 21.6s (45% improvement) and backup stored in 76.7MB (4.2x compression ratio) Hardly any CPU used (avg 5%), runtime = 39.5s, compression ratio of 0. Uncompressed Compressed
DEMO: Increasing Availability Using Always On Technologies
Developing Your Availability Solution using SQL Server Always On Technologies Solution Design
Recap Application availability requirements or SLA drive primary solution choices RPO and RTO are the key metrics used to define the SLA Need mitigation against planned and unplanned downtimes Multiple solution choices that provides varying cost\benefits Other requirements apart from application SLA factor into the choice Understand constraints and tradeoffs you can make Database Mirroring Clustering Log Shipping Peer-Peer Replication Application Availability Unplanned downtime Planned Downtime Solution Design
Always On Solution Characteristics No Data Loss (RPO=0) Failover UnitAuto Failover (RTO) InstDBTab + ** ReadMult- iple Write * * * Solutions Log Shipping DBMSync Async Cluster Transactional Replication Peer-Peer Replication RPOFailover Redundancy and Utilization Hard- ware App Perf Impact Manag- eability Low HighLow High***Low *** Low High Low High Cost * Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively ** Database Mirroring provides fastest failover to hot secondary *** Depends on SAN technology Solution Design
AdventureWorks Inc Scenario Adventureworks Inc is a manufacturing company that manufactures and sells bicycles across the world. There are a number of applications, some that are mission critical that run on multiple SQL Server Instances The DBA team is run by Darren who is responsible for deploying and managing the application databases. One of his core responsibilities is to ensure availability of all application databases in order to meet the application SLA One datacenter located in Omaha Three applications Manufacturing – Tier 1 Finance – Tier 2 Scheduling – Tier 3 Manufacturing application runs on a dedicated SQL Server 2008 Instance All other applications run on a second instance Availability of manufacturing application is critical Implement a solution at the lowest possible cost Solution Design
Application Requirements Manufacturing application has strict SLA’s Finance application requires readability on the secondary The reports are run every 4 hours and need to be fresh as of the last one hour. To offload the reporting load from the main system they would like to utilize the mirror Data Loss RPO=0 RTO in secs Failover UnitAuto Failover InstDBTab ReadMultiple Sites Read Write Applications Manufacturing Finance Scheduling Solution Design
Solution Choice for Manufacturing Application Clustering can provide a zero data loss solution that can also provide fast instance level failover Use RAID configuration to provide data redundancy on the SAN If a redundant copy is required that can provide instance failover with zero data loss use SAN replication High Cost Solution Use synchronous database mirroring if instance failover is not needed SolutionsData Loss RPO=0 Fast RTO Failover UnitAuto Failover Read> 1 Sites\ Copy Read Write InstDBTab Cluster SAN Replication DBM - Sync Log Shipping Transactional Replication Peer-Peer Replication Clustering with RAID DBM - Async Solution Design
For database level redundancy with acceptable data loss with minimal perf impact, asynchronous database mirroring is an optimal choice Use database snapshots at periodic intervals to provide a readable snapshot of the data for reporting Low cost solution DBM - Async Cluster Solution Choice for Finance Application SolutionsData Loss RPO=0 Fast RTO Failover UnitAuto Failover Read> 1 Sites\ Copy Read Write InstDBTab SAN Replication Log Shipping Transactional Replication Peer-Peer Replication Async Database Mirroring Omaha Datacenter Finance Db Snapshot every hour Reports Scheduling DBM - Sync Solution Design
Adding a Regional Datacenter Into the Mix Regulatory and compliance requirements drive the need for having a additional datacenter within a 10 mile radius to provide redundancy against site level failure. It is now required that all applications have the ability to failover to the regional datacenter across the river in Council Bluff The SLA need to be maintained for tier 1 applications even in the case of site failures Solution Design
Regional Site Solution Choices Async Database Mirroring Omaha Datacenter Finance Db Snapshot every hour Reports Scheduling Sync Mirroring no witness Log Shipping CB Datacenter Cluster with SAN Manufacturing Solution Design
A Complete Topology Considering the potential of floods and tornadoes destroying the regional data centers, Adventureworks Inc wants to maintain a disaster recovery site in San Antonio, TX The disaster recovery site has lower SLA requirements for all applications The manufacturing application can have an RPO of 1 hour The RTO is set at 4 hours Solution Design
Topology Diagram Sync Mirroring No witness Cluster with SAN Log Shipping Manufacturing Solution Design
Scale Out and Availability Scenario Adventureworks is building a new web based order management system that allows customers from all over the world access the system and place orders The core group of customers are in Western Europe, South East Asia and North America Requirements – Geo Redundancy – Data Locality – High Availability – Local Read-Scale Workload Characteristics – Mainly reads – Few writes Application Characteristics – Each user logging in connects to a particular server Partitioned based on user-id and region Writes from a user always happen on one server regardless of the region the user log in from – All reads redirected to the closest geo- location Reasonable tolerance for latency (5-10 minutes) Solution Design
Replication Topology Peer Nodes Read-Only Servers Asia1Asia2 Solution Design
Implementing and maintaining a HA solution
Licensing Facts Passive servers are mirror, log shipped secondary and clustering passive node No license required on passive if it is truly passive A passive server does not need a license if the number of processors in the passive server is equal to or less than the number of processors in the active server. The passive server can take the duties of the active server for 30 days. Afterwards, it must be licensed accordingly.
HA Features Edition Support FeatureExpressWorkgroupStandardEnterpriseComments Database Mirroring 1 Advanced high availability solution that includes fast failover and automatic client redirection Failover Clustering 2 Backup Log-shipping Data backup and recovery solution Online System Changes Includes Hot Add Memory, dedicated administrative connection, and other online operations Online Indexing Online Restore Fast Recovery Database available when undo operations begin ₁Single thread redo ₂ Limited to 2 node cluster
Summary There is no “one size fits all” solution Consider the cost\benefits\constraints and compare that to availability requirements of the organization to determine the best solution Use the charts to understand cost, benefit and constraints of the various SQL Server High Availability solutions TEST the solution to ensure it can meet the availability requirements and meet SLA’s
SQL Server Word of the Day DATA COMPRESSION Wednesday, May 13 *Game cards may be picked up at the SQL Server booths in the TLC
Additional Resources Team Forum: US/sqldisasterrecovery/threads Other: Team Forum: US/sqldisasterrecovery/threads Other: External Resources gh-availability.aspxURL SQL Server 2008 Business Value Calculator:
Sessions On-Demand & Community Resources for IT Professionals Resources for Developers Microsoft Certification and Training Resources Microsoft Certification & Training Resources Resources Required Slide Speakers, TechEd 2009 is not producing a DVD. Please announce that attendees can access session recordings at TechEd Online. Required Slide Speakers, TechEd 2009 is not producing a DVD. Please announce that attendees can access session recordings at TechEd Online.
Related Content Breakout Sessions DAT302 All You Need to Know about Microsoft SQL Server 2008 Failover Clusters DAT318 Microsoft SQL Server 2008 Virtualization Considerations and Best Practices DAT322 Tips and Tricks for Successful Database Mirroring Deployments with Microsoft SQL Server Required Slide Speakers, please list the Breakout Sessions, TLC Interactive Theaters and Labs that are related to your session. Required Slide Speakers, please list the Breakout Sessions, TLC Interactive Theaters and Labs that are related to your session. Hands On Lab DAT07-HOL Microsoft SQL Server 2008 Peer-to-Peer Replication DAT08-HOL Microsoft SQL Server 2008 Table Index and Partitioning DAT12-HOL Microsoft SQL Server 2008 Database Mirroring, Part 1 DAT13-HOL Microsoft SQL Server 2008 Database Mirroring, Part 2 DAT16-HOL Using Microsoft SQL Server 2008 Resource Governor for Predictable Performance
Complete an evaluation on CommNet and enter to win! Required Slide
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. Required Slide