Oracle DBAs Deploying Highly Available SQL Server Systems Joe Yong Chief Architect Scalability Experts Inc. jyong@scalabilityexperts.com
About This Session Goals Non-goals Pre-requisites Overview of SQL Server 2005 High Availability features Drilldown on HA implementation strategies Non-goals Deep dive into SQL Server Chest thumping Make you a HA expert Pre-requisites Experience as an Oracle DBA, Architect or Developer DBA Basic experience in designing, deployment and managing database systems that require medium to high levels of availability
Agenda What is High Availability SQL Server 2005 HA overview Solutions to common scenarios Case study Summary
What is High Availability Uninterrupted usability A running server is not necessarily available Is a factor of technology, people and processes Often measured as a percentage in “uptime” over 1 year Eg. 99.999% uptime = 5.25 minutes downtime a year Should includes both planned and unplanned downtime but many only measure unplanned You may not own every part of the equation but if you have to specify your SLA Example Online ordering system requires orders to be confirmed in 30 seconds Availability is impacted by application scalability, network and database Don’t forget security impact on HA
Agenda What is High Availability SQL Server 2005 HA technologies Hot Standby Failover Clustering Database Mirroring Warm Standby Database Mirroring – High Protection / Performance mode Replication Log Shipping Cold Standby Backup/restore Online operations Solutions to common scenarios Case study Summary
Microsoft Failover Clustering Overview * Inst1 Microsoft Failover Clustering Overview Hot Standby – Automatic failover Protects against local, limited disasters Built on Microsoft Server Clusters (MSCS) Multiple nodes provide availability, transparent to client Supports 2, 4, or 8 nodes depending on OS edition Automatic detection and failover Requires cluster certified hardware; see Windows Catalog: Clustered Supports many scenarios: Multiple Active Instances, N+1, N+I Up to 25 SQL Server instances per cluster NOT a load balancing solution Multiple Active Instances N+1: N Active, 1 Inactive Instances Inst2 * * Inst1 N+I: N Active, I Inactive Instances * Inst1 Inst3 * Inst2 *
Microsoft Failover Clustering Detail
Geographically Dispersed Clusters Same functionality and behavior as “standard” failover cluster Protects against local, total and extended disasters Requires specially certified cluster hardware from qualified vendors Requires guaranteed 500ms maximum round trip latency between nodes SQL Server does not differentiate between standard and geo-cluster Mirror Network N1 N4 N3 N2 D3 D1 Storage Controller S1 D2 D4 Storage Controller S2 Site 1 Site 2
High Availability Toolbox Disaster Recovery Features Failover Clustering Local limited disasters Local disasters With geo-clusters Extended disasters Limited Rapid failover Failure detection Automatic switchover Specialized hardware Secondary workload on standby (reporting) Meta data support Performance impact None Protect against storage failure Locations
Agenda What is High Availability SQL Server 2005 HA technologies Hot Standby Failover Clustering Database Mirroring – High Availability mode Warm Standby Database Mirroring – High Protection / Performance mode Replication Log Shipping Cold Standby Backup/restore Online operations Solutions to common scenarios Case study Summary
Database Mirroring Overview Hot Standby Provides a fault-tolerant database Building block for complex topologies Database Failover Very fast failover Less than five seconds in most cases Zero data loss Automatic or manual failover Automatic re-sync after failover Automatic, transparent client redirect Works with standard certified servers, storage and networks No location limitations No shared components; two separate copies of data
Database Mirroring How does it work? 4/25/2017 6:08 PM Database Mirroring How does it work? Mirror is always redoing – it remains current Application Witness (optional) Commit 1 5 Principal Mirror 2 SQL Server SQL Server 4 2 >2 3 >3 Log Data Log Data
Database Mirroring Witness Only required for automatic failover; Just another instance of SQL Server 2005 Can serve multiple sessions Prevents “split brain” scenario If partners do not see each other, is it due to network failure or server failure? To become Principal automatically, a server must talk to at least one other server Witness ONLY answers the question “Who do you see?”, does not promote a server to be Principal
Database Mirroring High Availability mode Safety Full; synchronous operation Commit when logged on Mirror Allows automatic failover No data loss Database available whenever quorum exists Formed by any two servers from the three; Principal, Mirror, Witness Witness is present – automatic Failover
Database Mirroring Transparent Client Redirect SQLConnection object that targets a mirrored database No application code change required Client automatically redirected if session is dropped Client library is aware of Principal and Mirror servers Upon initial connect to Principal, library caches Mirror name When client attempts to reconnect If Principal is available, connects If not, client library automatically redirects connection to Mirror If Principal is down upon first connect attempt, connection fails Workaround via explicit coding or NLB type solution Supports .NET and SNAC providers
High Availability Toolbox Disaster Recovery Features Failover Clustering Database Mirroring – HA mode Local limited disasters Local disasters With geo-clusters Extended disasters Limited Rapid failover Failure detection Automatic switchover Specialized hardware Secondary workload on standby (reporting) With DB snapshot Meta data support User DB only Performance impact None Minimal ~ low Protect against storage failure Locations Unrestricted
Agenda What is High Availability SQL Server 2005 HA technologies Hot Standby Failover Clustering Database Mirroring – High Availability mode Warm Standby Database Mirroring – High Protection / Performance mode Replication Log Shipping Cold Standby Backup/restore Online operations Solutions to common scenarios Case study Summary
Database Mirroring High Protection mode Safety Full; synchronous operation Commit when logged on Mirror No automatic failover; manual failover only Database quorum formed by Principal and Mirror If Principal loses quorum, it stops servicing the database Ensures high protection; database is never in ‘exposed’ state No Witness present – no automatic failover
Database Mirroring High Performance mode Safety Off; asynchronous operation Commit when logged on Principal No automatic failover; manual failover only Possible data loss If Mirror becomes unavailable; Principal continues working If Principal becomes unavailable; Mirror can assume workload Manual failover to Mirror is required No Witness present; no automatic failover
High Availability Toolbox Disaster Recovery Features Failover Clustering Database Mirroring – HA mode Database Mirroring – H P/P mode Local limited disasters Local disasters With geo-clusters Extended disasters Limited Rapid failover Failure detection Automatic switchover Specialized hardware Secondary workload on standby (reporting) With DB snapshot Meta data support User DB only Performance impact None Minimal ~ low Minimal Protect against storage failure Locations Unrestricted
Agenda What is High Availability SQL Server 2005 HA technologies Hot Standby Failover Clustering Database Mirroring – High Availability mode Warm Standby Database Mirroring – High Protection / Performance mode Replication Log Shipping Cold Standby Backup/restore Online operations Solutions to common scenarios Case study Summary
Transactional Replication Requires consideration at design time; cannot just “flip the switch” High performance – latency measured in seconds Some (minimal) load on the server Can be implemented at database or table level Failover possible; custom designed solution Two types Standard transactional replication Easy to design, setup & manage Subscriber (standby) can be used for reporting Peer-to-peer transactional replication Multi-master model; schema is identical on all sites Supports distributed applications with data partitioning; enables load balancing Does not handle conflicts; design to avoid/prevent conflicts
Peer-To-Peer Transactional Replication How does it work? 4/25/2017 6:08 PM Peer-To-Peer Transactional Replication How does it work? “West” “East” Distribution Agent Dist DB Logreader Distribution Agent Dist DB Logreader “South” Distribution Agent Dist DB Logreader
High Availability Toolbox Disaster Recovery Features Failover Clustering Database Mirroring – HA mode Database Mirroring – H P/P mode Transactional Replication Local limited disasters Local disasters With geo-clusters Extended disasters Limited Rapid failover Failure detection Can automate in application Automatic switchover Specialized hardware Secondary workload on standby (reporting) With DB snapshot Meta data support User DB only Some Performance impact None Minimal ~ low Minimal Low Protect against storage failure Locations Unrestricted
Agenda What is High Availability SQL Server 2005 HA technologies Hot Standby Failover Clustering Database Mirroring – High Availability mode Warm Standby Database Mirroring – High Protection / Performance mode Replication Log Shipping Cold Standby Backup/restore Online operations Solutions to common scenarios Case study Summary
Log Shipping Backup transaction log, copy to secondary server, restore transaction log backup Failover is manual Meta data management may be necessary Read operations on secondary is permitted Users are disconnected when log restore occurs Can maintain multiple secondary servers Optional Monitor server Records history and status of backup/restore jobs May be setup to raise alerts when jobs fail
High Availability Toolbox Disaster Recovery Features Failover Clustering Database Mirroring – HA mode Database Mirroring – H P/P mode Transactional Replication Log Shipping Local limited disasters Local disasters With geo-clusters Extended disasters Limited Rapid failover Failure detection Can automate in application Automatic switchover Specialized hardware Secondary workload on standby (reporting) With DB snapshot With limitations Meta data support User DB only Some Performance impact None Minimal ~ low Minimal Low Protect against storage failure Locations Unrestricted
Agenda What is High Availability SQL Server 2005 HA technologies Hot Standby Failover Clustering Database Mirroring – High Availability mode Warm Standby Database Mirroring – High Protection / Performance mode Replication Log Shipping Cold Standby Backup/restore Online operations Solutions to common scenarios Case study Summary
Backup and Restore Slowest recovery (but also simplest) Recommended as secondary or tertiary protection layer Manual failure detection and switchover Data loss possible Recommend maintaining active backups on disk; duplicate, archive and offsite backups on tape Various levels Database – full, differential, partial, differential partial, copy-only File & filegroups – “full”, differential Transaction log
Backup and Restore RESTORE VERIFY ONLY Backup media mirroring Backup and database page checksums Fine grained online repair Online restore Piecemeal restore Page-level restore Database backup does not block Log backup Backup/restore includes FullText data Copy-only – via T-SQL only
High Availability Toolbox Disaster Recovery Features Failover Clustering Database Mirroring – HA mode Database Mirroring – H P/P mode Transactional Replication Log Shipping Backup / Restore Local limited disasters Local disasters With geo-clusters Extended disasters Limited Rapid failover Failure detection Can automate in application Automatic switchover Specialized hardware Secondary workload on standby (reporting) With DB snapshot With limitations Meta data support User DB only Some Performance impact None Minimal ~ low Minimal Low Protect against storage failure Locations Unrestricted
Agenda What is High Availability SQL Server 2005 HA technologies Hot Standby Failover Clustering Database Mirroring – High Availability mode Warm Standby Database Mirroring – High Protection / Performance mode Replication Log Shipping Database Snapshot Cold Standby Backup/restore Online operations Solutions to common scenarios Case study Summary
Online Operations Backup/restore Indexing Memory allocations Full online backup Online piecemeal restore; undamaged data remains available Indexing Allows create, drop and alter while users continue to access data LOB datatype indexes not supported for online Memory allocations CPU affinity settings Database snapshots
Database Snapshot Not originally designed as a specific HA solution but works great in some situations Turning a Database Mirroring mirror into a reporting server Isolated historical data for report generation Protection in case of administrative, developer or user error; classic “Oops!” scenario Uses copy-on-write technique to reduce disk space consumption
Database Snapshot How does it work? 4/25/2017 6:08 PM Database Snapshot How does it work? CREATE DATABASE mydbSnap AS SNAPSHOT OF mydb USE mydb UPDATE (pages 4, 9, 10) mydb – Database Page 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 3 4 4 5 6 7 8 9 9 10 10 11 12 13 14 15 16 mydbSnap – Read-Only Database Snapshot USE mydbSnap SELECT (pages 4, 6, 9, 10, 14)
Agenda What is High Availability SQL Server 2005 HA technologies Solutions to common scenarios Case study Summary
Rolling Upgrades In three steps 4/25/2017 6:08 PM Rolling Upgrades In three steps Perform upgrades on the mirror, secondary, or subscriber Switch roles Database Mirroring Failover to the mirror Log Shipping Replication Redirect clients to subscriber Perform upgrades on the original database server Optional: Switch roles again Backup principal log with no-recovery Recover secondary Re-direct clients to secondary
Site Disaster Protection 4/25/2017 6:08 PM Site Disaster Protection Example Scenarios Earthquake, fire, or flood causes datacenter outage Solutions Database Mirroring to a secondary site Optimized solution - Allows very fast failover times to the secondary site Optionally add log shipping for additional site protection Log Shipping to one or more secondary sites Basic solution – requires additional effort for failover Third-party geo-clustering solutions for data center storage level redundancy Find SQL Server Always On reviewed solutions at the Microsoft Always On website: www.microsoft.com/SQL/AlwaysOn
Database Mirroring Configuration In three steps 4/25/2017 6:08 PM Database Mirroring Configuration In three steps Step 1: Restore database copy to mirror site with no-recovery option Step 2: Configure communication endpoints Step 3: Set the data protection level and Start Mirroring
Database Query Workload Scale Out With Redundancy 4/25/2017 6:08 PM Database Query Workload Scale Out With Redundancy Scenario Need for near real time reporting on a second server that can also be used for disaster recovery Need for a tier of identical databases for scaling out application queries with ability to use any one of the database copies for disaster recovery Solutions Transactional Replication Peer-to-Peer Replication
Putting It All Together 4/25/2017 6:08 PM Putting It All Together Failover Clustering Local server redundancy Full server/instance protection Database Mirroring Primary disaster site for databases Reporting with Snapshot Log Shipping Additional disaster sites for databases Logical recovery (with delay) Replication Database reporting and read scale out with redundancy Database Scale Out For Queries Replication Hot Standby Database Mirroring Production Database Failover Clustering Log Shipping Warm Standby Log Shipping With Restore Delay Logical Recovery Standby
Agenda What is High Availability SQL Server 2005 HA technologies Solutions to common scenarios Summary
Summary <<WiP>>
Resources www.microsoft.com/sql/ msdn.microsoft.com/sqlserver/ www.microsoft.com/technet/ www.scalabilityexperts.com www.sqldev.net www.sqlservercentral.com/
High Availability Toolbox Disaster Recovery Features Failover Clustering Database Mirroring – HA mode Database Mirroring – H P/P mode Transactional Replication Log Shipping Backup / Restore Local limited disasters Local disasters With geo-clusters Extended disasters Limited Rapid failover Failure detection Can automate in application Automatic switchover Specialized hardware Secondary workload on standby (reporting) With DB snapshot With limitations Meta data support User DB only Some Performance impact None Minimal ~ low Minimal Low Protect against storage failure Locations Unrestricted