Business Continuity for Virtual SQL Servers David Klee Founder – Heraflux Technologies #ITDevConnections
Founder & Chief Architect About David Klee @kleegeek davidklee.net heraflux.com linkedin.com/in/davidaklee Specialties / Focus Areas / Passions: Performance Tuning Virtualization & Cloud Business Continuity Health & Efficiency Capacity Management Founder & Chief Architect #ITDevConnections
“How can I work with my infrastructure team to avoid outages? “I fight with my infrastructure teams on my SQL Server availability strategy” …. should become …. “How can I work with my infrastructure team to avoid outages? #ITDevConnections
Session Agenda Availability & Virtualization Backups High Availability Disaster Recovery Disaster Avoidance #ITDevConnections
Virtualization backups Point of Contention Virtualization backups #ITDevConnections
VM Backups Reads & Writes Reads OS Data Log OS’ Log’ Mem Copy Backup TempDB OS’ Data’ Log’ TempDB’ Mem Copy Backup Changed Blocks #ITDevConnections
“App Aware” Backup Impact #ITDevConnections
VM Backup Impact VM backups should not hurt availability Might hurt performance App-aware backups Work with VM admins to tune #ITDevConnections
Tran Log Management Most VM backups do not manage tranlogs DON’T let them say you don’t need it! SQL Server tranlog mgmt required #ITDevConnections
SLAs Point-in-time recovery? DB-level recovery? Restoration complexity? Document RPO & RTO & MTTR Validate! #ITDevConnections
Virtualization availability Complementary, not competing Virtualization availability #ITDevConnections
SQL Server Availability Traditional Options Failover Clustering “Always On” Availability Groups Mirroring Replication #ITDevConnections
Virtualization HA (Average failover time: 2m 45s) #ITDevConnections
Virtualization HA Is… Unplanned outage last resort OS restarts SQL Server starts up No protection for planned outages Complementary to SQL Server HA #ITDevConnections
Placement Two-Node WSFC #ITDevConnections
(Anti-)Affinity Rules Two-Node WSFC #ITDevConnections
Failure Domains #ITDevConnections
Architectural Considerations SQL Server & VM HA #ITDevConnections
Documented Expectations Unplanned outage Planned outages Patch management Upgrades Written vs. business expectations #ITDevConnections
Time Sync “Time drift” Sync with AD Do not sync with host Verify 15 min (or less) sync period HKLM\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\NtpClient\SpecialPollInterval (seconds) #ITDevConnections
WSFC Heartbeat Reduce sensitivity (get-cluster).SameSubnetDelay = 2000 (get-cluster).SameSubnetThreshold = 20 (get-cluster).CrossSubnetDelay = 2000 (get-cluster).CrossSubnetThreshold = 20 (get-cluster).RouteHistoryLength = 20 #ITDevConnections
Isolate Traffic Streams Additional network adapters Dedicated VLANs? WSFC heartbeat AG replication Backups #ITDevConnections
Validate Network Perf Shared-everything environment Synchronous drags if network slow Use iperf to test network throughput iperf.fr How-To Guide: hfxte.ch/iperf #ITDevConnections
#ITDevConnections
Best Practices Complementary technologies Do what you always do! Work with VM admins Get access into hypervisor Validate everything #ITDevConnections
Competing, or complementary? Disaster recovery #ITDevConnections
SQL Server Options Traditional Options Failover Geo-Clustering “Always On” Availability Groups Mirroring (Deprecated 2012) Replication Log shipping #ITDevConnections
VM Options Replicate VM-level backups VM block-level replication Replicate DB backups VM block-level replication SAN LUN-level replication #ITDevConnections
Strategy What are your SLAs? Current DR architecture RPO RTO Databases Rest of infrastructure #ITDevConnections
Pick Best Tool for the Job VM or SQL Server DR? #ITDevConnections
DR Site Primary Site SQL Server VM A SQL Server VM A WAN 15 minute block-level replication WAN SQL Server VM B SQL Server VM B 60 minute block-level replication Virtualization Virtualization DR Site Primary Site #ITDevConnections
Primary Site DR Site SQL Server FCI VM A FCI VM B SQL Server FCI VM A Virtualization Consistency Group Asynchronous LUN-level replication Virtualization DR Site Primary Site #ITDevConnections
DR Site Primary Site SQL Server VM A SQL Server VM C WAN 5 minute transaction log replication WAN SQL Server VM B SQL Server VM D 60 minute transaction log replication Virtualization Virtualization DR Site Primary Site #ITDevConnections
WSFC DR Site Primary Site AG01 SQL Server VM A SQL Server VM C WAN Async DB replication SQL Server VM A SQL Server VM C WAN Sync DB repl SQL Server VM B SQL Server VM D Async DB replication Virtualization Virtualization DR Site Primary Site #ITDevConnections
Fail-Back Fail-over is only half the challenge How to handle fail-back? #ITDevConnections
Testing Test frequently Full failover & failback Best DR strategy ever… #ITDevConnections
VM-level DR might replace SQL Server DR #ITDevConnections
Where we all should be heading… Disaster Avoidance #ITDevConnections
Disaster Avoidance Multiple DCs No disruption in service “Active-Active” #ITDevConnections
Topology Speed & latency between DCs Data loss expectations VS. reality Speed of failover Compute capacity #ITDevConnections
WSFC Secondary Site Primary Site AG01 AG02 SQL Server VM A Sync DB replication SQL Server VM A SQL Server VM C WAN AG02 SQL Server VM B SQL Server VM D Async DB replication Virtualization Virtualization Primary Site Secondary Site #ITDevConnections
Conclusions VM HA is complementary VM DR might replace SQL Server DR Test & validate strategy Failover & failback #ITDevConnections
Questions? @kleegeek davidklee.net heraflux.com linkedin.com/in/davidaklee #ITDevConnections
WIN Rate This Session Now! Tell Us What You Thought of This Session Rate with Mobile App: Be Entered to WIN Prizes! Tell Us What You Thought of This Session Select the session from the Agenda or Speakers menus Select the Actions tab Click Rate Session Rate with Website: Register at www.devconnections.com/logintoratesession Go to www.devconnections.com/ratesession Select this session from the list and rate it #ITDevConnections