Luis Vargas Senior Program Manager Lead Microsoft
Meet Luis Vargas Senior Program Manager Lead in SQL Server –AlwaysOn (High Availability) –SQL Server on Windows Azure VMs –Hybrid scenarios (E.g. Backup/Replicas in Azure) Over 10 years of industry experience –Program Manager at Microsoft –Developer in Financial Services
Unified solution Easy to configure, manage, and monitor Reuse existing investments SAN/DAS environments Allow using HA hardware resources Fast seamless failover Detect failures reliably Able to stand multiple failures
Pre-existent Server failover Useful in consolidation scenarios Shared storage (SAN / SMB) Depends on storage redundancy Failover takes minutes Server restart Multi-node instance Passive secondary nodes Failover Cluster Instances (for servers) New Multi-database Failover DBs that app depends on Direct attached storage Log synchronization Failover takes seconds Secondary replicas are online Multiple Secondary Replicas Active Secondary Replicas Availability Groups (for groups of databases)
Multi-Site Clusters Clusters across subnets without stretch vLAN TempDB on local disk Improved performance, better SAN utilization Indirect Checkpoints More predictable DB recovery (failover) times Flexible Failover Policy Sensitivity to failures for automatic failover Failover Cluster Instances Enhancements in SQL Server 2012
Multi-database Failover Multiple secondaries (4) Sync (max 2) / Async Compression & Encryption Manual/Automatic Failover Flexible Failover Policy Automatic Page Repair Seamless App Connectivity Configuration Wizard Monitoring Dashboard Diagnostics infrastructure System Center integration Full cross-feature support Contained Databases, FileStream, FileTable, Service Broker Active Secondaries Read workloads Backups PowerShell Automation Fast Failover Integrated Efficient Availability Groups Introduced in SQL Server 2012
Sync Log Synchronization Async Log Synchronization
Box Strategy Continue momentum in the Enterprise with deliberate box innovations and new hybrid scenarios Availability Scenario Mission Provide the best in class and cost-effective HA/DR solutions for on- premise, cloud, and hybrid
Pre-existent Server failover Useful in consolidation scenarios Shared storage (SAN / SMB) Depends on storage redundancy Failover takes minutes Server restart Multi-node instance Passive secondary nodes Availability Groups (for groups of databases) New Multi-database Failover DBs that app depends on Direct attached storage Log synchronization Failover takes seconds Secondary replicas are online Multiple Secondary Replicas Active Secondary Replicas Failover Cluster Instances (for servers) Increased Number of Secondaries Increased Availability of Readable Secondaries Add Azure Replica Wizard Support for Windows Cluster Shared Volumes E n h a n c e d D i a g n o s t i c s
SQL Server 2012: Customers using (max 4) readable secondaries to offload read workloads Single technology to configure / manage Higher throughput (~7x) than Replication Customers asking for more replicas Reduce query latency in geo-distributed environments (e.g. Bwin) Scale-out read workloads (e.g. Baltika) SQL Server 2014: Max 8 secondaries Max 2 sync secondaries for high availability Secondary delay depends on network latency and I/O ~1s within data center~5s between data centers Availability Groups Increased Number of Secondaries
Minimal performance impact Commits don’t wait for async secondaries Log sender threads share log pool Added transaction latency of 8 async secondaries: <1% Scoped-out: Load balancing via connection string Read_Only connections still routed to first available readable secondary Load balancing possible via DNS round-robin or specialized load balancers (e.g. NLB) Availability Groups Increased Number of Secondaries
SQL Server 2012: Read workloads killed during network failures Geo-distributed environments (e.g. failure/upgrade of network equipment, ISP failures) Hybrid (on-premise to Azure) deployments SQL Server 2014: read workloads not impacted during network failures.. or primary down.. or cluster quorum loss.. Readable secondaries remain available during “Resolving” state Requires direct connections to readable secondaries (Read-only routing not supported yet) Replica state and last commit time available in DMV/Dashboard Availability Groups Increased Readable Secondaries Availability
Availability Groups Increased Readable Secondaries Availability Sync Log Synchronization Async Log Synchronization
“The increased readable secondaries availability means our users can still find answers online and the world keeps spinning - StackOverflow Availability Groups Increased Readable Secondaries Availability
StackOverflow can now: Offload more critical read workloads to readable secondary in main data center Network glitches even within the same DC can happen Use readable secondary in DR site while data center is down (70% reads) Simpler to change DNS than force failover and failback Doesn’t result in data loss Availability Groups Increased Readable Secondaries Availability
Many customers can’t afford a DR site Site rent + maintenance, hardware, Ops SQL Server 2012: Started supporting replicas on Windows Azure VMs this year Offload read workloads Offload backups (policy compliance) Disaster recovery At best region West US, East US, East Asia, Southeast Asia, North Europe, West Europe Latency / political considerations Availability Groups Add Azure Replica Wizard
Availability Groups Add Azure Replica Wizard Sync Log Synchronization Async Log Synchronization
Low TCO VM and storage Free ingress traffic Case studies Lufthansa, Thomson Reuters, Buffalo Hospital Supply SQL Server 2014: “Add Replica Wizard” supports Windows Azure E2E: From provisioning VM to starting log synchronization Validates environment Handles failures Does cleanup Availability Groups Add Azure Replica Wizard
24 Enhancements on Dashboard, Error Messages, DMVs, XEvents Simplify troubleshooting & prevent issues Based on feedback from customers & CSS Availability Groups & Failover Cluster Instances Enhanced Diagnostics
Availability Groups & Failover Cluster Instances Enhanced Diagnostics TitleComponent Show timestamps in XEL output in UTC (not adjusted to client SSMS computer)XEvents Viewer Warning about log synchronization behavior when primary replica is asyncDashboard System function IsPrimaryReplica(database_name)System function Add AG name (and replica name and DB name if relevant) to many more XEvents to allow better data correlation between the logs XEvents Report major HADRON Manager transitions to AlwaysOn XEvent sessionXEvents Add Replica name context to connection established error log entryError Log Dump relevant output from sys.dm_hadr_database_replica_states to SQL error log when replicas change to resolving state XEvents Add new error message to detect AG startup failure when quorum is forcedError Log Separate error msg (replica can't become primary) - raised for two importantly different reasons Error Log AlwaysOn Functions/DMVs should also support FCIs where applicableDMVs Improve the CREATE AG error message “AG already exists”, to say “It’s possible that a previous DROP AG operation, executed during cluster quorum loss, didn’t delete the AG from the cluster. If so, please retry the DROP operation” Error Message Remove FCI setup dependency on cluster.exe (deprecated) – Use PowershellError Log Show timestamps in XEL output in UTC (not adjusted to client SSMS computer) Warning about log synchronization behavior when primary replica is async System function IsPrimaryReplica(database_name) Separate error msg (replica can't become primary) - raised for two importantly different reasons Dump relevant output from sys.dm_hadr_database_replica_states to SQL error log when replicas change to resolving state and 13 more..
Cluster Shared Volume (CSV) Shared disk accessible to all nodes (over SMB) One or more per physical drive Failover Cluster Instances on CSV Improves SAN utilization Removes limitation of 24 drives Increases I/O resiliency Retry read/write via other nodes Increases failover resiliency Disks don’t need to be unmounted/mounted Failover Cluster Instances Support for Windows Cluster Shared Volumes (Windows Server 2012 & 2012 R2)
Failover Cluster Instances Support for Windows Cluster Shared Volumes
AlwaysOn and Windows Server Windows Cluster Enhancements Version 2012 Dynamic Quorum Removes votes from unavailable nodes Enables “last man standing” Increased network resiliency Handle more exceptions Avoid node evictions Version 2012 R2 Network names without Active Directory Avoid Listeners issues: permissions, collisions
JOIN US in San Jose, May 7-9 for our second annual event to get the best learning for analyzing, managing, and sharing business information and insights through the Microsoft Data Platform of technologies. $150 off with discount Code: XXXXXX