SQL Server on VMware Jonathan Kehayias (MCTS, MCITP) SQL Database Administrator Tampa, FL
Agenda SQL Initiative Overview Performance Metrics I/O Metrics Processor Performance Counters High Availability Configuration Manageability Gains
SQL Initiative Primary Objective Disaster Recovery Success Criteria Immediate failover to remote data center with minimal data loss Performance
Performance VMware ESX Server, 3.0.2, 63195
Performance SQL Virtual Machine Configuration
Performance SQLIO Benchmarks SQLIO is a tool provided by Microsoft which can also be used to determine the I/O capacity of a given configuration. SQL Server stores data 8K pages allocated in blocks of 8 as 64K extents. Typical SQL I/O operations involve Random Reads of extenteRs from disk. SQLIO benchmarks on this SQL Server for 64K Random Read I/O with 2 threads simulating the recommended setting of one file per processor core for SQL Server were equivalent or better than common physical hardware.
Performance (VMWare) Random Read I/O By The Numbers Drive Format Test IOs/sec MBs/sec READ Default Read 8KB random Default Read 64KB random Default Read 128KB random Default Read 256KB random Default Read 8KB sequential Default Read 64KB sequential Default Read 128KB sequential Default Read 256KB sequential WRITE Default Write 8KB Random Default Write 64KB Random Default Write 128KB Random Default Write 256KB Random Default Write 8KB Sequential Default Write 64KB Sequential Default Write 128KB sequential Default Write 256KB sequential
Dell PowerEdge x 2.7 GHz 2GB RAM 4x 36GB RAID 10 array Drive Format Test IOs/sec MBs/sec READ Default Read 8KB random Default Read 64KB random Default Read 128KB random Default Read 256KB random Default Read 8KB sequential Default Read 64KB sequential Default Read 128KB sequential Default Read 256KB sequential WRITE Default Write 8KB Random Default Write 64KB Random Default Write 128KB Random Default Write 256KB Random Default Write 8KB Sequential Default Write 64KB Sequential Default Write 128KB sequential Default Write 256KB sequential Physical SQLIO outputs were obtained from the SQL Server Performance website Forums:
Dual 3.0Ghz Xeon Dual 2GB FC HBA IBM DS4300 SAN RAID 10 (10 x 74GB 15K) Drive Format Test IOs/sec MBs/sec SQLIO 8k sector Read - Random Read - Sequential Write - Random Write - Sequential SQLIO 32k sector Read - Random Read - Sequential Write - Random Write - Sequential SQLIO 64k sector Read - Random Read - Sequential Write - Random Write - Sequential Physical SQLIO outputs were obtained from the SQL Server Performance website Forums:
DELL 6850 Quad Xeon 3.4ghz w/4GB RAM EMC Clariion CX-700 RAID 10 Drive Format Test IOs/sec MBs/sec 8k random write: k random write: k random write: k random write: k seq. write: k seq. write: k seq. write: k seq. write: k random read: k random read: k random read: k random read: k seq. read: k seq. read: k seq. read: k seq. read: Physical SQLIO outputs were obtained from the SQL Server Performance website Forums:
Performance Performance Counter Monitoring To understand how well a SQL Server is performing the SQL Server as well as Windows Subsystem Performance Counters need to be Monitored. Key Counters to monitor include: Processor/%Processor Time should remain below 80%. Processor/%Privileged Time should remain below 20% SQL Server General/User Connections Batches/Sec Page Life Expectancy Pages/Sec Memory Grants Pending Lazy Writes/sec For further information, take a look at the following Screencast series by Kevin Kline (SQL Server MVP and Professional Association for SQL Server (PASS) President):
Performance Performance Counters (%Processor Time) < 80% average
Performance Performance Counters (Processor\%Privileged Time) < 20% Average
Performance Performance Counters (User Connections) This is just a reference counter to be used in tandem with other counters to view system activity.
Performance Performance Counters (Batches/sec) This is a reference to the amount of activity the Server is performing. It is used along with other counters like Page Splits/sec to determine if there are problems.
Performance Performance Counters (Buffer Cache) Should remain as close to 100% as possible. Consistent drops below 95-90% signals Memory Pressure
Performance Performance Counters (Page Life Expectance) Page Life in the cache should ideally remain over 300 seconds. Consistent drops below this should be investigated and can signal Memory Pressure on the server.
Performance Performance Counters (Memory pages/sec) The rate at which pages are read from or written to disk. If > 100 on a slow disk subsystem or > 600 on a fast disk subsystem it should be investigated.
Performance Conclusions 1: Compared to various disk configurations of physical implementations with local storage, we experience 10x performance for disk subsystem I/O 2: Critical performance counters SQL maintains industry acceptable performance 3: Ability to consolidate multiple VMs along with SQL server, ~5 to 7:1 to save costs on physical hardware, rack space, power, cooling and integrate into DR plan.
Single SQL server per ESX host Multiple SQL Servers never reside on the same ESX host Logical placement of VM’s to eliminate contention of resources Web and App server never communicate with the SQL server on the same host Consolidation Architecture
High Availability SAN Storage Mirrored for Disaster Recovery to Chicago Datacenter. Quarterly Failover Tests of key SQL Servers on Chicago Network with zero data loss at failover.
High Availability VMotion Eliminates need for Mirroring Solutions for Hardware redundancy. In the event of a host hardware failure the Virtual Machines can be hot migrated to another host Allows live migration of Servers during high load operation to shift load to more powerful hosts as needed.
Manageability Snapshot Technology Provides a point in time recovery point for risky operations such as upgrading Server OS and or SQL Server. Hot Add Disk Arrays Allows Zero Down Time additions of new Storage LUNS as database grow in size HBA Load Balancing Allows Disk I/O Load balancing across redundant paths to the SAN storage.
Manageability Rapid Scale Up Adding additional vCPU’s and Memory is only a reboot away, provided the host has available resources. Easily upsize a server for end of month processing when it requires the most power while minimizing its footprint while under minimal load.
Manageability VMWare Infrastructure Client Immediate shared console level access to the SQL Server provides remote administration and rapid response for critical server outages. No risk of a BSOD requiring physical access to the server or of a stuck ILO interface on a server. Integrated performance monitoring of processor, memory, disk and networking counters.
Manageability VMWare Infrastructure Client (cont’d) Running history of Events occurring through the client including Server Resets, Migrations, and Reconfigurations. Integrated Alarms are configurable for out of tolerance counter statuses.
Lessons Learned 1: Don’t lock pages in memory if you plan to do VMotion. 2: Be cautious when implementing Microsoft Recommended Best Practices that affect system configurations and test in Development before deploying in Production. Some may not be compatible with VMotion. 3: Don’t accept a vendor’s statement that VMware is the problem – look deeper, and you can generally disprove this statement.
Conclusions Performance Metrics I/O Metrics – 10x performance, Processor and Performance Counters High Availability Configuration VMotion to load balance during end of month reconciliations keeps ahead of the business Reduced downtime with hot add disk Manageability Gains SOX compliance for Disaster Recovery Virtual Center Console for team collaboration Cost Avoidance Reduces need for Mirroring hardware and software Consolidation of hardware both primary and DR