Proving Hardware Bottlenecks & Architecting For Performance Matt Henderson: Principal Solutions Architect
Agenda I/O and databases What to look for Monitoring tools & techniques What could be…
RAM versus Storage SQL Buffer PLE: Page Life Expectancy Working Set MRU – LRU chain Temp tables & 2nd reads PLE: Page Life Expectancy How long a page lives in the chain before being cycled out MSFT recommends over 300 (5 minutes) Working Set How much data do you want/NEED in RAM? Database size isn’t relevant What’s being worked on now, what will be in the near future? Workload profile What data are the users hitting? Is there any way to predict the future data page hits?
CPU Schedulers & Times Slices One SQL scheduler per logical core SQL scheduler time-slices between users
Waits & Queues Running Waiting Runnable Currently executing process Waiting for a resource (IO, network, locks, latches, etc) Runnable Resource ready, waiting to get on CPU
Wait Statistics: Guided Tuning Waits: What is keeping the SQL engine from continuing? Application (SQL) Hardware Architecture Tuning sys.dm_os_wait_stats WaitType Wait_S Resource_S Signal_S WaitCount Percentage AvgWait_S AvgRes_S AvgSig_S BROKER_RECEIVE_WAITFOR 661.36 4 44.6 165.3388 LCK_M_IS 139.46 139.35 0.11 489 9.4 0.2852 0.285 0.0002 LCK_M_X 96.86 96.54 0.32 373 6.53 0.2597 0.2588 0.0009 LCK_M_U 83.93 83.91 0.02 32 5.66 2.6227 2.6221 0.0006 PAGEIOLATCH_SH 83.92 83.84 0.08 9835 0.0085 LCK_M_S 82.44 82.1 0.33 419 5.56 0.1967 0.1959 0.0008 ASYNC_NETWORK_IO 54.4 53.61 0.79 33146 3.67 0.0016 ASYNC_IO_COMPLETION 43.1 37 2.91 1.1649 BACKUPIO 42.22 42.19 0.03 12607 2.85 0.0033 BACKUPBUFFER 36.64 36.48 0.15 2175 2.47 0.0168 0.0001 LCK_M_IX 30.88 30.85 130 2.08 0.2376 0.2373 0.0003 IO_COMPLETION 28.12 28.11 0.01 2611 1.9 0.0108 CXPACKET 23.27 21.6 1.67 3542 1.57 0.0066 0.0061 0.0005 PREEMPTIVE_OS_CREATEFILE 18.84 247 1.27 0.0763
CPU Utilization & Storage CPU’s are faster than hard drives Adding cores does not correct issue 8 cores @ 10% = 1 core @ 80% 40% utilization = 60% over paying
Reduce I/O Wait: Maximize CPU Utilization < 0.2ms Maximize CPU utilization
Accelerate & Consolidate Same CPU’s Same work Less time Consolidate Fewer cores = few licenses
I/O Patterns: MAXDOP & OLTP BI: MAXDOP Dozens of parallel workers Large batch reads (256k) Sequential per worker Bandwidth limited Latency spikes effect parallel completion Flushes buffer = low PLE OLTP 8k page reads, 4k tran log writes 70r/30w common I/O mix 8k read latency effects performance more than 4k log write Random I/O pattern Clustered Index matters PKs, FKs, multi-table & multi-step xacts
SQL & Windows Performance Monitoring PerfMon (Performance Monitor) Live & recorded stats Good for system, not SQL typeperf Command line tool; live stats typeperf "\LogicalDisk(*)\Avg. Disk sec/Read" Resource Monitor Live file stats, not recorded Good for processes and files SQL DMV’s File: sys.dm_io_virtual_file_stats Waits: sys.dm_os_wait_stats I/O: sys.dm_io_pending_io_requests Objects: sys.dm_os_waiting_tasks Queries: sys.dm_exec_query_stats Indexes: sys.dm_db_index_usage_stats
Architecting for I/O Rule of Many: At every layer utilize many of each component to reduce bottlenecks Virtual disks: 2-4x the existing LUNs Database files: many per LUN, copy files in parallel TempDB files: 1 per physical core MPIO paths: minimum 4 paths per port MPIO policy: Least Queue Depth Physical ports: 825MB/s per FC16 port LUNs: 2GB/s per LUN Parallelization: Use many objects and many processes to increase parallel workloads Spread transactions over pages Use a switch (path multiplier) Increase MAXDOP (and test) I/O Latency: Is Sacred. Don’t add anything to the I/O path that doesn’t need to be there LVM (Logical Volume Manager) Virtualization Compression De-dup
Data Warehouse Fast Track: CPU Efficiency & Performance Broadwell, 2 CPU, 12-core, 3.0GHz, 8 x 16Gb FC Vendor Rating CPU Cores Efficiency XtremeIO 20TB 32 0.63 Leveovo (NVMe DAS) 22TB 40 0.55 Pure 23TB 0.58 Lenovo (TMS) 60TB 56 1.07 Vexata 70TB 48 1.46 AVG throughput Over 8 GB/s PEAK throughput Over 13 GB/s
DEMO