Wait Stats and You
Bio – Greg Celentano SQL DBA – Manufacturing ERP and MES Systems Reporting (SSRS, PowerBI, etc.) Wife, 2yo daughter, and grey hair Active CrossFit’er RI SQL User Group
https://safetyismygoal. files. wordpress https://safetyismygoal.files.wordpress.com/2013/02/another-traffic-pic.jpg http://c8.alamy.com/comp/EYGAWE/a-view-of-traffic-moving-along-clifton-boulevard-in-nottingham-nottinghamshire-EYGAWE.jpg http://i.kinja-img.com/gawker-media/image/upload/s--p6pmuHVq--/jseuwpjcap6lnvyoyizx.jpg
What are they? Well…..Its what SQL is waiting on An insight into the stress or lack there of in the system Tell the best story over a span of time Ad hoc server wait stats queries are as good as your last reboot or manually cleared out To Clear them out - DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR); To see wait stat counters - SELECT * FROM sys.dm_os_wait_stats To see when SQL was last started - SELECT sqlserver_start_time from sys.dm_os_sys_info
MSFT BOL Definition – Wait Stats Resource waits - occur when a worker requests access to a resource that is not available because the resource is being used by some other worker or is not yet available. Examples of resource waits are locks, latches, network and disk I/O waits. Lock and latch waits are waits on synchronization objects Queue waits - occur when a worker is idle, waiting for work to be assigned. Queue waits are most typically seen with system background tasks such as the deadlock monitor and deleted record cleanup tasks. These tasks will wait for work requests to be placed into a work queue. Queue waits may also periodically become active even if no new packets have been put on the queue. External waits - occur when a SQL Server worker is waiting for an external event, such as an extended stored procedure call or a linked server query, to finish. When you diagnose blocking issues, remember that external waits do not always imply that the worker is idle, because the worker may actively be running some external code.
How to track them sys.dm_os_wait_stats (comes with some overhead out of the box) Active transaction queries – Adam Mechanic’s WhoIsActive 3rd party monitoring tools Paul Randal Tell Me Where It Hurts Script – Good overall assement https://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/
What to do with wait stat data Grab top x wait stats in intervals Every hour, Every 10 minutes, etc. Write that data to a table Create reports off that table to trend over time Use SSRS, PowerBI, or excel to make it look pretty Mangers and Executives love pretty graphs Trend with 3rd party utilities Red Gate, Spotlight, Database Health Monitor (FREE)
https://www. brentozar https://www.brentozar.com/first-aid/first-responder-kit-power-bi-dashboard/
Wait Stat, Extra weight You don’t need all the wait stats Some wait stats are system related SQLTRACE_INCREMENTAL_FLUSH_SLEEP; SQLTRACE_BUFFER_FLUSH; LAZYWRITER_SLEEP; XE_TIMER_EVENT; XE_DISPATCHER_WAIT; FT_IFTS_SCHEDULER_IDLE_WAIT; LOGMGR_QUEUE; CHECKPOINT_QUEUE; BROKER_TO_FLUSH; BROKER_TASK_STOP; BROKER_EVENTHANDLER; SLEEP_TASK; WAITFOR; DBMIRROR_DBM_MUTEXDBMIRROR_EVENTS_QUEUEDBMIRRORING_CMD; DISPATCHER_QUEUE_SEMAPHORE; BROKER_RECEIVE_WAITFOR; CLR_AUTO_EVENT; DIRTY_PAGE_POLL; HADR_FILESTREAM_IOMGR_IOCOMPLETION; ONDEMAND_TASK_QUEUE; FT_IFTSHC_MUTEX; CLR_MANUAL_EVENT; SP_SERVER_DIAGNOSTICS_SLEEP; REQUEST_FOR_DEADLOCK_SEARCH;
Top Waits to watch for CXPACKET and SOS_Scheduler_Yield – CPU BackIO – backup thread waiting complete Async Network IO – application/network related wait LCK_M_xxx – Lock waits PAGExxx – Latch waits OLEDB – Remote calls to return data, Linked servers, Bulk Inserts, etc WriteLog – Logged transactions Any PREEMTIVE – OS interruptions https://www.brentozar.com/sql/wait-stats/ https://www.spotlightessentials.com/waitopedia
Lock Waits Popular Lock Types LCK_M_(type) Shared (S) Update (U) Exclusive (X) Intent (I) Schema (Sch) http://www.sqlteam.com/article/introduction-to-locking-in-sql-server https://msdn.microsoft.com/En-US/library/ms186396.aspx
Latch Waits Often confused with Locks A latch can be defined as an object that ensures data integrity on other objects in SQL Server memory, particularly pages. Buffer class waits PAGELATCH_(type) IO Class waits PAGEIOLATCH_(type) Popular types include SHARED (SH), UPDATE (UP), EXCLUSIVE (EX) sys.dm_os_latch_stats to monitor Latches Another nice DMV to collect over time https://www.mssqltips.com/sqlservertip/3088/explanation-of-sql-server-io-and-latches/
Popular Async and Backup Wait types Async Network IO – Waiting on the application/network Async IO Completion – Waiting on Disk BackupIO/BackupBuffer– The backup is waiting data to write backup data 3rd party utilities (tape drives, disk staging, etc) BackupThread – Backup is waiting to finish Check out https://www.spotlightessentials.com/waitopedia
Wait Stat Case Study AsyncNetworkIO was always at the top CPU and other resources were not maxed out All best practices from the vendor were followed to the letter User environment was a mix of VDI virtual clients and physical PC’s or laptops VDI clients saw the slowest performance (majority of the users) Physical machines saw the best performance Production servers were on the worst OS – Win2012 (non R2)
Case Study – continued TCPING times were high 5 – 7ms on average from Applications servers to SQL Server Normal should be sub 1 – 2ms Used a utility supplied by the software vendor to test latency SQL test had a round trip of 5 secs for the test operation preformed No glaring issues on the SQL side Some re-indexing was done Statistics were looked at Again, all CPU, IO, Memory looked fine and no signs of stress
What was done to fix this After many, many, consulting hours we figured it out internally A best practice was broken Replaced the VM Ware VM3 NIC with the Intel Tuned the NIC settings Trial and Error
What does breaking the rules do to wait stats?
Database Health Monitor Tool – Steve Stedman
Training the team Make your development group aware of how you (the DBA) monitor traffic Have a single pane of glass to monitor different scenarios Firefighting, in the moment troubleshooting Long term trends Response kits Work towards better solutions to reduce problems
Resources Used Tonight Steve Stedman Database Health Monitor (Free) Bing - Paul Randal Wait Stats Bing – Brent Ozar Waits Waitopedia
SOS Scheduler Waits on VM https://www.sqlskills.com/blogs/paul/increased-sos_scheduler_yield- waits-on-virtual-machines/ Capturing Baslines w/ Erin http://www.sqlservercentral.com/articles/baselines/96270/ Lock and Latches https://blog.sqlauthority.com/2014/03/16/sql-server-what-is-the- difference-between-latches-and-locks/