Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diagnosing Storage IO Latency

Similar presentations


Presentation on theme: "Diagnosing Storage IO Latency"— Presentation transcript:

1 Diagnosing Storage IO Latency
From SQL Server Through windows

2 What we will discuss today:
How to identify within SQL Server if IO latency is a likely cause of slow performance. How to use tools such as performance counters and storport tracing to confirm an IO latency issue and isolate the source. How to use WPR to track down IO latency which has been isolated to layers within the scope of the Windows OS.

3 OK, so SQL Server is slow? Start by finding out the predominant waits. We should query both the cumulative wait information for the instance (sys.dm_os_wait_stats) and the per-query wait information (sys.dm_exec_requests/sys.dm_os_waiting_tasks/Xevent wait_completed or wait_info) Do you see these? PAGEIOLATCH_* When a SQL Server is going to read a buffer page from disk, an exclusive latch on the buffer is acquired (_EX). This is necessary because of course the content of that page are about to be written to the buffer. This is immediately followed by an attempt to acquire a shared latch (_SH) which is of course blocked by the EX – this blocking is what will show in wait_stats as PAGEIOLATCH_SH. IO_COMPLETION Non-data page IO operations. ASYNC_IO_COMPLETION File growth BACKUPIO Waiting for a backup-specific IO operation BACKUPBUFFER This occurs when all of the buffers allocated to hold data waiting to be flushed to a backup are full. This is often accompanied by BACKUPIO which would indicate the buffers may be full due insufficient flush rate caused by IO latency. WRITELOG Writing to the transaction log BIGINT; = SUM(wait_time_ms) FROM sys.dm_os_wait_stats WHERE wait_type NOT LIKE '%IDLE%' AND wait_type NOT LIKE '%QUEUE%' AND wait_type NOT LIKE '%SLEEP%' AND wait_type NOT LIKE '%BROKER%'; SELECT wait_type ,waiting_tasks_count ,wait_time_ms ,max_wait_time_ms ,(wait_time_ms / (waiting_tasks_count +1)) WaitTimePerTask AS numeric(18,2)) [PercentOfTotalWaiting] AND wait_type NOT LIKE '%BROKER%' ORDER BY PercentOfTotalWaiting DESC WITH Threads AS ( session_id , COUNT(1) [ThreadsUsed] FROM sys.dm_os_threads th JOIN sys.dm_os_tasks ta ON th.worker_address = ta.worker_address WHERE session_id IS NOT NULL GROUP BY ta.session_id ) t.session_id , t.ThreadsUsed , er.wait_type , wt.exec_context_id , COALESCE(wt.wait_duration_ms, er.wait_time, 'N/A') [wait_ms] , COALESCE(wt.resource_description, er.wait_resource, 'N/A') [wait_resource] , COALESCE(st.text, 'N/A') [Statement] , COALESCE(qp.query_plan, 'N/A') [Statement] FROM sys.dm_exec_requests er LEFT JOIN sys.dm_os_waiting_tasks wt ON wt.session_id = er.session_id LEFT JOIN Threads t ON t.session_id = er.session_id OUTER APPLY sys.dm_exec_sql_text(er.plan_handle) st OUTER APPLY sys.dm_exec_query_plan(er.plan_handle) qp WHERE er.session_id = 98 ORDER BY wait_time DESC CREATE EVENT SESSION [FindWaits] ON SERVER ADD EVENT sqlos.wait_info( ACTION(sqlserver.plan_handle,sqlserver.sql_text) ),-- WHERE ([sqlserver].[session_id]=(98))), ADD EVENT sqlserver.query_post_execution_showplan( ACTION(sqlserver.plan_handle))--If there's a particular session interesting ADD TARGET package0.event_file(SET filename=N'IO_Issues',max_file_size=(128),max_rollover_files=(32)), ADD TARGET package0.histogram(SET filtering_event_name=N'sqlos.wait_info',slots=(20),source=N'wait_type',source_type=(0)) WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF) GO WITH HistogramData AS( hd.slot_data.value('(value)[1]', 'varchar(256)') [WaitType], 'varchar(256)') [Occurrences] FROM ( CAST(xt.target_data AS xml) as target_data FROM sys.dm_xe_session_targets AS xt JOIN sys.dm_xe_sessions AS xs ON (xs.address = xt.event_session_address) WHERE xs.name = 'FindWaits' and target_name='histogram' ) as t CROSS APPLY t.target_data.nodes('//HistogramTarget/Slot') AS hd (slot_data) )SELECT mv.map_value [Wait_Type], hd.Occurrences FROM HistogramData hd JOIN sys.dm_xe_map_values mv ON mv.map_key = hd.WaitType AND mv.name = 'wait_types'

4 So it looks like it might be an IO latency issue…
Let’s drill down. All of the information has so far been seen from the perspective of the database engine. There are many layers below that which could affect IO latency. sys.io_virtual_file_stats A way to view IO statistics aggregated per file – from the perspective of SQL Server. Performance Counters (Logman/PerfMon) A way to view IO statistics aggregated per disk/volume from the perspective of the Windows drivers which implement the disk/volume abstractions in Windows. Storport tracing A way to view limited IO statistics aggregated per LUN from the perspective of the storport driver – the lowest-level storage IO drive in the Windows OS. SELECT db.name [Database] , af.filename [File] , fs.num_of_reads [Reads] , fs.num_of_writes [Writes] , fs.num_of_bytes_read [ReadBytes] , fs.num_of_bytes_written [WritenBytes] , fs.io_stall , CAST((fs.io_stall_read_ms/(fs.num_of_reads+0.001)) AS decimal(18,2)) [AvgReadLatency_ms] , CAST((fs.io_stall_write_ms/(fs.num_of_writes+0.001)) AS decimal(18,2)) [AvgwriteLatency_ms] , fs.io_stall_queued_read_ms , fs.io_stall_queued_write_ms FROM sys.databases db JOIN sys.sysaltfiles af ON af.dbid = db.database_id CROSS APPLY sys.dm_io_virtual_file_stats(db.database_id, af.fileid) fs ORDER BY io_stall_read_ms DESC, io_stall_write_ms DESC

5 The Road to Disk Has Many Tunnels
Just because SQL Server shows signs of IO latency doesn’t mean that there’s necessarily a problem with the disks or storage appliance. We can capture high-level IO performance information with PerfMon at two layers: logical and physical disk counters. At the lowest level, we can capture IO latency information with a storport trace. If this reports latency, the issue is somewhere between the miniport driver and the disks. Perfmon Commands: Logman.exe create counter IOPerf-Data -o "c:\perflogs\IOPerf-Data.blg" -f bincirc -v mmddhhmm -max 300 -c "\LogicalDisk(*)\*" "\Memory\*" "\PhysicalDisk(*)\*" "\Processor(*)\*" "\Process(*)\*" "\System\*" -si 00:00:02 Logman.exe start IOPerf-Data Logman.exe stop IOPerf-Data Storport Commands: Logman.exe create trace “storport” -ow -o c:\perflogs\storport.etl -p “Microsoft-Windows-StorPort” 0xffffffffffffffff 0xff -nb bs mode Circular -f bincirc -max 4096 –ets logman stop “storport” –ets

6

7 Why not skip directly to storport tracing (without PerfMon)?
What does it mean when storport tracing shows little latency but the PerfMon counters show terrible latency? How can that be narrowed down? FLTMC filters -> heuristic approach WPR tracing -> investigative approach

8 Analyzing Storport Using Storport PacMan
The real benefit is that it takes much of the work out for you!

9 Mapping the LUN:

10 Ok, so PerfMon numbers are bad, but Storport numbers are good
wpr -start diskio -start cpu –filemode wpr -stop e:\wpr_trace.etl Don’t forget to Trace->Load Symbols Graph Explorer – typically two areas of interest: Storage->Disk Usage->Utilization by Process, Path Name, Stack Computation->CPU Usage (Sampled)->Utilization by Process, Thread, Stack Obviously there are many other interesting areas! Check out wpr -? and wpr –profiles for more fun!

11 WPR So what happens when your security team absolutely refuses to remove the antivirus filter driver until more evidence is provided that it’s causing a problem? That’s where WPR tracing comes in.

12 Additional Resources Registered Filter driver Lookup: SQL Server Antivirus Exclusions: computers-that-run-sql-server Deeper Dive into Storport/WPR Analysis:

13


Download ppt "Diagnosing Storage IO Latency"

Similar presentations


Ads by Google