Diagnosing Storage IO Latency

Slides:



Advertisements
Similar presentations
Monitoring and Testing I/O
Advertisements

DAT 342 Advanced SQL Server Performance and Tuning Bren Newman Program Manager SQL Server Development Microsoft Corporation.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
SQL Server Wait Statistics Capture, Report, Analyse Rob Risetto Principal Consultant with StrataDB
Module 20 Troubleshooting Common SQL Server 2008 R2 Administrative Issues.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
11 MONITORING MICROSOFT WINDOWS SERVER 2003 Chapter 3.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
Virtual Memory Tuning   You can improve a server’s performance by optimizing the way the paging file is used   You may want to size the paging file.
Module 15: Monitoring. Overview Formulate requirements and identify resources to monitor in a database environment Types of monitoring that can be carried.
1 Chapter Overview Monitoring Server Performance Monitoring Shared Resources Microsoft Windows 2000 Auditing.
Module 18 Monitoring SQL Server 2008 R2. Module Overview Monitoring Activity Capturing and Managing Performance Data Analyzing Collected Performance Data.
Key Concepts About Performance Factors Affecting SQL Performance SQL Performance Tuning Methodologies SQL Performance Tuning Tools 1.
Danette Dineen Riviello Magellan Health March 17,
Learningcomputer.com SQL Server 2008 – Profiling and Monitoring Tools.
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Perfmon and Profiler 101.
SQLRX – SQL Server Administration – Tips From the Trenches SQL Server Administration – Tips From the Trenches Troubleshooting Reports of Sudden Slowdowns.
Troubleshooting SQL Server Performance: Tips &Tools Amit Khandelwal.
Monitoring and Managing Server Performance. Server Monitoring To become familiar with the server’s performance – typical behavior Prevent problems before.
Presented by Vishy Grandhi.  Lesson 1: AX Overview  Lesson 2: Role based security  Lesson 3: Monitoring  Troubleshooting.
Connect with life Praveen Srivatsa Founder and CEO – AstraSoft.NET Vinod Kumar Technology Evangelist – Databases and BI.
Diagnosing Performance with Wait Statistics Robert L Davis Principal Database
Copyright Sammamish Software Services All rights reserved. 1 Prog 140  SQL Server Performance Monitoring and Tuning.
Troubleshoot Customer Performance Problems Like a Microsoft Engineer Tim Chapman Senior Field Engineer, Microsoft.
Response Time Analysis A Methodology Around SQL Server Wait Types Dean Richards.
SQL Advanced Monitoring Using DMV, Extended Events and Service Broker Javier Villegas – DBA | MCP | MCTS.
Using Correlated Tracing to Diagnose Query Level Performance What’s slowing down my app? Jerome Halmans Senior Software Development Engineer Microsoft.
No more waiting. Sponsors About me  Database Technology Specialist  MVP  Blogger  Author 3
Multiprogramming. Readings r Chapter 2.1 of the textbook.
An introduction to Wait Statistics
Troubleshooting SQL Server high CPU usage
Process Management Process Concept Why only the global variables?
What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm
Building a Performance Monitoring System using XEvents and DMVs
Query Performance Tuning: Start to Finish
Hands-On Microsoft Windows Server 2008
MCTS Guide to Microsoft Windows 7
SQL Server Monitoring Overview
MONITORING MICROSOFT WINDOWS SERVER 2003
Extend Your Knowledge with Extended Events!
Medlemsträff i Stockholm
Where to Start, What You Need
Microsoft Dumps Question Answer - Dumps4download
Optimizing SQL Server Performance in a Virtual Environment
Root Cause Analysis with DMVs
Upgrading to Microsoft SQL Server 2014
Building a Performance Monitoring System using XEvents and DMVs
Chapter 3: Windows7 Part 2.
Troubleshooting SQL Server Basics
SQLSaturday 393- May 16, Redmond, WA
Proving Hardware Bottlenecks &
What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm
Chapter 3: Windows7 Part 2.
මොඩියුල විශ්ලේෂණය SQL Server Waits. Tables රැසක් එකට එකතු කිරීම.
Troubleshooting Techniques(*)
It’s TEMPDB Why Should You Care?
Статистика ожиданий или как найти место "где болит"
PROCESSES & THREADS ADINA-CLAUDIA STOICA.
Software - Operating Systems
Transaction Log Performance Tuning
Using wait stats to determine why my server is slow
Chapter 6 – Distributed Processing and File Systems
Developing Microsoft SQL Server Databases
Inside the Database Engine
Presentation transcript:

Diagnosing Storage IO Latency From SQL Server Through windows

What we will discuss today: How to identify within SQL Server if IO latency is a likely cause of slow performance. How to use tools such as performance counters and storport tracing to confirm an IO latency issue and isolate the source. How to use WPR to track down IO latency which has been isolated to layers within the scope of the Windows OS.

OK, so SQL Server is slow? Start by finding out the predominant waits. We should query both the cumulative wait information for the instance (sys.dm_os_wait_stats) and the per-query wait information (sys.dm_exec_requests/sys.dm_os_waiting_tasks/Xevent wait_completed or wait_info) Do you see these? PAGEIOLATCH_* When a SQL Server is going to read a buffer page from disk, an exclusive latch on the buffer is acquired (_EX). This is necessary because of course the content of that page are about to be written to the buffer. This is immediately followed by an attempt to acquire a shared latch (_SH) which is of course blocked by the EX – this blocking is what will show in wait_stats as PAGEIOLATCH_SH. IO_COMPLETION Non-data page IO operations. ASYNC_IO_COMPLETION File growth BACKUPIO Waiting for a backup-specific IO operation BACKUPBUFFER This occurs when all of the buffers allocated to hold data waiting to be flushed to a backup are full. This is often accompanied by BACKUPIO which would indicate the buffers may be full due insufficient flush rate caused by IO latency. WRITELOG Writing to the transaction log DECLARE @TotalWaiting BIGINT; SELECT @TotalWaiting = SUM(wait_time_ms) FROM sys.dm_os_wait_stats WHERE wait_type NOT LIKE '%IDLE%' AND wait_type NOT LIKE '%QUEUE%' AND wait_type NOT LIKE '%SLEEP%' AND wait_type NOT LIKE '%BROKER%'; SELECT wait_type ,waiting_tasks_count ,wait_time_ms ,max_wait_time_ms ,(wait_time_ms / (waiting_tasks_count +1)) WaitTimePerTask ,CAST((((wait_time_ms+0.0001)/@TotalWaiting)*100) AS numeric(18,2)) [PercentOfTotalWaiting] AND wait_type NOT LIKE '%BROKER%' ORDER BY PercentOfTotalWaiting DESC WITH Threads AS ( session_id , COUNT(1) [ThreadsUsed] FROM sys.dm_os_threads th JOIN sys.dm_os_tasks ta ON th.worker_address = ta.worker_address WHERE session_id IS NOT NULL GROUP BY ta.session_id ) t.session_id , t.ThreadsUsed , er.wait_type , wt.exec_context_id , COALESCE(wt.wait_duration_ms, er.wait_time, 'N/A') [wait_ms] , COALESCE(wt.resource_description, er.wait_resource, 'N/A') [wait_resource] , COALESCE(st.text, 'N/A') [Statement] , COALESCE(qp.query_plan, 'N/A') [Statement] FROM sys.dm_exec_requests er LEFT JOIN sys.dm_os_waiting_tasks wt ON wt.session_id = er.session_id LEFT JOIN Threads t ON t.session_id = er.session_id OUTER APPLY sys.dm_exec_sql_text(er.plan_handle) st OUTER APPLY sys.dm_exec_query_plan(er.plan_handle) qp WHERE er.session_id = 98 ORDER BY wait_time DESC CREATE EVENT SESSION [FindWaits] ON SERVER ADD EVENT sqlos.wait_info( ACTION(sqlserver.plan_handle,sqlserver.sql_text) ),-- WHERE ([sqlserver].[session_id]=(98))), ADD EVENT sqlserver.query_post_execution_showplan( ACTION(sqlserver.plan_handle))--If there's a particular session interesting ADD TARGET package0.event_file(SET filename=N'IO_Issues',max_file_size=(128),max_rollover_files=(32)), ADD TARGET package0.histogram(SET filtering_event_name=N'sqlos.wait_info',slots=(20),source=N'wait_type',source_type=(0)) WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF) GO WITH HistogramData AS( hd.slot_data.value('(value)[1]', 'varchar(256)') [WaitType], hd.slot_data.value('(@count)[1]', 'varchar(256)') [Occurrences] FROM ( CAST(xt.target_data AS xml) as target_data FROM sys.dm_xe_session_targets AS xt JOIN sys.dm_xe_sessions AS xs ON (xs.address = xt.event_session_address) WHERE xs.name = 'FindWaits' and target_name='histogram' ) as t CROSS APPLY t.target_data.nodes('//HistogramTarget/Slot') AS hd (slot_data) )SELECT mv.map_value [Wait_Type], hd.Occurrences FROM HistogramData hd JOIN sys.dm_xe_map_values mv ON mv.map_key = hd.WaitType AND mv.name = 'wait_types'

So it looks like it might be an IO latency issue… Let’s drill down. All of the information has so far been seen from the perspective of the database engine. There are many layers below that which could affect IO latency. sys.io_virtual_file_stats A way to view IO statistics aggregated per file – from the perspective of SQL Server. Performance Counters (Logman/PerfMon) A way to view IO statistics aggregated per disk/volume from the perspective of the Windows drivers which implement the disk/volume abstractions in Windows. Storport tracing A way to view limited IO statistics aggregated per LUN from the perspective of the storport driver – the lowest-level storage IO drive in the Windows OS. SELECT db.name [Database] , af.filename [File] , fs.num_of_reads [Reads] , fs.num_of_writes [Writes] , fs.num_of_bytes_read [ReadBytes] , fs.num_of_bytes_written [WritenBytes] , fs.io_stall , CAST((fs.io_stall_read_ms/(fs.num_of_reads+0.001)) AS decimal(18,2)) [AvgReadLatency_ms] , CAST((fs.io_stall_write_ms/(fs.num_of_writes+0.001)) AS decimal(18,2)) [AvgwriteLatency_ms] , fs.io_stall_queued_read_ms , fs.io_stall_queued_write_ms FROM sys.databases db JOIN sys.sysaltfiles af ON af.dbid = db.database_id CROSS APPLY sys.dm_io_virtual_file_stats(db.database_id, af.fileid) fs ORDER BY io_stall_read_ms DESC, io_stall_write_ms DESC

The Road to Disk Has Many Tunnels Just because SQL Server shows signs of IO latency doesn’t mean that there’s necessarily a problem with the disks or storage appliance. We can capture high-level IO performance information with PerfMon at two layers: logical and physical disk counters. At the lowest level, we can capture IO latency information with a storport trace. If this reports latency, the issue is somewhere between the miniport driver and the disks. Perfmon Commands: Logman.exe create counter IOPerf-Data -o "c:\perflogs\IOPerf-Data.blg" -f bincirc -v mmddhhmm -max 300 -c "\LogicalDisk(*)\*" "\Memory\*" "\PhysicalDisk(*)\*" "\Processor(*)\*" "\Process(*)\*" "\System\*" -si 00:00:02 Logman.exe start IOPerf-Data Logman.exe stop IOPerf-Data Storport Commands: Logman.exe create trace “storport” -ow -o c:\perflogs\storport.etl -p “Microsoft-Windows-StorPort” 0xffffffffffffffff 0xff -nb 16 16 -bs 1024 -mode Circular -f bincirc -max 4096 –ets logman stop “storport” –ets

Why not skip directly to storport tracing (without PerfMon)? What does it mean when storport tracing shows little latency but the PerfMon counters show terrible latency? How can that be narrowed down? FLTMC filters -> heuristic approach WPR tracing -> investigative approach

Analyzing Storport Using Storport PacMan The real benefit is that it takes much of the work out for you! http://codebox/storportpacman https://blogs.technet.microsoft.com/askcore/2014/08/19/deciphering-storport-traces-101/

Mapping the LUN:

Ok, so PerfMon numbers are bad, but Storport numbers are good wpr -start diskio -start cpu –filemode wpr -stop e:\wpr_trace.etl Don’t forget to Trace->Load Symbols Graph Explorer – typically two areas of interest: Storage->Disk Usage->Utilization by Process, Path Name, Stack Computation->CPU Usage (Sampled)->Utilization by Process, Thread, Stack Obviously there are many other interesting areas! Check out wpr -? and wpr –profiles for more fun!

WPR So what happens when your security team absolutely refuses to remove the antivirus filter driver until more evidence is provided that it’s causing a problem? That’s where WPR tracing comes in.

Additional Resources Registered Filter driver Lookup: https://docs.microsoft.com/en-us/windows-hardware/drivers/ifs/allocated-altitudes SQL Server Antivirus Exclusions: https://support.microsoft.com/en-us/help/309422/choosing-antivirus-software-for- computers-that-run-sql-server Deeper Dive into Storport/WPR Analysis: https://docs.microsoft.com/en-us/windows-hardware/test/wpt/cpu-analysis