Become a SQL Server Performance Detective Danette Dineen Riviello Magellan Health June 6, 2015
Goal To learn ways to collect and interpret the data available in SQL Server 2008 and above to determine the culprit in chronic or emergent performance issues.
Game Plan What triggers an Investigation? Emergent Performance Issues Chronic Performance Problems Solving the Case
Open a case Increase in User Complaints Application Timeouts Long-running queries Open Transactions Chain of blocking
Look for Clues Look at all running processes sp_who2 active Look for one login or one database: SELECT spid, [status], loginame [Login],hostname, blocked BlkBy, Db_name(dbid) DBName, cmd Command, cpu CPUTime, physical_io DiskIO, last_batch LastBatch, [program_name] ProgramName FROM master.dbo.sysprocesses where [status] not in ('sleeping') and loginame like '%login%‘ And Db_name(dbid) = ‘DBName’ ORDER BY dbname
Apprehend the Culprit Look for the lead of a blocking chain SELECT spid,sp.STATUS ,loginame = SUBSTRING(loginame, 1, 12) ,hostname = SUBSTRING(hostname, 1, 12) ,blk = CONVERT(CHAR(3), blocked) ,open_tran ,dbname = SUBSTRING(DB_NAME(sp.dbid),1,10) ,cmd,waittype,program_name ,waittime ,last_batch ,SQLStatement = SUBSTRING ( qt.text, er.statement_start_offset/2, (CASE WHEN er.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(MAX), qt.text)) * 2 ELSE er.statement_end_offset END - er.statement_start_offset)/2 ) FROM master.dbo.sysprocesses sp LEFT JOIN sys.dm_exec_requests er ON er.session_id = sp.spid OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) AS qt WHERE spid IN (SELECT blocked FROM master.dbo.sysprocesses) AND blocked = 0
Gather Evidence - Query Look at object locks SELECT resource_type, db_name(resource_database_id) "DatabaseName", object_name(resource_associated_entity_id) "ObjectName", request_status, request_mode,request_session_id, resource_description FROM sys.dm_tran_locks sl JOIN sys.objects so ON SO.object_id = sl.resource_associated_entity_id WHERE resource_type = 'OBJECT'
Gather Evidence - Results
Decipher Lock Modes Sch-S Schema stability. Sch-M Schema modification. Shared. U Update. X Exclusive. IU Intent Update. IX Intent Exclusive. IS Intent Shared. SIU Shared Intent Update. SIX Shared Intent Exclusive. UIX Update Intent Exclusive. BU Bulk Update.
Collect further clues Look for open transactions SELECT spid, [status], loginame [Login],hostname, blocked BlkBy, Db_name(dbid) DBName, cmd Command, cpu CPUTime,`` physical_io DiskIO, last_batch LastBatch, [program_name] ProgramName FROM master.dbo.sysprocesses WHERE open_tran>0 ORDER BY spid
Investigate Further DBCC OPENTRAN Oldest active transaction: SPID (server process ID): 770 UID (user ID) : -1 Name : user_transaction LSN : (1035423:31630:1) Start time : May 22 2015 11:55:01:713AM SID : 0x7edf25dd64e37049b598df28cd124355 Replicated Transaction Information: Oldest distributed LSN : (1035392:5570:1) Oldest non-distributed LSN : (1035323:31630:1) DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Search for the “Smoking Gun” When a stored procedure is performing poorly, run the following query to figure out what line of code it is running: SELECT [Spid] = session_Id , ecid , [Database] = DB_NAME(sp.dbid) , [User] = nt_username , [Status] = er.status , [Wait] = wait_type , [Individual Query] = SUBSTRING (qt.text, er.statement_start_offset/2, (CASE WHEN er.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2 ELSE er.statement_end_offset END - er.statement_start_offset)/2) ,[Parent Query] = qt.text , Program = program_name , Hostname , nt_domain , start_time FROM sys.dm_exec_requests er INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) as qt WHERE session_Id = @SPID -- where @SPID is the one in Question ORDER BY 1, 2
Investigate the environment What has changed? Look at default system trace:
Inspect System Trace Data Look for recent changes Look at the Log directory for prior files
Identify Chronic Offenders To find most expensive stored procedures: SELECT TOP 100 d.object_id, d.database_id, OBJECT_NAME(object_id, database_id) 'proc name', d.cached_time, d.last_execution_time, d.total_elapsed_time, d.total_elapsed_time/d.execution_count AS [avg_elapsed_time], d.last_elapsed_time, d.execution_count FROM sys.dm_exec_procedure_stats AS d ORDER BY [total_worker_time] DESC;
Prioritize Worst Offenders Most expensive Stored procedure runs
Find Costliest Procedures Number of Days from cache time and most recent execution is about 15 – 16 days Number of times executed during time period Time procedure plan was cached Avg elapsed time multiplied by number of executions
Set up a Profiler trace Job - Why Less impact than an interactive trace Can load trace data on an alternate server Can load trace data at a different time of day Capture specific parameters passed Compare same time of day on different days
Set up a Profiler trace Job - How
Set up a Profiler trace Job - How
Inspect the Profiler Trace File Load the trace file to another server select * into dbo.tmp_loadtraceFile_ServerA_20150201_8 FROM ::fn_trace_gettable(‘d:\trace_20150201_8.trc', 1) Query trace file to find commands that are calling the suspected stored procedure select top 25 textdata, loginname, spid, duration, starttime, endtime, reads, cpu From dbo.tmp_loadtraceFile_ServerA_20150201_8 Where textdata like ‘%offendingproc%’ Order by duration desc
Inspect Profiler Trace Data
Retrieve the query plan Query to get query plans from DMV:
Examine long running procedures Look at the query plan Missing index or wrong index chosen? Look at the parameters sent in Check for other runs that perform better Could it be a parameter sniffing issue?
Parameter Sniffing - Defined Query plan developed based on the first values passed to the procedure Pros: Saves time: only one compile needed Cons: Wrong query plan chose
Parameter Sniffing - Detection Look at Query plans If one procedure performs well in one case and not others Do the index choices make sense?
Parameter Sniffing - Example
Parameter Sniffing – Solutions Do Nothing Force Recompile each run (expensive!) Query Hints (OPTIMIZE FOR) Break down stored procedures to handle specific cases Education users on best parameter choices
Rule out other culprits Check for table scans caused by: Missing index Broad “where” clause Check for improper join (many-to-many) Check for too many tables in one join Use of a function in a large query result set
Search for missing Indexes SELECT migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure, 'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle) + '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']' + ' ON ' + mid.statement + ' (' + ISNULL (mid.equality_columns,'') + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL (mid.inequality_columns, '') + ')' + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement, migs.*, mid.database_id, mid.[object_id] FROM sys.dm_db_missing_index_groups mig INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10 ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
Solve the Case! Solution may change over time Tables grow Statistics out of date Parameter Sniffing Some problems result from multiple issues Do least disruptive changes first: Add an index Close open connections
Questions? Thank you for attending! Further questions: