Monitor SQL Server Efficiently

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Advanced Tuning: Unconventional Solutions to Everyday Problems Robert L Davis.
SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
Gather SQL Server Performance Data with PowerShell
70-270, MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003 Chapter Thirteen Performing Network.
11 MONITORING MICROSOFT WINDOWS SERVER 2003 Chapter 3.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
SQL and System Center meet, then got down to business.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Connect with life Praveen Srvatsa Director | AsthraSoft Consulting Microsoft Regional Director, Bangalore Microsoft MVP, ASP.NET.
Today’s Agenda Chapter 12 Admin Tasks Chapter 13 Automating Admin Tasks.
Module 18 Monitoring SQL Server 2008 R2. Module Overview Monitoring Activity Capturing and Managing Performance Data Analyzing Collected Performance Data.
Course Topics Administering SQL Server 2012 Jump Start 01 | Install and Configure SQL Server04 | Manage Data 02 | Maintain Instances and Databases05 |
Key Concepts About Performance Factors Affecting SQL Performance SQL Performance Tuning Methodologies SQL Performance Tuning Tools 1.
Key Perf considerations & bottlenecks Windows Azure VM characteristics Monitoring TroubleshootingBest practices.
© Wiley Inc All Rights Reserved. MCSE: Windows Server 2003 Active Directory Planning, Implementation, and Maintenance Study Guide, Second Edition.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Agenda for Today Do Chapter 14 Final Project Review for Final.
Performance Dash A free tool from Microsoft that provides some quick real time information about the status of your SQL Servers.
Learningcomputer.com SQL Server 2008 – Profiling and Monitoring Tools.
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Perfmon and Profiler 101.
SQLRX – SQL Server Administration – Tips From the Trenches SQL Server Administration – Tips From the Trenches Troubleshooting Reports of Sudden Slowdowns.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Troubleshooting SQL Server Performance: Tips &Tools Amit Khandelwal.
Windows Server 2003 系統效能監視 林寶森
MISSION CRITICAL COMPUTING Siebel Database Considerations.
Your Data Any Place, Any Time Performance and Scalability.
Presented by Vishy Grandhi.  Lesson 1: AX Overview  Lesson 2: Role based security  Lesson 3: Monitoring  Troubleshooting.
ObjectCounterLook For Processor% Processor Time
Connect with life Praveen Srivatsa Founder and CEO – AstraSoft.NET Vinod Kumar Technology Evangelist – Databases and BI.
Diagnosing Performance with Wait Statistics Robert L Davis Principal Database
Copyright Sammamish Software Services All rights reserved. 1 Prog 140  SQL Server Performance Monitoring and Tuning.
Troubleshooting Dennis Shasha and Philippe Bonnet, 2013.
SQL Advanced Monitoring Using DMV, Extended Events and Service Broker Javier Villegas – DBA | MCP | MCTS.
Improve query performance with the new SQL Server 2016 query store!! Michelle Gutzait Principal Consultant at
No more waiting. Sponsors About me  Database Technology Specialist  MVP  Blogger  Author 3
SQL Database Management
AX Performance Tools Present and Future
Managing a database environment in the cloud
Monitoring Windows Server 2012
Smarter Technology for Better Business
An introduction to Wait Statistics
Monitoring SQL with System Center
Troubleshooting SQL Server high CPU usage
SQL Server Data Collector From Every Angle
Benchmarking the forgotten Role of Performance Tuning
Query Performance Tuning: Start to Finish
Welcome to SQL Saturday Denmark
MCTS Guide to Microsoft Windows 7
SQL Server Monitoring Overview
MONITORING MICROSOFT WINDOWS SERVER 2003
Software Architecture in Practice
Where to Start, What You Need
Microsoft Build /20/2018 5:17 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Root Cause Analysis with DMVs
Simplifying XEvents Management with dbatools
Get to know SysKit Monitor
Dev Test on Windows Azure Solution in a Box
Chapter 9: Virtual-Memory Management
SQL Server 2016 Query Data Store
Near Real Time ETLs with Azure Serverless Architecture
Targeting Wait Statistics with Extended Events
Troubleshooting Techniques(*)
Performance And Scalability In Oracle9i And SQL Server 2000
Using wait stats to determine why my server is slow
Presentation transcript:

Monitor SQL Server Efficiently Remus Rusanu #sqlsaturday565 24 Sept 2016, Bucharest 24 Sept 2017 SQLSaturday 565 Bucharest

whoami Worked in the SQL Server team with Microsoft since 2001 Now founder with DBHistory.com Spent many hours investigating performance On-call performance engineer for Azure SQL DB Troubleshooting based on telemetry alone 24 Sept 2017 SQLSaturday 565 Bucharest

Why do we monitor? Troubleshooting Post Mortem analysis Trending We need to investigate incidents as they occur and we need to measure right now On demand, high resolution, short duration, discardable Requires access to the system being monitored Post Mortem analysis After an incident we need to look back to understand why it occurred Contiguous, medium resolution, discardable after a grace period Missing data feedback loop Trending Understand long term direction, estimate capacity needs, justify spending requests Contiguous, low resolution, long retention Baselining Periodically collect same data we would collect in troubleshooting so we can compare an incident with normal activity On schedule, high resolution, medium duration, long retention Alerting Detect incidents, notify on-call team and trigger investigation Alerting coupled with automation: mitigation bots for know issues 24 Sept 2017 SQLSaturday 565 Bucharest

What do we monitor? Activity Capacity Availability Recoverability What is running Capacity Free Disk, space used CPU utilization IO utilization Memory use, paging Network use, bandwidth, latency Availability Uptime, SLA Errors Recoverability Backups Availability Groups Specific features Replication 24 Sept 2017 SQLSaturday 565 Bucharest

How do we monitor? Performance Counters DMVs XEvents ETW Logs The Golden Standard when it comes to measurement Easy to collect, low impact, rich toolset, cheap to store DMVs Difficult to collect, many require snapshot-store-and-compare Some have significant impact XEvents Abundant information Easy to filter at source Difficult to collect ETW Logs Event Notifications Query Store 24 Sept 2017 SQLSaturday 565 Bucharest

USE methodology http://www.brendangregg.com/usemethod.html Utilization, Saturation and Errors Identify resources in the system For each resource, identify metrics that represent utilization (in use vs. idle) and saturation (queueing, blocking, waiting). Identify errors indicators (events, logs etc) When investigating, iterate through resources Look at error indicators Look for saturation indicators Look for high utilization percentage Generic methodology, the trick is identifying resources and collecting/finding the associated metrics Can be applied at host level (CPU, IO, network) but also at SQL Server internals level 24 Sept 2017 SQLSaturday 565 Bucharest

SQL Server Query Execution http://rusanu 24 Sept 2017 SQLSaturday 565 Bucharest

Performance Counters Extremely cheap for a process to produce performance counters Increment a memory location in a shared memory area Extremely cheap for monitoring to read a value Read the value via shared memory Low impact Rich toolset SDK: .Net, PDH native OS service for collecting them (Data Collection Sets) perfmon.exe, logman.exe, typeperf.exe Can write directly to SQL (and this is supported by Data Collection Sets) PowerShell supports direct counter querying Get-Counter Data Collector Sets must go through COM object Pla.DataCollectorSet 24 Sept 2017 SQLSaturday 565 Bucharest

Performance Counters tools perfmon.exe Interactive GUI for Data Collector Sets management Interactive GUI for on-demand counters collection Graphic visualization for both real time and historic logs logman.exe CLI for Data Collector Sets (counters, ETW, alerts) Countless 3rd party tools DMV sys.dm_os_performance_counters I advice against it’s use: expensive to query, difficult to read correctly Has the advantage of being available over TDS 24 Sept 2017 SQLSaturday 565 Bucharest

Performance Counters SQL logging option Read and Write counter values into an ODBC defined destination Define an ODBC 64bit System DSN, specify database name explicitly A feature of PDH, available for all PDH consumers SDK: PdhOpenLog (…, PDH_LOG_TYPE_SQL, …) perfmon, logman, typeperf On first connect it will deploy the SQL tables Schema is documented at https://msdn.microsoft.com/en-us/library/windows/desktop/aa373198(v=vs.85).aspx CounterData table is a perfect columnstore candidate 24 Sept 2017 SQLSaturday 565 Bucharest

SQL Logging table schema 24 Sept 2017 SQLSaturday 565 Bucharest

Interpreting the SQL logged data 24 Sept 2017 12 | SQLSaturday 565 Bucharest

Reading SQL logged counters using Perfmon 24 Sept 2017 SQLSaturday 565 Bucharest

What to collect: CPU Processor (_Total)\% Processor Time Processor Object(*) seldom justifies the overhead Processor (_Total)\% Priviledged Time Process(*)\% Processor Time Can help track down when other processes starve CPU Expensive to collect, each process is a separate instance Collect Process(sqlservr)\% Processor Time and Process(_Total)\% Processor Time can at least shift the blame, but not pinpoint the culprit. Buffer Manager\Page lookups/sec Indicative of scans, it helps explain high CPU 24 Sept 2017 SQLSaturday 565 Bucharest

What to collect: memory (OS) Memory\Page Reads/sec This are hard page faults. Page Faults\sec is soft faults. Memory\% Committed Bytes In Use Memory\Available Bytes Memory\Commit Limit Process(sqlservr)\Private Bytes 24 Sept 2017 SQLSaturday 565 Bucharest

What to collect: memory (SQL) Memory Manager\* Really: collect every counter if you can afford it. No magic formula for ‘good’ vs. ‘bad’ values, but can be compared with baseline Memory Grants Outstanding Memory Grants Pending Buffer Manager\Buffer cache hit ratio Buffer Node(*)\Page Life Expectancy 24 Sept 2017 SQLSaturday 565 Bucharest

What to collect: IO (OS) Process(sqlservr)\ IO Read Operations/sec IO Read Bytes/sec IO Write Operations/sec IO Write Bytes/sec The counters measure all IO (disk, network, devices) Physical Disk\ What to capture and how to interpret it is a black art Windows Performance Monitor Disk Counters Explained https://blogs.technet.microsoft.com/askcore/2012/03/16/windows-performance-monitor-disk-counters-explained/ ‘Good’ vs. ‘Bad’ values are highly dependent on hardware Memory\Pages/sec 24 Sept 2017 SQLSaturday 565 Bucharest

What to collect: IO (SQL) Buffer Manager/ Page reads/sec Readahead pages/sec Page writes/sec Background writer pages/sec Checkpoint pages/sec Extension page reads/sec Extension pages writes/sec Lazy writes/sec Database(*)/ Log Bytes Flushed/sec Backup/Restore Throughput/sec 24 Sept 2017 SQLSaturday 565 Bucharest

Understanding how SQL Server executes a query http://rusanu TL/DR: CPU CPU CPU CPU Wait Wait Wait Time 24 Sept 2017 SQLSaturday 565 Bucharest

What to collect: blocking (the aspiration slide) sys.dm_os_wait_stats sys.dm_os_latch_stats sys.dm_os_spinlock_stats TODO: fill up when decent monitoring possible Ever Increasing values Require snapshot-store-and-compare Can be reset, difficult to detect in processing logic No differentiation between idle wait and busy wait resulting in tribal knowledge ‘benign waits’ However, still important to collect… somehow sys.dm_session_wait_stats 24 Sept 2017 SQLSaturday 565 Bucharest

What to collect: blocking (the pragmatic slide) Wait Statistics(*)\Average wait time (ms) A performance counter, easy to collect and analyze There is an instance per wait type but only few selected wait types are represented Latches\Average Latch Wait Time (ms) Latches\Total Latch Wait Time (ms) Locks\Average Wait Time (ms) Locks\Number of Deadlocks/sec General Statistics\Processes blocked 24 Sept 2017 SQLSaturday 565 Bucharest

What to collect: workload SQL Statistics\Batch Requests/sec SQL Statistics\SQL Attention rate ‘Attention’ is the TDS jargon for client command timeout Transactions\Transactions General Statistics\User Connections SQL Errors(_Total)\Errors/sec 24 Sept 2017 SQLSaturday 565 Bucharest

What to collect: miscelaneous LogicalDisk(*)\% Free Space Databases(*)\Data File(s) Size (KB) Databases(*)\Log Growths Plan Cache(*)\Cache Objects Count General Statistics\Temp tables creation rate Access Methods Full scans/sec, Probe Scans/sec, Range Scans/sec, Index Searches/sec Page Splits/sec Skipped Ghosted Records/sec, Forwarded Records/sec 24 Sept 2017 SQLSaturday 565 Bucharest

Measure workload response distribution Batch Resp Time(*)\* “one latency distribution plot is worth a thousand throughput measurements” Typical SQL Server has a heterogeneous workload X batches complete in .01s Y in 1s and Z in 100.0 Is it one query over a latency distribution? Or is it 3 very different queries? Latency for specific queries better measured in app It is trivial to expose new counters from apps http://rusanu.com/2009/04/11/using-xslt-to-generate-performance-counters-code/ 24 Sept 2017 SQLSaturday 565 Bucharest

Collecting via querying Collecting data by periodically querying catalog views/DMVs Expensive, some DMVs have serious performance overhead Snapshot-store-and-compare Many DMVs arbitrarily reset internally, unreliable for monitoring Some information, still, hard to discover any other way sys.dm_io_virtual_file_stats sys.dm_db_index_usage_stats 24 Sept 2017 SQLSaturday 565 Bucharest

Collecting query execution stats Don’t. Use Query Store instead. But if you must: sys.dm_exec_query_stats sys.dm_exec_procedure_stats Avoid the join with sys.dm_exec_sql_text and sys.dm_exec_sql_plan Very expensive Honor query_hash and plan_hash 24 Sept 2017 SQLSaturday 565 Bucharest

Using Event Notifications create queue notifications; go create service notifications on queue notifications ([http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]); create event notification [sqlsaturday] on server for ddl_events to service N'notifications', N'current database'; waitfor(receive cast(message_body as xml) from notifications); 24 Sept 2017 SQLSaturday 565 Bucharest

DDL_EVENTS Captures any DDL, in any database, for any type CREATE/ALTER/DROP GRANT/DENY/REVOKE sp_rename, sp_tableoptions etc Captures any configuration change sp_configure Database scoped options It can be made more granular if desired Capture only specific database Capture only specific events: select * from sys.server_events 24 Sept 2017 SQLSaturday 565 Bucharest

Event Notifications for Profiler events Bridges Profiler events as Event Notification messages select * from sys.trace_events It can do everything administrative trace can do plus: Deliver the message remotely Trigger activated procedure 24 Sept 2017 SQLSaturday 565 Bucharest

Collect all warnings? File Growth? Deadlocks? Logins? create event notification [trace_events] on server for Hash_Warning, Execution_Warnings, Sort_Warnings, Bitmap_Warning, Log_File_Auto_Grow, Deadlock_graph, Audit_Login_Failed, Audit_Login to service … 24 Sept 2017 SQLSaturday 565 Bucharest

Collecting Event Notification Events can be delivered remotely All events look the same, an XML payload Must shred the XML and discover object types, object names and event types Some events apply to multiple objects eg. CREATE INDEX Coupled with Activation can trigger warnings or mitigation Reliable delivery, can survive disconnects Also means that events always refer to past, automated mitigation must check current state before proceeding No toolset whatsoever 24 Sept 2017 SQLSaturday 565 Bucharest

Query Data Store Absolutely best option for monitoring query performance Optimized, hooked deep into execution Cannot be simulated with DMV snapshotting Excellent troubleshooting tool Runtime stats, Compilation Stats 24 Sept 2017 SQLSaturday 565 Bucharest

Query Store missing features Collection of wait info Centralized aggregation of multiple sources Collect info from readable secondaries 24 Sept 2017 SQLSaturday 565 Bucharest

XEvents Overwhelmingly rich information Cheap to produce Built in analytical capabilities Counter, Histogram, Pair Matching Difficult to collect Require ETL through a SQL Server Poor toolset 24 Sept 2017 SQLSaturday 565 Bucharest

ETW, Windows Performance Analyzer Very powerful for analyzing the entire stack A must when the issue is not SQL Server Quick Start Guide: WPA Basics https://msdn.microsoft.com/en-us/library/ff190975.aspx Bruce Dawson blog: https://randomascii.wordpress.com/ 24 Sept 2017 SQLSaturday 565 Bucharest

Sampling Profiling Requires Visual Studio toolset vsperf.exe Captures execution stacks on all threads, about 10k/sec Only as an on-demand troubleshooting option Difficult to setup Can have impact Incredibly rich insight into what the server is doing Provided you manage to resolve the symbols… Requires code understanding Educated guess of execution role from function names 24 Sept 2017 SQLSaturday 565 Bucharest

Sqlservr sample profile 24 Sept 2017 SQLSaturday 565 Bucharest

Tanks, Q&A and please review http://speakerscore.com/ZFJ9 24 Sept 2017 SQLSaturday 565 Bucharest