Automating Performance … Joe Chang SolidQ

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Understanding SQL Server Query Execution Plans
SQL Server performance tuning basics
© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.
Modern Performance - SQL Server
Big Data Working with Terabytes in SQL Server Andrew Novick
Statistics That Need Special Attention Joe Chang yahoo
SQL Performance 2011/12 Joe Chang, SolidQ
Module 17 Tracing Access to SQL Server 2008 R2. Module Overview Capturing Activity using SQL Server Profiler Improving Performance with the Database Engine.
10 Things Not To Do With SQL SQLBits 7. Some things you shouldn’t do.
Comprehensive Performance with Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
Modern Performance - SQL Server Joe Chang & SolidQ.
Module 6 Implementing Table Structures in SQL Server ®2008 R2.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo Slide deck here.
Virtual techdays INDIA │ 9-11 February 2011 SQL 2008 Query Tuning Praveen Srivatsa │ Principal SME – StudyDesk91 │ Director, AsthraSoft Consulting │ Microsoft.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
SQL Server Query Optimizer Cost Formulas Joe Chang
Denny Cherry twitter.com/mrdenny.
Parallel Execution Plans Joe Chang
SQL Server 2005 Performance Enhancements for Large Queries Joe Chang
#SQLSatRiyadh Special Topics Joe Chang
Troubleshooting SQL Server Enterprise Geodatabase Performance Issues
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Access Path Selection in a Relational Database Management System Selinger et al.
Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo Slide deck here.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Insert, Update & Delete Performance Joe Chang
Table Indexing for the.NET Developer Denny Cherry twitter.com/mrdenny.
Parallel Execution Plans Joe Chang
Large Data Operations Joe Chang
Parallel Execution Plans Joe Chang
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
TPC-H Studies Joe Chang
Denny Cherry twitter.com/mrdenny.
Query Optimizer Execution Plan Cost Model Joe Chang
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
SQL SERVER DAYS 2011 Table Indexing for the.NET Developer Denny Cherry twitter.com/mrdenny.
Session 1 Module 1: Introduction to Data Integrity
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
8 Copyright © 2005, Oracle. All rights reserved. Gathering Statistics.
How to kill SQL Server Performance Håkan Winther.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
This document is provided for informational purposes only and Microsoft makes no warranties, either express or implied, in this document. Information.
Session Name Pelin ATICI SQL Premier Field Engineer.
SQL Server Statistics and its relationship with Query Optimizer
Chris Index Feng Shui Chris
Tuning Transact-SQL Queries
UFC #1433 In-Memory tables 2014 vs 2016
Joe Chang yahoo . com qdpma.com
Introduction to Execution Plans
Cardinality Estimator 2014/2016
Predictive Performance
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Statistics: What are they and How do I use them
Joe Chang yahoo Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
SQL Server Query Optimizer Cost Formulas
Four Rules For Columnstore Query Performance
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Introduction to Execution Plans
Introduction to Execution Plans
Introduction to Execution Plans
Presentation transcript:

Automating Performance … Joe Chang SolidQ

SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL execution plan operations (2003?) Database with distribution statistics only, no data (2004?) Decoding statblob/stats_stream – writing your own statistics Disk IO cost structure Tools for system monitoring, execution plan analysis etc About Joe

Why is performance still important today Performance Tuning Elements Automating Performance data collection & analysis What can be automated What still needs to be done by you! SQL Server Engine What every Developer/DBA needs to known Overview

Past – some day, servers will be so powerful that we don’t have to worry about performance (and that annoying consultant) Today we have powerful servers – X overkill* cores, each 10X over Pentium II 400MHz 1TB memory (64 x 16GB DIMMs, $400 each) Essentially unlimited IOPS, bandwidth 10+GB/s (Unless the SAN vendor configured your storage system) What can go wrong? Performance – Past, Present and ? * Except for VM

Ex 1 Parameter – column type mismatch nvarchar(25) = N'Customer# ' SELECT * FROM CUSTOMER WHERE C_NAME SELECT * FROM CUSTOMER WHERE C_NAME =

Example 2 – Multi-optional SARG int = 1 SELECT * FROM LINEITEM WHERE IS NULL OR L_ORDERKEY AND IS NULL OR L_PARTKEY AND IS NOT NULL IS NOT NULL)

Example 3 – Function on column, SARG SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE YEAR(L_SHIPDATE) = 1995 AND MONTH(L_SHIPDATE) = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN ' ' AND ' '

int = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE AND

Example 4 – Parameter sniffing -- first call, procedure compiles with these parameters exec = = ' ' -- subsequent calls, procedure executes with original plan exec = = ' '

Parameter mismatch – parameter type over column SQL search argument cannot be identified/optimized Search argument: function (column) Compile parameter & parameter range etc Impact is easily X or more Summary of serious problems

Query Execution Statistics Index Usage Statistics (Op stats, missing indexes) Execution plans including compile parameters Performance Data

From SQL Server 2005 on dm_exec_query_stats & related dm_exec_sql_text, dm_exec_text_query_plan & related (XML output) dm_db_index_usage_stats & related Performance DMVs and DMFs

Dm_exec_query_stats Execution count, CPU, duration, Phy reads, Log Wr, Min/Max Potentially 1M+ rows Sorting can be expensive Far fewer entries with total_worker_time > 1000 micro-sec Find top SQL Get execution plan, then work on it Query Execution Statistics

Index Usage Stats Index level, usage stats but no waits Index Operational Stats Index & Partition level + wait stats Index Physical Stats Useful? But full index rebuilds can be quicker Missing Index Index DMVs

Compile cost – cpu, time, memory Indexes used, tables scanned Seek predicates Predicates Compile parameter values Execution Plans - XML

Analyze execution plans for (almost) entire query stats Or all stored procedures Index used by SQL What is implication of changing cluster key Consolidate infrequently used indexes Full Execution Plan Analysis

Generate estimated execution plans for all stored procedures Functions Triggers? Maintain a list of SQL to be executed with actual execution plans Actual versus estimated row count, number of executions Actual CPU & duration Parallelism – distribution of rows Triggers etc Other Performance Data options

Find top SQL Profiler/Trace Query Execution Stats – sys.dm_exec_query_stat Currently running SQL – sys.dm_exec_requests etc Get SQL & Execution plan (DMF) Rewrite SQL or re-index Index usage statistics Consolidate indexes with same leading keys Drop unused indexes? Index and Statistics maintenance Simple Performance Tuning No automation required Blindly applying indexes from missing IX DMV not recommended

What is minimum set of good indexes? Can 2 Indexes with keys 1) ColA, ColB and 2) ColB, ColA be consolidated? Infrequently used indexes – is it just for off-hours query? What procedures/SQL uses each index? What Advanced Performance

Always bad Performance slowly degrades over time Probably related to fragmentation or unreclaimed space Best test is if index rebuild significantly reduces space Could be execution plan with scan, and size is growing Sudden change: good to bad, bad to good Probably compile parameter values or statistics Performance Problem Classification

Compile parameters Data distribution statistics update periodicity Sample size Indexes Dead space bloat Fragmentation less important? Natural changes in data size & distribution Maintaining Performance

Performance Information Query Execution Stats Index Usage Stats Execution Plans

Statistics – sampling percentage, update policy ETL may need statistics updated at key steps AND/OR combinations EXISTS/NOT EXISTS combinations Complex SQL, sub-expressions Row count estimation propagation errors What else can go wrong in a big way

Range-high key, equal rows, Range rows, Avg RR Sampling – random pages, all rows Sampling percentage for reasonable accuracy based on true random row sample Correlation between value and page? Updates triggered at 6, 500, and every 20% modified Range and boundary What if compile parameter is outside boundary when stats were updated? Statistics

Consider custom strategy for ETL, etc Seriously bad execution plan

OR condition on different tables SELECT O_CUSTKEY, O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = OR O_CUSTKEY =

OR versus UNION SELECT O_CUSTKEY, O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = UNION -- ALL SELECT O_CUSTKEY, O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE O_CUSTKEY = Above UNION SQL requires sort operation – cheap for few rows or narrow columns

Compile cost – number of indexes, join types, join orders etc Propagating row estimation errors Splitting with temp table Overhead of create table, insert Reduced compile cost Statistics recomputed for temp tables at 6 and 500 rows, and 20% Complex SQL with sub-expressions

sys.configurations (sp_configure) defaults Cost threshold for parallelism 5 Max degree of parallelism0 (unlimited) Problem – overhead for starting threads no considered 4 sockets, 10 cores each + HT => DOP 80 is possible Option Cost Threshold to MaxDOP to 4 (for default queries) Explicit OPTION (MAXDOP n) for known big queries Parallel Execution Strategy

Performance is still important Automating performance data collection is easy Why an execution plan may changed with serious consequences Available tools cannot automate diagnosis of performance problems This could be done? Full SQL – index usage cross-reference Optimized index set Summary