Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo Slide deck here.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Understanding SQL Server Query Execution Plans
Chapter 9. Performance Management Enterprise wide endeavor Research and ascertain all performance problems – not just DBMS Five factors influence DB performance.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Statistics That Need Special Attention Joe Chang yahoo
SQL Performance 2011/12 Joe Chang, SolidQ
Automating Performance … Joe Chang SolidQ
10 Things Not To Do With SQL SQLBits 7. Some things you shouldn’t do.
Comprehensive Performance with Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
Virtual techdays INDIA │ 9-11 February 2011 SQL 2008 Query Tuning Praveen Srivatsa │ Principal SME – StudyDesk91 │ Director, AsthraSoft Consulting │ Microsoft.
SQL Server Query Optimizer Cost Formulas Joe Chang
Parallel Execution Plans Joe Chang
Troubleshooting SQL Server Enterprise Geodatabase Performance Issues
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 Robert Wijnbelt Health Check your Database A Performance Tuning Methodology.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo Slide deck here.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Insert, Update & Delete Performance Joe Chang
Applications hitting a wall today with SQL Server Locking/Latching Scale-up Throughput or latency SLA Applications which do not use SQL Server.
Parallel Execution Plans Joe Chang
By Shanna Epstein IS 257 September 16, Cnet.com Provides information, tools, and advice to help customers decide what to buy and how to get the.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
TPC-H Studies Joe Chang
Query Optimizer Execution Plan Cost Model Joe Chang
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
7 Strategies for Extracting, Transforming, and Loading.
Connect with life Nauzad Kapadia Quartz Systems
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
Stored Procedure Optimization Preventing SP Time Out Delay Deadlocking More DiskReads By: Nix.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Chap 5. Disk IO Distribution Chap 6. Index Architecture Written by Yong-soon Kwon Summerized By Sungchan IDS Lab
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Copyright Sammamish Software Services All rights reserved. 1 Prog 140  SQL Server Performance Monitoring and Tuning.
Dave LinkedIn
How to kill SQL Server Performance Håkan Winther.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
SQL Basics Review Reviewing what we’ve learned so far…….
Module 6: Creating and Maintaining Indexes. Overview Creating Indexes Understanding Index Creation Options Maintaining Indexes Introducing Statistics.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
Improve query performance with the new SQL Server 2016 query store!! Michelle Gutzait Principal Consultant at
Session Name Pelin ATICI SQL Premier Field Engineer.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.
SQL Server Statistics and its relationship with Query Optimizer
Query Optimization Techniques
Stored Procedures – Facts and Myths
Query Tuning without Production Data
UFC #1433 In-Memory tables 2014 vs 2016
Query Tuning without Production Data
Joe Chang yahoo . com qdpma.com
Introduction to Execution Plans
Query Optimization Techniques
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Statistics: What are they and How do I use them
Steve Hood SimpleSQLServer.com
Joe Chang yahoo Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
SQL Server Query Optimizer Cost Formulas
Introduction to Execution Plans
Diving into Query Execution Plans
Introduction to Execution Plans
Query Optimization Techniques
Introduction to Execution Plans
Advanced Database Topics
Presentation transcript:

Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo Slide deck here

About Joe SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, – no data 2004 Decoding statblob-stats_stream – writing your own statistics Disk IO cost structure Tools for system monitoring, execution plan analysis See ExecStats Download: Blog:

What is not in this session List of rules – to be followed blindly without consideration for the underlying reason – and whether rule actually applies in the current circumstance DBA skill: cause and effect analysis & assessment - Not enthusiastic, prolific, indiscriminate collector of rules

Why this topic on Indexing? Ideal – Good indexes, but no more than necessary Reality – Too many nonclustered indexes How did this happen? – Poor choice of cluster key (my opinion) – Indexes added one at a time for specific query Instead of modifying similar existing index – DBA not inclined to remove indexes Afraid to touch the cluster key Indexes can be created or dropped for specific operations

Going from Current to Good Indexes Determine the better choice of cluster key – Can eliminate many nonclustered index by itself Not hard to determine, but down time required Drop indexes not used (over long period) Consolidate indexes – with leading keys in the same order – Determine if indexes with same keys in different order can be consolidated Need to find SQL that uses each index? Simple Does this involve work/risk? I don’t like work/risk! Execution plan links SQL to index usage

Notes Complexities & depth SQL performance – Cause and Effect Focus on the execution plan – Inefficient plans – missing indexes – very large estimate/actual row discrepancies Comprehensive Index Strategy – few good indexes, but no more than necessary

Preliminary: Correct Results Normalization – Data stored once, avoid anomalies Unique Keys – Avoid duplicate rows Foreign Keys – Avoid orphaned rows Incorrect architecture requires use of SELECT DISTINCT etc. to correct architecture deficiencies Which may cause performance problems as well Correct action is to address the architecture mistakes before the performance issue.

Performance Big Picture Natural keys with unique indexes, not SQL The Execution Plan links all the elements of performance Index tuning alone has limited value Over indexing can cause problems as well Index and Statistics maintenance policy 1 Logic may need more than one execution plan? Compile cost versus execution cost? Tables and SQL combined implement business logic Plan cache bloat? SQL Tables keys & const Indexes Execution Plan Statistics Sampling & Re-compute Compile parameters & variables Storage Engine Hardware DOP Memory Parallel plans Recompile temp table / table variable Query Optimizer Index & Stats Maintenance API Server Cursors: open, prepare, execute, close? SET NO COUNT Information messages Row estimate propagation errors

Indexing Principles Good cluster key choice – Grouping + unique, not too wide Good nonclustered indexes – For key queries, not necessarily every query – Covered indexes where practical – Create and drop custom indexes for maintenance ops/special circum. No more indexes than necessary – Update overhead – Compile overhead – May tolerate occasional scans to avoid update maintenance Note emphasis on good, not perfect

Using DMVs – Execution Plan dm_exec_query_stats dm_exec_sql_text dm_exec_query_plan dm_exec_text_query_plan dm_db_index_usage_stats dm_db_index_operational_stats dm_db_index_physical_stats DBCC SHOW_STATISTICS STATS_DATE (object_id, stats_id) dm_db_stats_properties Execution Plan Indexes, joins Compile parameters System views Indexes, key columns, Include list, filter, XML, Columns store etc. sys.dm_db_stats_properties, is available in SQL Server 2012 starting with Service Pack 1 and in SQL Server 2008 R2 starting with SP2. last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter dm_exec_query_profiles 2014 Real time query progress?

Execution Plan maps SQL to Indexes SQL Indexes Execution Plan dm_db_index_usage_stats dm_exec_query_stats sql_handle & plan_handle dm_exec_sql_text dm_exec_query_plan dm_exec_text_query_plan

Performance Oriented Approach Getting Top SQL from dm_exec_query_stats – Manually examining top execution plans Index Reduction – dm_db_index_usage_stats – Drop unused indexes (based on long period) – Consolidating indexes with similar keys – Infrequently used indexes? Must hunt down SQL, possibly low item in query stats Can it use another index?

Systematic Approach Get full list of: – stored procedures : schema + name – Scalar Functions (FN) schema + name – Inline & Table Valued Functions (IF,TF) Need parameter list – Triggers (should be obsolete, my opinion) Generate execution plan – match to indexes – Alternative, maintain a list of SQL

Real World Example July B rows, 3.4TB, 1.5TB data1.8TB indexes Key tables have 21, 8, 14, 4 and 14 nonclustered indexes

Index Reduction Key tables have 6, 4, 3, 2 and 4 nonclustered indexes (some nonclustered indexes compressed) Dec B rows (30%), 2.3TB, 1.7TB data0.5TB indexes

Compression – base tables May B rows, 0.99TB, 0.62TB data0.36TB indexes Dec B rows, 2.3TB, 1.7TB data0.5TB indexes

Table – 21 nonclustered Indexes Note: All indexes are full rows, not filtered

Table with 21 NC Indexes Jul frequently used NC indexes of 21

Dec – 6 Nonclustered Indexes Infrequently used indexes could probably be removed by re-working the query Jul frequently used NC indexes of 21 Dec frequently used, 2 Filtered IX, note lead column

May 2015 – 2 NC used May frequently used NC, 1 disabled pending removal Jul frequently used NC indexes of 21 Dec frequently used, 2 Filtered IX, note lead column

Note on Filtered Index Strategy Query is SELECT xx FROM NARSplit WHERE IsActive = 1 AND CessionId IN (list) Then the index strategy used is CREATE INDEX IX ON table (CessionId, xx, IsActive) WHERE IsActive = 1

Balancing Select vs Update In the above table, the nonclustered index with several included columns nearly eliminates key lookups One column, IsActive, in the include list was frequently updated Removing that column reduces need for update maintenance Since a key lookup is needed anyways, may as well remove all include columns

Compression Notes Very high compression was achieved – Because all keys were 16-byte GUID – Even on dimensions, when natural key would have been 1, 2 or 4 bytes! Core data + indexes – 648GB data230GB indexes w/o compression – 174GB data185GB indexes w/compression Reduction in I/O even with SSD storage far outweigh compression overhead! System memory: 256GB (220) Storage: Violin (NAND Flash)

Violin Compression & statistics Index reduction HDD Storage GDC

Index Theory - Locality Example Database – 10 B rows, 80 bytes per row, 800GB, 100M pages (100M x 8KB/page = 800GB, 100 rows per page) Suppose 1% of rows are active, i.e. 100M rows – There could be 1 active row in each page (100M) Possible if each table were clustered on a row guid – Possible for active rows to be in only 1M pages All rows in each of these page happen to be active Build the cluster key to tend toward 2 nd option

Index Key SARG + Group/Order SELECT TransactionType, SUM(Amount) FROM Table WHERE ReportDate = ‘value’ AND other SARGS GROUP BY TransactionType Index should lead with the key SARG, then Group or Order, less selective SARGs can be in Include List Selective SARG Grouping

Index Key SARG + Order By SELECT TransactionType, SUM(Amount) FROM Table WHERE ReportDate = ‘value’ AND other SARGS GROUP BY TransactionType Index should lead with the key SARG, then Group or Order, less selective SARGs can be in Include List Selective SARG Grouping

Index Example w & w/o Partitioning Without partitioning Required indexes leading with: 1) ReportDate - for grouping 2) RowId - for single row access With partitioning Partition as CREATE UNIQUE CLUSTERED INDEX UCX ON Table (RowId, ReportDate) ON psdate (ReportDate) Partition key is not lead key of cluster index Search on RowId must check each partition Oracle Skip-Scan would be nice

ExecStats

Database view

File IO view

Table view columns

Indexes - continued Number of execution plans that reference the index in Seeks, Scans, Lookups, Insert/Updates and Deletes Literal identifying the execution plans that reference the index in Seeks, Scans, Lookups, Insert/Upd & Deletes

Query Execution Stats - 1

Query Execution Stats - more

Dataspace – Partition Scheme view Partition View

Procedure and Functions Columns Dbid, schema, object, object_id, type, Create date, modify date, Number of references (NumRef) (literal) plan reference (from QExec Stats) Caller reference (Functions only)

Volumes

Slides not used

Performance Strategy Tables – support business logic – Normalization, uniqueness etc. SQL – clear SARG, Query optimizer interpretable – 1 Logic maps to X Execution plans Indexes – good cluster key choice – Good nonclustered indexes, no more than necessary Statistics – sample strategy & update frequency Compile parameter strategy Temp table / Table variable strategy: Recompile & Row est. prop. error Parallel execution plans: DOP and CTOP strategy Identity key / alternative: large & small customers

Identify (weight) important SQL statements – stored procedure: parameter values & code path Recompile impact for temp tables Execution plan cross references SQL & indexes – Actual plan is better than estimate plan – Compile parameters & skewed statistics Temp tables - Recompile impact Automate Execution Plan analysis to fully cross-reference SQL to index usage

SQL & Execution Plan Sources Estimated Execution Plan – dm_exec_query_stats Contents of plan cache + execution statistics – List of stored procedures SELECT name FROM sys.procedures Any SQL list – Plans not in cache, to be generated – Can also execute SQL for actual plans

sys.dm_exec_query_stats sql_handle – token for batch or stored procedure statement_start_offset – sql_handle + offset = SQL statement plan_handle – SQL (batch) can have multiple plans on recompile query_hash – identify queries with similar logic, – differing only by literal values

sys.procedures Get list of stored procedures in database – functions are called from procedure? Generate estimated execution plan for each – Default parameters Full map of index usage to stored procedure No trigger details in estimated plan

SQL List Configuration file has SQL to retrieve SQL list – Can be explicit SQL or stored procedures with parameters – Same procedure, multiple parameter set To expose different code path (actual plan) EXEC proc RECOMPILE (estimated plan)

About ExecStats General information Execution plan sources 1.dm_exec_query_stats 2.list of all stored procedures (estimated) 3.List of SQL in table (estimated or actual plan) 4.Trace file Correlates execution plans to index usage Procedures, functions and triggers Rollup file IO stats by DB, filegroup, disk/vol, data/log Distribution Statistics Output to Excel, sqlplan file, (sql in txt file)

ExecStats Output Files Txt – runtime info Log – abbreviated SQL error logs Excel – Missing Indexes DMV SQL plan directory This can be sent to someone who can identify and fix your problem

Important Items Query cost – plan efficiency? Recompiles? – Compile parameters – skewed statistics CPU versus Duration (worker – elapsed time) – Disk IO, network transmission, parallel plan? Execution count – network roundtrip? Plan cost – Parallelism – High volume of quick queries is bad, so is excessive DOP Index – current rows, rows at time stats generated, sample rows & date

Execution Plans estimate - actual Actual: estimated cost, actual rows, DOP – Compile parameters – Actual rows/executions versus estimated Execute stored procedure once for each possible code path – with appropriate parameters

Execution Plans Analysis Predicate – index key columns does not matching full SARG – SQL has function on SARG, data type mismatch – Compile parameters & statistics – Actual and Estimated rows/execution mismatch – Large table scans: how many rows output? – Rebinds and Rewinds – key lookup – Parallelism

Execution Plans Pay attention to: – Compile parameters – Large table scans: how many rows output? – Predicate search condition without suitable index – Rebinds and Rewinds – key lookup – Parallelism

Index Usage – missing IX, excess IX? Index usage – seek, scan, lookup & update – Unused indexes (infrequent code?) can be dropped – Infrequent usage: check plan references – Similar indexes (leading keys) Same keys, different order Check plan reference – consolidate if possible Scans to large tables or even nonclustered IX – Is it real (SELECT TOP 1 may not be a real scan) Lookups – can these be reduced?

SQL Server Skills & Roles Developers SQL code Architect Table structure, unique keys Data Architect normalization DBA Index + Statistics Maintenance Hardware & Storage Performance.

SQL Server Performance History Before DMVs (SQL Server 2000) – Profiler/Trace to get top SQL – Execution plans – not really exportable – Which indexes are actually used? Today – Trace/Extended Events sometimes not necessary If the dm_exec_query_stats content is good – Execution plans are exportable – Index Usage Stats

How much can be automated? Data collectionall, of course – Top resource consumers, etc. Assessment sometimes – Is there a problem – Can it be fixed or improved Fix/Changesometimes – Indexes – SQL – sometimes – Table structure, architectureno If problems could be solved by pushing a button, what would be the skill requirements to be a DBA? Great accomplishments – 99% perspiration 1% inspiration

Performance Approaches Check against list of “Best Practices” Manual DMV scripts approach – Find Top 5 or 10 SQL – Fix it if/when there is a problem All Indexes and procedures/SQL – Examine the complete set of stored procedures – Or the full list of SQL statements – Good indexes for all SQL, no more indexes than

Why bother when there are no problems? No problems for over 1 year – Never bothered to collect performance baseline Problem Today – Find it with DMV, fix it – the problem was xxx – but why did it occur today & not before? Probably statistics or compile parameters, but prove it? Why ExecStats – SQL scripts? – too much manual work – Third party tools? – only find problem

Rigorous Optimization Table structure, SQL, Client-side Cluster Key Good (nonclustered) Indexes – All indexes are actually used No more indexes than necessary – Consolidate similar indexes same keys, same order, or reverse order? – What SQL is impacted? Statistics update Index maintenance Must consider the full set of SQL/procedures in removing indexes?

SQL versus programming languages SQL – great for data access – Not good for everything else – When SQL becomes horribly complicated – What would the code looks like in VB/Java/Cxx Client-side program C#

Performance Information Server, Storage OS & SQL Server Settings SQL Server – SQL, query execution statistics, execution plan – Compile parameters – Indexes and index usage statistics – Statistics sampling – when? percentage? skew?