Joe Chang yahoo . com qdpma.com

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Understanding SQL Server Query Execution Plans
Statistics That Need Special Attention Joe Chang yahoo
SQL Performance 2011/12 Joe Chang, SolidQ
Automating Performance … Joe Chang SolidQ
Comprehensive Performance with Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
Module 13: Optimizing Query Performance. Overview Introduction to the Query Optimizer Obtaining Execution Plan Information Using an Index to Cover a Query.
SQL Server Query Optimizer Cost Formulas Joe Chang
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Module 12: Optimizing Query Performance. Overview Introducing the Query Optimizer Tuning Performance Using SQL Utilities Using an Index to Cover a Query.
Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo Slide deck here.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Parallel Execution Plans Joe Chang
Large Data Operations Joe Chang
Parallel Execution Plans Joe Chang
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
TPC-H Studies Joe Chang
Microsoft AREC TAM Internship SQL Server Performance Tuning(I) Haijun Yang AREC SQL Support Team Feb, SQL Server 2000.
Query Optimizer Execution Plan Cost Model Joe Chang
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
Dave LinkedIn
How to kill SQL Server Performance Håkan Winther.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
APRIL 13 th Introduction About me Duško Mirković 7 years of experience.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL Server Statistics and its relationship with Query Optimizer
Chris Index Feng Shui Chris
Practical Database Design and Tuning
Module 11: File Structure
Indexing Structures for Files and Physical Database Design
SQL Server Statistics 101 Travis Whitley Senior Consultant, Oakwood Systems whitleysql.wordpress.com.
Tree-Structured Indexes
Database Performance Tuning and Query Optimization
Blazing-Fast Performance:
Introduction to Execution Plans
Examples of Physical Query Plan Alternatives
Cardinality Estimator 2014/2016
Azure SQL Data Warehouse Performance Tuning
Statistics What are the chances
Physical Database Design
Execution Plans Demystified
Statistics: What are they and How do I use them
Practical Database Design and Tuning
Reading Execution Plans Successfully
Selected Topics: External Sorting, Join Algorithms, …
Joe Chang yahoo Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
Microsoft SQL Server 2014 for Oracle DBAs Module 7
Database systems Lecture 6 – Indexes
Tree-Structured Indexes
SQL Server Query Optimizer Cost Formulas
Query Processing CSD305 Advanced Databases.
Introduction to Execution Plans
Chapter 11 Database Performance Tuning and Query Optimization
Evaluation of Relational Operations: Other Techniques
Diving into Query Execution Plans
A – Pre Join Indexes.
Introduction to Execution Plans
Introduction to Execution Plans
All about Indexes Gail Shaw.
Presentation transcript:

Joe Chang Jchang6 @ yahoo . com qdpma.com Indexing Joe Chang Jchang6 @ yahoo . com qdpma.com

About Joe SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data (2004) Decoding statblob/stats_stream write your own stats Disk IO cost structure Tools for system monitoring, execution plan analysis Freelance consultant since 1999, specializing in SQL Server performance. Reversed engineered the SQL Server query optimizer cost formulas (2001). Database with no data, but having the data distribution statistics from a production system. Automated index and execution plan cross reference analysis on www.qdpma.com (ExecStats). Indexing is one of foundations of databases, and is taught at the beginner level books? Unfortunately most of this is not entirely correct or on solid basis, and so it is important to learn what is true. What is usually taught is that selectivity is most important. In fact grouping is as Important. See http://www.qdpma.com/ Download: http://www.qdpma.com/ExecStatsZip.html Blog: http://sqlblog.com/blogs/joe_chang/default.aspx

Indexing Fundamental topic covered in most Intro SQL Index key must be highly selective Or it won’t be used But its not entirely correct

TPC-C database schema Examples are based on TPC-C tables Warehouse w_id 1:10 District d_w_id d_id 1:3000 Customer c_w_id c_d_id c_id Orders o_w_id o_d_id o_c_id o_id history h_c_w_id h_date h_c_d_id h_c_id h_amount 1:10 Order_line ol_w_id ol_d_id ol_c_id ol_o_id ol_id Examples are based on TPC-C tables

Nonclustered index details CREATE CLUSTERED INDEX (Col1, Col2) CREATE INDEX IX ON Table (Col3, Col4) INCLUDE(C5) Explicit keys: Col3, Col4, Implicit keys: Col1, Col2 Full key: Col3, Col4, Col1, Col2 If one or more clustered index key columns are part of the explicit nonclustered index key, then other cluster key are implicit

Index Seek Examples Clustered index seek nonclustered index seek, no key lookup nonclustered index seek, + key lookup for columns not in nonclustered index Table scan – when no suitable index (or forced with hint)

Index Selectivity – Why? Plan cost 9.767, 3000 rows Plan cost 91.17, 30000 rows Plan cost 102.7, (IO: 101 ) 136366 pages, 1090MB Plan cost (IO portion) of is approximately 1/320 per key lookup row (random) 1/1350 per page in table scan. (See Execution Plan Cost Formulas slide deck) Ratio of Key lookup row to table scan page is 4.2:1, with CPU portion 3.5:1

Plan Cost Key Lookup Table scan IO portion approximately 1/320 per row (random) Table scan 1/1350 per page Ratio of Key lookup row to table scan page is 4:21 IO + CPU portion 3.5:1

Loop Joins – similar to key lookup Customer2 clustered on c_id only Plan cost 91.10, 30000 rows customers clustered on identity(-ish) Customer3 clustered on warehouse, same as orders2 Plan cost 13.53, 30000 rows

Index Important Points Selectivity is important But so is locality (grouping rows into common pages) Applicable when Multiple tables have a common grouping column(s) Impacts choice of primary key and/or cluster key Key Lookup (IO portion) costs are roughly 1/320 per row (with adjustments) in a large table unless the query optimizer knows the rows are in a limited number of pages

Big Picture The Execution Plan links all the elements of performance SQL Tables natural keys Indexes Execution Plan Statistics & Compile parameters Compile Row estimate propagation errors Storage Engine Hardware DOP Memory Parallel plans Recompile temp table / table variable Query Optimizer Index & Stats Maintenance API Server Cursors: open, prepare, execute, close? SET NO COUNT Information messages Tables and SQL combined implement business logic Natural keys with unique indexes, not SQL Index and Statistics maintenance policy 1 Logic may need more than one execution plan? Compile cost versus execution cost? Plan cache bloat? The Execution Plan links all the elements of performance Index tuning alone has limited value Over indexing can cause problems as well Client App also important

Indexing Objectives No such thing as perfect Indexing is trade-offs, what is more important Insert/Update/Delete performance Select performance (& compile overhead) Maintenance? Also need to consider statistics update, compile parameters

Topics Primary Key, Cluster Key Nonclustered indexes Included columns Filtered index Columnstore - Not covered here, see slide decks by Jimmy May Special – also not covered here XML, Spatial, Hash – memory optimized tables Related: Partitioning Partition to distribute or concentrate

Identity, Primary Key, Cluster Key These are three different things Primary – uniquely identifies row/record Identity/Row GUID – mechanism for generating key Identity is useful, but should not be always used Guid – only use when absolutely no alternatives Consider a natural key for dimension tables Cluster Key – physical organization of table nonclustered indexes implicitly incorporates cluster key columns

Clustered Index Identity or other sequentially increasing value Always inserted to the last page In theory, no fragmentation in the clustered index (or a nonclustered index have such as key) B-tree will become unbalanced Grouping Good for multi-row SELECT queries Gets fragmented with inserts

Common Grouping Option Table A a_id Table C a_id b_id c_id Table B Table D d_id (unique) Cluster key a_id, b_id Cluster key a_id, b_id, c_id Unique nonclustered index on c_id Cluster key c_id, d_id If the cluster/primary key is on the parent table key + a local key, Does the local key need to be an identity? Example: Orders – LineItem LineItem table key is OrderId + LineItem sequence

Nonclustered Index Key columns Optional WITH options Include columns Filter condition WITH options Row/page compression Fill factor Wish list, would be nice if we could: Specify different fill factors for leaf and upper levels Rebuild only upper levels, or only leaf level

Index Write Overhead Insert write overhead Update Write overhead always Update Write overhead overhead only when modified column is part of index Index row moves if key column updated Delete Always Take away: Pay attention to IUD frequency If updates are frequent, which columns?

Nonclustered Index Key Strategy SELECT xxx FROM WHERE selective search arguments AND (not so) or non-selective SARGS (GROUP BY) xxx (ORDER BY) xxx Index key should have important selective SARGs & possibly either the GROUP BY or ORDER BY Less important SARGs can be in INCLUDE list

Include List All (selected) columns negates need for key lookup a major cost in execution plan for multi-row queries Considerations Fat include list -> almost another copy of the table? Update implications? Leave frequently updated columns out of include list? More work when updated column is in key, less when in include Options If a smaller include can minimize need for key lookups This is good enough

Indexing Scenario Query has a moderately selective equality SARG & several additional WHERE clause conditions not amenable to index seek, but cumulatively reduce rows Sometimes, row reduction occurs after a join Many columns are needed (impractical to include all) Option Index Key on important equality SARG Other arguments in the INCLUDE list Rely on Key Lookup for remaining columns

B-tree Index depth: or INDEXPROPERTY sys.dm_db_index_physical_stats root IL 2 IL 2 IL 2 IL 3 IL 3 IL 3 Index depth: INDEXPROPERTY or sys.dm_db_index_physical_stats

Temporary Indexing Permanent indexes for common operations For maintenance or upgrade operations Drop/disable indexes -> op -> recreate Or create index -> op -> drop

Partitioning Can be used to concentrate active rows Example: date – year, month, day etc. Can be used to distribute active rows over all partitions Example guid, hash, etc. Partitioning trick Partition key not the clustered index lead key Example: Cluster key, OrderId, DateKey (partition on date) Query with OrderId only : index seek on all partitions On date only: scan single partition

Summary Both Selectivity and Grouping/Locality important Effects Key Lookup -> alternative is table scan Indexing trade-offs, no one rule for all cases Consider insert/update/read & maintenance Missing Indexes DMV not intelligent advice! Extreme high perf. requires verification

Related Statistics recomputed at first 6 rows modified, first 500 rows, then every 20% Newer versions of SQL Server auto-recompute at lower threshold (than 20%) for very large tables Default statistics sample problematic with grouping What are the compile parameter values on the first execute after a statistics recompute?