Statistics That Need Special Attention Joe Chang yahoo

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
SQL Server performance tuning basics
SQL Server Storage Engine.  Software architect at Red Gate Software  Responsible for SQL tools: ◦ SQL Compare, SQL Data Compare, SQL Packager ◦ SQL.
Cardinality How many rows? Distribution How many distinct values? density How many rows for each distinct value? Used by optimizer A histogram 200 steps.
Modern Performance - SQL Server Joe Chang yahoo.
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.
Modern Performance - SQL Server
SQL Performance 2011/12 Joe Chang, SolidQ
Automating Performance … Joe Chang SolidQ
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Comprehensive Performance with Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
Project Management Database and SQL Server Katmai New Features Qingsong Yao
Dave Ballantyne Clear Sky SQL. ›Freelance Database Developer/Designer –Specializing in SQL Server for 15+ years ›SQLLunch –Lunchtime usergroup –London.
Virtual techdays INDIA │ 9-11 February 2011 SQL 2008 Query Tuning Praveen Srivatsa │ Principal SME – StudyDesk91 │ Director, AsthraSoft Consulting │ Microsoft.
SQL Server Query Optimizer Cost Formulas Joe Chang
Denny Cherry twitter.com/mrdenny.
#SQLSatRiyadh Special Topics Joe Chang
Module 8 Improving Performance through Nonclustered Indexes.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo Slide deck here.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Insert, Update & Delete Performance Joe Chang
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Table Indexing for the.NET Developer Denny Cherry twitter.com/mrdenny.
Parallel Execution Plans Joe Chang
Large Data Operations Joe Chang
Parallel Execution Plans Joe Chang
TPC-H Studies Joe Chang
Denny Cherry twitter.com/mrdenny.
Query Optimizer Execution Plan Cost Model Joe Chang
SQL SERVER DAYS 2011 Table Indexing for the.NET Developer Denny Cherry twitter.com/mrdenny.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Session 1 Module 1: Introduction to Data Integrity
Thinking in Sets and SQL Query Logical Processing.
8 Copyright © 2005, Oracle. All rights reserved. Gathering Statistics.
How to kill SQL Server Performance Håkan Winther.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
APRIL 13 th Introduction About me Duško Mirković 7 years of experience.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
Execution Plans Detail From Zero to Hero İsmail Adar.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL Server Statistics and its relationship with Query Optimizer
Stored Procedures – Facts and Myths
Query Tuning without Production Data
Query Tuning without Production Data
Query Tuning without Production Data
Joe Chang yahoo . com qdpma.com
Introduction to Execution Plans
Now where does THAT estimate come from?
Cardinality Estimator 2014/2016
Table Indexing for the .NET Developer
Statistics What are the chances
Statistics: What are they and How do I use them
Reading Execution Plans Successfully
Joe Chang yahoo Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
SQL Server Query Optimizer Cost Formulas
Four Rules For Columnstore Query Performance
Introduction to Execution Plans
Diving into Query Execution Plans
Introduction to Execution Plans
Reading execution plans successfully
Introduction to Execution Plans
Presentation transcript:

Statistics That Need Special Attention Joe Chang yahoo

About Joe SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data 2004 Decoding statblob/stats_stream – writing your own statistics Disk IO cost structure Tools for system monitoring, execution plan analysis See Download: Blog:

Statistics – Special Attention What works automatically? – No need for special attention What does not work by default – or simple adjustments, trace flags etc. – What could cause spectacular failures Ok, now what do we do? – scheduled jobs, triggered jobs, in-procedure, optimize for “fake data” Other?

Topics Quick Statistics Overview – Details available elsewhere Weak point of SQL Server Statistics – Recompute set points – Sampling – Compile parameter etc. Options – Trace 2371, 4136 Workarounds

SQL Performance Natural keys with unique indexes, not SQL The Execution Plan links all the elements of performance Index tuning alone has limited value Over indexing can cause problems as well Index and Statistics maintenance policy 1 Logic may need more than one execution plan? Compile cost versus execution cost? Tables and SQL combined implement business logic Plan cache bloat? SQL Tables natural keys Indexes Execution Plan Statistics & Compile parameters Compile Row estimate propagation errors Storage Engine Hardware DOP Memory Parallel plans Recompile temp table / table variable Query Optimizer Index & Stats Maintenance API Server Cursors: open, prepare, execute, close? SET NO COUNT Information messages

Factors to Consider SQLTablesIndexes Query Optimizer Statistics Compile Parameters Storage Engine Hardware DOP memory

STATISTICS

SQL Server Statistics Principles: it should just work Statistics are automatically – created and updated Indexes and columns Statistics Used by the Query Optimizer in Microsoft SQL Server Eric N. Hanson and Yavor Angelov Contributor Lubor Kollar

DBCC SHOW_STATISTICS DBCC SHOW_STATISTICS('LINEITEM',L_SHIPDATE_CLUIDX) header Density vector Histogram Options: WITH STAT_HEADER, DENSITY VECTOR, HISTOGRAM, STATS_STREAM binary

Density Vector Statistics binary storage structure allows for 30 rows in density vector Full nonclustered index key consists of explicitly declared columns plus cluster key columns (not in NC key) Limit of 15 columns in clustered index key And 15 columns in nonclustered index key

Histogram Up to 200 steps Captures low and high bounds Attempts to capture skewed distribution – frequent values equal rows, others in range rows

UPDATE STATISTICS – STATS_STREAM UPDATE STATISTICS LINEITEM(L_SHIPDATE_CLUIDX) WITH STATS_STREAM = 0x A A128038F3..., ROWCOUNT = , PAGECOUNT = DBCC SHOW_STATISTICS('LINEITEM',L_SHIPDATE_CLUIDX) WITH STATS_STREAM

Statistics Structure Stored (mostly) in binary field Scalar values Density Vector – limit 30, half in NC, half Cluster key Histogram Up to 200 steps Consider not blindly using IDENTITY on critical tables Example: Large customers get low ID values Small customers get high ID values

Statistics Auto - re-compute at 20% (first 6, 500 rows) Sampling strategy – How much to sample - theory? – Random pages versus random rows – Histogram Equal and Range Rows – Out of bounds, value does not exist – etc. Statistics Used by the Query Optimizer in SQL Server 2008 Eric N. Hanson and Yavor Angelov, Contributor: Lubor Kollar Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator Joseph Sack

System tables, views, functions ;WITH k AS ( SELECT k.object_id, k.stats_id, k.stats_column_id, c.column_id, c.name FROM sys.stats_columns k INNER JOIN sys.columns c ON c.object_id = k.object_id AND c.column_id = k.column_id ) SELECT s.name, o.name, d.name, d.stats_id, d.auto_created, ISNULL(STUFF(( SELECT ', ' + name FROM k WHERE k.object_id = d.object_id AND k.stats_id = d.stats_id ORDER BY k.stats_column_id FOR XML PATH(''), TYPE, ROOT).value('root[1]','nvarchar(max)'),1,1,''),'') as Keys, p.rows, p.rows_sampled, p.steps, p.modification_counter, p.last_updated FROM sys.objects o JOIN sys.schemas s ON s.schema_id = o.schema_id JOIN sys.stats d ON d.object_id = o.object_id OUTER APPLY sys.dm_db_stats_properties(d.object_id, d.stats_id) p WHERE o.is_ms_shipped = 0 ORDER BY s.name, o.name, d.stats_id sys.stats sys.stats_columns sys.dm_db_stats_properties (2008R2 sp2, 2012 sp1, 2014 RTM)

;WITH b AS ( SELECT d.object_id, d.index_id, part = COUNT(*), reserved = 8*SUM(d.reserved_page_count), used = 8*SUM(d.used_page_count ), in_row_data = 8*SUM(d.in_row_data_page_count), lob_used = 8*SUM(d.lob_used_page_count), overflow = 8*SUM( d.row_overflow_used_page_count), row_count = SUM(row_count), notcompressed = SUM(CASE data_compression WHEN 0 THEN 1 ELSE 0 END), compressed = SUM(CASE data_compression WHEN 0 THEN 0 ELSE 1 END) -- change to 0 for SQL Server 2005 FROM sys.dm_db_partition_stats d WITH(NOLOCK) INNER JOIN sys.partitions r WITH(NOLOCK) ON r.partition_id = d.partition_id GROUP BY d.object_id, d.index_id ), j AS ( SELECT j.object_id, j.index_id, j.key_ordinal, c.column_id, c.name,is_descending_key FROM sys.index_columns j INNER JOIN sys.columns c ON c.object_id = j.object_id AND c.column_id = j.column_id ) SELECT t.name, o.name, ISNULL(i.name, '') [index], ISNULL(STUFF(( SELECT ', ' + name + CASE is_descending_key WHEN 1 THEN '-' ELSE '' END FROM j WHERE j.object_id = i.object_id AND j.index_id = i.index_id AND j.key_ordinal >0 ORDER BY j.key_ordinal FOR XML PATH(''), TYPE, ROOT).value('root[1]','nvarchar(max)'),1,1,''),'') as Keys, ISNULL(STUFF(( SELECT ', ' + name FROM j WHERE j.object_id = i.object_id AND j.index_id = i.index_id AND j.key_ordinal = 0 ORDER BY j.column_id FOR XML PATH(''), TYPE, ROOT).value('root[1]','nvarchar(max)'),1,1,''),'') as Incl, i.index_id, CASE WHEN i.is_primary_key = 1 THEN 'PK' WHEN i.is_unique_constraint = 1 THEN 'UC' WHEN i.is_unique = 1 THEN 'U' WHEN i.type = 0 THEN 'heap' WHEN i.type = 3 THEN 'X' WHEN i.type = 4 THEN 'S' ELSE CONVERT(char, i.type) END typ, i.data_space_id dsi, b.row_count, b.in_row_data in_row, b.overflow ovf, b.lob_used lob, b.reserved - b.in_row_data - b.overflow -b.lob_used unu, 'ABR' = CASE row_count WHEN 0 THEN 0 ELSE 1024*used/row_count END, y.user_seeks, y.user_scans u_scan, y.user_lookups u_look, y.user_updates u_upd, b.notcompressed ncm, b.compressed cmp, rw_delta = b.row_count - s.rows, s.rows_sampled --, s.unfiltered_rows, s.modification_counter mod_ctr, s.steps, CONVERT(varchar, s.last_updated,120) updated, i.is_disabled dis, i.is_hypothetical hyp, ISNULL(i.filter_definition, '') filt FROM sys.objects o JOIN sys.schemas t ON t.schema_id = o.schema_id JOIN sys.indexes i ON i.object_id = o.object_id LEFT JOIN b ON b.object_id = i.object_id AND b.index_id = i.index_id LEFT JOIN sys.dm_db_index_usage_stats y ON y.object_id = i.object_id AND y.index_id = i.index_id AND y.database_id = DB_ID() OUTER APPLY sys.dm_db_stats_properties(i.object_id, i.index_id) s WHERE o.is_ms_shipped = 0

Statistics Auto/Re-Compute Automatically generated on query compile Recompute at 6 rows, 500, every 20%? Has this changed? 2008 R2 Trace 2371 – lower threshold auto recomputed for large tables Understanding When Statistics Will Automatically Update

Statistics Sampling Sampling theory – True random sample – Sample error - square root N Relative error 1/ N SQL Server sampling – Random pages But always first and last page??? – All rows in selected pages

Query Optimizer Assumptions Statistics has captured lower and upper bounds If there are no range rows between range hi keys, then it knows there are no rows in between The page sampling method compensates for lack of true random row sample

Hypothesis If we insert/update rows, then dm_db_stats_properties modification_counter > 0 Should not we assume that there is new data?

Row Estimate Problems (at source) Skewed data distribution – Does query compile with low, medium or high skew parameter value? – What about subsequent executes? Errors – due to random page sample Out of bounds Value does not exist Row estimate errors at source – is classified under statistics topic

Problems in Statistics Scenario – update all statistics with default Insert rows, new (int / identity) values that are higher than upper bound at time of stats if plan compiles with previously existing value – Then fine If plan compiles with new value, – Then optimizer knows value “does not exist” – But plan will show 1 row

SELECT * FROM PART WHERE P_PARTKEY = SELECT * FROM PART WHERE P_PARTKEY =

SELECT * FROM LINEITEM WHERE L_PARTKEY =

SELECT * FROM PART JOIN LINEITEM ON L_PARTKEY = P_PARTKEY WHERE P_PARTKEY = SELECT * FROM PART JOIN LINEITEM ON L_PARTKEY = P_PARTKEY WHERE P_PARTKEY =

Loop Join - Table Scan on Inner Source Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source Default statistics rules may lead to serious ETL issues Consider custom strategy

Limited distinct values Column has less than 200 distinct values – Statistics via histogram knows the exact values – i.e., it also knows what values do not exist Now insert rows with different values – If >= 20%, then statistics update is triggered – Next query w/explicit SARG on new value good Ex WHERE bIsProcessed = 0 – If too few row to trigger stats, then bad news

Compile Parameter Not Exists Main procedure has cursor around view_Servers First server in view_Servers is ’CAESIUM’ Cursor executes sub-procedure for each Server sql: SELECT MAX(ID) FROM TReplWS WHERE Hostname But CAESIUM does not exist in TReplWS!

Good and Bad Plan?

SqlPlan Compile Parameters

<StmtSimple varchar(50) = ISNULL(MAX(id),0) FROM TReplWS WHERE Hostname StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost=" " StatementEstRows="1" StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2" StatementOptmEarlyAbortReason="GoodEnoughPlanFound"> <StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" /> <RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar" EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007“ AvgRowSize="15" EstimatedTotalSubtreeCost=" " Parallel="0" EstimateRebinds="0" EstimateRewinds="0"> Compile parameter values at bottom of sqlplan file

Microsoft Responds (Empire Strikes Back) automatic-update-statistics-in-sql-server-traceflag-2371.aspx Changes to automatic update statistics in SQL Server – traceflag 2371 Trace flag flags-for-dynamics-ax.aspx QL Server Trace Flags for Dynamics AX Disables use of the histogram

2371

2014 The New and Improved Cardinality Estimator in SQL Server e-new-and-improved-cardinality-estimator-in-sql-server-2014.aspx e-new-and-improved-cardinality-estimator-in-sql-server-2014.aspx Filtered Stats and CE Model Variation Dima Puligin

Upper Bound Problem Insert dummy row with maximum value – SET IDENTITY_INSERT table ON – int 2,147,483,647, datetime Dec 31, 9999 OPTION (OPTIMIZE FOR UNKNOWN)) SQL Server 2014 – not a problem

Temp Table and Table Variable Forget what other people have said – Most is Temp Tables – subject to statistics auto/re-compile Table variable – no statistics, assumes 1 row Question: In each specific case: does the statistics and recompile help or not? – Yes: temp table – No: table variable Is this still true?

Multiple joins to one table SELECT xxx FROM Cession.RetroTransaction rt INNER JOIN Common.Code c1 ON c1.CodeId = rt.TransactionTypeId INNER JOIN Cession.NARSplit ns ON ns.NARSplitId = rt.NARSplitId AND ns.CessionId = rt.CessionId INNER JOIN CRM.Company comp ON comp.CompanyId = ns.CompanyId INNER JOIN Cession.RetroTransactionPeriod rtp ON rt.RetroTransactionPeriodId = rtp.RetroTransactionPeriodId LEFT OUTER JOIN Cession.RetroTransactionAllocation BasePremium WITH ( NOLOCK ) ON BasePremium.RetroTransactionId = rt.RetroTransactionId AND BasePremium.AllocationTypeId = '02C48EDA-57ED CD-6122AE372B3D' AND BasePremium.AllocationGrossNetTypeId = '09E95F B9-B4FB-A068CDBDBFDF' LEFT OUTER JOIN Cession.RetroTransactionAllocation BaseAllowance WITH ( NOLOCK ) ON BaseAllowance.RetroTransactionId = rt.RetroTransactionId AND BaseAllowance.AllocationTypeId = 'CA5047A3-43CD-496B-8F71-3E95E209806E' LEFT OUTER JOIN Cession.RetroTransactionAllocation PolicyFee WITH ( NOLOCK ) ON PolicyFee.RetroTransactionId = rt.RetroTransactionId AND PolicyFee.AllocationTypeId = 'FCD F-4BCA-99DC-F55BD46E46AF' AND PolicyFee.AllocationGrossNetTypeId = '09E95F B9-B4FB-A068CDBDBFDF' LEFT OUTER JOIN Cession.RetroTransactionAllocation PolicyFeeAllowance ON PolicyFeeAllowance.RetroTransactionId = rt.RetroTransactionId AND PolicyFeeAllowance.AllocationTypeId = 'E570DA71-6B6C-4742-AB8B-DDFDB ' LEFT OUTER JOIN Cession.RetroTransactionAllocation TempFlatExtra WITH ( NOLOCK ) ON TempFlatExtra.RetroTransactionId = rt.RetroTransactionId AND TempFlatExtra.AllocationTypeId = '82CB2D45-904F-4B50-8C36-A39CD2F33D9D' AND TempFlatExtra.AllocationGrossNetTypeId = '09E95F B9-B4FB-A068CDBDBFDF' LEFT OUTER JOIN Cession.RetroTransactionAllocation TempFlatExtraAllowance ON TempFlatExtraAllowance.RetroTransactionId = rt.RetroTransactionId AND TempFlatExtraAllowance.AllocationTypeId = '43E AC3-4A1F-A13F-19A1BA7D3AFA' LEFT OUTER JOIN Cession.RetroTransactionAllocation PermFlatExtra WITH ( NOLOCK ) ON PermFlatExtra.RetroTransactionId = rt.RetroTransactionId AND PermFlatExtra.AllocationTypeId = '889639AB-620C B9E-2970A4500B57' AND PermFlatExtra.AllocationGrossNetTypeId = '09E95F B9-B4FB-A068CDBDBFDF' LEFT OUTER JOIN Cession.RetroTransactionAllocation PermFlatExtraAllowance ON PermFlatExtraAllowance.RetroTransactionId = rt.RetroTransactionId AND PermFlatExtraAllowance.AllocationTypeId = '43DFA681-44FB-4025-B75A-1E12498A679B' LEFT OUTER JOIN Cession.RetroTransactionAllocation CashSurrender WITH ( NOLOCK ) ON CashSurrender.RetroTransactionId = rt.RetroTransactionId AND CashSurrender.AllocationTypeId = 'BCBDF86C-DCC5-49DA-BD2E-CD42CE402E99' LEFT OUTER JOIN Common.Code splittype ON splittype.CodeId = ns.SplitTypeId WHERE rt.CessionId = N'86037aac-465a-4e47-8bf1-6f af5'

<ShowPlanXML xmlns:xsi=" xmlns:xsd=" Version="1.1" Build=" " xmlns=" <StmtSimple StatementCompId="1“ StatementEstRows=" “ StatementId="1" StatementOptmLevel="FULL“ StatementOptmEarlyAbortReason="TimeOut" StatementSubTreeCost="44029" StatementText="SELECT rt.RetroTransactionId, FROM Cession.vwRetroTransaction AS rt WHERE rt.CessionId = N'86037aac-465a-4e47-8bf1-6f af5'" StatementType="SELECT" nvarchar(4000)) " QueryHash="0x19274C C69" QueryPlanHash="0x80E6A4E14DF1DC0A"> <RelOp AvgRowSize="365" EstimateCPU=" " EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0“ EstimateRows=" " LogicalOp="Gather Streams" NodeId="0" Parallel="true" PhysicalOp="Parallelism“ EstimatedTotalSubtreeCost="44029" >

<ShowPlanXML xmlns:xsi= xmlns:xsd=" Version="1.1" Build=" “ xmlns=" <StmtSimple StatementCompId="1“ StatementEstRows=" “ StatementId="1" StatementOptmLevel="FULL“ StatementOptmEarlyAbortReason="TimeOut" StatementSubTreeCost=" “ StatementText="SELECT rt.RetroTransactionId FROM Cession.vwRetroTransaction rt WHERE StatementType="SELECT" QueryHash="0x19274C C69" QueryPlanHash="0xD5FF47FBFB2AD6AE">

plan