Download presentation
Presentation is loading. Please wait.
1
Now where does THAT estimate come from?
The nuts and bolts of cardinality estimation by Hugo Kornelis
2
WHAT estimate????
3
WHAT estimate????
4
About me Hugo Kornelis Independent database consultant
Community addict Speaker, blogger, author, technical editor, Pluralsight author, etc. MVP (SQL Server/Data Platform) Blog:
5
Proudly brought to you by
Platinum Gold Silver Bronze
6
Which version? “Old” (legacy) cardinality estimator
Introduced in SQL Server 7.0 Unchanged until SQL Server 2012 “New” cardinality estimator Introduced in SQL Server 2014 Small changes in later versions Better? Database compatibility level Trace flags 2312 (force new) / 9481 (force old) Usually used with OPTION (QUERYTRACEON nnnn) query hint SQL 2016+: ALTER DATBASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = [ ON | OFF ];
7
Overview Cardinality estimation How: What? Why? How?
The usual suspects Statistics Single-table queries (simple filters, complex filters) Multiple tables (single equijoin condition, single non-equijoin condition, multiple join conditions)
8
Cardinality estimation: What is it?
Prediction of the number of rows that an operator will return
9
Cardinality estimation: Why is it important?
Determines plan choice Bad join strategy Non-optimal index choice Serial or parallel plan
10
Cardinality estimation: Why is it important?
Determines plan choice Determines memory grant Operators that store data in memory: Sort, Hash Match (Join), Hash Match (Aggregate) Required total memory computed based on cardinality estimates Query execution waits until memory available Insufficient memory: Spill to tempdb
11
The usual suspects Table variables No statistics
Estimated table cardinality Always 1 row Except … Fix: temporary tables
12
The usual suspects Table variables
Multi-statement table-valued functions Implemented using table variable No statistics Estimated cardinality: always 1 row Changed to 100 rows in SQL Server 2014 Fixes: Inline table-valued function Copy results to temporary table SQL Server 2017: Interleaved Execution Rest of plan recompiled after table-valued function is evaluated Restrictions apply
13
DEMO The usual suspects Table variables
Multi-statement table-valued functions
14
The usual suspects Table variables
Multi-statement table-valued functions Stale and outdated statistics AUTO_UPDATE_STATISTICS AUTO_UPDATE_STATISTICS_ASYNC Trace flag 2371 SQL Server 2008R2 and up On by default as on SQL (requires compatibility level 130)
15
The usual suspects Table variables
Multi-statement table-valued functions Stale and outdated statistics Unrepresentative statistics Use higher sampling rate, or use FULLSCAN Drawbacks: slower, more resources used Cannot be configured for automatic statistic updates Not guaranteed to work
16
The usual suspects Table variables
Multi-statement table-valued functions Stale and outdated statistics Unrepresentative statistics Parameter sniffing Estimates based on value of first execution Plan reused for later executions Even when variables change
17
The usual suspects Table variables
Multi-statement table-valued functions Stale and outdated statistics Unrepresentative statistics Parameter sniffing OPTIMIZE FOR hint Estimates based on hard-coded value Even when actual value is different
18
Statistics Number of rows sampled Total number of rows in table
Last update of this statistics
19
Statistics 1.0 / COUNT(DISTINCT LastName)
1.0 / COUNT(DISTINCT LastName, FirstName, MiddleName)
20
Statistics 7 rows have LastName = ‘Brown27’
146 rows have LastName > ‘Brook6’ and LastName < ‘Brown27’ 23 distinct LastName values in those rows So on average ~ 6.35 (146 / 23) rows for each of those values
21
Statistics DEMO DBCC SHOW_STATISTICS
22
Statistics Assumptions made when using statistics: Independence
Predicates are not correlated Uniformity Values are evenly spread Containment / Inclusion Values searched for will exist
23
Single-table queries Comparison with a variable Equality E-05
24
Single-table queries Comparison with a variable Equality
* 29750 sys.dm_db_partition_stats
25
Single-table queries Comparison with a variable Equality
* 29750 = DBCC SHOW_STATISTICS
26
Single-table queries Comparison with a variable Equality Inequality
Fixed selectivity assumption: 30% 0.3 * 29750 = 8925 sys.dm_db_partition_stats
27
DEMO Single-table queries Comparison with a variable Equality
Inequality
28
Single-table queries Comparison with a constant, sniffed parameter, or sniffed variable Equality 7 * / = 7 29750 29750 sys.dm_db_partition_stats DBCC SHOW_STATISTICS
29
Single-table queries Comparison with a constant, sniffed parameter, or sniffed variable Equality * / 29750 29750 29750
30
DEMO Single-table queries
Comparison with a constant, sniffed parameter, or sniffed variable Equality
31
Single-table queries Comparison with a constant, sniffed parameter, or sniffed variable Equality Inequality = 37 * / = 37
32
Single-table queries Comparison with a constant, sniffed parameter, or sniffed variable Equality Inequality = 37 36 * / = 37 36
33
Single-table queries Comparison with a constant, sniffed parameter, or sniffed variable Equality Inequality + (~67% * 34) = ~24.78 * / = ~24.78 (actual estimate: )
34
Single-table queries Comparison with a constant, sniffed parameter, or sniffed variable Equality Inequality + (~67% * 34) = ~24.78 * / = ~24.78 (actual estimate: )
35
DEMO Single-table queries
Comparison with a constant, sniffed parameter, or sniffed variable Equality Inequality
36
Single-table queries Comparison with a constant, sniffed parameter, or sniffed variable Equality Inequality Changes on SQL Server 2014: Different interpolation Difference between < and <= is observed > ‘Zwilling4’ >= ‘Zwilling4’
37
DEMO Ascending Key problem Not just for keys!
(And not just ascending either)
38
Ascending Key problem Workaround since SQL Server 2005 SP1:
Trace flag 2389 (“known ascending columns”) Trace flag 2390 (“all other columns”) Value out of range and column indexed? find MIN/MAX from table If within actual range, use average density Example: Statistics: rowcount 10,000; highest value 250; density 0.004 Actual: rowcount 11,500; highest value 300 WHERE value = 300 Density * Rowcount = * 10,000 = 40 WHERE value = 310 Out of actual range, so estimate is still 1 WHERE value = 290 In new range. Estimate should be 40, but is 1
39
Ascending Key problem Workaround since SQL Server 2005 SP1:
Trace flag 2389 (“known ascending columns”) Trace flag 2390 (“all columns”) Value out of range and column indexed? find MIN/MAX from table If within actual range, use average density Example: Statistics: rowcount 10,000; highest value 250; density 0.01 Actual: rowcount 11,500; highest value 300 WHERE value > 290 Interpolation based on 1,500 rows in range: 332 WHERE value > 310 Out of actual range, so estimate is still 1
40
Ascending Key problem Change in SQL Server 2014: WHERE value = 300
Value out of range? (Regardless of “known ascending” and index!) Estimate as if variable was used, based on number of rows added Example: Statistics: rowcount 10,000; highest value 250; density 0.01 Actual: rowcount 11,500; highest value 300 WHERE value = 300 Density * Rowcount = * 1,500 10,000 = 40 WHERE value = 310 Density * Rowcount = * 10,000 = 40 WHERE value = 290 Density * Rowcount = * 10,000 = 40
41
Ascending Key problem Change in SQL Server 2014: WHERE value > 290
Value out of range? (Regardless of “known ascending” and index!) Estimate as if variable was used, based on number of rows added Example: Statistics: rowcount 10,000; highest value 250; density 0.01 Actual: rowcount 11,500; highest value 300 WHERE value > 290 Inequality assumption * New rows = 0.3 * 1,500 = 450 WHERE value > 310 Inequality assumption * New rows = 0.3 * 1,500 = 450
42
Single-table queries Multiple predicates
SQL 7 – SQL 2012: Assume independency Estimate selectivity of each predicate (Estimate / Table cardinality) Find combined selectivity (Selectivity 1 * Selectivity 2 * Selectivity 3 * …) Find rowcount (Table cardinality * Combined selectivity)
43
Single-table queries Multiple predicates
Example: Table has 100,000 rows WHERE Col1 = ‘A’ 4,500 rows Selectivity = 4,500 / 100,000 = 0.045 WHERE Col2 Density = 0.025 Selectivity = 0.025 WHERE Col3 Fixed selectivity Selectivity = 0.3
44
Single-table queries Multiple predicates
Example: Table has 100,000 rows WHERE Col1 = ‘A’ AND Col2 AND Col3 Selectivity = * * 0.3 = Estimate = 100,000 * = 33.75
45
Single-table queries Multiple predicates
SQL 2014: Assumed partially dependent Estimate selectivity of each predicate Sort in order of ascending selectivity Most selective selectivity comes first Find combined selectivity (“exponential backoff”) (Selectivity 1× Selectivity 2 × Selectivity 3 ×…) Find rowcount
46
Single-table queries Multiple predicates
Example: Table has 100,000 rows Selectivity values: 0.045, 0.025, 0.3 Ascending order: 0.025, 0.045, 0.3 Selectivity = 0.025× × ≈ Estimate = 100,000 * =
47
Single-table queries Multiple predicates Fix (for SQL 7 – SQL 2012)
Create multi-column statistics (These are never created automatically) Will use average density for column combination Not used on SQL 2014 RTM Was fixed in SQL 2016
48
Single-table queries DEMO Multiple predicates
49
Multiple tables DEMO Simple joins One equality predicate
50
One equality predicate
Aligning histograms Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2)
51
One equality predicate
Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) 2 * 2 = 4 2 * 2 = 4 4 + 4
52
One equality predicate
Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) (98/49) * 2 = 4 1 * (61/55) ≈ 1.1 (149/149) * 1 = 1
53
One equality predicate
~20% * 48 = 9.6 distinct values Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) 9.6 * (98/49) * (84/42) = 38.4
54
One equality predicate
~80% * 48 = 38.4 distinct values Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) ~80% * 54 = 43.2 distinct values 38.4 * (98/49) * (61/55) ≈ 85.2
55
One equality predicate
~10% * 148 = 14.8 Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) ~20% * 54 = 10.8 10.8 * (149/149) * (61/55) ≈ 12.0
56
One equality predicate
~90% * 148 = 133.2 Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) 100 * (149/149) * (100/100) = 100
57
One equality predicate
Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) (actual estimate: ) = 249.7 * / = 249.7
58
One equality predicate
SQL Server 2014: Much simpler! Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2)
59
One equality predicate
SQL Server 2014: Much simpler! Abercrombie0 (2) Ackerman0 (2) 248/199 248/199 Abercrombie0 (2) Ackerman0 (2) 2 * 2 = 4 2 * 2 = 4 4 + 4
60
One equality predicate
SQL Server 2014: Much simpler! Abercrombie0 (2) Ackerman0 (2) 248/199 248/199 Abercrombie0 (2) Ackerman0 (2) 199 * (248/199) * (248/199) ≈ 309.1 = * / = 317.1 (actual estimate: 316.5)
61
One equality predicate
SQL Server 2014: Actual Abercrombie0 (2) Ackerman0 (2) 248/199 248/199 Abercrombie0 (2) Ackerman0 (2) 200 * (250/200) * (250/200) = 312.5 2 * 2 = 4 = * / = 316.5 (actual estimate: 316.5)
62
Multiple tables Simple joins One inequality predicate
Nothing documented So … let’s speculate! SQL Server 7.0 – 2012: Variation on equality algorithm
63
One inequality predicate
2 * 250 = 500 Abercrombie0 (2) Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) = 250 500
64
One inequality predicate
~20% * 48 = 9.6 distinct values Abercrombie0 (2) Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) 84 84 * 0.5 = 166 9.6 * (98/49) * (166 + (84 * 0.5)) * 166 = 500
65
One inequality predicate
98/49 * 164 = 328 Abercrombie0 (2) Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) = 164
66
One inequality predicate
38.4 * 98/49 * ( ((43.2 * 61/55) * 0.5) = Abercrombie0 (2) Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) (43.2 * 61/55) * 0.5 (11.8 * 61/55) =
67
One inequality predicate
1 * = Abercrombie0 (2) Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) (10.8 * 61/55) =
68
One inequality predicate
14.8 * 149/149 * ( ((10.8 * 61/55) * 0.5) = Abercrombie0 (2) Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) (10.8 * 61/55) * 0.5 =
69
One inequality predicate
149/149 * 102 = 102 Abercrombie0 (2) Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) = 102
70
One inequality predicate
133.2 * 149/149 * (2 + (100 * 0.5)) = Abercrombie0 (2) Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) 100 * 0.5 2
71
One inequality predicate
Abercrombie0 (2) Abercrombie0 (2) Abolrous0 (1) Ackerman0 (2) (98/49) (149/149) (84/42) (61/55) (100/100) Abercrombie0 (2) Abercrombie48 (2) Abolrous9 (1) Ackerman0 (2) (actual estimate: ) =
72
Multiple tables Simple joins One inequality predicate
Nothing documented So … let’s speculate! SQL Server 7.0 – 2012: Variation on equality algorithm SQL Server 2014: Same with the simplified histogram
73
One inequality predicate
2 * 250 = 500 Abercrombie0 (2) Ackerman0 (2) 248/199 248/199 Abercrombie0 (2) Ackerman0 (2) = 250 500
74
One inequality predicate
250 * (250 * 0.5) = 31250 Abercrombie0 (2) Ackerman0 (2) 248/199 248/199 Abercrombie0 (2) Ackerman0 (2) 250 * 0.5 500
75
One inequality predicate
Abercrombie0 (2) Ackerman0 (2) 248/199 248/199 Abercrombie0 (2) Ackerman0 (2) (actual estimate: 31750) 500 = 31750
76
Multiple tables Complex joins (actual estimate: 13.1007)
Two equality predicates SQL Server 7.0 – 2012 Compute selectivity for each predicate Multiply (assume independence) Cartesian product (estimate): (2975 * 1975) Join on LastName only (estimate): Selectivity = / = 4.242E-04 Join on FirstName only (estimate): Selectivity = / = 5.256E-03 Estimate = 2.230E-06 * = Combined selectivity = 4.242E-04 * 5.256E-03 = 2.230E-06 (actual estimate: )
77
Multiple tables Complex joins (actual estimate: 21327.1)
Two equality predicates SQL Server 7.0 – 2012 Compute selectivity for each predicate Multiply (assume independence) Or use density for column combination Multi-column statistics Statistics for multi-column index DISTINCT p.FirstName, p.LastName (estimate): 27550 DISTINCT p2.FirstName, p2.LastName (estimate): 19200 DISTINCT matching combination: 19200 Rows per combination in p: rows * density = 1.08 Rows per combination in p2: rows * density = 1.03 Estimate: * 1.08 * 1.03 = (actual estimate: )
78
Multiple tables Complex joins Two equality predicates SQL Server 2014
Estimate #distinct combinations on each side Multiply smallest with estimated densities Same as SQL Server with multi-column statistics (even when there are no multi-column statistics)
79
Multiple tables Complex joins Equality and inequality
SQL Server 7.0 – 2012 Compute selectivity for each predicate Multiply (assume independence) Same as for two equality predicates Multi-column statistics never used
80
Multiple tables Complex joins Equality and inequality SQL Server 2014
Estimate cardinality of each input Assume each row from large input matches one row from smaller input
81
Multiple tables Complex joins Join with extra filter predicates
on other columns SQL Server 7.0 – 2012 Assume filters are correlated Determine filter selectivity Scale down histograms Align scaled-down histograms
82
Multiple tables Complex joins Join with extra filter predicates
on other columns SQL Server 2014 Assumes no correlation between filters Align original histograms (simplified) Determine filter selectivity Reduce estimate after histogram alignment
83
Click “Sessions”–“Schedule” … “Download”
T H E E N D Questions? Download deck and code: Click “Sessions”–“Schedule” … “Download”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.