Download presentation
Presentation is loading. Please wait.
1
TPC-H Studies Joe Chang jchang6@yahoo.com www.qdpma.com
2
About Joe Chang SQL Server Execution Plan Cost Model True cost structure by system architecture Decoding statblob (distribution statistics) SQL Clone – statistics-only database Tools ExecStats – cross-reference index use by SQL- execution plan Performance Monitoring, Profiler/Trace aggregation
3
TPC-H
4
TPC-H DSS – 22 queries, geometric mean 60X range plan cost, comparable actual range Power – single stream Tests ability to scale parallel execution plans Throughput – multiple streams Scale Factor 1 – Line item data is 1GB 875MB with DATE instead of DATETIME Only single column indexes allowed, Ad-hoc
5
SF 10, test studies Not valid for publication Auto-Statistics enabled, Excludes compile time Big Queries – Line Item Scan Super Scaling – Mission Impossible Small Queries & High Parallelism Other queries, negative scaling Did not apply T2301, or disallow page locks
7
Big Q: Plan Cost vs Actual Plan Cost reduction from DOP1 to 16/32 Q128% Q944% Q1870% Q2120% Plan Cost says scaling is poor except for Q18, memory affects Hash IO onset Plan Cost @ 10GB Actual Query time In seconds Plan Cost is poor indicator of true parallelism scaling Q18 & Q 21 > 3X Q1, Q9
8
Big Query: Speed Up and CPU Q13 has slightly better than perfect scaling? In general, excellent scaling to DOP 8-24, weak afterwards Holy Grail CPU time In seconds Speed up relative to DOP 1
9
Super Scaling Suppose at DOP 1, a query runs for 100 seconds, with one CPU fully pegged CPU time = 100 sec, elapse time = 100 sec What is best case for DOP 2? Assuming nearly zero Repartition Threads cost CPU time = 100 sec, elapsed time = 50? Super Scaling: CPU time decreases going from Non-Parallel to Parallel plan! No, I have not started drinking, yet
10
Super Scaling CPU-sec goes down from DOP 1 to 2 and higher (typically 8) CPU normalized to DOP 1 Speed up relative to DOP 1 3.5X speedup from DOP 1 to 2 (Normalized to DOP 1)
11
CPU and Query time in seconds CPU time Query time
12
Super Scaling Summary Most probable cause Bitmap Operator in Parallel Plan Bitmap Filters are great, Question for Microsoft: Can I use Bitmap Filters in OLTP systems with non-parallel plans?
13
Small Queries – Plan Cost vs Act Query 3 and 16 have lower plan cost than Q17, but not included Q4,6,17 great scaling to DOP 4, then weak Negative scaling also occurs Query time Plan Cost
14
Small Queries CPU & Speedup What did I get for all that extra CPU?, Interpretation: sharp jump in CPU means poor scaling, disproportionate means negative scaling Query 2 negative at DOP 2, Q4 is good, Q6 get speedup, but at CPU premium, Q17 and 20 negative after DOP 8 CPU time Speed up
15
High Parallelism – Small Queries Why? Almost No value TPC-H geometric mean scoring Small queries have as much impact as large Linear sum of weights large queries OLTP with 32, 64+ cores Parallelism good if super-scaling Default max degree of parallelism 0 Seriously bad news, especially for small Q Increase cost threshold for parallelism? Sometimes you do get lucky
16
Q that go Negative Query time “Speedup”
17
CPU
18
Other Queries – CPU & Speedup Q3 has problems beyond DOP 2 CPU time Speedup
19
Other - Query Time seconds Query time
20
Scaling Summary Some queries show excellent scaling Super-scaling, better than 2X Sharp CPU jump on last DOP doubling Need strategy to cap DOP To limit negative scaling Especially for some smaller queries? Other anomalies
22
Compression PAGE
23
Compression Overhead - Overall 40% overhead for compression at low DOP, 10% overhead at max DOP??? Query time compressed relative to uncompressed CPU time compressed relative to uncompressed
24
Query time compressed relative to uncompressed CPU time compressed relative to uncompressed
25
Compressed Table LINEITEM – real data may be more compressible Uncompressed: 8,749,760KB, Average Bytes per row: 149 Compressed: 4,819,592KB, Average Bytes per row: 82
26
Partitioning Orders and Line Item on Order Key
27
Partitioning Impact - Overall Query time partitioned relative to not partitioned CPU time partitioned relative to not partitioned
28
Query time partitioned relative to not partitioned CPU time partitioned relative to not partitioned
29
Plan for Partitioned Tables
31
Scaling DW Summary Massive IO bandwidth Parallel options for data load, updates etc Investigate Parallel Execution Plans Scaling from DOP 1, 2, 4, 8, 16, 32 etc Scaling with and w/o HT Strategy for limiting DOP with multiple users
32
Fixes from Microsoft Needed Contention issues in parallel execution Table scan, Nested Loops Better plan cost model for scaling Back-off on parallelism if gain is negligible Fix throughput degradation with multiple users running big DW queries Sybase and Oracle, Throughput is close to Power or better
33
Query Plans
34
Big Queries
35
Q1 Pricing Summary Report
36
Q1 Plan Non-Parallel Parallel Parallel plan 28% lower than scalar, IO is 70%, no parallel plan cost reduction
38
Q9 Product Type Profit Measure IO from 4 tables contribute 58% of plan cost, parallel plan is 39% lower Non-Parallel Parallel
39
Q9 Non-Parallel Plan Table/Index Scans comprise 64%, IO from 4 tables contribute 58% of plan cost Join sequence: Supplier, (Part, PartSupp), Line Item, Orders
40
Q9 Parallel Plan Non-Parallel: (Supplier), (Part, PartSupp), Line Item, Orders Parallel: Nation, Supplier, (Part, Line Item), Orders, PartSupp
41
Q9 Non-Parallel Plan details Table Scans comprise 64%, IO from 4 tables contribute 58% of plan cost
42
Q9 Parallel reg vs Partitioned
44
Q13 Why does Q13 have perfect scaling?
46
Q18 Large Volume Customer Non-Parallel Parallel
47
Q18 Graphical Plan Non-Parallel Plan: 66% of cost in Hash Match, reduced to 5% in Parallel Plan
48
Q18 Plan Details Non-Parallel Parallel Non-Parallel Plan Hash Match cost is 1245 IO, 494.6 CPU DOP 16/32: size is below IO threshold, CPU reduced by >10X
50
Q21 Suppliers Who Kept Orders Waiting Note 3 references to Line Item Non-Parallel Parallel
51
Q21 Non-Parallel Plan H1 H2 H3 H2 H3
52
Q21 Parallel
53
Q21 3 full Line Item clustered index scans Plan cost is approx 3X Q1, single “scan”
54
Super Scaling
55
Q7 Volume Shipping Non-Parallel Parallel
56
Q7 Non-Parallel Plan Join sequence: Nation, Customer, Orders, Line Item
57
Q7 Parallel Plan Join sequence: Nation, Customer, Orders, Line Item
59
Q8 National Market Share Non-Parallel Parallel
60
Q8 Non-Parallel Plan Join sequence: Part, Line Item, Orders, Customer
61
Q8 Parallel Plan Q8 Parallel Plan Join sequence: Part, Line Item, Orders, Customer
63
Q11 Important Stock Identification Non-Parallel Parallel
64
Q11 Join sequence: A) Nation, Supplier, PartSupp, B) Nation, Supplier, PartSupp
65
Q11
66
Small Queries
67
Query 2 Minimum Cost Supplier Wordy, but only touches the small tables, second lowest plan cost (Q15)
68
Q2 Clustered Index Scan on Part and PartSupp have highest cost (48%+42%)
69
Q2 PartSupp is now Index Scan + Key Lookup
71
Q6 Forecasting Revenue Change Note sure why this blows CPU Scalar values are pre-computed, pre-converted
73
Q20? This query may get a poor execution plan Date functions are usually written as because Line Item date columns are “date” type CAST helps DOP 1 plan, but get bad plan for parallel
74
Q20
75
Q20
76
Q20 alternate - parallel Statistics estimation error here Penalty for mistake applied here
77
Other Queries
78
Q3
79
Q3
81
Q12 Random IO? Will this generate random IO?
82
Query 12 Plans Non-Parallel Parallel
83
Queries that go Negative
84
Q17 Small Quantity Order Revenue
85
Q17 Table Spool is concern
86
Q17 the usual suspects
88
Q19
89
Q19
90
Q22
91
Q22
92
Speedup from DOP 1 query time CPU relative to DOP 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.