TPC-H Studies Joe Chang
About Joe Chang SQL Server Execution Plan Cost Model True cost structure by system architecture Decoding statblob (distribution statistics) SQL Clone – statistics-only database Tools ExecStats – cross-reference index use by SQL- execution plan Performance Monitoring, Profiler/Trace aggregation
TPC-H
TPC-H DSS – 22 queries, geometric mean 60X range plan cost, comparable actual range Power – single stream Tests ability to scale parallel execution plans Throughput – multiple streams Scale Factor 1 – Line item data is 1GB 875MB with DATE instead of DATETIME Only single column indexes allowed, Ad-hoc
SF 10, test studies Not valid for publication Auto-Statistics enabled, Excludes compile time Big Queries – Line Item Scan Super Scaling – Mission Impossible Small Queries & High Parallelism Other queries, negative scaling Did not apply T2301, or disallow page locks
Big Q: Plan Cost vs Actual Plan Cost reduction from DOP1 to 16/32 Q128% Q944% Q1870% Q2120% Plan Cost says scaling is poor except for Q18, memory affects Hash IO onset Plan 10GB Actual Query time In seconds Plan Cost is poor indicator of true parallelism scaling Q18 & Q 21 > 3X Q1, Q9
Big Query: Speed Up and CPU Q13 has slightly better than perfect scaling? In general, excellent scaling to DOP 8-24, weak afterwards Holy Grail CPU time In seconds Speed up relative to DOP 1
Super Scaling Suppose at DOP 1, a query runs for 100 seconds, with one CPU fully pegged CPU time = 100 sec, elapse time = 100 sec What is best case for DOP 2? Assuming nearly zero Repartition Threads cost CPU time = 100 sec, elapsed time = 50? Super Scaling: CPU time decreases going from Non-Parallel to Parallel plan! No, I have not started drinking, yet
Super Scaling CPU-sec goes down from DOP 1 to 2 and higher (typically 8) CPU normalized to DOP 1 Speed up relative to DOP 1 3.5X speedup from DOP 1 to 2 (Normalized to DOP 1)
CPU and Query time in seconds CPU time Query time
Super Scaling Summary Most probable cause Bitmap Operator in Parallel Plan Bitmap Filters are great, Question for Microsoft: Can I use Bitmap Filters in OLTP systems with non-parallel plans?
Small Queries – Plan Cost vs Act Query 3 and 16 have lower plan cost than Q17, but not included Q4,6,17 great scaling to DOP 4, then weak Negative scaling also occurs Query time Plan Cost
Small Queries CPU & Speedup What did I get for all that extra CPU?, Interpretation: sharp jump in CPU means poor scaling, disproportionate means negative scaling Query 2 negative at DOP 2, Q4 is good, Q6 get speedup, but at CPU premium, Q17 and 20 negative after DOP 8 CPU time Speed up
High Parallelism – Small Queries Why? Almost No value TPC-H geometric mean scoring Small queries have as much impact as large Linear sum of weights large queries OLTP with 32, 64+ cores Parallelism good if super-scaling Default max degree of parallelism 0 Seriously bad news, especially for small Q Increase cost threshold for parallelism? Sometimes you do get lucky
Q that go Negative Query time “Speedup”
CPU
Other Queries – CPU & Speedup Q3 has problems beyond DOP 2 CPU time Speedup
Other - Query Time seconds Query time
Scaling Summary Some queries show excellent scaling Super-scaling, better than 2X Sharp CPU jump on last DOP doubling Need strategy to cap DOP To limit negative scaling Especially for some smaller queries? Other anomalies
Compression PAGE
Compression Overhead - Overall 40% overhead for compression at low DOP, 10% overhead at max DOP??? Query time compressed relative to uncompressed CPU time compressed relative to uncompressed
Query time compressed relative to uncompressed CPU time compressed relative to uncompressed
Compressed Table LINEITEM – real data may be more compressible Uncompressed: 8,749,760KB, Average Bytes per row: 149 Compressed: 4,819,592KB, Average Bytes per row: 82
Partitioning Orders and Line Item on Order Key
Partitioning Impact - Overall Query time partitioned relative to not partitioned CPU time partitioned relative to not partitioned
Query time partitioned relative to not partitioned CPU time partitioned relative to not partitioned
Plan for Partitioned Tables
Scaling DW Summary Massive IO bandwidth Parallel options for data load, updates etc Investigate Parallel Execution Plans Scaling from DOP 1, 2, 4, 8, 16, 32 etc Scaling with and w/o HT Strategy for limiting DOP with multiple users
Fixes from Microsoft Needed Contention issues in parallel execution Table scan, Nested Loops Better plan cost model for scaling Back-off on parallelism if gain is negligible Fix throughput degradation with multiple users running big DW queries Sybase and Oracle, Throughput is close to Power or better
Query Plans
Big Queries
Q1 Pricing Summary Report
Q1 Plan Non-Parallel Parallel Parallel plan 28% lower than scalar, IO is 70%, no parallel plan cost reduction
Q9 Product Type Profit Measure IO from 4 tables contribute 58% of plan cost, parallel plan is 39% lower Non-Parallel Parallel
Q9 Non-Parallel Plan Table/Index Scans comprise 64%, IO from 4 tables contribute 58% of plan cost Join sequence: Supplier, (Part, PartSupp), Line Item, Orders
Q9 Parallel Plan Non-Parallel: (Supplier), (Part, PartSupp), Line Item, Orders Parallel: Nation, Supplier, (Part, Line Item), Orders, PartSupp
Q9 Non-Parallel Plan details Table Scans comprise 64%, IO from 4 tables contribute 58% of plan cost
Q9 Parallel reg vs Partitioned
Q13 Why does Q13 have perfect scaling?
Q18 Large Volume Customer Non-Parallel Parallel
Q18 Graphical Plan Non-Parallel Plan: 66% of cost in Hash Match, reduced to 5% in Parallel Plan
Q18 Plan Details Non-Parallel Parallel Non-Parallel Plan Hash Match cost is 1245 IO, CPU DOP 16/32: size is below IO threshold, CPU reduced by >10X
Q21 Suppliers Who Kept Orders Waiting Note 3 references to Line Item Non-Parallel Parallel
Q21 Non-Parallel Plan H1 H2 H3 H2 H3
Q21 Parallel
Q21 3 full Line Item clustered index scans Plan cost is approx 3X Q1, single “scan”
Super Scaling
Q7 Volume Shipping Non-Parallel Parallel
Q7 Non-Parallel Plan Join sequence: Nation, Customer, Orders, Line Item
Q7 Parallel Plan Join sequence: Nation, Customer, Orders, Line Item
Q8 National Market Share Non-Parallel Parallel
Q8 Non-Parallel Plan Join sequence: Part, Line Item, Orders, Customer
Q8 Parallel Plan Q8 Parallel Plan Join sequence: Part, Line Item, Orders, Customer
Q11 Important Stock Identification Non-Parallel Parallel
Q11 Join sequence: A) Nation, Supplier, PartSupp, B) Nation, Supplier, PartSupp
Q11
Small Queries
Query 2 Minimum Cost Supplier Wordy, but only touches the small tables, second lowest plan cost (Q15)
Q2 Clustered Index Scan on Part and PartSupp have highest cost (48%+42%)
Q2 PartSupp is now Index Scan + Key Lookup
Q6 Forecasting Revenue Change Note sure why this blows CPU Scalar values are pre-computed, pre-converted
Q20? This query may get a poor execution plan Date functions are usually written as because Line Item date columns are “date” type CAST helps DOP 1 plan, but get bad plan for parallel
Q20
Q20
Q20 alternate - parallel Statistics estimation error here Penalty for mistake applied here
Other Queries
Q3
Q3
Q12 Random IO? Will this generate random IO?
Query 12 Plans Non-Parallel Parallel
Queries that go Negative
Q17 Small Quantity Order Revenue
Q17 Table Spool is concern
Q17 the usual suspects
Q19
Q19
Q22
Q22
Speedup from DOP 1 query time CPU relative to DOP 1