TPC-H Studies Joe Chang

TPC-H Studies Joe Chang jchang6@yahoo.com www.qdpma.com

About Joe Chang SQL Server Execution Plan Cost Model True cost structure by system architecture Decoding statblob (distribution statistics) SQL Clone – statistics-only database Tools ExecStats – cross-reference index use by SQL- execution plan Performance Monitoring, Profiler/Trace aggregation

TPC-H DSS – 22 queries, geometric mean 60X range plan cost, comparable actual range Power – single stream Tests ability to scale parallel execution plans Throughput – multiple streams Scale Factor 1 – Line item data is 1GB 875MB with DATE instead of DATETIME Only single column indexes allowed, Ad-hoc

SF 10, test studies Not valid for publication Auto-Statistics enabled, Excludes compile time Big Queries – Line Item Scan Super Scaling – Mission Impossible Small Queries & High Parallelism Other queries, negative scaling Did not apply T2301, or disallow page locks

Big Q: Plan Cost vs Actual Plan Cost reduction from DOP1 to 16/32 Q128% Q944% Q1870% Q2120% Plan Cost says scaling is poor except for Q18, memory affects Hash IO onset Plan Cost @ 10GB Actual Query time In seconds Plan Cost is poor indicator of true parallelism scaling Q18 & Q 21 > 3X Q1, Q9

Big Query: Speed Up and CPU Q13 has slightly better than perfect scaling? In general, excellent scaling to DOP 8-24, weak afterwards Holy Grail CPU time In seconds Speed up relative to DOP 1

Super Scaling Suppose at DOP 1, a query runs for 100 seconds, with one CPU fully pegged CPU time = 100 sec, elapse time = 100 sec What is best case for DOP 2? Assuming nearly zero Repartition Threads cost CPU time = 100 sec, elapsed time = 50? Super Scaling: CPU time decreases going from Non-Parallel to Parallel plan! No, I have not started drinking, yet

Super Scaling CPU-sec goes down from DOP 1 to 2 and higher (typically 8) CPU normalized to DOP 1 Speed up relative to DOP 1 3.5X speedup from DOP 1 to 2 (Normalized to DOP 1)

CPU and Query time in seconds CPU time Query time

Super Scaling Summary Most probable cause Bitmap Operator in Parallel Plan Bitmap Filters are great, Question for Microsoft: Can I use Bitmap Filters in OLTP systems with non-parallel plans?

Small Queries – Plan Cost vs Act Query 3 and 16 have lower plan cost than Q17, but not included Q4,6,17 great scaling to DOP 4, then weak Negative scaling also occurs Query time Plan Cost

Small Queries CPU & Speedup What did I get for all that extra CPU?, Interpretation: sharp jump in CPU means poor scaling, disproportionate means negative scaling Query 2 negative at DOP 2, Q4 is good, Q6 get speedup, but at CPU premium, Q17 and 20 negative after DOP 8 CPU time Speed up

High Parallelism – Small Queries Why? Almost No value TPC-H geometric mean scoring Small queries have as much impact as large Linear sum of weights large queries OLTP with 32, 64+ cores Parallelism good if super-scaling Default max degree of parallelism 0 Seriously bad news, especially for small Q Increase cost threshold for parallelism? Sometimes you do get lucky

Q that go Negative Query time “Speedup”

Other Queries – CPU & Speedup Q3 has problems beyond DOP 2 CPU time Speedup

Other - Query Time seconds Query time

Scaling Summary Some queries show excellent scaling Super-scaling, better than 2X Sharp CPU jump on last DOP doubling Need strategy to cap DOP To limit negative scaling Especially for some smaller queries? Other anomalies

Compression PAGE

Compression Overhead - Overall 40% overhead for compression at low DOP, 10% overhead at max DOP??? Query time compressed relative to uncompressed CPU time compressed relative to uncompressed

Query time compressed relative to uncompressed CPU time compressed relative to uncompressed

Compressed Table LINEITEM – real data may be more compressible Uncompressed: 8,749,760KB, Average Bytes per row: 149 Compressed: 4,819,592KB, Average Bytes per row: 82

Partitioning Orders and Line Item on Order Key

Partitioning Impact - Overall Query time partitioned relative to not partitioned CPU time partitioned relative to not partitioned

Query time partitioned relative to not partitioned CPU time partitioned relative to not partitioned

Plan for Partitioned Tables

Scaling DW Summary Massive IO bandwidth Parallel options for data load, updates etc Investigate Parallel Execution Plans Scaling from DOP 1, 2, 4, 8, 16, 32 etc Scaling with and w/o HT Strategy for limiting DOP with multiple users

Fixes from Microsoft Needed Contention issues in parallel execution Table scan, Nested Loops Better plan cost model for scaling Back-off on parallelism if gain is negligible Fix throughput degradation with multiple users running big DW queries Sybase and Oracle, Throughput is close to Power or better

Query Plans

Big Queries

Q1 Pricing Summary Report

Q1 Plan Non-Parallel Parallel Parallel plan 28% lower than scalar, IO is 70%, no parallel plan cost reduction

Q9 Product Type Profit Measure IO from 4 tables contribute 58% of plan cost, parallel plan is 39% lower Non-Parallel Parallel

Q9 Non-Parallel Plan Table/Index Scans comprise 64%, IO from 4 tables contribute 58% of plan cost Join sequence: Supplier, (Part, PartSupp), Line Item, Orders

Q9 Parallel Plan Non-Parallel: (Supplier), (Part, PartSupp), Line Item, Orders Parallel: Nation, Supplier, (Part, Line Item), Orders, PartSupp

Q9 Non-Parallel Plan details Table Scans comprise 64%, IO from 4 tables contribute 58% of plan cost

Q9 Parallel reg vs Partitioned

Q13 Why does Q13 have perfect scaling?

Q18 Large Volume Customer Non-Parallel Parallel

Q18 Graphical Plan Non-Parallel Plan: 66% of cost in Hash Match, reduced to 5% in Parallel Plan

Q18 Plan Details Non-Parallel Parallel Non-Parallel Plan Hash Match cost is 1245 IO, 494.6 CPU DOP 16/32: size is below IO threshold, CPU reduced by >10X

Q21 Suppliers Who Kept Orders Waiting Note 3 references to Line Item Non-Parallel Parallel

Q21 Non-Parallel Plan H1 H2 H3 H2 H3

Q21 Parallel

Q21 3 full Line Item clustered index scans Plan cost is approx 3X Q1, single “scan”

Super Scaling

Q7 Volume Shipping Non-Parallel Parallel

Q7 Non-Parallel Plan Join sequence: Nation, Customer, Orders, Line Item

Q7 Parallel Plan Join sequence: Nation, Customer, Orders, Line Item

Q8 National Market Share Non-Parallel Parallel

Q8 Non-Parallel Plan Join sequence: Part, Line Item, Orders, Customer

Q8 Parallel Plan Q8 Parallel Plan Join sequence: Part, Line Item, Orders, Customer

Q11 Important Stock Identification Non-Parallel Parallel

Q11 Join sequence: A) Nation, Supplier, PartSupp, B) Nation, Supplier, PartSupp

Small Queries

Query 2 Minimum Cost Supplier Wordy, but only touches the small tables, second lowest plan cost (Q15)

Q2 Clustered Index Scan on Part and PartSupp have highest cost (48%+42%)

Q2 PartSupp is now Index Scan + Key Lookup

Q6 Forecasting Revenue Change Note sure why this blows CPU Scalar values are pre-computed, pre-converted

Q20? This query may get a poor execution plan Date functions are usually written as because Line Item date columns are “date” type CAST helps DOP 1 plan, but get bad plan for parallel

Q20 alternate - parallel Statistics estimation error here Penalty for mistake applied here

Other Queries

Q12 Random IO? Will this generate random IO?

Query 12 Plans Non-Parallel Parallel

Queries that go Negative

Q17 Small Quantity Order Revenue

Q17 Table Spool is concern

Q17 the usual suspects

Speedup from DOP 1 query time CPU relative to DOP 1

TPC-H Studies Joe Chang

Similar presentations

Presentation on theme: "TPC-H Studies Joe Chang"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TPC-H Studies Joe Chang

Similar presentations

Presentation on theme: "TPC-H Studies Joe Chang"— Presentation transcript:

Similar presentations

About project

Feedback