Presentation is loading. Please wait.

Presentation is loading. Please wait.

TPC-H Studies Joe Chang

Similar presentations


Presentation on theme: "TPC-H Studies Joe Chang"— Presentation transcript:

1 TPC-H Studies Joe Chang jchang6@yahoo.com www.qdpma.com

2 About Joe Chang SQL Server Execution Plan Cost Model True cost structure by system architecture Decoding statblob (distribution statistics) SQL Clone – statistics-only database Tools ExecStats – cross-reference index use by SQL- execution plan Performance Monitoring, Profiler/Trace aggregation

3 TPC-H

4 TPC-H DSS – 22 queries, geometric mean 60X range plan cost, comparable actual range Power – single stream Tests ability to scale parallel execution plans Throughput – multiple streams Scale Factor 1 – Line item data is 1GB 875MB with DATE instead of DATETIME Only single column indexes allowed, Ad-hoc

5 SF 10, test studies Not valid for publication Auto-Statistics enabled, Excludes compile time Big Queries – Line Item Scan Super Scaling – Mission Impossible Small Queries & High Parallelism Other queries, negative scaling Did not apply T2301, or disallow page locks

6

7 Big Q: Plan Cost vs Actual Plan Cost reduction from DOP1 to 16/32 Q128% Q944% Q1870% Q2120% Plan Cost says scaling is poor except for Q18, memory affects Hash IO onset Plan Cost @ 10GB Actual Query time In seconds Plan Cost is poor indicator of true parallelism scaling Q18 & Q 21 > 3X Q1, Q9

8 Big Query: Speed Up and CPU Q13 has slightly better than perfect scaling? In general, excellent scaling to DOP 8-24, weak afterwards Holy Grail CPU time In seconds Speed up relative to DOP 1

9 Super Scaling Suppose at DOP 1, a query runs for 100 seconds, with one CPU fully pegged CPU time = 100 sec, elapse time = 100 sec What is best case for DOP 2? Assuming nearly zero Repartition Threads cost CPU time = 100 sec, elapsed time = 50? Super Scaling: CPU time decreases going from Non-Parallel to Parallel plan! No, I have not started drinking, yet

10 Super Scaling CPU-sec goes down from DOP 1 to 2 and higher (typically 8) CPU normalized to DOP 1 Speed up relative to DOP 1 3.5X speedup from DOP 1 to 2 (Normalized to DOP 1)

11 CPU and Query time in seconds CPU time Query time

12 Super Scaling Summary Most probable cause Bitmap Operator in Parallel Plan Bitmap Filters are great, Question for Microsoft: Can I use Bitmap Filters in OLTP systems with non-parallel plans?

13 Small Queries – Plan Cost vs Act Query 3 and 16 have lower plan cost than Q17, but not included Q4,6,17 great scaling to DOP 4, then weak Negative scaling also occurs Query time Plan Cost

14 Small Queries CPU & Speedup What did I get for all that extra CPU?, Interpretation: sharp jump in CPU means poor scaling, disproportionate means negative scaling Query 2 negative at DOP 2, Q4 is good, Q6 get speedup, but at CPU premium, Q17 and 20 negative after DOP 8 CPU time Speed up

15 High Parallelism – Small Queries Why? Almost No value TPC-H geometric mean scoring Small queries have as much impact as large Linear sum of weights large queries OLTP with 32, 64+ cores Parallelism good if super-scaling Default max degree of parallelism 0 Seriously bad news, especially for small Q Increase cost threshold for parallelism? Sometimes you do get lucky

16 Q that go Negative Query time “Speedup”

17 CPU

18 Other Queries – CPU & Speedup Q3 has problems beyond DOP 2 CPU time Speedup

19 Other - Query Time seconds Query time

20 Scaling Summary Some queries show excellent scaling Super-scaling, better than 2X Sharp CPU jump on last DOP doubling Need strategy to cap DOP To limit negative scaling Especially for some smaller queries? Other anomalies

21

22 Compression PAGE

23 Compression Overhead - Overall 40% overhead for compression at low DOP, 10% overhead at max DOP??? Query time compressed relative to uncompressed CPU time compressed relative to uncompressed

24 Query time compressed relative to uncompressed CPU time compressed relative to uncompressed

25 Compressed Table LINEITEM – real data may be more compressible Uncompressed: 8,749,760KB, Average Bytes per row: 149 Compressed: 4,819,592KB, Average Bytes per row: 82

26 Partitioning Orders and Line Item on Order Key

27 Partitioning Impact - Overall Query time partitioned relative to not partitioned CPU time partitioned relative to not partitioned

28 Query time partitioned relative to not partitioned CPU time partitioned relative to not partitioned

29 Plan for Partitioned Tables

30

31 Scaling DW Summary Massive IO bandwidth Parallel options for data load, updates etc Investigate Parallel Execution Plans Scaling from DOP 1, 2, 4, 8, 16, 32 etc Scaling with and w/o HT Strategy for limiting DOP with multiple users

32 Fixes from Microsoft Needed Contention issues in parallel execution Table scan, Nested Loops Better plan cost model for scaling Back-off on parallelism if gain is negligible Fix throughput degradation with multiple users running big DW queries Sybase and Oracle, Throughput is close to Power or better

33 Query Plans

34 Big Queries

35 Q1 Pricing Summary Report

36 Q1 Plan Non-Parallel Parallel Parallel plan 28% lower than scalar, IO is 70%, no parallel plan cost reduction

37

38 Q9 Product Type Profit Measure IO from 4 tables contribute 58% of plan cost, parallel plan is 39% lower Non-Parallel Parallel

39 Q9 Non-Parallel Plan Table/Index Scans comprise 64%, IO from 4 tables contribute 58% of plan cost Join sequence: Supplier, (Part, PartSupp), Line Item, Orders

40 Q9 Parallel Plan Non-Parallel: (Supplier), (Part, PartSupp), Line Item, Orders Parallel: Nation, Supplier, (Part, Line Item), Orders, PartSupp

41 Q9 Non-Parallel Plan details Table Scans comprise 64%, IO from 4 tables contribute 58% of plan cost

42 Q9 Parallel reg vs Partitioned

43

44 Q13 Why does Q13 have perfect scaling?

45

46 Q18 Large Volume Customer Non-Parallel Parallel

47 Q18 Graphical Plan Non-Parallel Plan: 66% of cost in Hash Match, reduced to 5% in Parallel Plan

48 Q18 Plan Details Non-Parallel Parallel Non-Parallel Plan Hash Match cost is 1245 IO, 494.6 CPU DOP 16/32: size is below IO threshold, CPU reduced by >10X

49

50 Q21 Suppliers Who Kept Orders Waiting Note 3 references to Line Item Non-Parallel Parallel

51 Q21 Non-Parallel Plan H1 H2 H3 H2 H3

52 Q21 Parallel

53 Q21 3 full Line Item clustered index scans Plan cost is approx 3X Q1, single “scan”

54 Super Scaling

55 Q7 Volume Shipping Non-Parallel Parallel

56 Q7 Non-Parallel Plan Join sequence: Nation, Customer, Orders, Line Item

57 Q7 Parallel Plan Join sequence: Nation, Customer, Orders, Line Item

58

59 Q8 National Market Share Non-Parallel Parallel

60 Q8 Non-Parallel Plan Join sequence: Part, Line Item, Orders, Customer

61 Q8 Parallel Plan Q8 Parallel Plan Join sequence: Part, Line Item, Orders, Customer

62

63 Q11 Important Stock Identification Non-Parallel Parallel

64 Q11 Join sequence: A) Nation, Supplier, PartSupp, B) Nation, Supplier, PartSupp

65 Q11

66 Small Queries

67 Query 2 Minimum Cost Supplier Wordy, but only touches the small tables, second lowest plan cost (Q15)

68 Q2 Clustered Index Scan on Part and PartSupp have highest cost (48%+42%)

69 Q2 PartSupp is now Index Scan + Key Lookup

70

71 Q6 Forecasting Revenue Change Note sure why this blows CPU Scalar values are pre-computed, pre-converted

72

73 Q20? This query may get a poor execution plan Date functions are usually written as because Line Item date columns are “date” type CAST helps DOP 1 plan, but get bad plan for parallel

74 Q20

75 Q20

76 Q20 alternate - parallel Statistics estimation error here Penalty for mistake applied here

77 Other Queries

78 Q3

79 Q3

80

81 Q12 Random IO? Will this generate random IO?

82 Query 12 Plans Non-Parallel Parallel

83 Queries that go Negative

84 Q17 Small Quantity Order Revenue

85 Q17 Table Spool is concern

86 Q17 the usual suspects

87

88 Q19

89 Q19

90 Q22

91 Q22

92 Speedup from DOP 1 query time CPU relative to DOP 1

93


Download ppt "TPC-H Studies Joe Chang"

Similar presentations


Ads by Google