Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Execution Plans Joe Chang

Similar presentations


Presentation on theme: "Parallel Execution Plans Joe Chang"— Presentation transcript:

1 Parallel Execution Plans Joe Chang jchang6@yahoo.com www.qdpma.com

2 About Joe Chang SQL Server Execution Plan Cost Model True cost structure by system architecture Decoding statblob (distribution statistics) SQL Clone – statistics-only database Tools ExecStats – cross-reference index use by SQL- execution plan Performance Monitoring, Profiler/Trace aggregation

3 So you bought a 64+ core box Learn all about Parallel Execution All guns (cores) blazing Negative scaling Super-scaling High degree of parallelism & small SQL Anomalies, execution plan changes etc Compression Partitioning Now No I have not been smoking pot Yes, this can happen, how will you know How much in CPU do I pay for this? Great management tool, what else?

4 Parallel Execution Plans This should be a separate slide deck

5

6 Execution Plan Quickie Cost is duration in seconds on some reference platform IO Cost for scan: 1 = 10,800KB/s, 810 implies 8,748,000KB IO in Nested Loops Join: 1 = 320/s, multiple of 0.003125 F4 Estimated Execution Plan I/O and CPU Cost components

7 Index + Key Lookup - Scan (926.67- 323655 * 0.0001581) / 0.003125 = 280160 (86.6%) Actual CPUTime (Data in memory) LU19191919 Scan87368727 1,093,729 pages/1350 = 810.17 (8,748MB) True cross-over approx 1,400,000 rows 1 row : page

8 Index + Key Lookup - Scan 8748000KB/8/1350 = 810 (817- 280326 * 0.0001581) / 0.003125 = 247259 (88%) Actual CPUTime LU 2138321 Scan18622658

9 Actual Execution Plan Note Actual Number of Rows, Rebinds, Rewinds Actual Estimated Actual Estimated

10 Row Count and Executions For Loop Join inner source and Key Lookup, Actual Num Rows = Num of Exec × Num of Rows Inner Source Outer

11

12 Parallel Plans

13 Parallelism Operations Distribute Streams Non-parallel source, parallel destination Repartition Streams Parallel source and destination Gather Streams Destination is non-parallel

14 Parallel Execution Plans Note: gold circle with double arrow, and parallelism operations

15 Parallel Scan (and Index Seek) DOP 1 DOP 2 DOP 4 DOP 8 IO Cost same CPU reduce by degree of parallelism, except no reduction for DOP 16 2X 4X 8X IO contributes most of cost!

16 Parallel Scan 2 DOP 16

17 Hash Match Aggregate CPU cost only reduces By 2X,

18 Parallel Scan IO Cost is the same CPU cost reduced in proportion to degree of parallelism, last 2X excluded? On a weak storage system, a single thread can saturate the IO channel, Additional threads will not increase IO (reduce IO duration). A very powerful storage system can provide IO proportional to the number of threads. It might be nice if this was optimizer option? The IO component can be a very large portion of the overall plan cost Not reducing IO cost in parallel plan may inhibit generating favorable plan, i.e., not sufficient to offset the contribution from the Parallelism operations. A parallel execution plan is more likely on larger systems (-P to fake it?)

19 Actual Execution Plan - Parallel

20 More Parallel Plan Details

21 Parallel Plan - Actual

22 Parallelism – Hash Joins

23 Hash Join Cost DOP 1 DOP 2 DOP 8 DOP 4 Search: Understanding Hash Joins For In-memory, Grace, Recursive

24 Hash Join Cost CPU Cost is linear with number of rows, outer and inner source See BOL on Hash Joins for In-Memory, Grace, Recursive IO Cost is zero for small intermediate data size, beyond set point proportional to server memory(?) IO is proportional to excess data (beyond in-memory limit) Parallel Plan: Memory allocation is per thread! Summary: Hash Join plan cost depends on memory if IO component is not zero, in which case is disproportionately lower with parallel plans. Does not reflect real cost?

25 Parallelism Repartition Streams DOP 2DOP 4 DOP 8

26 Bitmap BOL: Optimizing Data Warehouse Query Performance Through Bitmap Filtering A bitmap filter uses a compact representation of a set of values from a table in one part of the operator tree to filter rows from a second table in another part of the tree. Essentially, the filter performs a semi-join reduction; that is, only the rows in the second table that qualify for the join to the first table are processed. SQL Server uses the Bitmap operator to implement bitmap filtering in parallel query plans. Bitmap filtering speeds up query execution by eliminating rows with key values that cannot produce any join records before passing rows through another operator such as the Parallelism operator. A bitmap filter uses a compact representation of a set of values from a table in one part of the operator tree to filter rows from a second table in another part of the tree. By removing unnecessary rows early in the query, subsequent operators have fewer rows to work with, and the overall performance of the query improves. The optimizer determines when a bitmap is selective enough to be useful and in which operators to apply the filter. For more information, see Optimizing Data Warehouse Query Performance Through Bitmap Filtering.

27

28 What Should Scale? Trivially parallelizable: 1) Split large chunk of work among threads, 2) Each thread works independently, 3) Small amount of coordination to consolidate threads 2 2 3

29 More Difficult Parallelizable: 1) Split large chunk of work among threads, 2) Each thread works on first stage 3) Large coordination effort between threads 4) More work … Consolidate 2 2 3 3 4

30 Parallel Execution Plan Summary Queries with high IO cost may show little plan cost reduction on parallel execution Plans with high portion hash or sort cost show large parallel plan cost reduction Parallel plans may be inhibited by high row count in Parallelism Repartition Streams Watch out for (Parallel) Merge Joins!

31 Test Systems

32 2-way quad-core Xeon 5430 2.66GHz Windows Server 2008 R2, SQL 2008 R2 8-way dual-core Opteron 2.8GHz Windows Server 2008 SP1, SQL 2008 SP1 8-way quad-core Opteron 2.7GHz Barcelona Windows Server 2008 R2, SQL 2008 SP1 8-way systems were configured for AD- not good! Build 2789

33 Test Methodology Boot with all processors Run queries at MAXDOP 1, 2, 4, 8, etc Not the same as running on 1-way, 2-way, 4-way server Interpret results with caution

34 TPC-H

35 Continuing Development

36 Suppose I need to ALTER TABLE ADD new columns? Of course, then UPDATE to set default

37 Write Operations Insert, Update and Delete (IUD) component operations are not parallelizable. Select portion of query may be parallelized. Select parallelization may be inhibited if row count is high.

38 Mass Update Insert, Update and Delete (IUD) component operations are not parallelizable. Select portion of query may be parallelized. Select parallelization may be inhibited if row count is high.

39

40 Compressed Table LINEITEM – real data may be more compressible Uncompressed: 8,749,760KB, Average Bytes per row: 149 Compressed: 4,819,592KB, Average Bytes per row: 82

41


Download ppt "Parallel Execution Plans Joe Chang"

Similar presentations


Ads by Google