Parallel Execution Plans Joe Chang

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
SQL Performance 2011/12 Joe Chang, SolidQ
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Automating Performance … Joe Chang SolidQ
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Midterm Review Spring Overview Sorting Hashing Selections Joins.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
SQL Server Query Optimizer Cost Formulas Joe Chang
SQL Server 2005 Performance Enhancements for Large Queries Joe Chang
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Denny Cherry Manager of Information Systems MVP, MCSA, MCDBA, MCTS, MCITP.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Primary Key, Cluster Key & Identity Loop, Hash & Merge Joins Joe Chang
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
RELATIONAL JOIN Advanced Data Structures. Equality Joins With One Join Column External Sorting 2 SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid.
Parallel Execution Plans Joe Chang
Large Data Operations Joe Chang
Parallel Execution Plans Joe Chang
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
TPC-H Studies Joe Chang
SQL Server Scaling on Big Iron (NUMA) Systems Joe Chang TPC-H.
Query Optimizer Execution Plan Cost Model Joe Chang
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Processing CS 405G Introduction to Database Systems.
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
How to kill SQL Server Performance Håkan Winther.
Execution Plans Detail From Zero to Hero İsmail Adar.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
SAP Tuning 실무 SK㈜ ERP TFT.
DESIGNING HIGH PERFORMANCE ETL FOR DATA WAREHOUSE. Best Practices and approaches. Alexei Khalyako (SQLCAT) & Marcel Franke (pmOne)
System Architecture: Big Iron (NUMA)
Four Rules For Columnstore Query Performance
Evaluation of Relational Operations
Blazing-Fast Performance:
Introduction to Execution Plans
Evaluation of Relational Operations: Other Operations
Relational Operations
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
SQL Server Query Optimizer Cost Formulas
Four Rules For Columnstore Query Performance
Introduction to Execution Plans
Implementation of Relational Operations
EXECUTION PLANS Quick Dive.
Evaluation of Relational Operations: Other Techniques
Diving into Query Execution Plans
Introduction to Execution Plans
Introduction to Execution Plans
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Parallel Execution Plans Joe Chang

About Joe Chang SQL Server Execution Plan Cost Model True cost structure by system architecture Decoding statblob (distribution statistics) SQL Clone – statistics-only database Tools ExecStats – cross-reference index use by SQL- execution plan Performance Monitoring, Profiler/Trace aggregation

So you bought a 64+ core box Learn all about Parallel Execution All guns (cores) blazing Negative scaling Super-scaling High degree of parallelism & small SQL Anomalies, execution plan changes etc Compression Partitioning Now No I have not been smoking pot Yes, this can happen, how will you know How much in CPU do I pay for this? Great management tool, what else?

Parallel Execution Plans Reference: Adam Machanic PASS

Execution Plan Quickie Cost is duration in seconds on some reference platform IO Cost for scan: 1 = 10,800KB/s, 810 implies 8,748,000KB IO in Nested Loops Join: 1 = 320/s, multiple of F4 Estimated Execution Plan I/O and CPU Cost components

Index + Key Lookup - Scan ( * ) / = (86.6%) Actual CPUTime (Data in memory) LU Scan ,093,729 pages/1350 = (8,748MB) True cross-over approx 1,400,000 rows 1 row : page

Index + Key Lookup - Scan KB/8/1350 = 810 ( * ) / = (88%) Actual CPUTime LU Scan

Actual Execution Plan Note Actual Number of Rows, Rebinds, Rewinds Actual Estimated Actual Estimated

Row Count and Executions For Loop Join inner source and Key Lookup, Actual Num Rows = Num of Exec × Num of Rows Inner Source Outer

Parallel Plans

Parallelism Operations Distribute Streams Non-parallel source, parallel destination Repartition Streams Parallel source and destination Gather Streams Destination is non-parallel

Parallel Execution Plans Note: gold circle with double arrow, and parallelism operations

Parallel Scan (and Index Seek) DOP 1 DOP 2 DOP 4 DOP 8 IO Cost same CPU reduce by degree of parallelism, except no reduction for DOP 16 2X 4X 8X IO contributes most of cost!

Parallel Scan 2 DOP 16

Hash Match Aggregate CPU cost only reduces By 2X,

Parallel Scan IO Cost is the same CPU cost reduced in proportion to degree of parallelism, last 2X excluded? On a weak storage system, a single thread can saturate the IO channel, Additional threads will not increase IO (reduce IO duration). A very powerful storage system can provide IO proportional to the number of threads. It might be nice if this was optimizer option? The IO component can be a very large portion of the overall plan cost Not reducing IO cost in parallel plan may inhibit generating favorable plan, i.e., not sufficient to offset the contribution from the Parallelism operations. A parallel execution plan is more likely on larger systems (-P to fake it?)

Actual Execution Plan - Parallel

More Parallel Plan Details

Parallel Plan - Actual

Parallelism – Hash Joins

Hash Join Cost DOP 1 DOP 2 DOP 8 DOP 4 Search: Understanding Hash Joins For In-memory, Grace, Recursive

Hash Join Cost CPU Cost is linear with number of rows, outer and inner source See BOL on Hash Joins for In-Memory, Grace, Recursive IO Cost is zero for small intermediate data size, beyond set point proportional to server memory(?) IO is proportional to excess data (beyond in-memory limit) Parallel Plan: Memory allocation is per thread! Summary: Hash Join plan cost depends on memory if IO component is not zero, in which case is disproportionately lower with parallel plans. Does not reflect real cost?

Parallelism Repartition Streams DOP 2DOP 4 DOP 8

Bitmap BOL: Optimizing Data Warehouse Query Performance Through Bitmap Filtering A bitmap filter uses a compact representation of a set of values from a table in one part of the operator tree to filter rows from a second table in another part of the tree. Essentially, the filter performs a semi-join reduction; that is, only the rows in the second table that qualify for the join to the first table are processed. SQL Server uses the Bitmap operator to implement bitmap filtering in parallel query plans. Bitmap filtering speeds up query execution by eliminating rows with key values that cannot produce any join records before passing rows through another operator such as the Parallelism operator. A bitmap filter uses a compact representation of a set of values from a table in one part of the operator tree to filter rows from a second table in another part of the tree. By removing unnecessary rows early in the query, subsequent operators have fewer rows to work with, and the overall performance of the query improves. The optimizer determines when a bitmap is selective enough to be useful and in which operators to apply the filter. For more information, see Optimizing Data Warehouse Query Performance Through Bitmap Filtering.

Parallel Execution Plan Summary Queries with high IO cost may show little plan cost reduction on parallel execution Plans with high portion hash or sort cost show large parallel plan cost reduction Parallel plans may be inhibited by high row count in Parallelism Repartition Streams Watch out for (Parallel) Merge Joins!

Scaling Theory

Parallel Execution Strategy Partition work into little pieces Ensures each thread has same amount High overhead to coordinate Partition into big pieces May have uneven distribution between threads Small table join to big table Thread for each row from small table Partitioned table options

What Should Scale? Trivially parallelizable: 1) Split large chunk of work among threads, 2) Each thread works independently, 3) Small amount of coordination to consolidate threads 2 2 3

More Difficult? Parallelizable: 1) Split large chunk of work among threads, 2) Each thread works on first stage 3) Large coordination effort between threads 4) More work … Consolidate

Partitioned Tables No Repartition Streams Regular Table Partitioned Tables No Repartition Streams operations!

Scaling Reality 8-way Quad-Core Opteron Windows Server 2008 R2 SQL Server 2008 SP1 + HF 27

Test Queries TPC-H SF 10 database Standard, Compressed, Partitioned (30) Line Item Table SUM, 59M rows, 8.75GB Orders Table 15M rows

CPU-sec Standard CPU-sec to SUM 1 or 2 columns in Line Item Compressed

Speed Up Compressed Standard

Line Item sum 1 column Speed up relative to DOP 1 CPU-sec

Line Item Sum w/Group By Speedup CPU-sec

Hash Join Speedup CPU-sec

Key Lookup and Table Scan Speedup CPU-sec 1.4M rows

Parallel Execution Summary Contention in queries w/low cost per page Simple scan, High Cost per Page – improves scaling! Multiple Aggregates, Hash Join, Compression Table Partitioning – alternative query plans Loop Joins – broken at high DOP Merge Join – seriously broken (parallel)

Scaling DW Summary Massive IO bandwidth Parallel options for data load, updates etc Investigate Parallel Execution Plans Scaling from DOP 1, 2, 4, 8, 16, 32 etc Scaling with and w/o HT Strategy for limiting DOP with multiple users

Fixes from Microsoft Needed Contention issues in parallel execution Table scan, Nested Loops Better plan cost model for scaling Back-off on parallelism if gain is negligible Fix throughput degradation with multiple users running big DW queries Sybase and Oracle, Throughput is close to Power or better

Test Systems

2-way quad-core Xeon GHz Windows Server 2008 R2, SQL 2008 R2 8-way dual-core Opteron 2.8GHz Windows Server 2008 SP1, SQL 2008 SP1 8-way quad-core Opteron 2.7GHz Barcelona Windows Server 2008 R2, SQL 2008 SP1 8-way systems were configured for AD- not good! Build 2789

Test Methodology Boot with all processors Run queries at MAXDOP 1, 2, 4, 8, etc Not the same as running on 1-way, 2-way, 4-way server Interpret results with caution

References Search Adam Machanic PASS