Parallel Execution Plans Joe Chang
Parallel Execution Plans Allows single query to use multiple processors Query should run faster but may consume more resources Example 1 thread: 10 sec run time, 10 CPU-sec 2 threads: 6 sec run time, 12 CPU-sec
Parallel Execution Configuration Cost Threshold For Parallelism Minimum query plan threshold for considering queries for parallel execution Default 5: Considering increasing to for new systems Max Degree of Parallelism Default 0: Can use all available processors SQL Server determines level based on available memory and recent CPU usage
Parallel Plan Operators The Distribute Streams operator consumes a single input stream of records and produces multiple output streams. The record contents and format are not changed. Each record from the input stream appears in one of the output streams. This operator automatically preserves the relative order of the input records in the output streams. Usually, hashing is used to decide to which output stream a particular input record belongs. The Repartition Streams operator consumes multiple streams and produces multiple streams of records. The record contents and format are not changed. Each record from an input stream is placed into one output stream. If this operator is order-preserving, then all input streams must be ordered and merged into several ordered output streams. The Gather Streams operator consumes several input streams and produces a single output stream of records by combining the input streams. The record contents and format are not changed. If this operator is order-preserving, then all input streams must be ordered.
Execution Plan Cost Formulas Table Scan or Index Scan I/O: per page CPU: per row Index Seek – Plan Formula I/O Cost = per additional page (≤1GB) = per additional page (>1GB) CPU Cost = per additional row Bookmark Lookup – May have changed ? I/O Cost = multiple of (≤1GB) = multiple of (>1GB) CPU Cost = per row Table Scan or Index Scan IUD I/O Cost ~ – (>100 rows) IUD CPU Cost = per row
Cost Interpretation Time in seconds? CPU time? sec -> 160/sec >1350/sec (8KB) ->169/sec(64K)-> 10.8MB/sec S2K BOL: Administering SQL Server, Managing Servers, Setting Configuration Options: cost threshold for parallelism Opt Query cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration. Too fast for 7200RPM disk random I/Os. About right for 1997 sequential disk transfer rate?
Test Table CREATE TABLE M3A_20 ( GroupID int NOT NULL, ID int NOT NULL, ID2 int NOT NULL, ID3 int NOT NULL, ID4 int NOT NULL, sID smallint NOT NULL, bID1 bigint NOT NULL, bID2 bigint NOT NULL, bID3 bigint NOT NULL, rMoney money NOT NULL, rDate datetime NOT NULL, rReal real NOT NULL, rDecimal decimal (9,4) NOT NULL, CONSTRAINT [PK_M3A_20] PRIMARY KEY CLUSTERED ( [GroupID], [ID] ) WITH FILLFACTOR = 100 ) GO
Data Population Script 1 SET NOCOUNT ON SELECT SELECT BEGIN BEGIN TRANSACTION = BEGIN INSERT M3A_20 (GroupID, ID, ID2, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal) VALUES ( *rand(), 10000*rand(), 10000*rand() ) IF > 0 BEGIN GOTO B END END COMMIT TRANSACTION CHECKPOINT PRINT CONVERT(varchar,GETDATE(),121) + ', row ' + END B: IF > 0 COMMIT TRANSACTION PRINT '01 Complete ' + CONVERT(varchar,GETDATE(),121) + ', row ' + + ', Trancount ' +
Data Population Script 1 Notes Double While Loop Each Insert/Update/Delete statement is an implicit transaction Gets separate transaction log entry Explicit transaction – generates a single transaction log write (max 64KB per IO) Single TRAN for entire loop requires excessively large log file Inserts are grouped into intermediate size batches
Data Population Scripts 2 int = 1 <= 3 BEGIN INSERT M3A_11 (GroupID,ID,ID2,ID3,ID4,sID,bID1,bID2,bID3,rMoney,rDate,rReal, rDecimal) SELECT TOP GroupID, ID, ID, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal FROM M3A_20 WHERE GroupID = 1 AND ID BETWEEN + 1 CHECKPOINT PRINT '11 Step ' + + ', ' + CONVERT(varchar,GETDATE(),121) END UPDATE STATISTICS M3A_01 (PK_M3A_01) WITH FULLSCAN CREATE STATISTICS ST_01 ON M3A_01 (ID) WITH FULLSCAN, NORECOMPUTE Primary table populated using single row inserts in a WHILE loop, Additional tables populated with INSERT / SELECT statement Single row inserts ~20-30K rows/sec INSERT / SELECT statement ~100K+ rows/sec
Index Seek Plans Many rows returned, Non-parallel plan Parallel Execution disabled Cost: 9.34 Cost: 9.82 Cost: 4.94 Parallel Plan
Index Seek Details Non-parallel plan Parallel plan
Index Seek – Non-parallel Cost assigned to SELECT Index Seek, 1M rows in 11,115 pages (81 bytes/row, 90% Fill) I/O cost is: CPU Cost is Cost & sub-tree Cost is correct, I/O & CPU is ½ of correct value
Index Seek – Parallel Plan No cost assigned to SELECT Index Seek I/O and CPU cost ½ of non-parallel plan
Index Seek with Aggregate 1234
Index Seek Aggregate Parallel Plan Details
Table Scan Cost: 9.01 Cost: 8.26
Table Scan Details Non-parallel plan Parallel plan I/O cost same CPU cost ½ of non parallel plan
Table Scan Details Non-parallel plan Parallel plan No cost on Select No cost I/O cost same CPU cost ½ of non parallel plan
Parallel Plan Cost Formulas Patterns CPU costs are ½ of non-parallel plan Index Seek I/O cost are also ½ Scan I/O cost is same as non-parallel plan Parallel plan costs are based on 2 processors Actual number of processors determined at runtime Overhead operations Distribute, Repartition & Gather Streams
Hash Join Cost: 6.50 Cost: ,000 rows 15 byte OS row size
Hash Join Details Non-parallel plan Parallel plan
Hash Join Details Non-parallel plan Parallel plan
Hash Join – Non-parallel plan
Hash Join – Parallel Plan
Hash Join with I/O Cost 900,000 rows MAXDOP 1 Cost 74.1 Cost 85.1
Hash Join – Join I/O Cost 730,000 rows 740,000 rows
Hash Join - Bitmap
Hash Join Cost Formula Index Seek – Plan Formula I/O Cost = per additional page (≤1GB) = per additional page (>1GB) CPU Cost = per additional row Hash Join CPU Cost = base (2-30 rows) (100 rows) per row (parallel) per row per 4 bytes in OS per additional row in IS I/O Cost = per row over 64MB (Row Size+8) per 4 byte over 15B
Parallel Cost Formula Base Cost Repartition Stream Cost per row = Base (15 Bytes) per additional 4 Bytes Gather Stream Cost per row = Base(15) per additional 4 Bytes Dispatch
Loop Join
Loop Join Details Non-parallel plan Outer Source Parallel plan Outer Source
Loop Join Details Inner Source cost identical for both non-parallel and parallel plans
Loop Join Details Non-parallel plan Parallel plan
Merge Join
Merge Join Details Non-parallel plan Parallel plan
Merge Join Details Non-parallel plan Parallel plan
Merge Join Details Non-parallel plan Parallel plan
Index Seek + Aggregate Test Opteron2.2GHz 1M Xeon 2.4GHz/512K
Index Seek + Aggregate Test, Itanium 2 Itanium 2 1.5GHz/6M
Index Seek + Aggregate Test, SUM(INT) Itanium 2 1.5GHz/6M
Index Seek + Aggregate Test, NULL Itanium 2 1.5GHz/6M
Loop Join, COUNT(*) Itanium 2 1.5GHz/6M
Hash Join, COUNT(*) Itanium 2 1.5GHz/6M
Merge Join, COUNT(*) Itanium 2 1.5GHz/6M
General Recommendations Useful in DW, ETL, and maintenance activities Use judgment on transactions processing Is throughput more important Or faster expensive queries Increase Cost Threshold from 5 to Limit MAXDOP to 4 Verify or limit parallelism on Xeon systems with Hyper-Threading enabled
Additional Information SQL Server Quantitative Performance Analysis Server System Architecture Processor Performance Direct Connect Gigabit Networking Parallel Execution Plans Large Data Operations Transferring Statistics SQL Server Backup Performance with Imceda LiteSpeed