Large Data Operations Joe Chang

Large Data Operations Joe Chang jchang6@yahoo.com www.sql-server-performance.com/joe_chang.asp

Large Data Operations Overview Updates & Deletes Modifying large row counts can be very slow? Dropping indexes improves performance? Inserts – See SQLDev.Net Covered in various presentations by Gert Drapers

Execution Plan with Indexes 1. Insert multiple rows into table with clustered index 2. Rows are spooled 3. Nonclustered indexes are modified from the spooled data 123 Operations with indexes in place should be faster Exception - large inserts where bulk log requirements are met

Execution Plan Cost Formula Review Table Scan or Index Scan I/O: 0.0375785 + 0.000740741 per page CPU: 0.0000785 + 0.0000011 per row Index Seek – Plan Formula I/O Cost = 0.006328500 + 0.000740741 per additional page(≤1GB) = 0.003203425 + 0.000740741 per additional page(>1GB) CPU Cost = 0.000079600 + 0.000001100 per additional row Bookmark Lookup I/O Cost = multiple of 0.006250000 (≤1GB) = multiple of 0.003124925 (>1GB) CPU Cost = 0.0000011 per row Insert, Update & Delete IUD I/O Cost ~ 0.01002 – 0.01010 (>100 rows) IUD CPU Cost = 0.000001 per row

Plan Cost – Unit of Measure Time in seconds? CPU time? 0.0062500sec -> 160/sec 0.000740741 ->1350/sec (8KB) ->169/sec(64K)-> 10.8MB/sec S2K BOL: Administering SQL Server, Managing Servers, Setting Configuration Options: cost threshold for parallelism Opt Query cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration. Too fast for 7200RPM disk random I/Os. About right for 1997 sequential disk transfer rate?

Test Table CREATE TABLE M3C_00 ( ID int NOT NULL, ID2 int NOT NULL, ID3 int NOT NULL, ID4 int NOT NULL, ID5 int NOT NULL, ID6 int NOT NULL, SeqID int NOT NULL, DistID int NOT NULL, Value char(10) NOT NULL, rDecimal decimal (9,4) NOT NULL, rMoney money NOT NULL, rDate datetime NOT NULL, sDate datetime NOT NULL ) CREATE CLUSTERED INDEX IX_M3C_00 ON M3C_00 (ID) WITH SORT_IN_TEMPDB 10M rows in table, 99 rows per page, 101,012 pages, 808MB 100K rows for each distinct value of SeqID and DistID Common SeqID values are in adjacent rows Common DistID values are in separate 8KB pages (100 rows apart)

Data Population Script DECLARE @BatchStart int, @BatchEnd int, @BatchTotal int, @BatchSize int, @BatchRow int, @RowTotal int, @I int, @p int, @sc1 int, @dv1 int SELECT @BatchStart = 1, @BatchEnd = 1000, @BatchTotal = 1000, @BatchSize = 10000 SELECT @RowTotal = @BatchTotal*@BatchSize, @p = 100, @sc1 = 100000 SELECT @I = (@BatchStart-1)*@BatchSize+1, @dv1 = @RowTotal/@sc1 WHILE @BatchStart <= @BatchEnd BEGIN BEGIN TRANSACTION SELECT @BatchRow = @BatchStart*@BatchSize WHILE @I <= @BatchRow BEGIN INSERT M3C_00 (ID,ID2,ID3,ID4,ID5,ID6,SeqID,DistID,Value,rDecimal,rMoney,rDate,sDate) VALUES ( @I, @I, 1+(@I-1)*@p/@RowTotal+((@I-1)*@p)%@RowTotal, (@I-1)%(@sc1)+1, (@I-1)/2+1, (@I-1)%320+1, (@I-1)/@sc1+1, (@I-1)%(@dv1)+1, CHAR(65+26*rand())+CHAR(65+26*rand())+CHAR(65+26*rand()) +CONVERT(char(6),CONVERT(int,100000*(9.0*rand()+1.0)))+CHAR(65+26*rand()), 10000*rand(), 10000*rand(), DATEADD(hour,100000*rand(),'1990-01-01'), DATEADD(hour,@I/5,'1990-01-01') ) SET @I = @I+1 END COMMIT TRANSACTION CHECKPOINT PRINT CONVERT(char,GETDATE(),121)+‘ row ' + CONVERT(char,@BatchRow)+' Complete' SET @BatchStart = @BatchStart+1 END

Data Population Script Notes Double While Loop Each Insert/Update/Delete statement is an implicit transaction Gets separate transaction log entry Explicit transaction – generates a single transaction log write (max 64KB per IO) Single TRAN for entire loop requires excessively large log file Inserts are grouped into intermediate size batches

Indexes CREATE INDEX IX_M3C_01_Seq ON M3C_01 (SeqID) WITH SORT_IN_TEMPDB CHECKPOINT CREATE INDEX IX_M3C_01_Dist ON M3C_01 (DistID) WITH SORT_IN_TEMPDB CHECKPOINT UPDATE STATISTICS M3C_01 (IX_M3C_01_Seq) WITH FULLSCAN UPDATE STATISTICS M3C_01 (IX_M3C_01_Dist) WITH FULLSCAN Common SeqID values are in adjacent rows Common DistID values are in separate 8KB pages (100 rows apart)

Test Queries -- Sequential rows, table scan SELECT AVG(rMoney) FROM M3C_01 WHERE SeqID = 91 -- Sequential rows, index seek and bookmark lookup SELECT AVG(rMoney) FROM M3C_01 WITH(INDEX(IX_M3C_01_Seq)) WHERE SeqID = 91 -- Distributed rows, table scan SELECT AVG(rMoney) FROM M3C_01 WHERE DistID = 91 -- Distributed rows, index seek and bookmark lookup SELECT AVG(rMoney) FROM M3C_01 WITH(INDEX(IX_M3C_01_Dist)) WHERE DistID = 91

Execution Plans - Select Table scan involves 101,012 pages Bookmark Lookup involves 100,000 rows 1 BL ~3.6X more expensive than 1 page in Table Scan

Table Scan Cost Detail Table Scan Formula I/O: 0.0375785 + 0.000740741 x 101,012 = 74.8 CPU: 0.0000785 + 0.0000011 x 10M = 11.0 I/O and CPU cost occasionally show ½ the expected value, but combined cost shows the expected value

Index and Bookmark Details Bookmark Lookup I/O: 0.003124925x100Kx0.998 = 311.87 CPU: 0.0000011x100K = 0.11

Measured Query Times SELECT query 100K rows Sequential rows Distributed rows 256M Server memIndex + BLTable ScanIndex+BLTable Scan Query time (sec)0.310.516710.5 Rows or Pages/sec333,333(R)9,620(P)599(R)9,620(P) Disk IO/secLow~1,200~600~1,200 Avg. Byte/ReadN/A64K8K64K 1154MB Server mem Query time0.2661.0760.3731.090 Rows or Pages/sec376,00093,877268,00092,672 Test System: 2x2.4GHz Xeon, data on 2 15K disk drives

Disk Bound Select Query Cost Performance limited by disk capability Random 300/disk (small portion of 18GB drive & high queue depth) Sequential 38MB/sec (Seagate ST318451, first generation 15K drive) Disk drive random I/O ~2X gain since mid-1990’s Sequential I/O ~ 5X Cost formulas underestimate current generation disk drive sequential performance relative to random However, SQL Server cost formulas do not reflect in-memory costs

Update Operation

Update Details

Actual Cost - Update UPDATE query - 100K rows Sequential rows Distribute d rows 256M server memIndexTable Scan IndexTable Scan Query time (sec)1.312.6476.628 Checkpoint time (sec) 0.40.614.58 Rows /sec57,4717,5762032,778 1154MB server mem Query time (sec)0.81.30.91.5 Checkpoint time (sec) 0.20.123 Rows /sec100,00071,4294,1844,082

Update Variation Default plan is now a table scan Column value is not in the index, so a bookmark lookup is required However – data page must be loaded into buffer cache before it can be modified regardless!!

Delete Operation

Delete Details

Delete Details (2)

Delete - Actual Costs Delete query - 100K rows Sequential rows Distributed rows 256M Server memIndex Table Scan Index Table Scan Query time (sec)4.888.5228241 Checkpoint time (sec) 8.44.528.414 Rows / sec7,5761,0753401,800 1154MB Server mem Query time (sec)4.16.44.25.3 Checkpoint time (sec) 3.73.928.6 Rows /sec12,8219,7083,0482,949

Delete–no indexes Delete query, no index 100K rowsSequential rowsDistributed rows 256M server memTable Scan Query time (sec)11.526 Checkpoint time (sec)0.14 Rows / sec8,6213,300 1154MB server mem Query time (sec)1.91.5 Checkpoint time (sec)0.222 Rows /sec47,6194,255

Delete with Foreign Keys

Summary When large updates and deletes are slow Examine the execute plan Look for nonclustered index seeks on modified tables with high row count Use index hint to force table scan

Additional Information www.sql-server-performance.com/joe_chang.asp SQL Server Quantitative Performance Analysis Server System Architecture Processor Performance Direct Connect Gigabit Networking Parallel Execution Plans Large Data Operations Transferring Statistics SQL Server Backup Performance with Imceda LiteSpeed jchang6@yahoo.com

Large Data Operations Joe Chang

Similar presentations

Presentation on theme: "Large Data Operations Joe Chang"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Large Data Operations Joe Chang

Similar presentations

Presentation on theme: "Large Data Operations Joe Chang"— Presentation transcript:

Similar presentations

About project

Feedback