Large Data Operations Joe Chang
Large Data Operations Overview Updates & Deletes Modifying large row counts can be very slow? Dropping indexes improves performance? Inserts – See SQLDev.Net Covered in various presentations by Gert Drapers
Execution Plan with Indexes 1. Insert multiple rows into table with clustered index 2. Rows are spooled 3. Nonclustered indexes are modified from the spooled data 123 Operations with indexes in place should be faster Exception - large inserts where bulk log requirements are met
Execution Plan Cost Formula Review Table Scan or Index Scan I/O: per page CPU: per row Index Seek – Plan Formula I/O Cost = per additional page(≤1GB) = per additional page(>1GB) CPU Cost = per additional row Bookmark Lookup I/O Cost = multiple of (≤1GB) = multiple of (>1GB) CPU Cost = per row Insert, Update & Delete IUD I/O Cost ~ – (>100 rows) IUD CPU Cost = per row
Plan Cost – Unit of Measure Time in seconds? CPU time? sec -> 160/sec >1350/sec (8KB) ->169/sec(64K)-> 10.8MB/sec S2K BOL: Administering SQL Server, Managing Servers, Setting Configuration Options: cost threshold for parallelism Opt Query cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration. Too fast for 7200RPM disk random I/Os. About right for 1997 sequential disk transfer rate?
Test Table CREATE TABLE M3C_00 ( ID int NOT NULL, ID2 int NOT NULL, ID3 int NOT NULL, ID4 int NOT NULL, ID5 int NOT NULL, ID6 int NOT NULL, SeqID int NOT NULL, DistID int NOT NULL, Value char(10) NOT NULL, rDecimal decimal (9,4) NOT NULL, rMoney money NOT NULL, rDate datetime NOT NULL, sDate datetime NOT NULL ) CREATE CLUSTERED INDEX IX_M3C_00 ON M3C_00 (ID) WITH SORT_IN_TEMPDB 10M rows in table, 99 rows per page, 101,012 pages, 808MB 100K rows for each distinct value of SeqID and DistID Common SeqID values are in adjacent rows Common DistID values are in separate 8KB pages (100 rows apart)
Data Population Script int = = = = = = = BEGIN BEGIN TRANSACTION = BEGIN INSERT M3C_00 (ID,ID2,ID3,ID4,ID5,ID6,SeqID,DistID,Value,rDecimal,rMoney,rDate,sDate) CHAR(65+26*rand())+CHAR(65+26*rand())+CHAR(65+26*rand()) +CONVERT(char(6),CONVERT(int,100000*(9.0*rand()+1.0)))+CHAR(65+26*rand()), 10000*rand(), 10000*rand(), DATEADD(hour,100000*rand(),' '), ) END COMMIT TRANSACTION CHECKPOINT PRINT CONVERT(char,GETDATE(),121)+‘ row ' + Complete' END
Data Population Script Notes Double While Loop Each Insert/Update/Delete statement is an implicit transaction Gets separate transaction log entry Explicit transaction – generates a single transaction log write (max 64KB per IO) Single TRAN for entire loop requires excessively large log file Inserts are grouped into intermediate size batches
Indexes CREATE INDEX IX_M3C_01_Seq ON M3C_01 (SeqID) WITH SORT_IN_TEMPDB CHECKPOINT CREATE INDEX IX_M3C_01_Dist ON M3C_01 (DistID) WITH SORT_IN_TEMPDB CHECKPOINT UPDATE STATISTICS M3C_01 (IX_M3C_01_Seq) WITH FULLSCAN UPDATE STATISTICS M3C_01 (IX_M3C_01_Dist) WITH FULLSCAN Common SeqID values are in adjacent rows Common DistID values are in separate 8KB pages (100 rows apart)
Test Queries -- Sequential rows, table scan SELECT AVG(rMoney) FROM M3C_01 WHERE SeqID = Sequential rows, index seek and bookmark lookup SELECT AVG(rMoney) FROM M3C_01 WITH(INDEX(IX_M3C_01_Seq)) WHERE SeqID = Distributed rows, table scan SELECT AVG(rMoney) FROM M3C_01 WHERE DistID = Distributed rows, index seek and bookmark lookup SELECT AVG(rMoney) FROM M3C_01 WITH(INDEX(IX_M3C_01_Dist)) WHERE DistID = 91
Execution Plans - Select Table scan involves 101,012 pages Bookmark Lookup involves 100,000 rows 1 BL ~3.6X more expensive than 1 page in Table Scan
Table Scan Cost Detail Table Scan Formula I/O: x 101,012 = 74.8 CPU: x 10M = 11.0 I/O and CPU cost occasionally show ½ the expected value, but combined cost shows the expected value
Index and Bookmark Details Bookmark Lookup I/O: x100Kx0.998 = CPU: x100K = 0.11
Measured Query Times SELECT query 100K rows Sequential rows Distributed rows 256M Server memIndex + BLTable ScanIndex+BLTable Scan Query time (sec) Rows or Pages/sec333,333(R)9,620(P)599(R)9,620(P) Disk IO/secLow~1,200~600~1,200 Avg. Byte/ReadN/A64K8K64K 1154MB Server mem Query time Rows or Pages/sec376,00093,877268,00092,672 Test System: 2x2.4GHz Xeon, data on 2 15K disk drives
Disk Bound Select Query Cost Performance limited by disk capability Random 300/disk (small portion of 18GB drive & high queue depth) Sequential 38MB/sec (Seagate ST318451, first generation 15K drive) Disk drive random I/O ~2X gain since mid-1990’s Sequential I/O ~ 5X Cost formulas underestimate current generation disk drive sequential performance relative to random However, SQL Server cost formulas do not reflect in-memory costs
Update Operation
Update Details
Actual Cost - Update UPDATE query - 100K rows Sequential rows Distribute d rows 256M server memIndexTable Scan IndexTable Scan Query time (sec) Checkpoint time (sec) Rows /sec57,4717, , MB server mem Query time (sec) Checkpoint time (sec) Rows /sec100,00071,4294,1844,082
Update Variation Default plan is now a table scan Column value is not in the index, so a bookmark lookup is required However – data page must be loaded into buffer cache before it can be modified regardless!!
Delete Operation
Delete Details
Delete Details (2)
Delete - Actual Costs Delete query - 100K rows Sequential rows Distributed rows 256M Server memIndex Table Scan Index Table Scan Query time (sec) Checkpoint time (sec) Rows / sec7,5761, , MB Server mem Query time (sec) Checkpoint time (sec) Rows /sec12,8219,7083,0482,949
Delete–no indexes Delete query, no index 100K rowsSequential rowsDistributed rows 256M server memTable Scan Query time (sec) Checkpoint time (sec)0.14 Rows / sec8,6213, MB server mem Query time (sec) Checkpoint time (sec)0.222 Rows /sec47,6194,255
Delete with Foreign Keys
Summary When large updates and deletes are slow Examine the execute plan Look for nonclustered index seeks on modified tables with high row count Use index hint to force table scan
Additional Information SQL Server Quantitative Performance Analysis Server System Architecture Processor Performance Direct Connect Gigabit Networking Parallel Execution Plans Large Data Operations Transferring Statistics SQL Server Backup Performance with Imceda LiteSpeed