Proactive Index Design Using QUBE Lauri Pietarinen Courtesy of Tapio Lahdenmäki November 2010 IDUG 2010.

Proactive Index Design Using QUBE Lauri Pietarinen Courtesy of Tapio Lahdenmäki November 2010 IDUG 2010

Q U B E Q-Quick U-Upper B-Bound E-Estimate A simple formula for estimating CPU and elapsed time for queries, created by Tapio Lahdenmäki and others at IBM-Finland

Index Basics CNO CNO, IDATE SELECT INO, CNAME FROM INVOICE WHERE CNO = :CNO AND IDATE > :IDATE INVOICE CNO, IDATE, INO, CNAME INVOICE 1 1,000,000 invoices Per customer: Max 10,000 invoices Max 300 recent invoices 2 3 4 1,000,000 T10,000 T 300 T T = Touch

1 3 7 8 20 12 39 33 21 7 20 39 Leaf pages Non-leaf pages Continue until last level single page Normally 3 levels if 1,000,000 table rows Number of non-leaf pages much lower than number of leaf pages Non-leaf pages tend to stay in pool with current hardware Reasonable (2010): Ignore cost of non-leaf page processing B-Tree Index WHERE COL = 12 WHERE COL BETWEEN 2 AND 10 COL TABLETABLE

1 3 7 8 12 20 21 Recommended Mental Image COL is M column (matching column) Predicate COL BETWEEN 2 AND 10 defines index slice (matching predicate) COL T = Touch TABLE TR = Number of random touches TS = Number of sequential touches TR = TS = T T T T T T T

Request Tracking Insurance company Req DEADLINE STATUS RPK CNO... BO = 15 10s 3.6.2005 X X X 4.6.2005 X X X 1,000 per day Average: 5 STATUS changes per request Primary key of REQUEST = RPK, foreign keys CNO and BO BO = Branch office (100 branch offices, the largest one covers 3% of requests) STATUS: 1...9 (9 = Closed) DL = Deadline DL = 31.12.2099 BO = Latest BO 99% Customers 20 rows / screen

REQUEST RPK 1,000,000 rows, average length 300 bytes SELECT DL, STATUS, RPK, CNO, C1, C2 FROM REQUEST WHERE STATUS < 9 AND BO = :BO ORDER BY DL CNO BO FF = 0...3% FF = 1% FF = Filter Factor: 0...100% Common Transaction

REQUEST RPK 1,000,000 rows CNO BO Read 1,000,000 table rows Pick rows that satisfy both predicates Sort 300 result rows Read index slice Read 30,000 table rows Pick rows that satisfy STATUS < 9 Sort 300 result rows 3% Alternative 1Alternative 2 Which One Faster?

Sequential Read in 2010 I/O time DBMS and the disk subsystem read ahead -- lots of pages with one rotation Not all pages at once -- just trying to stay ahead: when the program needs a page it should be in the buffer pool If sequential read speed 40 MB/s, I/O time per 4K page 0.1 ms; if 10 rows per page, I/O time per row = 10 us (microseconds) CPU time Rule of thumb: CPU time per examined row = 5 us with sequential read FETCH (move qualifying row to application pgm) may take 50 us of CPU time READ CACHE Processor RAM CPU cache Buffer pool

Random Read in 2010 Disk I/O time If needed page not in pool: disk read If needed page in read cache: I/O time may be 1 ms Random read from disk drive may take 10 ms CPU time Retrieving a row and evaluating it may take 50 us of CPU time (random read) FETCH one row may take 50 us of CPU time -- as with sequential read Serious READ CACHE Processor RAM CPU cache Buffer pool

Depends on drive busy Q = (u / (1-u)) x S Q = Average queuing time u = Average drive busy S = Average service time 50 random reads a second u = 50 read/s x 0.006 s/read = 0.3 Q = (0.3 /(1- 0.3)) x 6 ms = 3 ms Queuing (Q) 3 ms Seek 4 ms Half a rotation 2 ms Transfer 1 ms Total I/O time 10 ms S = Service time One random read keeps a drive busy for 6 ms Random Read from Disk Drive

Disk Drives -- the Bottleneck 3 GB 72 GB 145 GB 300 GB 1992200220032005 1 TB 2007 Storage density grows dramatically Sequential I/O getting faster Random I/O remains slow (and may even become slower) u 30% to 60% Q 3 ms to 9 ms Random read 10 ms to 16 ms Q = (u / (1-u)) x S Q = Average queuing time u = Average drive busy S = Average service time 2 TB 2009

ET = TR x 10 ms + TS x 0.01 ms + F x 0.1 ms CPU = TR x 50 us + TS x 5 us + F x 50 us ET = Elapsed time (SQL) CPU = CPU time (SQL) TR = Number of random touches TS = Number of sequential touches F = Number of rows returned to program (Fetches) Quick Upper-Bound Estimate (QUBE)

Index Table TR TS ET = ( CPU = ( + + ) x 10 ms = + + ) ms / 20 = TR TS/1000F/100 TRTS/10F REQUEST 1,000,000 rows Alternative 1 Worst input: F = 300 1 M = 1,000,000 1 1 M 1 1,000310 s 1300 100,0005 s

Alternative 2A REQUEST RPK 1,000,000 rows CNO BO 3% Index Table TR TS ET = ( CPU = ( + + ) x 10 ms = + + ) ms / 20 = TR TS/1000F/100 TRTS/10F Worst input: F = 300 30,001303300 s 30,001300 3,000 2 s 1 30,000

REQUEST BO Alternative 2B RPK 1,000,000 rows CNO 3% Index Table TR TS ET = ( CPU = ( + + ) x 10 ms = + + ) ms / 20 = TR TS/1000F/100 TRTS/10F Worst input: F = 300 26031 s 2300 6,000 0,3 s 1 30,000 C 3% 1

STATUS BO 1 1 3 4 9 9 9 3 88 2 70 1 2 2 MatchingScreening REQUEST Basic Question STATUS < 9 defines index slice BO = :BO evaluated in index All predicate columns in one index? Touch table only when WHERE clause true If yes, index is semi-fat WHERE STATUS < 9 AND BO = :BO T T T T T T

BOSTATUS 1 2 2 2 3 70 88 3 3 6 9 1 4 1 STATUS < 9 AND BO = 2 defines index slice REQUEST The index slice contains only qualifying index rows Semi-Fat Index Matching T T T T T

REQUEST SELECT DL, STATUS, RPK, CNO, C1, C2 FROM REQUEST WHERE STATUS < 9 AND BO = :BO ORDER BY DL FF = 1% M M 0.03% BO, STATUS M M QUBE for Semi-Fat Index – Your Turn! FF = 3% MC = 2 SC = 0 IXONLY = N SORT = Y Index Table TR TS ET = ( CPU = ( + + ) x 10 ms = + + ) ms / 20 = TRTS/1000F/100 TRTS/10F F = 300

QUBE for Semi-Fat Index – Solution REQUEST SELECT DL, STATUS, RPK, CNO, C1, C2 FROM REQUEST WHERE STATUS < 9 AND BO = :BO ORDER BY DL FF = 1% M M 0.03% BO, STATUS M M FF = 3% MC = 2 SC = 0 IXONLY = N SORT = Y Index Table TR TS ET = ( CPU = ( ++ ) x 10 ms = + + ) ms / 20 = TR TS/1000 F/100 TR TS/10F F = 300 1300 301 00 33 s 301 30 300 30 ms

Still Too Long - What Next? The problem: 300 random table touches Fat index No table touches 20 FETCHes - 20 table touches? 300 x 10 ms = 3 s

DECLARE CURSOR... OPEN CURSOR FETCH CURSOR ---- while found CLOSE CURSOR FETCH: One result row OR Access path without sort OPEN CURSOR: All result rows When Do Touches Take Place? Sort very fast today (say, 10 us CPU per row) but... Access path with sort ? ?

MC = 1 SC = 1 IXONLY = N SORT = N Index Table TR TS ET = ( CPU = ( SELECT DL, STATUS, RPK, CNO, C1, C2 FROM REQUEST WHERE STATUS < 9 AND BO = :BO ORDER BY DL, RPK FETCH FIRST 20 ROWS ONLY FF = 1% + + ) x 10 ms = + + ) ms / 20 = TRTS/1000F/100 TRTS/10F FF = 3% No Sort, 20 FETCHes BO, DL, RPK, STATUS F = 20 First screen 1 19 20 21 0 0 210 ms 21 2 20 2 ms

Worst-Input Estimates Elapsed time (ET) CPU time Note No index 10 s 5 s BO (non-C) 300 s 2 s BO, STATUS 3 s 0.03 s No Sort BO (C) 1 s 0,3 s 0.2 s 0.002 s Semi-fat First screen Modify pgm BO, DL, RPK, STATUS

MC = 2 SC = 0 IXONLY = Y SORT = Y Index Table TR TS ET = ( CPU = ( SELECT DL, STATUS, RPK, CNO, C1, C2 FROM REQUEST WHERE STATUS < 9 AND BO = :BO ORDER BY DL FF = 1% + + ) x 10 ms = + + ) ms / 20 = TRTS/1000F/100 TRTS/10F FF = 3% Fat Index with Sort BO, STATUS, RPK, DL, CNO, C1, C2 F = 300 1 300 1 0 3 40 ms 1 30 300 20 ms

Worst-Input Estimates Elapsed time (ET) CPU time Note No index 10 s 5 s BO (non-C) 300 s 2 s BO, STATUS 3 s 0.03 s No Sort Fat BO, STATUS, RPK... 0.04 s 0.02 s BO (C) 1 s 0,3 s 0.2 s 0.002 s Semi-fat First screen Modify pgm BO, DL, RPK, STATUS

Too Expensive? Disk space RAM (Non-leaf pages) INSERT UPDATE DELETE Index reorg (rebuild) BO Something else? 1000 new rows per day 5000 STATUS updates per day 1,000,000 rows 80 bytes per row BO, STATUS, RPK, DL, CNO, C1, C2

The Cost of Adding an Index Roughly 10 ms per added row Add 10 ms if split Assumptions: Upper index levels in DB cache Leaf page not in DB cache INSERT DELETE UPDATE Roughly 10 ms per removed row Roughly 10 ms per added row when columns of new index updated Add 10 ms if move Add 10 ms if split Asynchronous writes Drive busy up

The Cost of Adding an Index Column None if adequate distributed free space (!) INSERT DELETE UPDATE None (!) Roughly 10 ms when only the new column updated Add 10 ms if move Add 10 ms if split I/O moves the whole page, not a row

No Low cost compared to dramatic reduction in response time and cost of SELECT Disk space RAM (Non-leaf pages) INSERT UPDATE DELETE Index reorg (rebuild) Something else? 1000 new rows per day 5000 STATUS updates per day 1,000,000 rows 80 bytes per row BO BO, STATUS, RPK,DL, CNO, C1, C2 The only issue Index slice read time increases if index reorg interval too long So, Too Expensive?

Obsolete Do not index volatile columns STATUS Max N indexes per table Max 5 columns per index etc INSERT and DELETE fast enough after index added? TR = 1 per added or removed index row UPDATE fast enough after index columns added? TR = 1 or 2 per updated index column Index reorg requirement OK? Long index rows (more than 5% of leaf page) Hot spots (except end of index) Drive load caused by index maintenance If dozens of random index row inserts or deletes a second Index storage cost (disk & RAM) Diminishing year after year Underindexing: a common mistake TR RAM e/GB/m

Index BO Was Not Adequate For This SELECT Who should have seen this? When? SELECT DL, STATUS, RPK, CNO, C1, C2 FROM REQUEST WHERE STATUS < 9 AND BO = :BO ORDER BY DL

Summary Qube is a way of thinking about indexes It can be used to prevent performance problems It can be used in conjunction with other tools It can be used to understand and analyze performance problems

Proactive Index Design Using QUBE Lauri Pietarinen Courtesy of Tapio Lahdenmäki November 2010 IDUG 2010.

Similar presentations

Presentation on theme: "Proactive Index Design Using QUBE Lauri Pietarinen Courtesy of Tapio Lahdenmäki November 2010 IDUG 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Proactive Index Design Using QUBE Lauri Pietarinen Courtesy of Tapio Lahdenmäki November 2010 IDUG 2010.

Similar presentations

Presentation on theme: "Proactive Index Design Using QUBE Lauri Pietarinen Courtesy of Tapio Lahdenmäki November 2010 IDUG 2010."— Presentation transcript:

Similar presentations

About project

Feedback