Query Processing and Optimizing on SSDs Flash Group Qingling Cao
Introduction Page Layout on SSD Scan Approaches Conclusion Join Algorithms Outline
Introduction Page Layout on SSD Scan Approaches Conclusion Join Algorithms Outline
Page layout and data structure Leverage fast random read to speed up selection 、 projection and join operation Database query processing engines traditionally emphasize on sequential I/O Introduction
Page Layout on SSD Scan Approaches Conclusion Join Algorithms Outline
Page Layout on SSD Row Layout Column Layout - Attributes of one column stored in continuous pages slot
PAX Layout is efficient for SSD but not for disk. Why? Page Layout on SSD PAX Layout
Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be KB. Then full page size will be MB. IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be KB. Page Layout on SSD
Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be KB. Then full page size will be MB. IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be KB. Page Layout on SSD
Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be KB. Then full page size will be MB. IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be KB. Page Layout on SSD
Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be KB. Then full page size will be MB. IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be KB. Page Layout on SSD
Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be KB. Then full page size will be MB. IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be KB. Page Layout on SSD
Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be KB. Then full page size will be MB. IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be KB. Page Layout on SSD
Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be KB. Then full page size will be MB. IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be KB. Page Layout on SSD
Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be KB. Then full page size will be MB. IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be KB. Page Layout on SSD
Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be KB. Then full page size will be MB. IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be KB. Page Layout on SSD
Introduction Page Layout on SSD Scan Approaches Conclusion Join Algorithms Outline
NSMScan – Always read the whole relation. FlashScan – Read only the related columns. e.g. select S from R where J Scan Approaches
FlashScanOPT(U) – read only the mini-pages consist the tuples needed. e.g. select S from R where J FlashScanOPT(S) – Attributes are sorted, so the mini-pages are read at most once. Scan Approaches
Table: 70m tuples, 11columns, 10GB System: Intel Core 2 Duo at 2.33GHz, 4GB of RAM Mtron 32GB SSD
Introduction Page Layout on SSD Scan Approaches Conclusion Join Algorithms Outline
Block Nested Loops Join Sort-Merge Join Grace Hash Join Hybrid Hash Join Join Algorithms – past lessons
☆ Algorithms that stress random reads, and avoid random writes as much as possible see bigger improvements on flash Join Algorithms – past lessons Customer: 450w tuples, 730MB Order: 4500w tuples, 5GB HDD: 5400RPM, 320GB SSD: OCZ Core series 60GB SATA II
Join Algorithms – RARE-join J1 J2 Select Name, Team from Player, Game where Player.Team=Game.Geam Player Game Blue, P:4 Green, P:3 Red, P:2 → Red, P:5 Orange, P:1 → Orange, P:6 Blue, G:4 Red, G:1 Orange, G:2 → Orange, G:3
Join Algorithms – RARE-join Join Index : Total I/O cost: |J1|+ σ 1 |V1|+|J2|+ σ 2 |V2| Join Result :
Join Algorithms – FlashJoin Read(A) Read(D) hashA, id1hashD, id2 hashG, id1,id2 hashK, id3 id1,id2,id3 id1,id2
Join Algorithms – Fetch Kernel Join Index : Join Index : Each page is read no more than once.
Join Algorithms – Fetch Kernel Join Index : Join Index :
Join Algorithms – FlashJoin R: 70m tuples, 10GB S: 7m tuples, 1GB System: Intel Core 2 Duo at 2.33GHz, 4GB of RAM Mtron 32GB SSD
Row-based {JI, id x, id y } Minimize the IO to fetch the join result Join Algorithms – DigestJoin
Sort-merge join Join results are clustered Memory is enough Fetch the pages of the tuples as soon as they are produced Join Algorithms – Page Fetching(1)
Fetching instruction table Join candidate table Join Index: (x 1,A:1,C:1) (x 2,B:1,D:1) (x 3,A:2,C:2) (x 4,B:2,D:2) ft1={A:1, B:1, A:2, B:2} ft2={C:1, D:1, C:2, D:2} Join Algorithms – Page Fetching(2) jct1={x1,x2,x3,x4} jct2={y1,y2,y3,y4} ft1={A:1, A:2, B:1, B:2} ft2={C:1, C:2, D:1, D:2}
Join Graph G=(V 1 ∪ V 2, E) E V 1 V 2 Segment e.g. {1, a, b, c}, {a, 1, 2} Join Algorithms – Page Fetching(3)
Required storage size(RSS) Required cache size(RCS)
Introduction Page Layout on SSD Scan Approaches Conclusion Join Algorithms Outline
Scan algorithm has little room for improvement. RARE-Join 、 FlashJoin. No write. Join index will be sorted many times. The size of minipage is not fixed. Conclusion PAX:
Row: DigestJoin. IO is much more than other join algorithms. Column: None Storage is more flexible. Utilize the technology of tuple reconstruction. Conclusion