Carnegie Mellon Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Joint work with Shimin Chen School of Computer Science Carnegie.

Slides:



Advertisements
Similar presentations
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Advertisements

B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
1 Lecture 8: Data structures for databases II Jose M. Peña
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
CS4432: Database Systems II
B+-tree and Hashing.
@ Carnegie Mellon Databases Improving Hash Join Performance Through Prefetching Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki ‡ Carnegie.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Last Time –Main memory indexing (T trees) and a real system. –Optimize for CPU, space, and logging. But things have changed drastically! Hardware trend:
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Chapter 3: Data Storage and Access Methods
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Carnegie Mellon Improving Index Performance through Prefetching Shimin Chen, Phillip B. Gibbons † and Todd C. Mowry School of Computer Science Carnegie.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Inspector Joins IC-65 Advances in Data Management Systems 1 Inspector Joins By Shimin Chen, Anastassia Ailamaki, Phillip, and Todd C. Mowry VLDB 2005 Rammohan.
Primary Indexes Dense Indexes
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
CS4432: Database Systems II
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Making B+-Trees Cache Conscious in Main Memory
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
@ Carnegie Mellon Databases Inspector Joins Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki 2 Carnegie Mellon University Intel Research.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Database Architecture Optimized for the new Bottleneck: Memory Access Chau Man Hau Wong Suet Fai.
Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Tree-Structured Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Indexing CS 400/600 – Data Structures. Indexing2 Memory and Disk  Typical memory access: 30 – 60 ns  Typical disk access: 3-9 ms  Difference: 100,000.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Optimizing Multidimensional Index Trees for Main Memory Access Author: Kihong Kim, Sang K. Cha, Keunjoo Kwon Members: Iris Zhang, Grace Yung, Kara Kwon,
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing. 421: Database Systems - Index Structures 2 Cost Model for Data Access q Data should be stored such that it can be accessed fast q Evaluation.
CS411 Database Systems Kazuhiro Minami 10: Indexing-1.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
CS4432: Database Systems II
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
CS522 Advanced database Systems
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Cache Memory Presentation I
Lecture 19: Data Storage and Indexes
File Storage and Indexing
Indexing 4/11/2019.
CSE 373 Data Structures and Algorithms
Presentation transcript:

Carnegie Mellon Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Joint work with Shimin Chen School of Computer Science Carnegie Mellon University Bell Laboratories current affiliation: Intel Research Pittsburgh Phillip B. Gibbons School of Computer Science Carnegie Mellon University Todd C. Mowry DB2 UDB Development Team IBM Toronto Lab Gary Valentin

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees B + - Tree Operations: Review Search:  binary search in every node on the path Insertion/Deletion:  search followed by data movement Range Scan:  locate a collection of tuples in a range  traverse the linked list of leaf nodes  different from search-like operations

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Disk-optimized B + -Trees L2/L3 Cache CPU L1 Main Memory Disks Traditional focus: I/O performance  minimize # of disk accesses  optimal tree nodes are disk pages typically 4KB-64KB large

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Cache-optimized B + -Trees L2/L3 Cache CPU L1 Main Memory Disks Recent studies: cache performance  e.g. [Rao & Ross, SIGMOD’00], [Bohannon, McIlroy, Rastogi, SIGMOD’01], [Chen, Gibbons, Mowry, SIGMOD’01]  cache line size is B  optimal tree nodes are only a few cache lines

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Large Difference in Node Sizes L2/L3 Cache CPU L1 Main Memory Disks

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Cache-optimized B + -Trees: Poor I/O Performance L2/L3 Cache CPU L1 Main Memory Disks  may fetch a distinct disk page for every node on the path of a search  similar penalty for range scan

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Disk-optimized B + -Trees: Poor Cache Performance L2/L3 Cache CPU L1 Main Memory Disks  Binary search in a large node suffers excessive number of cache misses (explained later in the talk)

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Optimizing for Both Cache and Disk Performance? L2/L3 Cache CPU L1 Main Memory Disks

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Our Approach L2/L3 Cache CPU L1 Main Memory Disks Fractal Prefetching B + -Trees (fpB + -Trees)  embedding cache optimized trees inside disk-optimized trees

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Outline  Overview  Optimizing Searches and Updates  Optimizing Range Scans  Experimental Results  Related Work  Conclusion

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Page Structure of Disk-optimized B + -Trees  We focus on fixed sized keys  (please see our full paper for a discussion on variable sized keys) Header A huge array of index entries An index entry is or

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Binary Search in a B + -Tree Page Search for entry #71 1 st cache line2 nd 3 rd 4 th 128 th Suppose an index entry array has 1023 index entries, numbered index entries / cache line the array occupies 128 cache lines e.g. 8KB page, an entry is, 64B cache line, 8B header 9 th 8 th

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Binary Search in a B + -Tree Page Search for entry # Active Range Poor cache performance because of poor spatial locality

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Fractal Prefetching B + -Trees (fpB + -Trees) Embedding cache-optimized trees inside disk pages: good search cache performance  binary search in cache-optimized nodes  much better locality  use cache prefetching good search disk performance  nodes are embedded into disk pages

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Node Size Mismatch Problem  Disk page size and cache-optimized node size  determined by hardware parameters and key sizes  Ideally cache-optimized trees fit nicely in disk pages  But usually this is not true ! A 2-level tree overflows Unused Space A 2-level tree underflows. But adding one more level overflows.

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Two Solutions Solution 1: use different sizes for in-page leaf and nonleaf nodes  e.g. smaller root when overflow, larger root when underflow Solution 2: overflowing nodes become roots of new pages Unused Space

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees The Two Solutions from Another Point of View  Conceptually we apply disk and cache optimizations in different orders  Solution 1: disk-first  first build the disk-optimized pages  then fit smaller trees into disk pages by allowing different node sizes  Solution 2: cache-first  first build the cache-optimized trees  then group nodes together and place them into disk pages

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Insertion and Deletion Cache Performance  In disk-optimized B + -Trees, data movement is very expensive  the huge array structure in disk pages  on average, we need to move half the array  In our fpB + -Trees, the cost of data movement is much smaller  small cache-optimized nodes  We show that fpB + -Trees have much better insertion/deletion performance over disk-optimized B + -Trees with fixed sized keys

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Outline  Overview  Optimizing Searches and Updates  Optimizing Range Scans  Experimental Results  Related Work  Conclusion

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Jump-pointer Array Prefetching for Range Scan  Previous proposal for range scan cache performance (SIGMOD’01)  build data structures to hold leaf node addresses  prefetch leaf nodes during range scans Internal Jump Pointer Array  Recall that range scans essentially traverse the linked list of leaf nodes

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees New Proposal: I/O Prefetching linking leaf parent pages together  Employ jump-pointer array prefetching in I/O  jump-pointer arrays contain leaf page IDs  prefetching leaf pages to improve range scan I/O performance  Very useful when leaf pages are not sequential on disk  non-clustered index under frequent updates  (when sequential prefetching is not applicable)

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Both Cache and I/O Prefetching in fpB + -Trees Two jump-pointer arrays in fpB + -Trees:  One for range scan cache performance  containing leaf node addresses for cache prefetching  One for range scan disk performance  containing leaf page IDs for I/O prefetching L2/L3 Cache CPU L1 Main Memory Disks

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees More Details in Our Paper  Computation for optimal node sizes  Data structures  Algorithms  Bulkload  Search  Insertion  Deletion  Range scan

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Outline  Overview  Optimizing Searches and Updates  Optimizing Range Scans  Experimental Results  Related Work  Conclusion

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Implementation  We implemented a buffer manager and three index structures on top of the buffer manager Buffer Manager Disk-optimized B + -Trees Disk-first fpB + -Trees Cache-first fpB + -Trees Baseline

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Experiments and Methodology Experiments Search: (1) Cache performance; (2) Disk performance improving cache performance while preserving good disk performance Update: (3) Cache performance solving data movement problem Range Scan: (4) Cache performance; (5) Disk performance jump-pointer array prefetching Methodology cache performance: detailed cycle-by-cycle simulations  memory system parameters in near future  better prefetching support range scan I/O performance: execution times on real machines search I/O performance: counting the number of I/Os  I/O operations in search do not overlap

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Search Cache Performance Disk-optimized B + -Tree Disk-first fpB + -Tree Cache-first fpB + -Tree 2000 random searches after bulkload; 100% full except root; 16KB pages total # of entries in all leaf pages execution time (M cycles) 3  fpB + -Trees perform significantly better than disk-optimized B + -Trees  achieving speedups: at all sizes; over 1.25 when trees contain at least 1M entries  The performances of two fpB + -Trees are similar

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Search I/O Performance  Disk-first fpB + -Trees access < 3% more pages  Very small I/O performance impact  Cache-first fpB + -Trees may access up to 25% more pages in our results 4KB8KB16KB32KB page size # of I/O reads (x 1000) Disk-optimized B + -Tree Disk-first fpB + -Tree Cache-first fpB + -Tree 2000 random searches after bulkloading 10M index entries; 100% full except root

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Insertion Cache Performance 2000 random insertions after bulkloading 3M keys 70% full  fpB + -Trees are significantly faster than disk-optimized B + -Trees  achieving up to 35-fold speedups over disk-optimized B + -Trees  Data movement costs dominate disk-optimized B + -Tree performance Disk-optimized B + -Tree Disk-first fpB + -Tree Cache-first fpB + -Tree 4KB8KB16KB32KB execution time (M cycles) Page Size

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Range Scan Cache Performance 100 scans starting at random locations in index bulkloaded with 3M keys 100% full; each range contains 1M keys; 16KB pages  Disk-first and cache-first fpB + -Trees achieve speedups of 4.2 and 3.5 over disk-optimized B + -Trees  Jump-pointer array cache prefetching is effective Disk-optimized B + -Tree Disk-first fpB + -Tree Cache-first fpB + -Tree execution time (M cycles)

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Range Scan I/O Performance  IBM DB2 Universal Database  Jump-pointer array I/O prefetching achieves speedups of for disk-optimized B + -Trees 8-processor machine (RS/6000 line), 2GB memory, 80 SSA disks; mature index on a 12.8GB table no prefetchwith prefetchin memory normalized execution time

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Other Experiments  We find similar benefits in deletion cache performance  Up to 20-fold speedups  We performed many cache performance experiments and got similar results for  Varying tree sizes, bulkload factors, and page sizes  Mature trees  Varying key sizes: 20B keys  We performed range scan I/O experiments in our own index implementations and saw up to 6.9 fold speedups

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Related Work  Micro-indexing (discussed briefly by Lomet, SIGMOD Record, Sep. 2001) Continuous array of index entries Micro-index  We are the first to quantitatively analyze performance for micro-indexing:  improves search cache performance  but suffers from the data movement problem in update because of the continuous array structure  fpB + -Trees have much better update performance

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Fractal prefetching B + -Trees: Conclusion Search: combine cache-optimized and disk-optimized node sizes  better cache performance o speedup over disk-optimized B + -Trees  good disk performance for disk-first fpB + -Trees o disk-first fpB + -Trees visit < 3% more disk pages o we only recommend cache-first fpB + -Trees with very large memory Update: solve data movement problem by using smaller nodes  better cache performance o up to a 20-fold speedup over disk-optimized B + -Trees Range Scan: employ jump-pointer array prefetching  better cache performance  better disk performance o speedup on IBM DB2

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Back Up Slides

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Previous Work: Prefetching B + -Trees (SIGMOD 2001)  Study B + -Trees in main memory environment  For search: prefetching wider tree nodes  increase node size to multiple cache lines wide  use prefetching to read all cache lines of a node in parallel B + -Tree with one-line nodes Prefetching B + -Tree with four-line nodes

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Prefetching B + trees (cont’d)  For range scan: jump-pointer array prefetching  build jump-pointer arrays to hold leaf node addresses  prefetch leaf nodes with jump-pointer array  two implementations External Jump Pointer Array Internal Jump Pointer Array

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Optimization in Disk-first Approach  Two conflicting goals: 1) optimize search cache performance 2) maximize page fan-out to preserve good I/O performance  Optimal Criteria: maximize page fan-out while maintaining analytical search cost to be within 10% of the optimal  Details in the paper

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Cache-first fpB + -Trees Structure  Group sibling leaf nodes into the same pages for range scan  Group parent and its children into the same page for search  Leaf parent nodes may be put into overflow pages Overflow pages

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Simulation Parameters Pipeline Parameters Clock Rate1 GHz Issue Width4 insts/cycle Functional Units2 Int, 2 FP, 2 Mem, 1 Branch Reorder Buffer Size64 insts Integer Multiply/Divide12/76 cycles All Other Integer1 cycle FP Divide/Square Root15/20 cycles All Other FP2 cycles Branch Prediction Scheme gshare Memory Parameters Line Size64 bytes Primary Data Cache64 KB, 2-way set assoc. Primary Instruction Cache 64 KB, 2-way set-assoc. Miss Handlers32 for data, 2 for inst Unified Secondary Cache2 MB, direct-mapped Primary-to-Secondary Miss Latency 15 cycles (plus contention) Primary-to-Memory Miss Latency 150 cycles (plus contention) Main Memory Bandwidth1 access per 10 cycles Models all the gory details, including memory system contention

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Optimal Node Sizes Computation (key=4B) Disk-first fpB + -Trees Page Size Nonleaf Node Leaf Node Page Fan-out Cost/ Optimal 4KB64B384B KB192B256B KB192B512B KB256B832B Cache-first fpB + -Trees Page Size Node Size Page Fan-out Cost/ Optimal 4KB576B KB576B KB704B KB640B  Optimal criteria: maximize page fan-out while maintaining analytical search cost to be within 10% of the optimal  We used these optimal values in our experiments

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Search Cache Performance Disk-optimized B+tree Micro-indexing Disk-first fpB+tree Cache-first fpB+tree 2000 random searches after bulkload; 100% full except root; 16KB pages # of entries in leaf pages execution time (M cycles) 3  Cache-sensitive schemes (fpB + -Trees and micro-indexing) all perform significantly better than disk-optimized B + -Trees  The performances of cache-sensitive schemes are similar

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Search Cache Performance (Varying Page Sizes)  Same experiments but with different page sizes  We see the same trends: cache-sensitive schemes are better  They achieve speedups: at all sizes; when trees contain at least 1M entries execution time (M cycles) # of entries in leaf pages # of entries in leaf pages # of entries in leaf pages 4KB pages8KB pages32KB pages Disk-optimized B+tree Micro-indexing Disk-first fpB+tree Cache-first fpB+tree

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Optimal Width Selection Disk-first fpB + -TreesCache-first fpB + -Trees (16KB pages, 4B keys)  Our selected trees perform within 2% and 5% of the best for disk- first and cache-first fpB + -Trees

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Search I/O Performance page size 4KB8KB16KB32KB Mature Trees 4KB8KB16KB32KB page size # of I/O reads (x 1000) After Bulkload: 100% full Disk-optimized B+tree Disk-first fpB+tree Cache-first fpB+tree (2000 random searches, 4B keys)  Disk-first fpB + -Trees access < 3% more pages  Very small I/O performance impact  Cache-first fpB + -Trees may access up to 25% more pages in our results

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Insertion Cache Performance 4KB8KB16KB32KB execution time (M cycles) Disk-optimized B+tree Micro-indexing Disk-first fpB+tree Cache-first fpB+tree 2000 random insertions after bulkloading 3M keys 70% full

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Insertion Cache Performance II 100 bulkload factor execution time (M cycles) Disk-optimized B+tree Micro-indexing Disk-first fpB+tree Cache-first fpB+tree 2000 random insertions after bulkloading 3M keys; 16KB pages  fpB + -Trees are significantly faster than both disk-optimized B + -Trees and Micro-indexing  fpB + -Trees achieve up to 35-fold speedups over disk-optimized B + -Trees across all page sizes

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Insertion Cache Performance II 100 bulkload factor execution time (M cycles) Disk-optimized B+tree Micro-indexing Disk-first fpB+tree Cache-first fpB+tree 2000 random insertions after bulkloading 3M keys; 16KB pages  Two major costs: data movement, page split  Micro-indexing still suffers from data movement costs  fpB + -Trees avoid this problem with smaller nodes

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Space Utilization page size Disk-first fpB+tree Cache-first fpB+tree 4KB8KB16KB32KB4KB8KB16KB32KB page size space overhead (percentage) Disk-first fpB+tree Cache-first fpB+tree After Bulkload: 100% full Mature Trees  Disk-first fpB + -Trees incur < 9% space overhead  Cache-first fpB + -Trees may use up to 36% more pages in our results (4B keys)

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Range Scan Cache Performance bulkload factor execution time (M cycles) Disk-optimized B+tree Disk-first fpB+tree Cache-first fpB+tree 100 scans starting at random locations on index bulkloaded with 3M keys; Each range spans 1M keys; 16KB pages  Disk-first and cache-first fpB + -Trees achieve speedups of and over disk-optimized B + -Trees

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Range Scan I/O Performance # of disks used execution time (s) (10 disks) (10M entries in the range) jump-pointer array prefetching plain range scan  Setup: SGI Origin 200 with four 180MHz R10000 processors, 128MB memory, 12 SCSI disks (10 of them used in experiments); Range scan on mature trees  Jump-pointer array prefetching achieves up to a speedup of 6.9

Chen, Gibbons, Mowry & Valentin Carnegie Mellon Fractal Prefetching B + -Trees Jump-pointer Array Prefetching on IBM DB # of I/O processes normalized execution time SMP degree (# of parallel processes) normalized execution time no prefetch with prefetch in memory  Setup: 8-processor machine (RS/6000 line), 2GB memory, 80 SSA disks; mature index on a 12.8GB table; “SELECT COUNT(*) FROM data “  Jump-pointer array prefetching achieves speedups of