B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:

Slides:



Advertisements
Similar presentations
B+-Trees and Hashing Techniques for Storage and Index Structures
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advanced Data Structures NTUA Spring 2007 B+-trees and External memory Hashing.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
1 Tree-Structured Indexes Module 4, Lecture 4. 2 Introduction As for any index, 3 alternatives for data entries k* : 1. Data record with key value k 2.
Multilevel Indexing and B+ Trees
ICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 02/18/20101Lipyeow Lim.
CS4432: Database Systems II
B+-trees and Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: page = b bytes or B records (or block) If r is the size of a.
CS CS4432: Database Systems II Basic indexing.
B+-tree and Hashing.
Tree-Structured Indexes Lecture 5 R & G Chapter 9 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Indexing (cont.). Insertion in a B+ Tree Another B+ Tree
Tree-Structured Indexes Lecture 5 R & G Chapter 10 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Storage and Indexing February 26 th, 2003 Lecture 19.
Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
1.1 CS220 Database Systems Tree Based Indexing: B+-tree Slides courtesy G. Kollios Boston University via UC Berkeley.
Tree-Structured Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree- and Hash-Structured Indexes Selected Sections of Chapters 10 & 11.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
B+ Tree Index tuning--. overview B + -Tree Scalability Typical order: 100. Typical fill-factor: 67%. –average fanout = 133 Typical capacities (root at.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
Data on External Storage – File Organization and Indexing – Cluster Indexes - Primary and Secondary Indexes – Index data Structures – Hash Based Indexing.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
I/O Cost Model, Tree Indexes CS634 Lecture 5, Feb 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Tree-Structured Indexes Chapter 10
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Tree-Structured Indexes R & G Chapter 10 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
Multilevel Indexing and B+ Trees
Multilevel Indexing and B+ Trees
Tree-Structured Indexes: Introduction
Tree-Structured Indexes
Tree-Structured Indexes
COP Introduction to Database Structures
External Memory Hashing
B+-Trees and Static Hashing
External Memory Hashing
Tree-Structured Indexes
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
Tree-Structured Indexes
Tree-Structured Indexes
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Tree-Structured Indexes
Tree-Structured Indexes
Indexing 1.
Storage and Indexing.
General External Merge Sort
Tree-Structured Indexes
Indexing February 28th, 2003 Lecture 20.
Tree-Structured Indexes
Tree-Structured Indexes
Tree-Structured Indexes
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

B+-trees

Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity: in number of pages CPUMemory Disk

I/O complexity An ideal index has space O(N/B), update overhead O(1) or O(log B (N/B)) and search complexity O(a/B) or O(log B (N/B) + a/B) where a is the number of records in the answer But, sometimes CPU performance is also important… minimize cache misses -> don’t waste CPU cycles

B+-tree Records must be ordered over an attribute, SSN, Name, etc. Queries: exact match and range queries over the indexed attribute: “find the name of the student with ID= ” or “find all students with gpa between 3.00 and 3.5”

B+-tree:properties Insert/delete at log F (N/B) cost; keep tree height-balanced. (F = fanout) Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries/pointers. Two types of nodes: index nodes and data nodes; each node is 1 page (disk based method)

Example Root

to keys to keysto keys to keys < 5757  k<8181  k<95 95  Index node

Data node To record with key 57 To record with key 81 To record with key 85 From non-leaf node to next leaf in sequence Struct { Key real; Pointr *long; } entry; Node entry[B];

Insertion Find correct leaf L. Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2) Redistribute entries evenly, copy up middle key. Insert index entry pointing to L2 into parent of L. This can happen recursively To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.) Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at top.

Deletion Start at root, find leaf L where entry belongs. Remove the entry. If L is at least half-full, done! If L has only d-1 entries, Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). If re-distribution fails, merge L and sibling. If merge occurred, must delete entry (pointing to L or sibling) from parent of L. Merge could propagate to root, decreasing height.

Characteristics Optimal method for 1-d range queries: Space: O(N/B), Updates: O(log B (N/B)), Query:O(log B (N/B) + a/B) Space utilization: 67% for random input Original B-tree: index nodes store also pointers to records

Example Root Range[32, 160]

Other issues Internal node architecture [Lomet01]: Reduce the overhead of tree traversal. Prefix compression: In index nodes store only the prefix that differentiate consecutive sub-trees. Fanout is increased. Cache sensitive B+-tree Place keys in a way that reduces the cache faults during the binary search in each node. Eliminate pointers so a cache line contains more keys for comparison.

References [Lomet01]: David B. Lomet: The Evolution of Effective B-tree: Page Organization and Techniques: A Personal Account. SIGMOD Record 30(3): (2001)SIGMOD Record 30