Lecture 28: Index 3 B+ Trees

Slides:



Advertisements
Similar presentations
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 Lecture 8: Data structures for databases II Jose M. Peña
External Sorting “There it was, hidden in alphabetical order.” Rita Holt R&G Chapter 13.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CS4432: Database Systems II
External Sorting R & G Chapter 13 One of the advantages of being
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Storage and Indexing February 26 th, 2003 Lecture 19.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
Adapted from Mike Franklin
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
I/O Cost Model, Tree Indexes CS634 Lecture 5, Feb 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
CS 540 Database Management Systems
Indexing Goals: Store large files Support multiple search keys
Tree-Structured Indexes
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
External Sorting Chapter 13
Database Management Systems (CS 564)
Lecture#12: External Sorting (R&G, Ch13)
B+-Trees and Static Hashing
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
Tree-Structured Indexes
Midterm Review – Part I ( Disk, Buffer and Index )
External Sorting Chapter 13
Lecture 21: Indexes Monday, November 13, 2000.
Lecture 19: Data Storage and Indexes
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Tree-Structured Indexes
Lecture 21: B-Trees Monday, Nov. 19, 2001.
Adapted from Mike Franklin
Lecture 27: Index 2 B+ Trees
CSE 544: Lecture 11 Storing Data, Indexes
Lecture 31: The IO Model 2 Repacking
Storage and Indexing.
General External Merge Sort
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Indexing February 28th, 2003 Lecture 20.
Database Systems (資料庫系統)
Lecture 11: B+ Trees and Query Execution
Lecture 20: Indexes Monday, February 27, 2006.
Tree-Structured Indexes
External Sorting Chapter 13
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

Lecture 28: Index 3 B+ Trees Professor Xiannong Meng Spring 2018 Lecture and activity contents are based on what Prof Chris Ré of Stanford used in his CS 145 in the fall 2016 term with permission

B+ Trees in Practice Typical order: d=100. Typical fill-factor: 67%. Fill-factor is the percent of available slots in the B+ Tree that are filled; is usually < 1 to leave slack for (quicker) insertions Typical order: d=100. Typical fill-factor: 67%. average fanout = 133 Typical capacities: Height 4: 1334 = 312,900,700 records Height 3: 1333 = 2,352,637 records Top levels of tree sit in the buffer pool: Level 1 = 1 page = 8 Kbytes Level 2 = 133 pages = 1 Mbyte Level 3 = 17,689 pages = 133 MBytes What is the relationship between fill factor F and fanout f? (2d+1)*F = f Typically, only pay for one IO!

Simple Cost Model for Search Let: f = fanout, which is in [d+1, 2d+1] (we’ll assume it’s constant for our cost model…) N = the total number of pages we need to index F = fill-factor (usually ~= 2/3) Our B+ Tree needs to have room to index N / F pages! We have the fill factor in order to leave some open slots for faster insertions What height (h) does our B+ Tree need to be? h=1  Just the root node- room to index f pages h=2  f leaf nodes- room to index f2 pages h=3  f2 leaf nodes- room to index f3 pages … h  fh-1 leaf nodes- room to index fh pages!  We need a B+ Tree of height h = log 𝑓 𝑁 𝐹

Simple Cost Model for Search Note that if we have B available buffer (memory) pages, by the same logic: We can store 𝑳𝑩 levels of the B+ Tree in memory where 𝑳𝑩 is the number of levels such that the sum of all the levels’ nodes fit in the buffer: 𝐵≥1+𝑓+…+ 𝑓 𝐿𝐵−1 = 𝑙=0 𝐿𝐵−1 𝑓𝑙 In summary: to do exact search: We read in one page per level of the tree However, levels that we can fit in buffer are free! Finally we read in the actual record Height of B tree already in memory IO Cost: log 𝑓 𝑁 𝐹 −𝐿𝐵+1 where 𝐵≥ 𝑙=0 𝐿𝐵−1 𝑓𝑙

Simple Cost Model for Search To do range search, we just follow the horizontal pointers The IO cost is that of loading additional leaf nodes we need to access + the IO cost of loading each page of the results- we phrase this as “Cost(OUT)”. IO Cost: log 𝑓 𝑁 𝐹 −𝐿𝐵+𝐶𝑜𝑠𝑡(𝑂𝑈𝑇) where 𝐵≥ 𝑙=0 𝐿𝐵−1 𝑓𝑙 Cost(OUT) has one subtle but important twist… let’s watch again

B+ Tree Range Search Animation K in [30,85]? How many IOs did our friend do? Depends on how the data are arranged 30 < 80 80 30 in [20,60) 20 60 100 120 140 30 in [30,40) 10 15 18 20 30 40 50 60 65 80 85 90 Not all nodes pictured To the data! 10 12 15 20 28 30 40 59 63 80 84 89

Clustered Indexes An index is clustered if the underlying data is ordered in the same way as the index’s data entries. 11

Clustered vs. Unclustered Index 30 30 Index Entries 22 25 28 29 32 34 37 38 22 25 28 29 32 34 37 38 19 22 27 28 30 33 35 37 19 33 27 22 37 28 35 30 Data Records Clustered Unclustered

Clustered vs. Unclustered Index Recall that for a disk with block access, sequential IO is much faster than random IO For exact search, no difference between clustered / unclustered For range search over R values: difference between 1 random IO + R sequential IO, and R random IO: A random IO costs ~ 10ms (sequential much much faster) For R = 100,000 records- difference between ~10ms and ~17min!

Fast Insertions & Self-Balancing We won’t go into specifics of B+ Tree insertion algorithm, but has several attractive qualities: ~ Same cost as exact search Self-balancing: B+ Tree remains balanced (with respect to height) even after insert B+ Trees also (relatively) fast for single insertions! However, can become bottleneck if many insertions (if fill-factor slack is used up…)

Message: Bulk Loading is faster! 20 30 Bulk Loading 12 17 22 25 28 29 32 34 37 38 11 15 21 22 27 28 30 33 35 37 11 15 21 22 27 28 30 33 35 37 Input: Sorted File Output: B+ Tree Message: Bulk Loading is faster!

Summary We create indexes over tables in order to support fast (exact and range) search and insertion over multiple search keys B+ Trees are one index data structure which support very fast exact and range search & insertion via high fanout Clustered vs. unclustered makes a big difference for range queries too