CS522 Advanced database Systems

Slides:

Advertisements

Similar presentations

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.

Advertisements

1 Tree-Structured Indexes Module 4, Lecture 4. 2 Introduction As for any index, 3 alternatives for data entries k* : 1. Data record with key value k 2.

ICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 02/18/20101Lipyeow Lim.

Tree-Structured Indexes Lecture 5 R & G Chapter 9 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.

Tree-Structured Indexes Lecture 17 R & G Chapter 9 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.

Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.

1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Range Searches  `` Find all students with gpa > 3.0 ’’  If data is in sorted file, do.

Tree-Structured Indexes Lecture 5 R & G Chapter 10 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.

Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,

Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.

Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.

Tree-Structured Indexes Lecture 5 R & G Chapter 10 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.

1.1 CS220 Database Systems Tree Based Indexing: B+-tree Slides courtesy G. Kollios Boston University via UC Berkeley.

Tree-Structured Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree- and Hash-Structured Indexes Selected Sections of Chapters 10 & 11.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.

B+ Tree Index tuning--. overview B + -Tree Scalability Typical order: 100. Typical fill-factor: 67%. –average fanout = 133 Typical capacities (root at.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.

1 Database Systems ( 資料庫系統 ) November 1, 2004 Lecture #8 By Hao-hua Chu ( 朱浩華 )

CPSC 404, Laks V.S. Lakshmanan1 Tree-Structured Indexes (Brass Tacks) Chapter 10 Ramakrishnan and Gehrke (Sections )

1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.

I/O Cost Model, Tree Indexes CS634 Lecture 5, Feb 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

Tree-Structured Indexes Chapter 10

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.

Database Applications (15-415) DBMS Internals- Part IV Lecture 15, March 13, 2016 Mohammad Hammoud.

Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.

Database Systems (資料庫系統)

CS522 Advanced database Systems

CS522 Advanced database Systems

Tree-Structured Indexes

Tree-Structured Indexes

Tree-Structured Indexes (Brass Tacks)

Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.

CS522 Advanced database Systems

Tree Indices Chapter 11.

CS522 Advanced database Systems

Database System Implementation CSE 507

Extra: B+ Trees CS1: Java Programming Colorado State University

Database Management Systems (CS 564)

Chapter 11: Indexing and Hashing

Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.

CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.

Tree-Structured Indexes

CS222/CS122C: Principles of Data Management Notes #07 B+ Trees

Introduction to Database Systems Tree Based Indexing: B+-tree

Tree- and Hash-Structured Indexes

Tree-Structured Indexes

Tree-Structured Indexes

Tree-Structured Indexes

Tree-Structured Indexes

CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.

Database Systems (資料庫系統)

CPS216: Advanced Database Systems

Storage and Indexing.

Database Systems (資料庫系統)

Tree-Structured Indexes

Tree-Structured Indexes

Tree-Structured Indexes

Tree-Structured Indexes

Tree- and Hash-Structured Indexes

CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.

CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.

CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees

Presentation transcript:

CS522 Advanced database Systems 9/14/2018 CS522 Advanced database Systems 7. B+ tree 2 Huiping Guo Department of Computer Science California State University, Los Angeles

Outline Handle duplicates Key compression Bulk tree loading algorithm 9/14/2018 Outline Handle duplicates Key compression Bulk tree loading algorithm 7. B+ Tree 2 CS522_S16

Handle duplicates Duplicate keys Several data entries have the same key value Duplicate keys are ignored in the previous search, insertion and deletion How to handle duplicates? Method 1: Use overflow page (like ISAM) Method2: treat duplicates as regular data entries We need to change the definition of B+ tree The left pointer of a node points to all data entries with keys less than OR EQUAL TO the index key The right pointer of a node points to all data entries with keys greater than or equal to the index key 7. B+ Tree 2 CS522_S16

Treat duplicates as regular data entries Example use Alternative 2 Problem: how to find the leftmost data entry 2* 5* 14* 22* 27* 29* 33* 34* 38* 39* Root 30 5 17 5* We need to modify the basic search algorithm to find the leftmost data entry 7. B+ Tree 2 CS522_S16

Find the left-most data entry Index entries may have duplicates Use left-most entries on an index page Find the leaf page Scan left Scan right 7. B+ Tree 2 CS522_S16

Example of handling duplicates 7. B+ Tree 2 CS522_S16

Overflow page approach (d=2) Index on age 11 12 18 19 Primary page 18 18 18 18 19 19 19 19 18 18 18 7. B+ Tree 2 CS522_S16

Non overflow page approach Index on age 18 18 19 11 12 18 18 18 18 18 18 18 18 19 19 19 19 19 search for students with age >= 19? 7. B+ Tree 2 CS522_S16

Non overflow page approach (cont.) Index on gpa 3.4 3.4 3.8 1.8 2.2 3.2 3.4 3.4 3.4 3.4 3.4 3.4 3.4 3.5 3.8 3.8 3.8 3.8 What if we search for students with gpa=3.8? 7. B+ Tree 2 CS522_S16

Non overflow page approach (cont.) Problems with this method If a record is deleted, we need to find the corresponding data entry to delete The process is not efficient because we may have to check several duplicate entries with the same key value Solutions Treat rid value in the data entry as a part of the search key This solution turns the index into unique index (no duplicates) 7. B+ Tree 2 CS522_S16

Key Compression Important to increase fan-out. (Why?) 9/14/2018 Key Compression Important to increase fan-out. (Why?) The height of a tree is determined by: The number of data entries The number of index entries ( fan out) Height=logfan_out(# of data entries) Given the same # of data entries, the larger fan_out, the less height The number of page I/Os will be decreased if the height of the tree is decreased 7. B+ Tree 2 CS522_S16

Key Compression What determine fan_out? 9/14/2018 Key Compression What determine fan_out? The size of the an index entry The larger an index entry, the fewer index entries An index entry contains: A Key A page pointer Key values in index entries only `direct traffic’ can often compress them 7. B+ Tree 2 CS522_S16

key compression example Two adjacent index entries in a node The search key values are David Smith Devarakonda To discriminate the two values, it’s sufficient to store the abbreviated forms “Da” and “De” 7. B+ Tree 2 CS522_S16

Prefix key compression To compress an index entry we must examine the largest key value the subtree to the left of the key and the smallest smallest key value the the subtree to the right of the key Daniel Lee David Smith Devarakonda Dante Wu Darius Rex Davey Jones Compress “David Smith” to “Dav” or “Davi”? 7. B+ Tree 2 CS522_S16

Bulk Loading of a B+ Tree 9/14/2018 Bulk Loading of a B+ Tree How to create a B+ tree on existing collection of data records? Method 1: Start with an empty tree Repeatedly insert records using standard insertion algorithm slow Method 2: Bulk Loading can be done much more efficiently. 7. B+ Tree 2 CS522_S16 20

Initialization Sort all data entries Allocate an empty page to serve as root and insert a pointer to the first page Add one entry (<low key value on page, pointer to page>)to the root page for each page of the sorted data entries. Root Sorted pages of data entries; not yet in B+ tree 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 7. B+ Tree 2 CS522_S16

Insert node Insert two index entries to in a new (root) page. 6 9 10 11 Insert two index entries to in a new (root) page. Root 6 10 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 7. B+ Tree 2 CS522_S16

Insert node 12 13 Root 10 6 12 7. B+ Tree 2 CS522_S16 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 7. B+ Tree 2 CS522_S16

Insert node 20 22 Root 10 6 12 20 7. B+ Tree 2 CS522_S16 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 7. B+ Tree 2 CS522_S16

Insert node 23 31 Root 10 20 6 12 23 7. B+ Tree 2 CS522_S16 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 7. B+ Tree 2 CS522_S16

Insert node 35 36 Root 10 20 6 12 23 35 7. B+ Tree 2 CS522_S16 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 7. B+ Tree 2 CS522_S16

Insert node 38 41 Root 7. B+ Tree 2 CS522_S16 20 10 35 6 12 23 38 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 7. B+ Tree 2 CS522_S16

Insert node 44 Root 7. B+ Tree 2 CS522_S16 20 10 35 6 12 23 38 44 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 7. B+ Tree 2 CS522_S16

Summary of Bulk Loading 9/14/2018 Summary of Bulk Loading Index entries for leaf pages always entered into right-most index page just above leaf level. When this fills up, it splits. Split may go up right-most path to the root. Much faster than repeated inserts 7. B+ Tree 2 CS522_S16 10