Data Organization - B-trees. 11.2Database System Concepts A simple index Brighton A-217 700 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perry.

Slides:



Advertisements
Similar presentations
B+-Trees and Hashing Techniques for Storage and Index Structures
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
Data Organization - B-trees. A simple index Brighton A Downtown A Downtown A Mianus A Perry A A-101 A-102.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
CS CS4432: Database Systems II Basic indexing.
B+-tree and Hashing.
Chapter 9 of DBMS First we look at a simple (strawman) approach (ISAM). We will see why it is unsatisfactory. This will motivate the B+Tree Read 9.1 to.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science Database Applications Lecture#9: Indexing (R&G ch. 10)
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture#9 (R&G ch. 10) Indexing.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Adapted from Mike Franklin
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
CS 405G: Introduction to Database Systems 12. Index.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Data Organization - B-trees
Tree-Structured Indexes: Introduction
Tree-Structured Indexes
COP Introduction to Database Structures
C. Faloutsos Indexing and Hashing – part I
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
B+-Trees and Static Hashing
Tree-Structured Indexes
Faloutsos - Pavlo CMU SCS /615
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Indexing 1.
Credit for some of the slides in this lecture goes to
General External Merge Sort
Temple University – CIS Dept. CIS616– Principles of Data Management
15-826: Multimedia Databases and Data Mining
Access Methods Ways to access data on disk Heap Files
Tree-Structured Indexes
Presentation transcript:

Data Organization - B-trees

11.2Database System Concepts A simple index Brighton A Downtown A Downtown A Mianus A Perry A A-101 A-102 A-110 A-215 A Index of depositors on acct_no Index records: To answer a query for “acct_no=A-110” we: 1. Do a binary search on index file, searching for A “Chase” pointer of index record Index file

11.3Database System Concepts Index Choices 1. Primary: index search key = physical order search key vs Secondary: all other indexes Q: how many primary indices per relation? 2. Dense: index entry for every search key value vs Sparse: some search key values not in the index 3. Single level vs Multilevel (index on the indices)

11.4Database System Concepts Measuring ‘goodness’ On what basis do we compare different indices? 1. Access type: what type of queries can be answered:  selection queries (ssn = 123)?  range queries ( 100 <= ssn <= 200)? 2. Access time: what is the cost of evaluating queries  Measured in # of block accesses 3. Maintenance overhead: cost of insertion / deletion? (also BA’s) 4. Space overhead : in # of blocks needed to store the index

11.5Database System Concepts Indexing Primary (or clustering) index on SSN

11.6Database System Concepts Indexing Secondary (or non-clustering) index: duplicates may exist Address-index Can have many secondary indices but only one primary index

11.7Database System Concepts Indexing secondary index: typically, with ‘postings lists’ Postings lists

11.8Database System Concepts Indexing Primary/sparse index on ssn (primary key) >=123 >=456

11.9Database System Concepts Indexing Secondary / dense index Secondary on a candidate key: No duplicates, no need for posting lists

11.10Database System Concepts Summary DenseSparse Primaryrareusual secondaryusual All combinations are possible at most one sparse/clustering index as many as desired dense indices usually: one primary index (probably sparse) and a few secondary indices (non-clustering)

11.11Database System Concepts ISAM >=123 >=456 block 2 nd level sparse index on the values of the 1 st level What if index is too large to search in memory?

11.12Database System Concepts ISAM - observations What about insertions/deletions? >=123 >= ; peterson; fifth ave.

11.13Database System Concepts ISAM - observations What about insertions/deletions? 124; peterson; fifth ave. overflows Problems?

11.14Database System Concepts ISAM - observations  What about insertions/deletions? 124; peterson; fifth ave. overflows overflow chains may become very long - what to do?

11.15Database System Concepts ISAM - observations  What about insertions/deletions? 124; peterson; fifth ave. overflows overflow chains may become very long - thus: shut-down & reorganize start with ~80% utilization

11.16Database System Concepts ISAM - observations  if index is too large, store it on disk and keep index on the index (in memory)  usually two levels of indices, one first- level entry per disk block  typically, blocks: 80% full initially (what are potential problems / inefficiencies?)

11.17Database System Concepts So far  … indices (like ISAM) suffer in the presence of frequent updates  alternative indexing structure: B - trees

11.18Database System Concepts Overview  primary / secondary indices  multilevel (ISAM)  B - trees, B+ - trees  hashing  static hashing  dynamic hashing

11.19Database System Concepts B-trees  the most successful family of index schemes (B-trees, B +- trees, B * - trees)  Can be used for primary/secondary, clustering/non-clustering index.  balanced “n-way” search trees

11.20Database System Concepts B-trees Eg., B-tree of order 3: <6 >6 <9 >9

11.21Database System Concepts B-tree Nodes v1v2 …v n-1 p1 pn v<v1 v1 <= v < v2 Vn-1 <= v Key values are ordered MAXIMUM: n pointer values MINIMUM:  n/2  pointer values (Exception: root’s minimum = 2)

11.22Database System Concepts Properties  “block aware” nodes: each node -> disk page  O(log B (N)) for everything! (ins/del/search)  typically, if m = , then levels  utilization >= 50%, guaranteed; on average 69%

11.23Database System Concepts Queries  Algorithm for exact match query? (eg., ssn=8?) <6 >6 <9 >9

11.24Database System Concepts Queries  Algorithm for exact match query? (eg., ssn=8?) <6 >6 <9 >9

11.25Database System Concepts Queries  Algo for exact match query? (eg., ssn=8?) <6 >6 <9 >9

11.26Database System Concepts Queries  Algo for exact match query? (eg., ssn=8?) <6 >6 <9 >9

11.27Database System Concepts Queries  Algo for exact match query? (eg., ssn=8?) <6 >6 <9 >9 H steps (= disk accesses)

11.28Database System Concepts Queries  Algo for exact match query? (eg., ssn=8?) <6 >6 <9 >9

11.29Database System Concepts Queries  what about range queries? (eg., 5<salary<8)  Proximity/ nearest neighbor searches? (eg., salary ~ 8 )

11.30Database System Concepts Queries  what about range queries? (eg., 5<salary<8)  Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 >6 <9 >9

11.31Database System Concepts Queries  what about range queries? (eg., 5<salary<8)  Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 >6 <9 >9

11.32Database System Concepts B-trees: Insertion  Insert in leaf; on overflow, push middle up (recursively)  split: preserves B - tree properties

11.33Database System Concepts B-trees Easy case: Tree T0; insert ‘8’ <6 >6 <9 >9

11.34Database System Concepts B-trees Tree T0; insert ‘8’ <6 >6 <9 >9 8

11.35Database System Concepts B-trees Hardest case: Tree T0; insert ‘2’ <6 >6 <9 >9 2

11.36Database System Concepts B-trees Hardest case: Tree T0; insert ‘2’ push middle up

11.37Database System Concepts B-trees Hardest case: Tree T0; insert ‘2’ Ovf; push middle

11.38Database System Concepts B-trees Hardest case: Tree T0; insert ‘2’ Final state

11.39Database System Concepts B-trees - insertion  Q: What if there are two middles? (eg, order 4)  A: either one is fine

11.40Database System Concepts B-trees: Insertion  Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’)  split: preserves all B - tree properties (!!)  notice how it grows: height increases when root overflows & splits  Automatic, incremental re-organization (contrast with ISAM!)

11.41Database System Concepts INSERTION OF KEY ’K’ find the correct leaf node ’L’; if ( ’L’ overflows ){ split ’L’, by pushing the middle key upstairs to parent node ’P’; if (’P’ overflows){ repeat the split recursively; } else{ add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */ } Pseudo-code

11.42Database System Concepts Overview  primary / secondary indices  multilevel (ISAM)  B – trees  Dfn, Search, insertion, deletion  B+ - trees  hashing

11.43Database System Concepts Deletion Rough outline of algo:  Delete key;  on underflow, may need to merge In practice, some implementers just allow underflows to happen…

11.44Database System Concepts B-trees – Deletion Easiest case: Tree T0; delete ‘3’ <6 >6 <9 >9

11.45Database System Concepts B-trees – Deletion Easiest case: Tree T0; delete ‘3’ <6 >6 <9 >9

11.46Database System Concepts B-trees – Deletion  Case1: delete a key at a leaf – no underflow  Case2: delete non-leaf key – no underflow  Case3: delete leaf-key; underflow, and ‘rich sibling’  Case4: delete leaf-key; underflow, and ‘poor sibling’

11.47Database System Concepts B-trees – Deletion  Case1: delete a key at a leaf – no underflow (delete 3 from T0) <6 >6 <9 >9

11.48Database System Concepts B-trees – Deletion  Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) <6 >6 <9 >9 Delete & promote, ie:

11.49Database System Concepts B-trees – Deletion  Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) <6 >6 <9 >9 Delete & promote, ie:

11.50Database System Concepts B-trees – Deletion  Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) <6 >6 <9 >9 Delete & promote, ie: 3

11.51Database System Concepts B-trees – Deletion  Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) <3 >3 <9 >9 3 FINAL TREE

11.52Database System Concepts B-trees – Deletion  Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0)  Q: How to promote?  A: pick the largest key from the left sub-tree (or the smallest from the right sub-tree)  Observation: every deletion eventually becomes a deletion of a leaf key

11.53Database System Concepts B-trees – Deletion  Case1: delete a key at a leaf – no underflow  Case2: delete non-leaf key – no underflow  Case3: delete leaf-key; underflow, and ‘rich sibling’  Case4: delete leaf-key; underflow, and ‘poor sibling’

11.54Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie:

11.55Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie: Rich sibling

11.56Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’  ‘rich’ = can give a key, without underflowing  ‘borrowing’ a key: THROUGH the PARENT!

11.57Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie: Rich sibling NO!!

11.58Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie:

11.59Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie: 6

11.60Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <3 >3 <9 >9 Delete & borrow, through the parent 6 FINAL TREE

11.61Database System Concepts B-trees – Deletion  Case1: delete a key at a leaf – no underflow  Case2: delete non-leaf key – no underflow  Case3: delete leaf-key; underflow, and ‘rich sibling’  Case4: delete leaf-key; underflow, and ‘poor sibling’

11.62Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) <6 >6 <9 >9

11.63Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) <6 >6 <9 >9

11.64Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) <6 >6 <9 >9 A: merge w/ ‘poor’ sibling

11.65Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0)  Merge, by pulling a key from the parent  exact reversal from insertion: ‘split and push up’, vs. ‘merge and pull down’  Ie.:

11.66Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) <6 >6 A: merge w/ ‘poor’ sibling 9

11.67Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) <6 >6 9 FINAL TREE

11.68Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’  -> ‘pull key from parent, and merge’  Q: What if the parent underflows?  A: repeat recursively

11.69Database System Concepts B-tree deletion - pseudocode DELETION OF KEY ’K’ locate key ’K’, in node ’N’ if( ’N’ is a non-leaf node) { delete ’K’ from ’N’; find the immediately largest key ’K1’; /* which is guaranteed to be on a leaf node ’L’ */ copy ’K1’ in the old position of ’K’; invoke this DELETION routine on ’K1’ from the leaf node ’L’; else { /* ’N’ is a leaf node */... (next slide..)

11.70Database System Concepts B-tree deletion - pseudocode /* ’N’ is a leaf node */ if( ’N’ underflows ){ let ’N1’ be the sibling of ’N’; if( ’N1’ is "rich"){ /* ie., N1 can lend us a key */ borrow a key from ’N1’ THROUGH the parent node; }else{ /* N1 is 1 key away from underflowing */ MERGE: pull the key from the parent ’P’, and merge it with the keys of ’N’ and ’N1’ into a new node; if( ’P’ underflows){ repeat recursively } }

11.71Database System Concepts B-trees in practice In practice:  no empty leaves;  ptrs to records <6 >6 <9 >9 theory

11.72Database System Concepts B-trees in practice In practice:  no empty leaves;  ptrs to records <6 >6 <9 >9 practice

11.73Database System Concepts B-trees in practice In practice: <6 >6 <9 >9 Ssn……

11.74Database System Concepts B-trees in practice In practice, the formats are: -leaf nodes: (v1, rp1, v2, rp2, … vn, rpn) -Non-leaf nodes: (p1, v1, rp1, p2, v2, rp2, …) <6 >6 <9 >9

11.75Database System Concepts Overview  primary / secondary indices  multilevel (ISAM)  B – trees  B+ - trees  hashing

11.76Database System Concepts B+ trees - Motivation B-tree – print keys in sorted order: <6 >6 <9 >9

11.77Database System Concepts B+ trees - Motivation B-tree needs back-tracking – how to avoid it? <6 >6 <9 >9

11.78Database System Concepts Solution: B + - trees  facilitate sequential ops  They string all leaf nodes together  AND  replicate keys from non-leaf nodes, to make sure every key appears at the leaf level

11.79Database System Concepts B+-trees Eg., B+-tree of order 3: <6 =>6 <9 => (3, Joe, 23) (3, Bob, 23) (4, John, 23) ………… root: internal node leaf node Data File

11.80Database System Concepts B+ tree insertion INSERTION OF KEY ’K’ insert search-key value to ’L’ such that the keys are in order; if ( ’L’ overflows) { split ’L’ ; insert (ie., COPY) smallest search-key value of new node to parent node ’P’; if (’P’ overflows) { repeat the B-tree split procedure recursively; /* Notice: the B-TREE split; NOT the B+ -tree */ }

11.81Database System Concepts B+-tree insertion – cont’d /* ATTENTION: a split at the LEAF level is handled by COPYING the middle key upstairs; A split at a higher level is handled by PUSHING the middle key upstairs */

11.82Database System Concepts B+ trees - insertion <6 >=6>=6 <9 >=9>=9 713 Eg., insert ‘8’

11.83Database System Concepts B+ trees - insertion <6 >=6 <9 >=9 713 Eg., insert ‘8’ 8

11.84Database System Concepts B+ trees - insertion <6 >=6 <9 >=9 713 Eg., insert ‘8’ 8 COPY middle upstairs

11.85Database System Concepts B+ trees - insertion <6 >=6 <9 >= Eg., insert ‘8’ COPY middle upstairs 7 8 7

11.86Database System Concepts B+ trees - insertion <6 >=6 <9 >= Eg., insert ‘8’ COPY middle upstairs Non-leaf overflow – just PUSH the middle

11.87Database System Concepts B+ trees – insertion <6 >=6 >= Eg., insert ‘8’ <7>=7 <9 FINAL TREE