Download presentation
Presentation is loading. Please wait.
Published byRonnie McGinnis Modified over 9 years ago
1
CMU SCS 15-826: Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU www.cs.cmu.edu/~christos
2
CMU SCS Reading Material [Ramakrishnan & Gehrke, 3rd ed, ch. 10] 15-826Copyright: C. Faloutsos (2012)2
3
CMU SCS 15-826Copyright: C. Faloutsos (2012)3 Problem Given a large collection of (multimedia) records, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns
4
CMU SCS 15-826Copyright: C. Faloutsos (2012)4 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining
5
CMU SCS 15-826Copyright: C. Faloutsos (2012)5 Indexing - Detailed outline primary key indexing –B-trees and variants –(static) hashing –extendible hashing secondary key indexing spatial access methods text...
6
CMU SCS 15-826Copyright: C. Faloutsos (2012)6 In even more detail: B – trees B+ - trees, B*-trees hashing
7
CMU SCS 15-826Copyright: C. Faloutsos (2012)7 Primary key indexing find employee with ssn=123
8
CMU SCS 15-826Copyright: C. Faloutsos (2012)8 B-trees the most successful family of index schemes (B-trees, B +- trees, B * -trees) Can be used for primary/secondary, clustering/non-clustering index. balanced “n-way” search trees
9
CMU SCS 15-826Copyright: C. Faloutsos (2012)9 Citation Rudolf Bayer and Edward M. McCreight, Organization and Maintenance of Large Ordered Indices, Acta Informatica, 1:173-189, 1972. Received the 2001 SIGMOD innovations award among the most cited db publications www.informatik.uni-trier.de/~ley/db/about/top.html
10
CMU SCS 15-826Copyright: C. Faloutsos (2012)10 B-trees Eg., B-tree of order 3: 1 3 6 7 9 13 <6 >6 <9 >9
11
CMU SCS 15-826Copyright: C. Faloutsos (2012)11 B - tree properties: each node, in a B-tree of order n: –Key order –at most n pointers –at least n/2 pointers (except root) –all leaves at the same level –if number of pointers is k, then node has exactly k-1 keys –(leaves are empty) v1v2 …v n-1 p1 pn
12
CMU SCS 15-826Copyright: C. Faloutsos (2012)12 Properties “block aware” nodes: each node -> disk page O(log (N)) for everything! (ins/del/search) typically, if n = 50 - 100, then 2 - 3 levels utilization >= 50%, guaranteed; on average 69%
13
CMU SCS 15-826Copyright: C. Faloutsos (2012)13 Queries Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9
14
CMU SCS 15-826Copyright: C. Faloutsos (2012)14 Queries Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9
15
CMU SCS 15-826Copyright: C. Faloutsos (2012)15 Queries Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9
16
CMU SCS 15-826Copyright: C. Faloutsos (2012)16 Queries Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9
17
CMU SCS 15-826Copyright: C. Faloutsos (2012)17 Queries Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9 H steps (= disk accesses)
18
CMU SCS 15-826Copyright: C. Faloutsos (2012)18 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 )
19
CMU SCS 15-826Copyright: C. Faloutsos (2012)19 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) 1 3 6 7 9 13 <6 >6 <9 >9
20
CMU SCS 15-826Copyright: C. Faloutsos (2012)20 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) 1 3 6 7 9 13 <6 >6 <9 >9
21
CMU SCS 15-826Copyright: C. Faloutsos (2012)21 B-trees: Insertion Insert in leaf; on overflow, push middle up (recursively) split: preserves B - tree properties
22
CMU SCS 15-826Copyright: C. Faloutsos (2012)22 B-trees Easy case: Tree T0; insert ‘8’ 1 3 6 7 9 13 <6 >6 <9 >9
23
CMU SCS 15-826Copyright: C. Faloutsos (2012)23 B-trees Tree T0; insert ‘8’ 1 3 6 7 9 13 <6 >6 <9 >9 8
24
CMU SCS 15-826Copyright: C. Faloutsos (2012)24 B-trees Hardest case: Tree T0; insert ‘2’ 1 3 6 7 9 13 <6 >6 <9 >9 2
25
CMU SCS 15-826Copyright: C. Faloutsos (2012)25 B-trees Hardest case: Tree T0; insert ‘2’ 1 2 6 7 9 13 3 push middle up
26
CMU SCS 15-826Copyright: C. Faloutsos (2012)26 B-trees Hardest case: Tree T0; insert ‘2’ 6 7 9 1313 2 2 Ovf; push middle
27
CMU SCS 15-826Copyright: C. Faloutsos (2012)27 B-trees Hardest case: Tree T0; insert ‘2’ 7 9 1313 2 6 Final state
28
CMU SCS 15-826Copyright: C. Faloutsos (2012)28 B-trees: Insertion Q: What if there are two middles? (eg, order 4) A: either one is fine
29
CMU SCS 15-826Copyright: C. Faloutsos (2012)29 B-trees: Insertion Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’) split: preserves all B - tree properties (!!) notice how it grows: height increases when root overflows & splits Automatic, incremental re-organization
30
CMU SCS 15-826Copyright: C. Faloutsos (2012)30 Overview B – trees –Dfn, Search, insertion, deletion B+ - trees hashing
31
CMU SCS 15-826Copyright: C. Faloutsos (2012)31 Deletion Rough outline of algo: Delete key; on underflow, may need to merge In practice, some implementors just allow underflows to happen…
32
CMU SCS 15-826Copyright: C. Faloutsos (2012)32 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ 1 3 6 7 9 13 <6 >6 <9 >9
33
CMU SCS 15-826Copyright: C. Faloutsos (2012)33 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ 1 6 7 9 13 <6 >6 <9 >9
34
CMU SCS 15-826Copyright: C. Faloutsos (2012)34 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ 1 6 7 9 13 <6 >6 <9 >9
35
CMU SCS 15-826Copyright: C. Faloutsos (2012)35 B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor sibling’
36
CMU SCS 15-826Copyright: C. Faloutsos (2012)36 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 1 3 6 7 9 13 <6 >6 <9 >9 Delete & promote, ie:
37
CMU SCS 15-826Copyright: C. Faloutsos (2012)37 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 1 3 7 9 13 <6 >6 <9 >9 Delete & promote, ie:
38
CMU SCS 15-826Copyright: C. Faloutsos (2012)38 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 17 9 13 <6 >6 <9 >9 Delete & promote, ie: 3
39
CMU SCS 15-826Copyright: C. Faloutsos (2012)39 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 17 9 13 <3 >3 <9 >9 3 FINAL TREE
40
CMU SCS 15-826Copyright: C. Faloutsos (2012)40 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) Q: How to promote? A: pick the largest key from the left sub-tree (or the smallest from the right sub-tree) Observation: every deletion eventually becomes a deletion of a leaf key
41
CMU SCS 15-826Copyright: C. Faloutsos (2012)41 B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor sibling’
42
CMU SCS 15-826Copyright: C. Faloutsos (2012)42 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 6 7 9 13 <6 >6 <9 >9 Delete & borrow, ie:
43
CMU SCS 15-826Copyright: C. Faloutsos (2012)43 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 69 13 <6 >6 <9 >9 Delete & borrow, ie: Rich sibling
44
CMU SCS 15-826Copyright: C. Faloutsos (2012)44 B-trees – Deletion Case3: underflow & ‘rich sibling’ ‘rich’ = can give a key, without underflowing ‘borrowing’ a key: THROUGH the PARENT!
45
CMU SCS 15-826Copyright: C. Faloutsos (2012)45 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 69 13 <6 >6 <9 >9 Delete & borrow, ie: Rich sibling NO!!
46
CMU SCS 15-826Copyright: C. Faloutsos (2012)46 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 69 13 <6 >6 <9 >9 Delete & borrow, ie:
47
CMU SCS 15-826Copyright: C. Faloutsos (2012)47 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 9 13 <6 >6 <9 >9 Delete & borrow, ie: 6
48
CMU SCS 15-826Copyright: C. Faloutsos (2012)48 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 39 13 <6 >6 <9 >9 Delete & borrow, ie: 6
49
CMU SCS 15-826Copyright: C. Faloutsos (2012)49 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 39 13 <3 >3 <9 >9 Delete & borrow, through the parent 6 FINAL TREE
50
CMU SCS 15-826Copyright: C. Faloutsos (2012)50 B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor sibling’
51
CMU SCS 15-826Copyright: C. Faloutsos (2012)51 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 9 13 <6 >6 <9 >9
52
CMU SCS 15-826Copyright: C. Faloutsos (2012)52 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 9 <6 >6 <9 >9
53
CMU SCS 15-826Copyright: C. Faloutsos (2012)53 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 9 <6 >6 <9 >9 A: merge w/ ‘poor’ sibling
54
CMU SCS 15-826Copyright: C. Faloutsos (2012)54 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) Merge, by pulling a key from the parent exact reversal from insertion: ‘split and push up’, vs. ‘merge and pull down’ Ie.:
55
CMU SCS 15-826Copyright: C. Faloutsos (2012)55 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 <6 >6 A: merge w/ ‘poor’ sibling 9
56
CMU SCS 15-826Copyright: C. Faloutsos (2012)56 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 <6 >6 9 FINAL TREE
57
CMU SCS 15-826Copyright: C. Faloutsos (2012)57 B-trees – Deletion Case4: underflow & ‘poor sibling’ -> ‘pull key from parent, and merge’ Q: What if the parent underflows?
58
CMU SCS 15-826Copyright: C. Faloutsos (2012)58 B-trees – Deletion Case4: underflow & ‘poor sibling’ -> ‘pull key from parent, and merge’ Q: What if the parent underflows? A: repeat recursively
59
CMU SCS 15-826Copyright: C. Faloutsos (2012)59 Overview B – trees B+ - trees, B*-trees hashing
60
CMU SCS 15-826Copyright: C. Faloutsos (2012)60 B+ trees - Motivation if we want to store the whole record with the key –> problems (what?) 1 3 6 7 9 13 <6 >6 <9 >9
61
CMU SCS 15-826Copyright: C. Faloutsos (2012)61 Solution: B + - trees They string all leaf nodes together AND replicate keys from non-leaf nodes, to make sure every key appears at the leaf level
62
CMU SCS 15-826Copyright: C. Faloutsos (2012)62 B+ trees 1 3 6 6 9 9 <6 >=6>=6 <9 >=9>=9 713
63
CMU SCS 15-826Copyright: C. Faloutsos (2012)63 B+ trees - insertion 1 3 6 6 9 9 <6 >=6>=6 <9 >=9>=9 713 Eg., insert ‘8’
64
CMU SCS 15-826Copyright: C. Faloutsos (2012)64 Overview B – trees B+ - trees, B*-trees hashing
65
CMU SCS 15-826Copyright: C. Faloutsos (2012)65 B*-trees splits drop util. to 50%, and maybe increase height How to avoid them?
66
CMU SCS 15-826Copyright: C. Faloutsos (2012)66 B*-trees: deferred split! Instead of splitting, LEND keys to sibling! (through PARENT, of course!) 1 3 6 7 9 13 <6 >6 <9 >9 2
67
CMU SCS 15-826Copyright: C. Faloutsos (2012)67 B*-trees: deferred split! Instead of splitting, LEND keys to sibling! (through PARENT, of course!) 1 2 3 6 9 13 <3 >3 <9 >9 2 7 FINAL TREE
68
CMU SCS 15-826Copyright: C. Faloutsos (2012)68 B*-trees: deferred split! Notice: shorter, more packed, faster tree It’s a rare case, where space utilization and speed improve together BUT: What if the sibling has no room for our ‘lending’?
69
CMU SCS 15-826Copyright: C. Faloutsos (2012)69 B*-trees: deferred split! BUT: What if the sibling has no room for our ‘lending’? A: 2-to-3 split: get the keys from the sibling, pool them with ours (and a key from the parent), and split in 3. Details: too messy (and even worse for deletion)
70
CMU SCS 15-826Copyright: C. Faloutsos (2012)70 Conclusions Main ideas: recursive; block-aware; on overflow -> split; defer splits All B-tree variants have excellent, O(logN) worst-case performance for ins/del/search B+ tree is the prevailing indexing method More details: [Knuth vol 3.] or [Ramakrishnan & Gehrke, 3rd ed, ch. 10]
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.