Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Organization - B-trees. 11.2Database System Concepts A simple index Brighton A-217 700 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perry.

Similar presentations


Presentation on theme: "Data Organization - B-trees. 11.2Database System Concepts A simple index Brighton A-217 700 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perry."— Presentation transcript:

1 Data Organization - B-trees

2 11.2Database System Concepts A simple index Brighton A-217 700 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perry A-102 400...... A-101 A-102 A-110 A-215 A-217...... Index of depositors on acct_no Index records: To answer a query for “acct_no=A-110” we: 1. Do a binary search on index file, searching for A-110 2. “Chase” pointer of index record Index file

3 11.3Database System Concepts Index Choices 1. Primary: index search key = physical order search key vs Secondary: all other indexes Q: how many primary indices per relation? 2. Dense: index entry for every search key value vs Sparse: some search key values not in the index 3. Single level vs Multilevel (index on the indices)

4 11.4Database System Concepts Measuring ‘goodness’ On what basis do we compare different indices? 1. Access type: what type of queries can be answered:  selection queries (ssn = 123)?  range queries ( 100 <= ssn <= 200)? 2. Access time: what is the cost of evaluating queries  Measured in # of block accesses 3. Maintenance overhead: cost of insertion / deletion? (also BA’s) 4. Space overhead : in # of blocks needed to store the index

5 11.5Database System Concepts Indexing Primary (or clustering) index on SSN

6 11.6Database System Concepts Indexing Secondary (or non-clustering) index: duplicates may exist Address-index Can have many secondary indices but only one primary index

7 11.7Database System Concepts Indexing secondary index: typically, with ‘postings lists’ Postings lists

8 11.8Database System Concepts Indexing Primary/sparse index on ssn (primary key) >=123 >=456

9 11.9Database System Concepts Indexing Secondary / dense index Secondary on a candidate key: No duplicates, no need for posting lists

10 11.10Database System Concepts Summary DenseSparse Primaryrareusual secondaryusual All combinations are possible at most one sparse/clustering index as many as desired dense indices usually: one primary index (probably sparse) and a few secondary indices (non-clustering)

11 11.11Database System Concepts ISAM >=123 >=456 block 2 nd level sparse index on the values of the 1 st level What if index is too large to search in memory?

12 11.12Database System Concepts ISAM - observations What about insertions/deletions? >=123 >=456 124; peterson; fifth ave.

13 11.13Database System Concepts ISAM - observations What about insertions/deletions? 124; peterson; fifth ave. overflows Problems?

14 11.14Database System Concepts ISAM - observations  What about insertions/deletions? 124; peterson; fifth ave. overflows overflow chains may become very long - what to do?

15 11.15Database System Concepts ISAM - observations  What about insertions/deletions? 124; peterson; fifth ave. overflows overflow chains may become very long - thus: shut-down & reorganize start with ~80% utilization

16 11.16Database System Concepts ISAM - observations  if index is too large, store it on disk and keep index on the index (in memory)  usually two levels of indices, one first- level entry per disk block  typically, blocks: 80% full initially (what are potential problems / inefficiencies?)

17 11.17Database System Concepts So far  … indices (like ISAM) suffer in the presence of frequent updates  alternative indexing structure: B - trees

18 11.18Database System Concepts Overview  primary / secondary indices  multilevel (ISAM)  B - trees, B+ - trees  hashing  static hashing  dynamic hashing

19 11.19Database System Concepts B-trees  the most successful family of index schemes (B-trees, B +- trees, B * - trees)  Can be used for primary/secondary, clustering/non-clustering index.  balanced “n-way” search trees

20 11.20Database System Concepts B-trees Eg., B-tree of order 3: 1 3 6 7 9 13 <6 >6 <9 >9

21 11.21Database System Concepts B-tree Nodes v1v2 …v n-1 p1 pn v<v1 v1 <= v < v2 Vn-1 <= v Key values are ordered MAXIMUM: n pointer values MINIMUM:  n/2  pointer values (Exception: root’s minimum = 2)

22 11.22Database System Concepts Properties  “block aware” nodes: each node -> disk page  O(log B (N)) for everything! (ins/del/search)  typically, if m = 50 - 100, then 2 - 3 levels  utilization >= 50%, guaranteed; on average 69%

23 11.23Database System Concepts Queries  Algorithm for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9

24 11.24Database System Concepts Queries  Algorithm for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9

25 11.25Database System Concepts Queries  Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9

26 11.26Database System Concepts Queries  Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9

27 11.27Database System Concepts Queries  Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9 H steps (= disk accesses)

28 11.28Database System Concepts Queries  Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9

29 11.29Database System Concepts Queries  what about range queries? (eg., 5<salary<8)  Proximity/ nearest neighbor searches? (eg., salary ~ 8 )

30 11.30Database System Concepts Queries  what about range queries? (eg., 5<salary<8)  Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) 1 3 6 7 9 13 <6 >6 <9 >9

31 11.31Database System Concepts Queries  what about range queries? (eg., 5<salary<8)  Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) 1 3 6 7 9 13 <6 >6 <9 >9

32 11.32Database System Concepts B-trees: Insertion  Insert in leaf; on overflow, push middle up (recursively)  split: preserves B - tree properties

33 11.33Database System Concepts B-trees Easy case: Tree T0; insert ‘8’ 1 3 6 7 9 13 <6 >6 <9 >9

34 11.34Database System Concepts B-trees Tree T0; insert ‘8’ 1 3 6 7 9 13 <6 >6 <9 >9 8

35 11.35Database System Concepts B-trees Hardest case: Tree T0; insert ‘2’ 1 3 6 7 9 13 <6 >6 <9 >9 2

36 11.36Database System Concepts B-trees Hardest case: Tree T0; insert ‘2’ 1 2 6 7 9 13 3 push middle up

37 11.37Database System Concepts B-trees Hardest case: Tree T0; insert ‘2’ 6 7 9 1313 2 2 Ovf; push middle

38 11.38Database System Concepts B-trees Hardest case: Tree T0; insert ‘2’ 7 9 1313 2 6 Final state

39 11.39Database System Concepts B-trees - insertion  Q: What if there are two middles? (eg, order 4)  A: either one is fine

40 11.40Database System Concepts B-trees: Insertion  Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’)  split: preserves all B - tree properties (!!)  notice how it grows: height increases when root overflows & splits  Automatic, incremental re-organization (contrast with ISAM!)

41 11.41Database System Concepts INSERTION OF KEY ’K’ find the correct leaf node ’L’; if ( ’L’ overflows ){ split ’L’, by pushing the middle key upstairs to parent node ’P’; if (’P’ overflows){ repeat the split recursively; } else{ add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */ } Pseudo-code

42 11.42Database System Concepts Overview  primary / secondary indices  multilevel (ISAM)  B – trees  Dfn, Search, insertion, deletion  B+ - trees  hashing

43 11.43Database System Concepts Deletion Rough outline of algo:  Delete key;  on underflow, may need to merge In practice, some implementers just allow underflows to happen…

44 11.44Database System Concepts B-trees – Deletion Easiest case: Tree T0; delete ‘3’ 1 3 6 7 9 13 <6 >6 <9 >9

45 11.45Database System Concepts B-trees – Deletion Easiest case: Tree T0; delete ‘3’ 1 6 7 9 13 <6 >6 <9 >9

46 11.46Database System Concepts B-trees – Deletion  Case1: delete a key at a leaf – no underflow  Case2: delete non-leaf key – no underflow  Case3: delete leaf-key; underflow, and ‘rich sibling’  Case4: delete leaf-key; underflow, and ‘poor sibling’

47 11.47Database System Concepts B-trees – Deletion  Case1: delete a key at a leaf – no underflow (delete 3 from T0) 1 3 6 7 9 13 <6 >6 <9 >9

48 11.48Database System Concepts B-trees – Deletion  Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 1 3 6 7 9 13 <6 >6 <9 >9 Delete & promote, ie:

49 11.49Database System Concepts B-trees – Deletion  Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 1 3 7 9 13 <6 >6 <9 >9 Delete & promote, ie:

50 11.50Database System Concepts B-trees – Deletion  Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 17 9 13 <6 >6 <9 >9 Delete & promote, ie: 3

51 11.51Database System Concepts B-trees – Deletion  Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 17 9 13 <3 >3 <9 >9 3 FINAL TREE

52 11.52Database System Concepts B-trees – Deletion  Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0)  Q: How to promote?  A: pick the largest key from the left sub-tree (or the smallest from the right sub-tree)  Observation: every deletion eventually becomes a deletion of a leaf key

53 11.53Database System Concepts B-trees – Deletion  Case1: delete a key at a leaf – no underflow  Case2: delete non-leaf key – no underflow  Case3: delete leaf-key; underflow, and ‘rich sibling’  Case4: delete leaf-key; underflow, and ‘poor sibling’

54 11.54Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 6 7 9 13 <6 >6 <9 >9 Delete & borrow, ie:

55 11.55Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 69 13 <6 >6 <9 >9 Delete & borrow, ie: Rich sibling

56 11.56Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’  ‘rich’ = can give a key, without underflowing  ‘borrowing’ a key: THROUGH the PARENT!

57 11.57Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 69 13 <6 >6 <9 >9 Delete & borrow, ie: Rich sibling NO!!

58 11.58Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 69 13 <6 >6 <9 >9 Delete & borrow, ie:

59 11.59Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 39 13 <6 >6 <9 >9 Delete & borrow, ie: 6

60 11.60Database System Concepts B-trees – Deletion  Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 39 13 <3 >3 <9 >9 Delete & borrow, through the parent 6 FINAL TREE

61 11.61Database System Concepts B-trees – Deletion  Case1: delete a key at a leaf – no underflow  Case2: delete non-leaf key – no underflow  Case3: delete leaf-key; underflow, and ‘rich sibling’  Case4: delete leaf-key; underflow, and ‘poor sibling’

62 11.62Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 9 13 <6 >6 <9 >9

63 11.63Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 9 <6 >6 <9 >9

64 11.64Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 9 <6 >6 <9 >9 A: merge w/ ‘poor’ sibling

65 11.65Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0)  Merge, by pulling a key from the parent  exact reversal from insertion: ‘split and push up’, vs. ‘merge and pull down’  Ie.:

66 11.66Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 <6 >6 A: merge w/ ‘poor’ sibling 9

67 11.67Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 <6 >6 9 FINAL TREE

68 11.68Database System Concepts B-trees – Deletion  Case4: underflow & ‘poor sibling’  -> ‘pull key from parent, and merge’  Q: What if the parent underflows?  A: repeat recursively

69 11.69Database System Concepts B-tree deletion - pseudocode DELETION OF KEY ’K’ locate key ’K’, in node ’N’ if( ’N’ is a non-leaf node) { delete ’K’ from ’N’; find the immediately largest key ’K1’; /* which is guaranteed to be on a leaf node ’L’ */ copy ’K1’ in the old position of ’K’; invoke this DELETION routine on ’K1’ from the leaf node ’L’; else { /* ’N’ is a leaf node */... (next slide..)

70 11.70Database System Concepts B-tree deletion - pseudocode /* ’N’ is a leaf node */ if( ’N’ underflows ){ let ’N1’ be the sibling of ’N’; if( ’N1’ is "rich"){ /* ie., N1 can lend us a key */ borrow a key from ’N1’ THROUGH the parent node; }else{ /* N1 is 1 key away from underflowing */ MERGE: pull the key from the parent ’P’, and merge it with the keys of ’N’ and ’N1’ into a new node; if( ’P’ underflows){ repeat recursively } }

71 11.71Database System Concepts B-trees in practice In practice:  no empty leaves;  ptrs to records 1 3 6 7 9 13 <6 >6 <9 >9 theory

72 11.72Database System Concepts B-trees in practice In practice:  no empty leaves;  ptrs to records 1 3 6 7 9 13 <6 >6 <9 >9 practice

73 11.73Database System Concepts B-trees in practice In practice: 13 6 7 9 13 <6 >6 <9 >9 Ssn…… 3 7 6 9 1

74 11.74Database System Concepts B-trees in practice In practice, the formats are: -leaf nodes: (v1, rp1, v2, rp2, … vn, rpn) -Non-leaf nodes: (p1, v1, rp1, p2, v2, rp2, …) 13 6 7 9 13 <6 >6 <9 >9

75 11.75Database System Concepts Overview  primary / secondary indices  multilevel (ISAM)  B – trees  B+ - trees  hashing

76 11.76Database System Concepts B+ trees - Motivation B-tree – print keys in sorted order: 1 3 6 7 9 13 <6 >6 <9 >9

77 11.77Database System Concepts B+ trees - Motivation B-tree needs back-tracking – how to avoid it? 1 3 6 7 9 13 <6 >6 <9 >9

78 11.78Database System Concepts Solution: B + - trees  facilitate sequential ops  They string all leaf nodes together  AND  replicate keys from non-leaf nodes, to make sure every key appears at the leaf level

79 11.79Database System Concepts B+-trees Eg., B+-tree of order 3: 3 4 69 9 <6 =>6 <9 =>9 6713 (3, Joe, 23) (3, Bob, 23) (4, John, 23) ………… root: internal node leaf node Data File

80 11.80Database System Concepts B+ tree insertion INSERTION OF KEY ’K’ insert search-key value to ’L’ such that the keys are in order; if ( ’L’ overflows) { split ’L’ ; insert (ie., COPY) smallest search-key value of new node to parent node ’P’; if (’P’ overflows) { repeat the B-tree split procedure recursively; /* Notice: the B-TREE split; NOT the B+ -tree */ }

81 11.81Database System Concepts B+-tree insertion – cont’d /* ATTENTION: a split at the LEAF level is handled by COPYING the middle key upstairs; A split at a higher level is handled by PUSHING the middle key upstairs */

82 11.82Database System Concepts B+ trees - insertion 1 3 6 6 9 9 <6 >=6>=6 <9 >=9>=9 713 Eg., insert ‘8’

83 11.83Database System Concepts B+ trees - insertion 1 3 6 6 9 9 <6 >=6 <9 >=9 713 Eg., insert ‘8’ 8

84 11.84Database System Concepts B+ trees - insertion 1 3 6 6 9 9 <6 >=6 <9 >=9 713 Eg., insert ‘8’ 8 COPY middle upstairs

85 11.85Database System Concepts B+ trees - insertion 1 3 6 6 9 <6 >=6 <9 >=9 9 13 Eg., insert ‘8’ COPY middle upstairs 7 8 7

86 11.86Database System Concepts B+ trees - insertion 1 3 6 6 9 <6 >=6 <9 >=9 9 13 Eg., insert ‘8’ COPY middle upstairs 7 8 7 Non-leaf overflow – just PUSH the middle

87 11.87Database System Concepts B+ trees – insertion 1 3 6 6 <6 >=6 >=9 9 13 Eg., insert ‘8’ 7 8 7 9 <7>=7 <9 FINAL TREE


Download ppt "Data Organization - B-trees. 11.2Database System Concepts A simple index Brighton A-217 700 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perry."

Similar presentations


Ads by Google