Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Slides by Hector Garcia-Molina,

Similar presentations


Presentation on theme: "(Slides by Hector Garcia-Molina,"— Presentation transcript:

1 (Slides by Hector Garcia-Molina,
Chapter 13: Indexing (Slides by Hector Garcia-Molina, Chapter 13

2 Chapter 13 Indexing & Hashing value record ? value Chapter 13

3 Topics Conventional indexes B-trees Hashing schemes (self-study)
Chapter 13

4 Sequential File 20 10 40 30 60 50 80 70 100 90 Chapter 13

5 Sequential File Dense Index 10 20 30 40 50 60 70 80 90 100 10 20 30 40
110 120 100 90 Chapter 13

6 Sequential File Sparse Index 10 20 30 40 50 60 70 80 90 100 10 30 50
110 130 150 60 50 80 70 170 190 210 230 100 90 Chapter 13

7 Sequential File Sparse 2nd level 10 20 30 40 50 60 70 80 90 100 10 90
170 250 10 30 50 70 40 30 90 110 130 150 330 410 490 570 60 50 80 70 170 190 210 230 100 90 Chapter 13

8 Question: Can we build a dense, 2nd level index for a dense index?
Chapter 13

9 Notes on pointers: (1) Block pointer (sparse index) can be smaller than record pointer BP RP Chapter 13

10 Notes on pointers: (2) If file is contiguous, then we can omit
pointers (i.e., compute them) Chapter 13

11 K1 K2 K3 K4 R1 say: R2 1024 B per block R3 R4 if we want K3 block:
get it at offset (3-1)1024 = 2048 bytes R2 K2 K3 R3 K4 R4 Chapter 13

12 Sparse vs. Dense Tradeoff
Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file (Also: sparse better for insertions dense needed for secondary indexes) Chapter 13

13 Terms Index sequential file Search key (  primary key)
Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index Chapter 13

14 Next: Duplicate keys Deletion/Insertion Secondary indexes Chapter 13

15 Duplicate keys 10 20 10 30 20 30 45 40 Chapter 13

16 Dense index, one way to implement?
Duplicate keys Dense index, one way to implement? 10 10 10 10 10 10 10 10 20 10 20 10 20 20 30 20 30 20 20 20 30 30 30 30 30 30 30 30 45 40 45 40 Chapter 13

17 Duplicate keys Dense index, better way? 10 10 20 20 30 30 40 45 10 20
Chapter 13

18 Duplicate keys Sparse index, one way? careful if looking for 20 or 30!
10 10 10 20 20 10 30 30 20 30 45 40 Chapter 13

19 place first new key from block
Duplicate keys Sparse index, another way? place first new key from block 10 should this be 40? 10 20 30 20 10 30 30 20 30 45 40 Chapter 13

20 Duplicate values, primary index
Summary Index may point to first instance of each value only File Index a a a . b Chapter 13

21 Deletion from sparse index
20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

22 Deletion from sparse index
delete record 40 20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

23 Deletion from sparse index
delete record 30 20 10 10 40 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

24 Deletion from sparse index
delete records 30 & 40 20 10 10 50 70 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

25 Deletion from dense index
20 10 10 20 30 40 30 40 60 50 50 60 70 80 70 80 Chapter 13

26 Deletion from dense index
delete record 30 20 10 10 20 40 40 30 30 40 40 60 50 50 60 70 80 70 80 Chapter 13

27 Insertion, sparse index case
20 10 10 30 40 30 60 50 40 60 Chapter 13

28 Insertion, sparse index case
insert record 34 20 10 10 30 40 30 34 our lucky day! we have free space where we need it! 60 50 40 60 Chapter 13

29 Insertion, sparse index case
insert record 15 20 10 15 20 30 10 30 40 30 60 50 40 Illustrated: Immediate reorganization Variation: insert new block (chained file) update index 60 Chapter 13

30 Insertion, sparse index case
insert record 25 20 10 25 overflow blocks (reorganize later...) 10 30 40 30 60 50 40 60 Chapter 13

31 Insertion, dense index case
Similar Often more expensive . . . Chapter 13

32 Secondary indexes 30 50 20 70 80 40 100 10 90 60 Sequence field
Chapter 13

33 Secondary indexes does not make sense! Sparse index 30 50 20 70 80 40
Sequence field Sparse index does not make sense! 50 30 30 20 80 100 70 20 90 ... 40 80 10 100 60 90 Chapter 13

34 Secondary indexes Dense index sparse high level 30 50 20 70 80 40 100
Sequence field Dense index 10 20 30 40 50 60 70 ... 50 30 10 50 90 ... sparse high level 70 20 40 80 10 100 60 90 Chapter 13

35 With secondary indexes:
Lowest level is dense Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed) Chapter 13

36 Summary so far Conventional index
Basic Ideas: sparse, dense, multi-level… Duplicate Keys Deletion/Insertion Secondary indexes Chapter 13

37 Conventional indexes Advantage: - Simple - Index is sequential file
good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance Chapter 13

38 Outline: Conventional indexes B-Trees  NEXT
Hashing schemes (self-study) Chapter 13

39 NEXT: Another type of index
Give up on sequentiality of index Try to get “balance” Chapter 13

40 B+Tree Example n=3 Root 100 120 150 180 30 3 5 11 120 130 30 35 100 101 110 180 200 150 156 179 Chapter 13

41 Sample non-leaf 57 81 95 to keys to keys to keys to keys
<  k<81 81k<95 95 Chapter 13

42 Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95
with key 57 with key 81 To record with key 85 Chapter 13

43 Size of nodes: n+1 pointers n keys
(fixed) Chapter 13

44 Don’t want nodes to be too empty
Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data Chapter 13

45 n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35
counts even if null Chapter 13

46 B+tree rules: tree of order n
All leaves are at the same lowest level (balanced tree) (2) Pointers in leaves point to records, except for “sequence pointer” Chapter 13

47 (3) Number of pointers/keys for B+tree
Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1 Chapter 13

48 Insert into B+tree (a) simple case (b) leaf overflow
space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root Chapter 13

49 (a) Insert key = 32 n=3 100 30 3 5 11 30 31 32 Chapter 13

50 (a) Insert key = 7 n=3 100 30 7 3 5 11 30 31 3 5 7 Chapter 13

51 (c) Insert key = 160 n=3 100 160 120 150 180 180 150 156 179 180 200 160 179 Chapter 13

52 (d) New root, insert 45 n=3 30 new root 10 20 30 40 1 2 3 10 12 20 25
32 40 40 45 Chapter 13

53 Deletion from B+tree (a) Simple case - no example
(b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Chapter 13

54 n=4 (b) Coalesce with sibling Delete 50 10 40 100 40 10 20 30 40 50
Chapter 13

55 n=4 (c) Redistribute keys Delete 50 10 40 100 35 10 20 30 35 40 50
Chapter 13

56 n=4 (d) Non-leaf coalese Delete 37 new root 25 25 10 20 30 40 40 30 25
26 1 3 10 14 20 22 30 37 40 45 Chapter 13

57 B+tree deletions in practice
Often, coalescing is not implemented Too hard and not worth it! Chapter 13

58 Variation on B+tree: B-tree (no +)
Idea: Avoid duplicate keys Have record pointers in non-leaf nodes Chapter 13

59 K1 P1 K2 P2 K3 P3 to record to record to record
with K1 with K with K3 to keys to keys to keys to keys < K K1<x<K K2<x<k >k3 K1 P1 K2 P2 K3 P3 Chapter 13

60 B-tree example n=2 sequence pointers not useful now! 65 125 25 45 85
(but keep space for simplicity) 65 125 25 45 85 105 145 165 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 Chapter 13

61 Tradeoffs:  B-trees have faster lookup than B+trees
 in B-tree, non-leaf & leaf different sizes  in B-tree, deletion more complicated  B+trees preferred! Chapter 13

62 But note: If blocks are fixed size (due to disk and buffering restrictions) Then lookup for B+tree is actually better!! Chapter 13

63 So... B+ B Conclusion: For fixed block size,
8 records ooooooooooooo ooooooooo 156 records records Total = 116 B+ B Conclusion: For fixed block size, B+ tree is better because it is bushier Chapter 13

64 Outline/summary Conventional Indexes B trees
Sparse vs. dense Primary vs. secondary B trees B+trees vs. B-trees B+trees vs. indexed sequential Hashing schemes (self-study) Chapter 13


Download ppt "(Slides by Hector Garcia-Molina,"

Similar presentations


Ads by Google