Download presentation
Presentation is loading. Please wait.
1
(Slides by Hector Garcia-Molina,
Chapter 13: Indexing (Slides by Hector Garcia-Molina, Chapter 13
2
Chapter 13 Indexing & Hashing value record ? value Chapter 13
3
Topics Conventional indexes B-trees Hashing schemes (self-study)
Chapter 13
4
Sequential File 20 10 40 30 60 50 80 70 100 90 Chapter 13
5
Sequential File Dense Index 10 20 30 40 50 60 70 80 90 100 10 20 30 40
110 120 100 90 Chapter 13
6
Sequential File Sparse Index 10 20 30 40 50 60 70 80 90 100 10 30 50
110 130 150 60 50 80 70 170 190 210 230 100 90 Chapter 13
7
Sequential File Sparse 2nd level 10 20 30 40 50 60 70 80 90 100 10 90
170 250 10 30 50 70 40 30 90 110 130 150 330 410 490 570 60 50 80 70 170 190 210 230 100 90 Chapter 13
8
Question: Can we build a dense, 2nd level index for a dense index?
Chapter 13
9
Notes on pointers: (1) Block pointer (sparse index) can be smaller than record pointer BP RP Chapter 13
10
Notes on pointers: (2) If file is contiguous, then we can omit
pointers (i.e., compute them) Chapter 13
11
K1 K2 K3 K4 R1 say: R2 1024 B per block R3 R4 if we want K3 block:
get it at offset (3-1)1024 = 2048 bytes R2 K2 K3 R3 K4 R4 Chapter 13
12
Sparse vs. Dense Tradeoff
Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file (Also: sparse better for insertions dense needed for secondary indexes) Chapter 13
13
Terms Index sequential file Search key ( primary key)
Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index Chapter 13
14
Next: Duplicate keys Deletion/Insertion Secondary indexes Chapter 13
15
Duplicate keys 10 20 10 30 20 30 45 40 Chapter 13
16
Dense index, one way to implement?
Duplicate keys Dense index, one way to implement? 10 10 10 10 10 10 10 10 20 10 20 10 20 20 30 20 30 20 20 20 30 30 30 30 30 30 30 30 45 40 45 40 Chapter 13
17
Duplicate keys Dense index, better way? 10 10 20 20 30 30 40 45 10 20
Chapter 13
18
Duplicate keys Sparse index, one way? careful if looking for 20 or 30!
10 10 10 20 20 10 30 30 20 30 45 40 Chapter 13
19
place first new key from block
Duplicate keys Sparse index, another way? place first new key from block 10 should this be 40? 10 20 30 20 10 30 30 20 30 45 40 Chapter 13
20
Duplicate values, primary index
Summary Index may point to first instance of each value only File Index a a a . b Chapter 13
21
Deletion from sparse index
20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13
22
Deletion from sparse index
delete record 40 20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13
23
Deletion from sparse index
delete record 30 20 10 10 40 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13
24
Deletion from sparse index
delete records 30 & 40 20 10 10 50 70 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13
25
Deletion from dense index
20 10 10 20 30 40 30 40 60 50 50 60 70 80 70 80 Chapter 13
26
Deletion from dense index
delete record 30 20 10 10 20 40 40 30 30 40 40 60 50 50 60 70 80 70 80 Chapter 13
27
Insertion, sparse index case
20 10 10 30 40 30 60 50 40 60 Chapter 13
28
Insertion, sparse index case
insert record 34 20 10 10 30 40 30 34 our lucky day! we have free space where we need it! 60 50 40 60 Chapter 13
29
Insertion, sparse index case
insert record 15 20 10 15 20 30 10 30 40 30 60 50 40 Illustrated: Immediate reorganization Variation: insert new block (chained file) update index 60 Chapter 13
30
Insertion, sparse index case
insert record 25 20 10 25 overflow blocks (reorganize later...) 10 30 40 30 60 50 40 60 Chapter 13
31
Insertion, dense index case
Similar Often more expensive . . . Chapter 13
32
Secondary indexes 30 50 20 70 80 40 100 10 90 60 Sequence field
Chapter 13
33
Secondary indexes does not make sense! Sparse index 30 50 20 70 80 40
Sequence field Sparse index does not make sense! 50 30 30 20 80 100 70 20 90 ... 40 80 10 100 60 90 Chapter 13
34
Secondary indexes Dense index sparse high level 30 50 20 70 80 40 100
Sequence field Dense index 10 20 30 40 50 60 70 ... 50 30 10 50 90 ... sparse high level 70 20 40 80 10 100 60 90 Chapter 13
35
With secondary indexes:
Lowest level is dense Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed) Chapter 13
36
Summary so far Conventional index
Basic Ideas: sparse, dense, multi-level… Duplicate Keys Deletion/Insertion Secondary indexes Chapter 13
37
Conventional indexes Advantage: - Simple - Index is sequential file
good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance Chapter 13
38
Outline: Conventional indexes B-Trees NEXT
Hashing schemes (self-study) Chapter 13
39
NEXT: Another type of index
Give up on sequentiality of index Try to get “balance” Chapter 13
40
B+Tree Example n=3 Root 100 120 150 180 30 3 5 11 120 130 30 35 100 101 110 180 200 150 156 179 Chapter 13
41
Sample non-leaf 57 81 95 to keys to keys to keys to keys
< k<81 81k<95 95 Chapter 13
42
Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95
with key 57 with key 81 To record with key 85 Chapter 13
43
Size of nodes: n+1 pointers n keys
(fixed) Chapter 13
44
Don’t want nodes to be too empty
Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data Chapter 13
45
n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35
counts even if null Chapter 13
46
B+tree rules: tree of order n
All leaves are at the same lowest level (balanced tree) (2) Pointers in leaves point to records, except for “sequence pointer” Chapter 13
47
(3) Number of pointers/keys for B+tree
Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1 Chapter 13
48
Insert into B+tree (a) simple case (b) leaf overflow
space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root Chapter 13
49
(a) Insert key = 32 n=3 100 30 3 5 11 30 31 32 Chapter 13
50
(a) Insert key = 7 n=3 100 30 7 3 5 11 30 31 3 5 7 Chapter 13
51
(c) Insert key = 160 n=3 100 160 120 150 180 180 150 156 179 180 200 160 179 Chapter 13
52
(d) New root, insert 45 n=3 30 new root 10 20 30 40 1 2 3 10 12 20 25
32 40 40 45 Chapter 13
53
Deletion from B+tree (a) Simple case - no example
(b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Chapter 13
54
n=4 (b) Coalesce with sibling Delete 50 10 40 100 40 10 20 30 40 50
Chapter 13
55
n=4 (c) Redistribute keys Delete 50 10 40 100 35 10 20 30 35 40 50
Chapter 13
56
n=4 (d) Non-leaf coalese Delete 37 new root 25 25 10 20 30 40 40 30 25
26 1 3 10 14 20 22 30 37 40 45 Chapter 13
57
B+tree deletions in practice
Often, coalescing is not implemented Too hard and not worth it! Chapter 13
58
Variation on B+tree: B-tree (no +)
Idea: Avoid duplicate keys Have record pointers in non-leaf nodes Chapter 13
59
K1 P1 K2 P2 K3 P3 to record to record to record
with K1 with K with K3 to keys to keys to keys to keys < K K1<x<K K2<x<k >k3 K1 P1 K2 P2 K3 P3 Chapter 13
60
B-tree example n=2 sequence pointers not useful now! 65 125 25 45 85
(but keep space for simplicity) 65 125 25 45 85 105 145 165 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 Chapter 13
61
Tradeoffs: B-trees have faster lookup than B+trees
in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated B+trees preferred! Chapter 13
62
But note: If blocks are fixed size (due to disk and buffering restrictions) Then lookup for B+tree is actually better!! Chapter 13
63
So... B+ B Conclusion: For fixed block size,
8 records ooooooooooooo ooooooooo 156 records records Total = 116 B+ B Conclusion: For fixed block size, B+ tree is better because it is bushier Chapter 13
64
Outline/summary Conventional Indexes B trees
Sparse vs. dense Primary vs. secondary B trees B+trees vs. B-trees B+trees vs. indexed sequential Hashing schemes (self-study) Chapter 13
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.