(Slides by Hector Garcia-Molina,

(Slides by Hector Garcia-Molina,
Chapter 13: Indexing (Slides by Hector Garcia-Molina, Chapter 13

Chapter 13 Indexing & Hashing value record ? value Chapter 13

Topics Conventional indexes B-trees Hashing schemes (self-study)
Chapter 13

Sequential File 20 10 40 30 60 50 80 70 100 90 Chapter 13

Sequential File Dense Index 10 20 30 40 50 60 70 80 90 100 10 20 30 40
110 120 100 90 Chapter 13

Sequential File Sparse Index 10 20 30 40 50 60 70 80 90 100 10 30 50
110 130 150 60 50 80 70 170 190 210 230 100 90 Chapter 13

Sequential File Sparse 2nd level 10 20 30 40 50 60 70 80 90 100 10 90
170 250 10 30 50 70 40 30 90 110 130 150 330 410 490 570 60 50 80 70 170 190 210 230 100 90 Chapter 13

Question: Can we build a dense, 2nd level index for a dense index?
Chapter 13

Notes on pointers: (1) Block pointer (sparse index) can be smaller than record pointer BP RP Chapter 13

Notes on pointers: (2) If file is contiguous, then we can omit
pointers (i.e., compute them) Chapter 13

K1 K2 K3 K4 R1 say: R2 1024 B per block R3 R4 if we want K3 block:
get it at offset (3-1)1024 = 2048 bytes R2 K2 K3 R3 K4 R4 Chapter 13

Sparse vs. Dense Tradeoff
Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file (Also: sparse better for insertions dense needed for secondary indexes) Chapter 13

Terms Index sequential file Search key (  primary key)
Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index Chapter 13

Next: Duplicate keys Deletion/Insertion Secondary indexes Chapter 13

Duplicate keys 10 20 10 30 20 30 45 40 Chapter 13

Dense index, one way to implement?
Duplicate keys Dense index, one way to implement? 10 10 10 10 10 10 10 10 20 10 20 10 20 20 30 20 30 20 20 20 30 30 30 30 30 30 30 30 45 40 45 40 Chapter 13

Duplicate keys Dense index, better way? 10 10 20 20 30 30 40 45 10 20
Chapter 13

Duplicate keys Sparse index, one way? careful if looking for 20 or 30!
10 10 10 20 20 10 30 30 20 30 45 40 Chapter 13

place first new key from block
Duplicate keys Sparse index, another way? place first new key from block 10 should this be 40? 10 20 30 20 10 30 30 20 30 45 40 Chapter 13

Duplicate values, primary index
Summary Index may point to first instance of each value only File Index a a a . b Chapter 13

Deletion from sparse index
20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

delete record 40 20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

delete record 30 20 10 10 40 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

delete records 30 & 40 20 10 10 50 70 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

Deletion from dense index
20 10 10 20 30 40 30 40 60 50 50 60 70 80 70 80 Chapter 13

Deletion from dense index
delete record 30 20 10 10 20 40 40 30 30 40 40 60 50 50 60 70 80 70 80 Chapter 13

Insertion, sparse index case
20 10 10 30 40 30 60 50 40 60 Chapter 13

insert record 34 20 10 10 30 40 30 34 our lucky day! we have free space where we need it! 60 50 40 60 Chapter 13

insert record 15 20 10 15 20 30 10 30 40 30 60 50 40 Illustrated: Immediate reorganization Variation: insert new block (chained file) update index 60 Chapter 13

insert record 25 20 10 25 overflow blocks (reorganize later...) 10 30 40 30 60 50 40 60 Chapter 13

Insertion, dense index case
Similar Often more expensive . . . Chapter 13

Secondary indexes 30 50 20 70 80 40 100 10 90 60 Sequence field
Chapter 13

Secondary indexes does not make sense! Sparse index 30 50 20 70 80 40
Sequence field Sparse index does not make sense! 50 30 30 20 80 100 70 20 90 ... 40 80 10 100 60 90 Chapter 13

Secondary indexes Dense index sparse high level 30 50 20 70 80 40 100
Sequence field Dense index 10 20 30 40 50 60 70 ... 50 30 10 50 90 ... sparse high level 70 20 40 80 10 100 60 90 Chapter 13

With secondary indexes:
Lowest level is dense Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed) Chapter 13

Summary so far Conventional index
Basic Ideas: sparse, dense, multi-level… Duplicate Keys Deletion/Insertion Secondary indexes Chapter 13

Conventional indexes Advantage: - Simple - Index is sequential file
good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance Chapter 13

Outline: Conventional indexes B-Trees  NEXT
Hashing schemes (self-study) Chapter 13

NEXT: Another type of index
Give up on sequentiality of index Try to get “balance” Chapter 13

B+Tree Example n=3 Root 100 120 150 180 30 3 5 11 120 130 30 35 100 101 110 180 200 150 156 179 Chapter 13

Sample non-leaf 57 81 95 to keys to keys to keys to keys
<  k<81 81k<95 95 Chapter 13

Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95
with key 57 with key 81 To record with key 85 Chapter 13

Size of nodes: n+1 pointers n keys
(fixed) Chapter 13

Don’t want nodes to be too empty
Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data Chapter 13

n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35
counts even if null Chapter 13

B+tree rules: tree of order n
All leaves are at the same lowest level (balanced tree) (2) Pointers in leaves point to records, except for “sequence pointer” Chapter 13

(3) Number of pointers/keys for B+tree
Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1 Chapter 13

Insert into B+tree (a) simple case (b) leaf overflow
space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root Chapter 13

(a) Insert key = 32 n=3 100 30 3 5 11 30 31 32 Chapter 13

(a) Insert key = 7 n=3 100 30 7 3 5 11 30 31 3 5 7 Chapter 13

(c) Insert key = 160 n=3 100 160 120 150 180 180 150 156 179 180 200 160 179 Chapter 13

(d) New root, insert 45 n=3 30 new root 10 20 30 40 1 2 3 10 12 20 25
32 40 40 45 Chapter 13

Deletion from B+tree (a) Simple case - no example
(b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Chapter 13

n=4 (b) Coalesce with sibling Delete 50 10 40 100 40 10 20 30 40 50
Chapter 13

n=4 (c) Redistribute keys Delete 50 10 40 100 35 10 20 30 35 40 50
Chapter 13

n=4 (d) Non-leaf coalese Delete 37 new root 25 25 10 20 30 40 40 30 25
26 1 3 10 14 20 22 30 37 40 45 Chapter 13

B+tree deletions in practice
Often, coalescing is not implemented Too hard and not worth it! Chapter 13

Variation on B+tree: B-tree (no +)
Idea: Avoid duplicate keys Have record pointers in non-leaf nodes Chapter 13

K1 P1 K2 P2 K3 P3 to record to record to record
with K1 with K with K3 to keys to keys to keys to keys < K K1<x<K K2<x<k >k3 K1 P1 K2 P2 K3 P3 Chapter 13

B-tree example n=2 sequence pointers not useful now! 65 125 25 45 85
(but keep space for simplicity) 65 125 25 45 85 105 145 165 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 Chapter 13

Tradeoffs:  B-trees have faster lookup than B+trees
 in B-tree, non-leaf & leaf different sizes  in B-tree, deletion more complicated  B+trees preferred! Chapter 13

But note: If blocks are fixed size (due to disk and buffering restrictions) Then lookup for B+tree is actually better!! Chapter 13

So... B+ B Conclusion: For fixed block size,
8 records ooooooooooooo ooooooooo 156 records records Total = 116 B+ B Conclusion: For fixed block size, B+ tree is better because it is bushier Chapter 13

Outline/summary Conventional Indexes B trees
Sparse vs. dense Primary vs. secondary B trees B+trees vs. B-trees B+trees vs. indexed sequential Hashing schemes (self-study) Chapter 13

(Slides by Hector Garcia-Molina,

Similar presentations

Presentation on theme: "(Slides by Hector Garcia-Molina,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

(Slides by Hector Garcia-Molina,

Similar presentations

Presentation on theme: "(Slides by Hector Garcia-Molina,"— Presentation transcript:

Similar presentations

About project

Feedback