Download presentation
Presentation is loading. Please wait.
1
CPSC-608 Database Systems
Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #26 Notes #7
2
Another Index Structure: Hash Tables
function h h(x) buckets search key x A bucket is typically a disk block (probably with overflow blocks) h(x), 0 ≤ h(x) ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(x); indirect: h(x) is the index in a directory. Notes #7
3
How do we cope with growth?
Overflows and reorganizations Dynamic hashing Extensible Linear
4
Extensible Hashing: General framework
# bits used by the buckets (block index) # bits used by the directory (hash index) i j1 00…00 00…01 j2 h i x h(x) h(x)i j2 i . 11…11 i directory buckets
5
Extensible hashing: Searching
input: a search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); read in the disk block B with the address H[m].
6
Extensible hashing: Insertion
input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) b i mj h(x)
7
Extensible hashing: Insertion
input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1.
8
Extensible hashing: Insertion
input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)
9
Extensible hashing: Insertion
input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)
10
Extensible hashing: Insertion
input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)
11
Extensible hashing: Insertion
input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)
12
Extensible hashing: Insertion
input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) i > j b i mj h(x)
13
Extensible hashing: Insertion
input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) i > j b i mj h(x)
14
Extensible hashing: Insertion
input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) i > j b i mj h(x)
15
Insertion in Extensible Hashing
16
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: c1001 1 k1100
17
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: g1010 c1001 1 k1100
18
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: g1010 c1001 1 g1010 k1100
19
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: g1010 c1001 1 g1010 k1100 Split the block, and increase the block index
20
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size
21
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size 2
22
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size 2
23
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size 2
24
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 k1100 g1010 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size k1100 2
25
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 Insert: g1010 2 c1001 g1010 2 k1100
26
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 Insert: g1010 d0111 2 c1001 g1010 2 k1100
27
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 2 c1001 g1010 2 k1100
28
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 2 c1001 g1010 2 k1100
29
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100
30
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 e0000 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100
31
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 Split the block, and increase the block index b0001 e0000 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100
32
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 Split the block, and increase the block index 2 b0001 e0000 1 2 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100
33
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 Split the block, and increase the block index e0000 2 b0001 d0111 b0001 1 2 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100
34
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 e0000 2 b0001 d0111 2 i = 2 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100
35
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 e0000 2 b0001 d0111 2 i = 2 00 01 10 11 Insert: g1010 d0111 e0000 a1000 2 c1001 g1010 2 k1100
36
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 e0000 2 b0001 d0111 2 i = 2 00 01 10 11 Insert: g1010 d0111 e0000 a1000 2 c1001 g1010 a1000 Split the block, and increase the block index 2 k1100
37
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 if the block index is equal to the hash index, first double the directory size Insert: g1010 d0111 e0000 a1000 2 c1001 g1010 a1000 Split the block, and increase the block index 2 k1100
38
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 3 Insert: g1010 d0111 e0000 a1000 3 2 c1001 g1010 a1000 Split the block, and increase the block index 2 k1100
39
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 3 a1000 Insert: g1010 d0111 e0000 a1000 c1001 3 g1010 2 c1001 g1010 2 k1100
40
Insertion in Extensible Hashing
h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 3 a1000 Insert: g1010 d0111 e0000 a1000 c1001 g1010 3 2 k1100
41
Extensible hashing: deletion
No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure)
42
Note: Still need overflow chains
Example: many records with duplicate keys insert 1100 1 1101 1100
43
Note: Still need overflow chains
Example: many records with duplicate keys if we split: insert 1100 2 For 10** 1 1101 1100 2 For 11** 1101 ? 1100 1100
44
Note: Still need overflow chains
Example: many records with duplicate keys if we split: Even further split: 2 For 10** insert 1100 2 For 10** 1 1101 3 For 110* 1100 1101 2 For 11** 1101 1100 1100 ? ? 1100 1100 3 For 111*
45
Solution: overflow chains
insert 1100 add overflow block: 1 1 1101 1100 1101 1101 1100
46
Extensible hashing Summary. + Can handle growing files
- with less wasted space - with no full reorganizations +
47
Extensible hashing Summary. + - Can handle growing files Indirection
- with less wasted space - with no full reorganizations + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) -
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.