Download presentation
Presentation is loading. Please wait.
1
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu 1 Notes #11
2
secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database
3
The Main Purpose of Index Structures Speedup the search process 3 index σ a=6 (R) blocks contianing the desired tuples quickly figure out disks otherwise have to scan the entire R Example: B+ trees
4
Another Index Structure: Hash Tables 4 hush function h search key k h(k)h(k) buckets A bucket is typically a disk block (probably with overflow blocks) h(k), 0 ≤ k ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory.
5
Example hash function key k = ‘x 1 x 2 … x n ’ n byte character string have b buckets hash function h(k): h(k) = ( x 1 + x 2 + … + x n ) mod b 5
6
This may not be the best function… Read Knuth Vol. 3 if you really need to select a good function Good hash Expected number of function: keys/bucket is roughly the same for all buckets 6
7
Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & inserts/deletes not too frequent 7
8
Next:example to illustrate inserts, overflows, deletes h(key) 8
9
EXAMPLE (two records/bucket) INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b 9
10
EXAMPLE (two records/bucket) INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b h(e) = 1 e 10
11
01230123 a b c e d EXAMPLE: deletion DELETE: e f f g 11
12
01230123 a b c e d EXAMPLE: deletion DELETE: e f f g maybe move “g” up 12
13
01230123 a b c e d EXAMPLE: deletion DELETE: e f f g c maybe move “g” up 13
14
01230123 a b c e d EXAMPLE: deletion DELETE: e f f g c d maybe move “g” up 14
15
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization =. # keys used. total # keys that fit 15
16
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization =. # keys used. total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket 16
17
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 17
18
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 18
19
Extensible hashing: two ideas (a) Use i of b bits output by hash function h(k) use i grows over time… 00110101 19 b (b) Use directory h(k)[i] to bucket............
20
Extensible Hashing: General framework 20 k h(k)h(k) i h h(k)ih(k)i i i 00…00 00…01 11…11 i # bits used by the directory...... j1j1 j1j1 j2j2 directory buckets # bits used by the buckets
21
Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 21
22
Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 22
23
Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 New directory 2 00 01 10 11 i = 2 2 23
24
1 0001 2 1001 1010 2 1100 00 01 10 11 2 i = Example continued 24
25
1 0001 2 1001 1010 2 1100 Insert: 0111 00 01 10 11 2 i = Example continued 0111 25
26
1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 26
27
1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 27
28
1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 2 2 28
29
00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Example continued 29
30
00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1000 Example continued 1000 1001 1010 30
31
00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1000 Example continued 1000 1001 1010 000 001 010 011 100 101 110 111 3 i = 3 3 31
32
Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) 32
33
Note: Still need overflow chains Example: many records with duplicate keys 1 1101 1100 22 insert 1100 1100 if we split: 1101 ? 33
34
Solution: overflow chains 1 1101 1100 1 insert 1100 add overflow block: 1101 34
35
Extensible hashing: Searching input: a search key k \\ h is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of h(k); 2.read in the disk block B with the address D[m]. Summary A. 35
36
Extensible hashing: Insertion input: a tuple t with search key k \\ h is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of h(k); 2.read in the disk block B with address D[m]; 3.IF B has room THEN add t in B 4.ELSE let j be the bit number of B IF i = j THEN {double the size of D, i = i + 1; and let the pointers in the new D[2h] and D[2h+1] both equal to that in the old D[h], 0 ≤ h ≤ 2 i ; } split B + t into B 1 and B 2, both with block bit number j+1; let the two corresponding pointers in D go to B 1 and B 2, resp. Summary B. 36
37
Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary C. + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - - 37
38
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 38
39
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 39
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.