CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11
secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database
The Main Purpose of Index Structures Speedup the search process 3 index σ a=6 (R) blocks contianing the desired tuples quickly figure out disks otherwise have to scan the entire R Example: B+ trees
Another Index Structure: Hash Tables 4 hush function h search key k h(k)h(k) buckets A bucket is typically a disk block (probably with overflow blocks) h(k), 0 ≤ k ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory.
Example hash function key k = ‘x 1 x 2 … x n ’ n byte character string have b buckets hash function h(k): h(k) = ( x 1 + x 2 + … + x n ) mod b 5
This may not be the best function… Read Knuth Vol. 3 if you really need to select a good function Good hash Expected number of function: keys/bucket is roughly the same for all buckets 6
Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & inserts/deletes not too frequent 7
Next:example to illustrate inserts, overflows, deletes h(key) 8
EXAMPLE (two records/bucket) INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b 9
EXAMPLE (two records/bucket) INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b h(e) = 1 e 10
a b c e d EXAMPLE: deletion DELETE: e f f g 11
a b c e d EXAMPLE: deletion DELETE: e f f g maybe move “g” up 12
a b c e d EXAMPLE: deletion DELETE: e f f g c maybe move “g” up 13
a b c e d EXAMPLE: deletion DELETE: e f f g c d maybe move “g” up 14
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization =. # keys used. total # keys that fit 15
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization =. # keys used. total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket 16
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 17
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 18
Extensible hashing: two ideas (a) Use i of b bits output by hash function h(k) use i grows over time… b (b) Use directory h(k)[i] to bucket
Extensible Hashing: General framework 20 k h(k)h(k) i h h(k)ih(k)i i i 00…00 00…01 11…11 i # bits used by the directory j1j1 j1j1 j2j2 directory buckets # bits used by the buckets
Example: h(k) is 4 bits; 2 keys/bucket i = Insert
Example: h(k) is 4 bits; 2 keys/bucket i = Insert
Example: h(k) is 4 bits; 2 keys/bucket i = Insert New directory i =
i = Example continued 24
Insert: i = Example continued
Insert: i = Example continued
Insert: i = Example continued
Insert: i = Example continued
i = Example continued 29
i = Insert: 1000 Example continued
i = Insert: 1000 Example continued i =
Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) 32
Note: Still need overflow chains Example: many records with duplicate keys insert if we split: 1101 ? 33
Solution: overflow chains insert 1100 add overflow block:
Extensible hashing: Searching input: a search key k \\ h is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of h(k); 2.read in the disk block B with the address D[m]. Summary A. 35
Extensible hashing: Insertion input: a tuple t with search key k \\ h is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of h(k); 2.read in the disk block B with address D[m]; 3.IF B has room THEN add t in B 4.ELSE let j be the bit number of B IF i = j THEN {double the size of D, i = i + 1; and let the pointers in the new D[2h] and D[2h+1] both equal to that in the old D[h], 0 ≤ h ≤ 2 i ; } split B + t into B 1 and B 2, both with block bit number j+1; let the two corresponding pointers in D go to B 1 and B 2, resp. Summary B. 36
Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary C. + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not)
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 38
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 39