CS4432: Database Systems II Lecture #12 Professor Elke A. Rundensteiner CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing key h(key) <key> Buckets (typically 1 disk block) . CS 4432 lecture #11 - indexing & hashing
One example hash function Key = ‘x1 x2 … xn’ n-byte character string Have b buckets Example hash function : h: add (x1 + x2 + ….. Xn) modulo b CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing This may not be best function … Read Knuth Vol. 3 if you really need to select a good function. Good hash Expected number of function: keys/bucket is the same for all buckets CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing Within a bucket: Do we keep keys sorted? Yes, but only if CPU time critical & Inserts/Deletes not too frequent CS 4432 lecture #11 - indexing & hashing
Next: example to illustrate inserts, overflows, deletes h(K) CS 4432 lecture #11 - indexing & hashing
EXAMPLE 2 records/bucket 1 2 3 INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 d a c b e h(e) = 1 CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing EXAMPLE: deletion Delete: e f 1 2 3 a d b d c c e maybe move “g” up f g CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket CS 4432 lecture #11 - indexing & hashing
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible hashing Others … CS 4432 lecture #11 - indexing & hashing
Extensible hashing : idea 1 (a) Use i of b bits output by hash function b h(K) use i grows over time…. Note: enables future doubling of space ! 00110101 CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing Extensible hashing : idea 2 (b) Hash to directory of pointers to buckets (instead of buckets directly) h(K)[i ] to bucket (c) Assume directory of pointers is contiguous Note : Double space by doubling the directory ! . . CS 4432 lecture #11 - indexing & hashing
Example: h(k) is 4 bits; 2 keys/bucket New directory 2 00 01 10 11 i = 1 i = 0001 1 1 1 1001 1 1100 1010 1100 Insert 1010 CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing Example continued 2 0000 0111 0001 i = 2 00 01 10 11 1 0001 0111 2 1001 1010 Insert: 0111 0000 2 1100 CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing Example continued 000 001 010 011 100 101 110 111 3 i = 0000 2 i = 0001 2 00 01 10 11 0111 2 1001 1010 2 1001 1010 Insert: 1001 2 1100 CS 4432 lecture #11 - indexing & hashing
Extensible hashing: deletion Merge blocks and cut directory if possible (Reverse insert procedure) CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing Extensible hashing Summary If directory fits into main memory, then access cost is 1 IO, otherwise 2 IOs Can handle growing files - with less wasted space - with no full reorganizations + + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing Use what when : Indexing : Tree-Structures vs Hashing CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing Indexing vs Hashing Hashing good for equality probes given key: e.g., SELECT … FROM R WHERE R.A = 5 CS 4432 lecture #11 - indexing & hashing
lecture #11 - indexing & hashing Indexing vs Hashing Tree indexing (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 CS 4432 lecture #11 - indexing & hashing