Hashing Jeff Chastine
Hashing Many applications require INSERT, SEARCH and DELETE functions Hashing on average time can do all of these in O (1) Based on keys Falls under two general categories: Direct-Address Tables Hash Tables Jeff Chastine
Direct-Addressing Good for when universe U of keys is small U = {0, 1, …, m – 1 | m is not large} All elements have unique keys Table T [0..m -1] | each slot corresponds to a key All operations take only O (1) Jeff Chastine
Direct Implementation key satellite data 1 U (universe of keys) 2 2 3 9 6 3 7 4 4 1 2 5 K (actual keys) 5 3 6 5 8 7 8 8 9 Jeff Chastine
Direct-Addressing Operations DIRECT-ADDRESS-SEARCH (T, k) return T[k] DIRECT-ADDRESS-INSERT (T, x) T[key[x]] ← x DIRECT-ADDRESS-DELETE (T, x) T[key[x]] ← NIL Jeff Chastine
Hash Tables What are potential problems with direct addressing? |U| may be impractical Set of actual keys may be small Example SSNs Here, hash tables require much less storage Only catch: O (1) is average time instead of worst-case ! Jeff Chastine
How it works With direct-addressing, something with key k goes into slot k With hashing it goes into h (k) | h is a hash function Hash functions try to “randomize” Hash function maps U to T [0..m – 1] h :U → {0, 1, …, m – 1} Instead of |U| values,need only m values Jeff Chastine
Hash Implementation T U (universe of keys) K (actual keys) k1 k4 k5 k2 U (universe of keys) h (k1) h (k4) k1 h (k2) = h (k5) K (actual keys) k4 k5 k2 k3 h (k3) m - 1 Jeff Chastine
Collisions Have two keys hash to the same slot Because |U| > m, pigeon hole principle Therefore, collisions must exist We often talk of the load factor (α = n/m) Pick a good hash function Near random, yet deterministic Can chain collisions together This is where the worst-case comes from Can use open addressing Jeff Chastine
Chaining T U (universe of keys) K (actual keys) k1 k7 k4 k7 k1 k5 k2 Jeff Chastine
Hash Functions What makes a good hash function? Equally likely to hash to any of the m slots If keys are random numbers [0 … 1} then take floor of km Convert strings to ASCII to hash? Most usually involve mod Jeff Chastine
Hash Functions Division method: Multiplication method: h (k ) = k mod m Multiplication method: Let 0 < A < 1 h (k ) = floor(m (k A mod 1) ) // Fractional part Jeff Chastine
Open Addressing Systematically examine or probe slots until item is found No lists and no elements stored outside the table; thus α <= 1 Instead of following pointers, we compute the sequence Instead of fixed order – is based off of key Jeff Chastine
Kinds of Open Addressing Linear Probing h (k, i ) = (h’ (k ) + i ) mod m Quadratic Probing h (k, i ) = (h’ (k ) +c1i + c2i 2) mod m Double Hashing h (k, i ) = (h1(k ) + i h2(k )) mod m Jeff Chastine
Jeff Chastine