Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6
Hashing Data items stored in an array of some fixed size Hash table Search performed using some part of the data item key Used for performing insertions, deletions, and finds in constant average time Operations requiring ordering information not supported efficiently Such as findMin, findMax
Hash Table Example
Hash Table Applications Comparing search efficiency of different data structures: Vector, list: O(N) AVL search tree: O(log(N)) Hash table: O(1) expected time Compilers to keep track of declared variables Symbol tables Mapping from name to id On-line spelling checkers
Hash Functions Map keys to integers (which represent table indices) Hash(Key) = Integer Evenly distributed index values Even if the input data is not evenly distributed What happens if multiple keys mapped to the same integer (same position)? Collision management (discussed in detail later) Collisions are likely to be reduced if keys are evenly distributed over the hash table
Simple Hash Functions Assumptions: Goal: K: an unsigned 32-bit integer M: the number of buckets (the number of entries in a hash table) Goal: If a bit is changed in K, all bits are equally likely to change for Hash(K) So that items are evenly distributed in the hash table
A Simple Function What if What is wrong? Hash(K) = K % M Where M is of any integer value What is wrong? Values of K may not be evenly distributed But Hash(K) needs to be evenly distributed Suppose M = 10, K = 10, 20, 30, 40 Then K % M = 0, 0, 0, 0, 0…
Another Simple Function If Hash(K) = K % P, P = prime number Suppose P = 11 K = 10, 20, 30, 40 K % P = 10, 9, 8, 7 More uniform distribution… So hash tables often have prime number of entries
A Simple Hash for Strings unsigned int Hash(const string& Key) { unsigned int hash = 0; for (int j = 0; j != Key.size(); ++j) { hash += Key[j] } return hash; Problem: Small sized keys may not use a large fraction of a large hash table 9 9
Another Simple Hash Function unsigned int Hash(const string& Key) { return Key[0] + 27*Key[1] + 729*Key[2]; } Problem: English does not use random strings; so, the hash values are not uniformly distributed Using more characters of the key can improve the hash function
A Better Hash Function unsigned int Hash(const string &Key) { for (int j = 0; j != Key.size(); ++j) hash = 37*hash + (Key[j]-’a’+1); return hash%TableSize; } The for loop computes ai37n-i using Horner’s rule, where ai has the value 1 for ‘a’, 2 for ‘b’, etc a3 + 37a2 + 372a1 + 373a0 = 37(37(37a0 + a1)+ a2) + a3 The for loop implicitly performs arithmetic modulo 2k, where k is the number of bits in an unisigned int
STL Hash Tables STL extensions hash_set hash_map The key type, hash function, and equality operator may need to be provided Available in new standard as unordered set and map <tr1/unordered_map> or <unordered_map> <trl/unordered_set> or <unordered_set> Example: Lec24/hashmapex.cpp Reference www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html