Download presentation
Presentation is loading. Please wait.
Published byEverett Johns Modified over 9 years ago
1
Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming
2
Introduction Hash function – Maps keys to integers (buckets) Hash(Key) = Integer – Ideally in a random-like manner Evenly distributed bucket values Even if the input data is not evenly distributed
3
An Example ID Number Generation – Key = your name – Hash(Key) = a number Not a great hash function… – Two people with the same name will have the same number…
4
Simple Hash Functions Assumptions: – K: an unsigned 32-bit integer – M: the number of buckets (the number of entries in a hash table) Goal: – If a bit is changed in K, all bits are equally likely to change for Hash(K)
5
A Simple Hash Function… What if K = M? Hash(K) = K What is wrong? Your student ID = SSN – I can’t use your SSN to post your grades…
6
Another Simple Function If K > M Hash(K) = K % M What is wrong? Suppose M = 4, K = 2, 4, 6, 8 K % M = 2, 0, 2, 0
7
Yet Another Simple Function If K > P, P = prime number Hash(K) = K % P Suppose P = 3, K = 2, 4, 6, 8 K % P = 2, 1, 0, 3 More uniform distribution…but still problematic for other cases
8
More on Prime Numbers K > P 1 > P 2, P 1 and P 2 are prime numbers Hash(K) = (K % P 1 ) % P 2 Suppose P 1 = 5, P 2 = 3, K = 2, 4, 6, 8, 10 (K % 5) = 2, 4, 1, 3, 0 (K % 5) % 3 = 2, 1, 1, 0, 0 Still uniform distribution
9
Polynomial Functions If K > P, P = prime number Hash(K) = K(K + 3) % P Slightly better than pure modulo functions
10
How About… Hash(K) = rand() What is wrong? Not repeatable
11
How About… K > P, P = prime number Hash(K) = rand(K) % P Better randomness Can be expensive to compute random numbers
12
Pre-generated Randomness Two prime numbers: P 1 and P 2 K > P 1 and K > P 2 A table R[P 1 ], with R[i] pre-initialized to rand(i) % P 2 Hash(K) = R[K % P 1 ] Slight Problem: Possible duplicate mapping
13
To Avoid Duplicate Mapping… Two prime numbers: P 1 and P 2 K > P 1 and K > P 2 A table R[P 1 ], with R[i] pre-initialized to unique random numbers Hash(K) = R[K % P 1 ]
14
An Example K = 0…2 32, P 1 = 3, P 2 = 5 R[3] = {0, 4, 1} Hash(K) = R[K % 3]
15
Hashing a Sequence of Keys K = {K 1, K 2, …, K n ) E.g., Hash(“test”) = 98157 Design Principles – Use the entire key – Use the ordering information – Use pre-generated randomness
16
Use the Entire Key unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] } return hash; } Problem: Hash(“ab”) == Hash(“ba”)
17
Use the Ordering Information unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] hash = /* hash with some shiftings */ } return hash; } Problem: H(short keys) will not perturb all 32-bits (clustering)
18
Use Pre-generated Randomness unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ R[Key[j]] hash = /* hash with some shiftings */ } return hash; }
19
CRC Variant Do 5-bit circular shift of hash XOR hash and K[j] … for (…) { highorder = hash & 0xf8000000; hash = hash << 5; hash = hash ^ (highorder >> 27) hash = hash ^ K[j]; } …
20
CRC Variant + For long keys, all 32-bits are exercised + More randomness toward lower bits - Not all bits are changed for short keys
21
BUZ Hash Set up an array R to store precomputed random numbers … for (…) { highorder = hash & 0x80000000; hash = hash << 1; hash = hash ^ (highorder >> 31) hash = hash ^ R[K[j]]; } …
22
References Aho, Sethi, and Ullman. Compilers: Principles, Techniques, and Tools, 1986. Cormen, Leiserson, River. Introduction to Algorithms, 1990 Knuth. The Art of Computer Programming, 1973 Kuenning. Hash Functions, 2003.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.