Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming.

Similar presentations


Presentation on theme: "Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming."— Presentation transcript:

1 Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming

2 Introduction Hash function – Maps keys to integers (buckets) Hash(Key) = Integer – Ideally in a random-like manner Evenly distributed bucket values Even if the input data is not evenly distributed

3 An Example ID Number Generation – Key = your name – Hash(Key) = a number Not a great hash function… – Two people with the same name will have the same number…

4 Simple Hash Functions Assumptions: – K: an unsigned 32-bit integer – M: the number of buckets (the number of entries in a hash table) Goal: – If a bit is changed in K, all bits are equally likely to change for Hash(K)

5 A Simple Hash Function… What if K = M? Hash(K) = K What is wrong? Your student ID = SSN – I can’t use your SSN to post your grades…

6 Another Simple Function If K > M Hash(K) = K % M What is wrong? Suppose M = 4, K = 2, 4, 6, 8 K % M = 2, 0, 2, 0

7 Yet Another Simple Function If K > P, P = prime number Hash(K) = K % P Suppose P = 3, K = 2, 4, 6, 8 K % P = 2, 1, 0, 3 More uniform distribution…but still problematic for other cases

8 More on Prime Numbers K > P 1 > P 2, P 1 and P 2 are prime numbers Hash(K) = (K % P 1 ) % P 2 Suppose P 1 = 5, P 2 = 3, K = 2, 4, 6, 8, 10 (K % 5) = 2, 4, 1, 3, 0 (K % 5) % 3 = 2, 1, 1, 0, 0 Still uniform distribution

9 Polynomial Functions If K > P, P = prime number Hash(K) = K(K + 3) % P Slightly better than pure modulo functions

10 How About… Hash(K) = rand() What is wrong? Not repeatable

11 How About… K > P, P = prime number Hash(K) = rand(K) % P Better randomness Can be expensive to compute random numbers

12 Pre-generated Randomness Two prime numbers: P 1 and P 2 K > P 1 and K > P 2 A table R[P 1 ], with R[i] pre-initialized to rand(i) % P 2 Hash(K) = R[K % P 1 ] Slight Problem: Possible duplicate mapping

13 To Avoid Duplicate Mapping… Two prime numbers: P 1 and P 2 K > P 1 and K > P 2 A table R[P 1 ], with R[i] pre-initialized to unique random numbers Hash(K) = R[K % P 1 ]

14 An Example K = 0…2 32, P 1 = 3, P 2 = 5 R[3] = {0, 4, 1} Hash(K) = R[K % 3]

15 Hashing a Sequence of Keys K = {K 1, K 2, …, K n ) E.g., Hash(“test”) = 98157 Design Principles – Use the entire key – Use the ordering information – Use pre-generated randomness

16 Use the Entire Key unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] } return hash; } Problem: Hash(“ab”) == Hash(“ba”)

17 Use the Ordering Information unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] hash = /* hash with some shiftings */ } return hash; } Problem: H(short keys) will not perturb all 32-bits (clustering)

18 Use Pre-generated Randomness unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ R[Key[j]] hash = /* hash with some shiftings */ } return hash; }

19 CRC Variant Do 5-bit circular shift of hash XOR hash and K[j] … for (…) { highorder = hash & 0xf8000000; hash = hash << 5; hash = hash ^ (highorder >> 27) hash = hash ^ K[j]; } …

20 CRC Variant + For long keys, all 32-bits are exercised + More randomness toward lower bits - Not all bits are changed for short keys

21 BUZ Hash Set up an array R to store precomputed random numbers … for (…) { highorder = hash & 0x80000000; hash = hash << 1; hash = hash ^ (highorder >> 31) hash = hash ^ R[K[j]]; } …

22 References Aho, Sethi, and Ullman. Compilers: Principles, Techniques, and Tools, 1986. Cormen, Leiserson, River. Introduction to Algorithms, 1990 Knuth. The Art of Computer Programming, 1973 Kuenning. Hash Functions, 2003.


Download ppt "Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming."

Similar presentations


Ads by Google