Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming.

Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming

Introduction Hash function – Maps keys to integers (buckets) Hash(Key) = Integer – Ideally in a random-like manner Evenly distributed bucket values Even if the input data is not evenly distributed

An Example ID Number Generation – Key = your name – Hash(Key) = a number Not a great hash function… – Two people with the same name will have the same number…

Simple Hash Functions Assumptions: – K: an unsigned 32-bit integer – M: the number of buckets (the number of entries in a hash table) Goal: – If a bit is changed in K, all bits are equally likely to change for Hash(K)

A Simple Hash Function… What if K = M? Hash(K) = K What is wrong? Your student ID = SSN – I can’t use your SSN to post your grades…

Another Simple Function If K > M Hash(K) = K % M What is wrong? Suppose M = 4, K = 2, 4, 6, 8 K % M = 2, 0, 2, 0

Yet Another Simple Function If K > P, P = prime number Hash(K) = K % P Suppose P = 3, K = 2, 4, 6, 8 K % P = 2, 1, 0, 3 More uniform distribution…but still problematic for other cases

More on Prime Numbers K > P 1 > P 2, P 1 and P 2 are prime numbers Hash(K) = (K % P 1 ) % P 2 Suppose P 1 = 5, P 2 = 3, K = 2, 4, 6, 8, 10 (K % 5) = 2, 4, 1, 3, 0 (K % 5) % 3 = 2, 1, 1, 0, 0 Still uniform distribution

Polynomial Functions If K > P, P = prime number Hash(K) = K(K + 3) % P Slightly better than pure modulo functions

How About… Hash(K) = rand() What is wrong? Not repeatable

How About… K > P, P = prime number Hash(K) = rand(K) % P Better randomness Can be expensive to compute random numbers

Pre-generated Randomness Two prime numbers: P 1 and P 2 K > P 1 and K > P 2 A table R[P 1 ], with R[i] pre-initialized to rand(i) % P 2 Hash(K) = R[K % P 1 ] Slight Problem: Possible duplicate mapping

To Avoid Duplicate Mapping… Two prime numbers: P 1 and P 2 K > P 1 and K > P 2 A table R[P 1 ], with R[i] pre-initialized to unique random numbers Hash(K) = R[K % P 1 ]

An Example K = 0…2 32, P 1 = 3, P 2 = 5 R[3] = {0, 4, 1} Hash(K) = R[K % 3]

Hashing a Sequence of Keys K = {K 1, K 2, …, K n ) E.g., Hash(“test”) = 98157 Design Principles – Use the entire key – Use the ordering information – Use pre-generated randomness

Use the Entire Key unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] } return hash; } Problem: Hash(“ab”) == Hash(“ba”)

Use the Ordering Information unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] hash = /* hash with some shiftings */ } return hash; } Problem: H(short keys) will not perturb all 32-bits (clustering)

Use Pre-generated Randomness unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ R[Key[j]] hash = /* hash with some shiftings */ } return hash; }

CRC Variant Do 5-bit circular shift of hash XOR hash and K[j] … for (…) { highorder = hash & 0xf8000000; hash = hash << 5; hash = hash ^ (highorder >> 27) hash = hash ^ K[j]; } …

CRC Variant + For long keys, all 32-bits are exercised + More randomness toward lower bits - Not all bits are changed for short keys

BUZ Hash Set up an array R to store precomputed random numbers … for (…) { highorder = hash & 0x80000000; hash = hash << 1; hash = hash ^ (highorder >> 31) hash = hash ^ R[K[j]]; } …

References Aho, Sethi, and Ullman. Compilers: Principles, Techniques, and Tools, 1986. Cormen, Leiserson, River. Introduction to Algorithms, 1990 Knuth. The Art of Computer Programming, 1973 Kuenning. Hash Functions, 2003.

Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming.

Similar presentations

Presentation on theme: "Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming.

Similar presentations

Presentation on theme: "Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming."— Presentation transcript:

Similar presentations

About project

Feedback