Lecture 10 Hashing
Motivation: Set and Map Goal: An array whose index can be any object. Example: Dictionary Dictionary[“hash”] = “a dish of diced or chopped meat and often vegetables…” Properties: 1. Efficient lookup: Hope lookup is O(1) 2. Space: space is within constant factor to a list.
Naïve implementation of a set Method 1: Maintain a linked list. Problem: Lookup takes O(n) time. Method 2: Use a large array a[i] = 1 if i is in the set Problem: Needs huge amount of memory.
Hashing Idea: for each number, assign a random location Example: {3, 10, 3424, 643523} Store number i in a[f(i)] f(i): hash function.
Collisions Problem: want to add 123, f(123) = 4 = f(3424). (This will always happen because of pigeon hole principle) Solution: 123 and 3424 will share this location. null 10 3 3424 643523 123
Fixed Hash Function If the hash function is fixed, then it can be very slow for some bad examples. Example: We can try to find n numbers x1, x2, …, xn such that f(xi) = y for some fixed y (always possible by pigeon hole principle) Then hash table degenerates into a linked list. Solution: Use a family of random hash functions.
Universal Hash Function Hash function should be as “random” as possible. Ideally: Choose a random function out of all functions! However: cannot store a totally random function. Can use modular arithmetic to construct good hash functions! Goal: Construct a family of hash functions F, such that for any x ≠ y, we have Pr 𝑓∼𝐹 𝑓 𝑥 =𝑓 𝑦 = 1 𝑛 .
Recap: Modular Arithmetic For a prime number p, only consider numbers {0, 1, 2, 3, …, p-1} Can do addition, subtraction, multiplication the usual way (take mod p at the end). Inverse: For any integer 0 < x < p, there is an integer 0 < y < p such that 𝑥𝑦≡1(𝑚𝑜𝑑 𝑝) Example: p = 7, x = 2, then y = 4. We call y = x-1 Inverse can be computed efficiently.
Designing the Hash function Pick a prime number p, construct a hash family with p2 functions For every a, b in {0,1,2,…, p-1}, we have 𝑓 𝑎,𝑏 𝑥 =𝑎𝑥+𝑏 (𝑚𝑜𝑑 𝑝) Claim: For every x, y (x≠y), any two numbers u, v in {0, 1, 2, …, p-1}, we have Pr 𝑓 𝑥 =𝑢, 𝑓 𝑦 =𝑣 = 1 𝑝2