Amortized Analysis of Rehashing
What is rehashing Hash table too full spend a lot of time looking in buckets Solution: rehash make hash table twice the size for each item in original hash table, hash to location in bigger table
Assumptions for Thursday, Feb. 10, 2000 Rehash whenever table is 50% full or more Just a sequence of inserts (can be generalized for other operations) We never get a collision (once dealing with collisions, we’re in average-case analysis territory) Hash table starts as size 2
Observations How expensive is an insert? How expensive is a rehash? When will we need to rehash?
Observations How expensive is an insert? How expensive is a rehash? O(1) - assuming no collisions. Say it’s 1. How expensive is a rehash? O(N) - where N is current size of table. Say it’s N. When will we need to rehash? whenever size of table is power of 2 (1,2,4,8,16,…)
Amortized analysis Strategy 1: add up operations Note: I’ve made assumptions about the constants, but the analysis could be done for any constants. Note 2: This is sometimes called the “Aggregate Method”
Amortized analysis Strategy 2: Accounting Method Charge the cost of some operations to other operations Each operation gives us some “tokens”, which we can spend on future operations
Accounting Method analysis Each insert gives us 3 tokens 1 token for that insert 1 token for rehashing this item the first time 1 token for rehashing another item that already got rehashed once or more (since we rehash on double the size, # of items never hashed = # of items already hashed once or more)
What happens tokens added tokens used Insert 1 1A,1B,1C 1A Rehash to 4 1B – rehash 1 Insert 2 2A,2B,2C 2A Rehash to 8 2B – rehash 2 2C – rehash 1 Insert 3 3A,3B,3C 3A Insert 4 4A,4B,4C 4A Rehash to 16 3B – rehash 3 4B – rehash 4 3C – rehash 1 4C – rehash 2
What happens tokens added tokens used Insert 5 5A,5B,5C 5A Insert 6 6A,6B,6C 6A Insert 7 7A,7B,7C 7A Insert 8 8A,8B,8C 8A Rehash to 32 5B – rehash 5 6B – rehash 6 7B – rehash 7 8B – rehash 8 5C – rehash 1 6C – rehash 2 7C – rehash 3 8C – rehash 4
Potential function Potential function: more sophisticated tokens Potential function is a function When operation costs less, potential goes up When operation costs more, take from potential Always positive – or we’re taking too long
Tokens as a potential function potential function at operation #i P(i)= 1 + 2F - S/2 F = # of filled (non-empty) hash table slots S = total # of hash table slots Actual cost of operation of operation #i C(i) Amortized cost of operation #(i+1): CA(i+1)=C(i+1) + ( P(i+1) – P(i) )
The Math Beginning: For insert with no rehashing F=0, S=2. P(i)=0 For insert with no rehashing C(i+1)=1 P(i+1)-P(i)=2 (added non-empty slot) For insert with rehashing of N items C(i+1)=1+N P(i+1)-P(i)=2-N (before, F=N, S=2N. After, S=4N)
Potential method analysis P(i) is always positive. Yes. We rehash when F 1/2S, at which point, 4F=S. Etc… Amortized cost is constant CA(i+1)=C(i+1)+P(i+1)-P(i) = 3, in both cases (previous slide)