COMP 103 Hashing (II), and exam tips 2014-T2 Lecture 33 Marcus Frean School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay Groves, Peter Andreae and Thomas Kuehne, VUW
2 RECAP-TODAY RECAP Hashing with “buckets” TODAY Hashing by “probing” the exam
3 Collisions: chaining / buckets Store a Set in each cell: hash value → which set ant fox hen dog bee kea cow elk owl pig sow tui ape bat bug cat eel gnu jay nit ray yak cod roe
4 Dealing with Collisions Two approaches Use a collection at each place (“buckets” or “chaining”) Look for an empty place in the hashtable (“probing” or “open addressing”) N ⋯⋯ “ 2001 – A Space Odyssey ” HASH “ Gravity ” HASH
5 Linear Probing Hash value tells us where to start looking. if value.hashCode() → p start at index p if cell is used, try p+1, p+2, p+3 … wrap round to 0 at the end of the array. hash = (name[0]+name[1])% Sam Steve StigStu Sven Sun (3) (2) (5) (4) (2)
6 Hash Tables and Load Factor When is the hashTable “full”? When number of items is close to array size: May have to probe a large number of cells to find empty cell ⇒ performance becomes very slow. Linear probing is particularly bad! Should not let table get more than 70% - 80% full (maximum “load factor”) With a low load factor, cost is O(1) high O(N) “ eel ”“ pig ”“ cat ”“ bee ”“ fox ”“ dog ”“ owl ”“ hen ”“ ant ” “ kea ”
7 ensureCapacity If it is full, double and copy: how do you copy? Index depends on… hashCode and length (division method)! and it depends on previous collisions... ⇒ Have to rehash everything! “ eel ”“ kea ”“ ant ”“ cat ”“ bee ”“ fox ”“ dog ” “ eel ”“ kea ”“ ant ”“ cat ”“ bee ”“ fox ”“ dog ” “ eel ”“ kea ”“ ant ”“ cat ”“ bee ”“ fox ”“ dog ”
8 Linear Probing: Runs and Clustering Linear probing is particularly bad: Repeated collisions at one index create runs Runs → linear performance With linear probing, runs join up ⇒ they grow fast: the bigger the run, the faster it grows This is called "clustering“ Does it help to increase step size (p, p+d, p+2d, …) ? “ eel ”“ kea ”“ ant ”“ cat ”“ bee ”“ fox ”“ dog ” 3 1,254 hen owl pig gnu emu rat tui
9 Quadratic Probing Make the sequence of probes have increasing steps: runs don’t join up so fast h, h+1, h+4, h+9, h+16, … p=h, p+=1, p+=3, p+=5, p+= 7, p+= 9, …. In general, quadratic probing uses a quadratic formula: probe i = hash + a i + b i 2 ( b 0) Eg: with a=b=½, the step sizes become 1,2,3… instead of 1,3,5… “ eel ”“ kea ”“ ant ”“ cat ”“ fox ”“ dog ”“ hen ”“ bee ”“ owl ”
10 Quadratic Probing Another problem, perhaps? sequence might wrap back on itself before checking each cell: If we choose a = b = ½, and length is a power of 2... ⇒ guaranteed not to wrap until it has checked every cell ! probe i = hash + ½ (i + i 2 ) ⇒ probes are hash, hash+1, hash+3, hash+6, hash+10, hash+15,... ⇒ step sizes are 1, 2, 3, 4, 5, … “ eel ”“ dog ”“ hen ”
11 Hash Table with Probing: remove Inserted: Stu (2) Sven (5) Sam (4) Steve (2) Sun (4) Now remove: Sam (4) What’s the problem? contains(Sun) will return false! To remove, need to leave a marker (not null, not a value !) public void remove() { throw new UnsupportedOperationException(); } SamSteveStigStuSvenSun insert a "tombstone" key instead
12 Iterator? Iterating through hash table is not so simple! there will be nulls to skip over the order that items are returned appears random (and may change when the array is doubled!) At each call to next(), Iterator must advance the index to the next non-null cell. Could be slow!... “ eel ”“ kea ”“ ant ”“ cat ”“ bee ”“ fox ”“ dog ”
13 hashing summary hashing gives add/find that is crazily quick two ideas: buckets and probing with the probing method, removing requires “tombstones” when a hashtable is too full, you need to increase its size: this requires rehashing everything iterating over a HashSet can be a slow process
14 the COMP103 final exam The 4 th of November is a Tuesday Exam is at 2:30pm, and lasts TWO hours You will be distributed over 5 different rooms: ABUBAKR - BHIKHUMYLT101 BHULA - DEIGHTONHMLT104 DEL ROSARIO - LATEGANKKLT303 LAWRENCE - PEREZHMLT205 PHEASE - ZHUMCLT103
15 preparing for the exam the 103 homepage has link to “Assessment archive” Do your best without the answers 2. Then check against the answers Next week: tutor-run help sessions (Jeffrey Wu) 1. Monday 20 th, 12:30-3pm, in Cotton Wednesday 22 nd, 12:30-3pm, but in AM ALSO, VUW Science Society runs “cram session” for ECS: Friday 24 th, 10am-3pm, in the Memorial Theatre Foyer checklist – on the 103 homepage friends... assignments... textbook... notes... videos...
16
17 The Exam answer all questions manage your time Dumb calculators & non- electronic dictionaries are OK
18 doing your best on the day Read the question carefully and make sure you know what is being asked. Write your answer clearly Use extra pages for rough work or for answers Cross out what you don’t want marked Say where your answer is if not on same page For coding questions: There’s more than one way to skin a cat If it’s complicated, start with the pseudocode
19 best wishes!