Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP 103 Hashing (II), and exam tips 2014-T2 Lecture 33 Marcus Frean School of Engineering and Computer Science, Victoria University of Wellington  Marcus.

Similar presentations


Presentation on theme: "COMP 103 Hashing (II), and exam tips 2014-T2 Lecture 33 Marcus Frean School of Engineering and Computer Science, Victoria University of Wellington  Marcus."— Presentation transcript:

1 COMP 103 Hashing (II), and exam tips 2014-T2 Lecture 33 Marcus Frean School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae and Thomas Kuehne, VUW

2 2 RECAP-TODAY RECAP  Hashing with “buckets” TODAY  Hashing by “probing”  the exam

3 3 Collisions: chaining / buckets  Store a Set in each cell: hash value → which set ant fox hen dog bee kea cow elk owl pig sow tui ape bat bug cat eel gnu jay nit ray yak cod roe

4 4 Dealing with Collisions  Two approaches  Use a collection at each place (“buckets” or “chaining”)  Look for an empty place in the hashtable (“probing” or “open addressing”) 0123456789581N ⋯⋯ “ 2001 – A Space Odyssey ” HASH “ Gravity ” HASH

5 5 Linear Probing Hash value tells us where to start looking.  if value.hashCode() → p start at index p if cell is used, try p+1, p+2, p+3 … wrap round to 0 at the end of the array. hash = (name[0]+name[1])%7 0123456 Sam Steve StigStu Sven Sun (3) (2) (5) (4) (2)

6 6 Hash Tables and Load Factor  When is the hashTable “full”?  When number of items is close to array size: May have to probe a large number of cells to find empty cell ⇒ performance becomes very slow. Linear probing is particularly bad!  Should not let table get more than 70% - 80% full (maximum “load factor”)  With a low load factor, cost is O(1) ...........high..............................O(N) “ eel ”“ pig ”“ cat ”“ bee ”“ fox ”“ dog ”“ owl ”“ hen ”“ ant ” “ kea ”

7 7 ensureCapacity If it is full, double and copy:  how do you copy? Index depends on…  hashCode and length (division method)!  and it depends on previous collisions... ⇒ Have to rehash everything! “ eel ”“ kea ”“ ant ”“ cat ”“ bee ”“ fox ”“ dog ” “ eel ”“ kea ”“ ant ”“ cat ”“ bee ”“ fox ”“ dog ” “ eel ”“ kea ”“ ant ”“ cat ”“ bee ”“ fox ”“ dog ”

8 8 Linear Probing: Runs and Clustering  Linear probing is particularly bad:  Repeated collisions at one index create runs  Runs → linear performance  With linear probing, runs join up ⇒ they grow fast: the bigger the run, the faster it grows This is called "clustering“ Does it help to increase step size (p, p+d, p+2d, …) ? “ eel ”“ kea ”“ ant ”“ cat ”“ bee ”“ fox ”“ dog ” 3 1,254 hen owl pig gnu emu rat tui

9 9 Quadratic Probing  Make the sequence of probes have increasing steps:  runs don’t join up so fast h, h+1, h+4, h+9, h+16, … p=h, p+=1, p+=3, p+=5, p+= 7, p+= 9, ….  In general, quadratic probing uses a quadratic formula: probe i = hash + a  i + b  i 2 ( b  0)  Eg: with a=b=½, the step sizes become 1,2,3… instead of 1,3,5… “ eel ”“ kea ”“ ant ”“ cat ”“ fox ”“ dog ”“ hen ”“ bee ”“ owl ”

10 10 Quadratic Probing Another problem, perhaps?  sequence might wrap back on itself before checking each cell:  If we choose a = b = ½, and length is a power of 2... ⇒ guaranteed not to wrap until it has checked every cell ! probe i = hash + ½ (i + i 2 ) ⇒ probes are hash, hash+1, hash+3, hash+6, hash+10, hash+15,... ⇒ step sizes are 1, 2, 3, 4, 5, … “ eel ”“ dog ”“ hen ”

11 11 Hash Table with Probing: remove  Inserted: Stu (2) Sven (5) Sam (4) Steve (2) Sun (4)  Now remove: Sam (4)  What’s the problem?  contains(Sun) will return false!  To remove, need to leave a marker (not null, not a value !) public void remove() { throw new UnsupportedOperationException(); } 0123456 SamSteveStigStuSvenSun insert a "tombstone" key instead

12 12 Iterator?  Iterating through hash table is not so simple!  there will be nulls to skip over  the order that items are returned appears random (and may change when the array is doubled!)  At each call to next(), Iterator must advance the index to the next non-null cell. Could be slow!... “ eel ”“ kea ”“ ant ”“ cat ”“ bee ”“ fox ”“ dog ”

13 13 hashing summary  hashing gives add/find that is crazily quick  two ideas: buckets and probing  with the probing method, removing requires “tombstones”  when a hashtable is too full, you need to increase its size: this requires rehashing everything  iterating over a HashSet can be a slow process

14 14 the COMP103 final exam The 4 th of November is a Tuesday Exam is at 2:30pm, and lasts TWO hours You will be distributed over 5 different rooms:  ABUBAKR - BHIKHUMYLT101  BHULA - DEIGHTONHMLT104  DEL ROSARIO - LATEGANKKLT303  LAWRENCE - PEREZHMLT205  PHEASE - ZHUMCLT103

15 15 preparing for the exam  the 103 homepage has link to “Assessment archive” http://ecs.victoria.ac.nz/Main/ExamArchiveCOMP103 http://ecs.victoria.ac.nz/Main/ExamArchiveCOMP103 1. Do your best without the answers 2. Then check against the answers  Next week: tutor-run help sessions (Jeffrey Wu) 1. Monday 20 th, 12:30-3pm, in Cotton 228. 2. Wednesday 22 nd, 12:30-3pm, but in AM101. 3. ALSO, VUW Science Society runs “cram session” for ECS: Friday 24 th, 10am-3pm, in the Memorial Theatre Foyer  checklist – on the 103 homepage  friends... assignments... textbook... notes... videos...

16 16

17 17 The Exam answer all questions manage your time Dumb calculators & non- electronic dictionaries are OK

18 18 doing your best on the day  Read the question carefully and make sure you know what is being asked.  Write your answer clearly  Use extra pages for rough work or for answers  Cross out what you don’t want marked  Say where your answer is if not on same page  For coding questions:  There’s more than one way to skin a cat  If it’s complicated, start with the pseudocode

19 19 best wishes!


Download ppt "COMP 103 Hashing (II), and exam tips 2014-T2 Lecture 33 Marcus Frean School of Engineering and Computer Science, Victoria University of Wellington  Marcus."

Similar presentations


Ads by Google