Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing CS 105. Hashing Slide 2 Hashing - Introduction In a dictionary, if it can be arranged such that the key is also the index to the array that stores.

Similar presentations


Presentation on theme: "Hashing CS 105. Hashing Slide 2 Hashing - Introduction In a dictionary, if it can be arranged such that the key is also the index to the array that stores."— Presentation transcript:

1 Hashing CS 105

2 Hashing Slide 2 Hashing - Introduction In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast Example: Empdata[1000], index = employee ID number search for employee with emp. number = 500 return: Empdata[500] Running Time: O(1)

3 Hashing Slide 3 Hash table Hash table: a data structure, implemented as an array of objects, where the search keys correspond to the array indexes Insert and find operations involve straightforward array accesses: O(1) time complexity

4 Hashing Slide 4 About hash tables In the first example shown, it was relatively easy since employee number is an integer Problem 1: possible integer key values might be too large; creating an appropriate array might be impractical Need to map large integer values to smaller array indexes Problem 2: what if the key is a word in the English Alphabet (e.g. last names)? Need to map names to integers (indexes)

5 Hashing Slide 5 Large numbers -> small numbers Hash function - converts a number from a large range into a number from a smaller range (the range of array indices) Size of array Rule of thumb: the array size should be about twice the size of the data set (2s) for 50,000 words, use an array of 100,000 elements

6 Hashing Slide 6 Hash function and modulo Simplest hash function - achieved by using the modulo function (returns the remainder) for example, 33 % 10 = 3 General formula: LargeNumber % Smallrange

7 Hashing Slide 7 Hash functions for names Sum of Digits Method map the alphabet A-Z to the numbers 1 to 26 (a=1,b=2,c=3,etc.) add the total of the letters For example, “cats” (c=3,a=1,t=20,s=19) 3+1+20+19=43 ”cats” will be stored using index = 43 Can use modulo operation (%) if you need to map to a smaller array

8 Hashing Slide 8 Collisions Problem Too many words with the same index “was”,”tin”,”give”,”tend”,”moan”,”tick” and several other words add to 43 These are called collisions (case where two different search keys hash to the same index value) Can occur even when dealing with integers Suppose the size of the hash table is 100 Keys 158 and 358 hash to the same value when using the modulo hash function

9 Hashing Slide 9 Collision resolution policy Need to know what to do when a collision occurs; i.e., during an insert operation, what if the array slot is already occupied? Most common policy: go to the next available slot “Wrap around” the array if necessary Consequence: when searching, use the hash function but first check whether the element is the one you are looking for. If not try the next slots. How do you know if the element is not in the array?

10 Hashing Slide 10 Probe sequence Sequence of indexes that serve as array slots where a key value would map to The first index in the probe sequence is the home position, the value of the hash function. The next indexes are the alternative slots Example: suppose the array size is 10, and the hash function is h(K) = K%10. The probe sequence for K=25 is: 5, 6, 7, 8, 9, 0, 1, 2, 3, 4 Here, we assume the most common collision resolution policy of going to the next slot: p(K,i) = i, Goal: probe sequence should exhaust array slots

11 Hashing Slide 11 Recap: hash table operations Insert object Obj with key value K home <- h(K) for i <- 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then throw exception “error: duplicate record” // alternative: overwrite else if HT[pos] is null then HT[pos] <- Obj break; Finding an object with key value K home <- h(K) for i <- 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then return HT[pos] else if HT[pos] is null then throw exception “not found”

12 Hashing Slide 12 Hash table operations Note: although insert and find run in O(1) time during typical conditions, the time complexity in the worst-case is O(n) Something to think about: characterize the worst-case scenarios for insert and find

13 Hashing Slide 13 Removing elements Removing an element from a hash table during a delete operation poses a problem If we set the corresponding hash table entry to null, then succeeding find operations might not work properly Recall that for the find algorithm, seeing a null means a target element is not found but in fact the element might be in a next slot Solution: tombstone Arrange it so that deleted entries seem null when inserting, but don’t seem null when searching Requires a simple flag on the objects stored

14 Hashing Slide 14 Hash tables in Java java.util.Hashtable Important methods for the Hashtable class put(Object key, Object entry) Object get(Object key) remove(Object key) boolean containsKey(Object key)

15 Hashing Slide 15 Summary Hash tables implement the dictionary data structure and enable O(1) insert, find, and remove operations Caveat: O(n) in the worst-case because of the possibility of collisions Requires a hash function (maps keys to array indices) and a collision resolution policy Probe sequence depicts a sequence of array slots that an object would occupy, given its key In Java: use the Hashtable class


Download ppt "Hashing CS 105. Hashing Slide 2 Hashing - Introduction In a dictionary, if it can be arranged such that the key is also the index to the array that stores."

Similar presentations


Ads by Google