Download presentation
Presentation is loading. Please wait.
Published byAlexandra Butler Modified over 9 years ago
1
Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011
2
Hashing ► In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast ► Example: empdata[1000] index = employee ID number ► Search for employee with ID number 500 ► return empdata[500] ► Running Time: O(1)
3
Hash Table ► A data structure implemented as an array of objects, where the search keys correspond to the array indices ► Insert and find operations involve straight forward array accesses: O(1) time complexity
4
About Hash Tables ► In the first example shown, it was relatively easy since employee number is an integer ► A few problems may arise in different situations
5
About Hash Table ► Problem 1: possible integer key values might be too large; creating an appropriate array might be impractical ► Need to map large integer values to smaller array indices ► Problem 2: What if the key is a word in the English Alphabet (e.g. last names) ► Need to map names to integers (indices)
6
Large Values to Small Values ► Hash function: converts a number from a large range into a number from a smaller range (the range of array indices) ► Size of the array ► Rule of thumb: the array size should be about twice the size of the data set ► For 50,000 words, use an array of 100,000 elements
7
Hash Function and Modulo ► Simplest Hash Function: achieved by using the modulo function (returns the remainder) ► For example, 33 % 10 = 3 ► General Formula: LargeNumber % SmallRange
8
Hash Functions for Names ► Sum of Digits Method ► Map the alphabet A to Z to the numbers 1 to 26 (a=1, b=2, c=3, etc) ► Add the total of the letters ► For example, “cats” ► c=3, a=1, t=20, s=19, 3+1+20+19=43 ► “cats” will be stored using index 43 ► Use modulo to map to a smaller array
9
Collisions ► Problem ► Too many words with the same index ► “was”, “tin”, “give”, “tend”, “moan”, “tick” and several other words add to 43 ► These are called collisions: case where two different search keys hash to the same index value
10
Collisions ► Can occur even when dealing with integers ► Suppose the size of the hash table is 100 ► Keys 158 and 358 hash to the same value when using the modulo hash function
11
Collision Resolution Policy ► Need to know what to do when a collision occurs; i.e. during an insert operation; What if the array slot is already occupied? ► Most common policy: go to the next available slot ► “wrap around” the array if necessary
12
Collision Resolution Policy ► Consequence: when searching, use the hash function, first check whether the element is the one you are looking for ► If not, try the next slots ► How do you know if the element is not in the array?
13
Probe Sequence ► Sequence of indices that serve as array slots where a key value would map to ► The first index in the probe sequence is the home position; the value of the hash function ► The next indices are the alternative slots
14
Probe Sequence ► Suppose the array size is 10, and the hash function is h(K) = K%10. ► The probe sequence for K=25 is: ► 5,6,7,8,9,0,1,2,3,4 ► Here, we assume that most common collision resolution policy of going to the next slot: p(K,i) = I ► Goal: exhaust array slots
15
Hash Table Operations ► Insert object Obj with key value K ► home h(K) for i 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then throw exception “error” // or overwrite it else if HT[pos] is null then HT[pos] Obj break;
16
Hash Table Operations ► Finding an object with key value K ► home h(K) for i 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then return HT[pos] else if HT[pos] is null then throw exception “not found”
17
Hash Table Operations ► Although insert and find run in O(1) time during typical conditions, the time complexity in the worst-case is O(n) ► Something to think about: characterize the worst-case scenarios for insert and find
18
Removing Elements ► Removing an element from a hash table during a delete operation poses a problem ► If we set the corresponding hash table entry to null, then succeeding find operations might not work properly ► Recall that for the find algorithm, seeing a null means a target element is not found but in fact the element might be in a next slot
19
Removing Elements ► Solution: tombstone ► Arrange it so that deleted entries seem null when inserting, but don’t seem null when searching ► Requires a simple flag on the objects stored
20
Hash Tables in Java ► java.util.Hashtable ► Important methods for Hashtable class ► put(Object key, Object entry) ► Object get(Object key) ► remove(Object key) ► boolean constainsKey(Object key)
21
Summary ► Hash tables implement the dictionary data structure and enable O(1) insert, find, and remove operations ► Caveat: O(n) in the worst-case because of the possibility of collisions
22
Summary ► Requires a hash function(maps keys to array indices) and a collision resolution policy ► Probe sequence depicts a sequence of array slots that an object would occupy, given its key ► In Java: use the Hashtable class
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.