1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
2 Searching O(n) O(lg n) O(n) O(lg n) balanced O(n) - skewed
3 Can we do better? n What if we have students in a class that have been assigned id numbers in the range 0-50? n Just store each student in his/her student number position in the array: n What is the order of insert, remove, find for such a list? n Would this be a reasonable way to store student records at a university? JaneStevePaulMary... [0][1][2][3]...
4 n Can we use the previous method if we want to use SSN as the key field? n How may positions would our array need? n How many of those would we actually use? n What we need is a way to map the range of possible keys (10 9 for SSNs) onto the available array positions. n For SSNs we could just use the last 4 digits of the SSN to determine where to store the element.
5 n This process of mapping a range of keys onto an address space is known as hashing. n The function (usually more complicated than just key mod 10,000 ) used to find the hash value for a key is called the hashing function. n What is the order of insertion and searching using hashing? n What is the order of remove? n Is there a problem with this technique?
6 Collisions n When two keys have the same hash value (they map to the same location) we have what is called a collision. n When using a hash table we must have some collision resolution method. n Some collision resolution methods include: linear probing, quadratic probing, separate chaining, double hashing
7 Linear Probing n With linear probing, when a collision occurs, the element is placed in the next available position Hash( 89) = 9 Hash( 18) = 8 Hash( 49) = 9 Hash( 58) = 8 Hash( 9) = Problem: primary clustering
8 Quadratic Probing n If the element hashes to location H, the next location we try is H+1 2, then H+2 2, H+3 2, … H+i Hash( 89) = 9 Hash( 18) = 8 Hash( 49) = 9 Hash( 58) = 8 Hash( 9) =
9 Double Hashing Use two hash functions If M is prime, eventually will examine every position in the table double_hash_insert(K) if(table is full) error probe = h1(K) offset = h2(K) while (table[probe] occupied) probe = (probe + offset) mod M table[probe] = K Many of same (dis)advantages as linear probing Distributes keys more uniformly than linear probing
10 Double Hashing Example h1(K) = K mod 13 h2(K) = 8 - K mod 8 we want h2 to be an offset to add
11 Separate Chaining n Maintain an array of linked lists. The hash function tells us which list to insert an element X, and where to find an element X Hash( 89) = 9 Hash( 18) = 8 Hash( 49) = 9 Hash( 58) = 8 Hash( 9) =
12 Pseudo-code (chaining implementation) Has 3 basic methods, and the constructor which: creates the table of M lists insert(K) index = h(K) insert into table[index] find(K) index = h(K) walk down list at table[index], looking for a match return what was found (or error) remove(K) index = h(K) walk down list at table[index], looking for a match remove what was found (or error)
13 Hash Functions Need to choose a good hash function –quick to compute –distributes keys uniformly throughout the table How to deal with hashing non-integer keys: –find some way of turning the keys into integers –for example, if the keys are phone numbers, remove the hyphen in to get ! –for a String, add up the ASCII values of the characters of your string –then use a standard hash function on the integers
14 Hash Functions Use the remainder –h(K) = K mod M –K is the key, M the size of the table Need to choose M M = b e (bad) –if M is a power of two, h(K) gives the e least significant bits of K –all keys with the same ending go to the same place M prime (good) –helps ensure uniform distribution –take a number theory class to understand why
15 Hash Functions (cont.) Mid-Square – h(K) = middle digits of K 2 –I.E. Table size power of 10 h( ) = h( ) = h( ) = –I.E. Table power is power of 2 h(1001) = h(1011) = h(1101) =
16 Theoretical Results Let the load factor: average number of keys per array index Analysis is probabilistic, rather than worst-case
17
18 Sources: Some information in these slides was taken from the slides that accompany Data Structures and Algorithms in Java by Goodrich & Tamassia, Wiley 2014.