Chapter 5: Hashing Hash Tables Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Chapter 5: Hashing Hash Tables Lydia Sinapova, Simpson College
Hashing What is Hashing? Direct Access Tables Hash Tables
Hashing - basic idea A mapping between the search keys and indices - efficient searching into an array. Each element is found with one operation only.
Hashing example Example: 1000 students, identification number between 0 and 999, use an array of 1000 elements. SSN of each student a 9-digit number. much more elements than the number of the students, - a great waste of space.
The Approach Directly referencing records in a table using arithmetic operations on keys to map them onto table addresses. Hash function: function that transforms the search key into a table address.
Direct-address tables The most elementary form of hashing. Assumption – direct one-to-one correspondence between the keys and numbers 0, 1, …, m-1., m – not very large. Array A[m]. Each position (slot) in the array contains a pointer to a record, or NIL.
Direct-Address Tables The most elementary form of hashing Assumption – direct one-to-one correspondence between the keys and numbers 0, 1, …, m-1., m – not very large. Array A[m]. Each position (slot) in the array contains either a reference to a record, or NULL. Cost – the size of the array we need is determined the largest key. Not very useful if there are only a few keys
Hash Functions Transform the keys into numbers within a predetermined interval to be used as indices in an array (table, hash table) to store the records
Hash Functions – Numerical Keys Keys – numbers If M is the size of the array, then h(key) = key % M. This will map all the keys into numbers within the interval [0 .. (M-1)].
Hash Functions – Character Keys Keys – strings of characters Treat the binary representation of a key as a number, and then apply the hash function
How keys are treated as numbers If each character is represented with m bits, then the string can be treated as base-m number.
Example A K E Y : 00001 01011 00101 11001 = 1 . 323 + 11 . 322 + 5 . 321 + 25 . 320 = 44271 Each letter is represented by its position in the alphabet. E.G, K is the 11-th letter, and its representation is 01011 ( 11 in decimal)
Long Keys If the keys are very long, an overflow may occur. A solution to this is to apply the Horner’s method in computing the hash function.
Horner’s Method 4x5 + 2x4 + 3x3 + x2 + 7x1 + 9x0 = anxn + an-1.xn-1 + an-2.xn-2 + … + a1x1 + a0x0 = x(x(…x(x (an.x +an-1) + an-2 ) + …. ) + a1) + a0 4x5 + 2x4 + 3x3 + x2 + 7x1 + 9x0 = x( x( x( x ( 4.x +2) + 3) + 1 ) + 7) + 9 The polynomial can be computed by alternating the multiplication and addition operations
Example V E R Y L O N G K E Y 10110 00101 10010 11001 01100 01111 01110 00111 01011 00101 11001 V E R Y L O N G K E Y 22 5 18 25 12 15 14 7 11 5 25 22 . 3210 + 5 . 329 + 18 . 328 + 25 . 327 + 12 . 326 + 15 . 325 + 14 . 324 + 7 . 323 + 11 . 322 + 5 . 321 + 25 . 320
Example (continued) V E R Y L O N G K E Y h0 = (22.32 + 5) % M 22 . 3210 + 5 . 329 + 18 . 328 + 25 . 327 + 12 . 326 + 15 . 325 + 14 . 324 + 7 . 323 + 11 . 322 + 5 . 321 + 25 . 320 (((((((((22.32 + 5)32 + 18)32 + 25)32 + 12)32 + 15)32 + 14)32 + 7)32 +11)32 +5)32 + 25 compute the hash function by applying the mod operation at each step, thus avoiding overflowing. h0 = (22.32 + 5) % M h1 = (32.h0 + 18) % M h2 = (32.h1 +25) % M . . . .
Code int hash32 (char[] name, int tbl_size) { key_length = name.length; int h = 0; for (int i=0; i < key_length ; i++) h = (32 * h + name[i]) % tbl_size; return h; }
Hash Tables sentinel value, or a special field in each slot. Index - integer, generated by a hash function between 0 and M-1 Initially - blank slots. sentinel value, or a special field in each slot.
Hash Tables Insert - hash function to generate an address Search for a key in the table - the same hash function is used.
Size of the Table Table size M - different from the number of records N Load factor: N/M M must be prime to ensure even distribution of keys