Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked list? (O(N)) – (balanced) binary search tree (e.g., AVL tree)? (O(log 2 N)) – Unsorted array (O(N)) – Sorted array, binary search (O(log 2 N)) Can we do better than O(log n) (e.g., O(1))? – Yes, we can use “hash” functions
Hashing Basic Idea Use hash function to map keys into positions in a hash table Ideally If element e has key k and h is hash function, then e is stored in position h(k) of table To search for e, compute h(k) to locate position. If no element, the table does not contain e.
Hashing (cont’d) The general idea behind hashing is to directly map each data item into a position in a table (e.g., an array) using some function h – Complex data items must have a “key” in order to perform the hashing. Then, position = h(key) Data Name: Mike Height:.. Address: … h(“Mike”) =1 – With perfect hashing, every key is mapped to a different position. What is the implication of this in terms of T, the size of the array? 0 1Mike 2 T-1
Some Hash Function Example Division Folding
Hash Function: Division Hash Functions: Division If the keys are integers then key%T is generally a good hash function
Example The hash function was h(key) = key%10. Inserting entries 20, 25, 34, 76, 59, and
Hash Division What happens if more than one key hash to the same position? In general, to avoid situations like this, table size T should be a prime number. Collision methods, will be discussed later
Hash Function: Folding A key is divided into several parts, and use a simple operation to combine them in a certain way.
Hash Function: Folding If the keys are strings the hash function is some function of the characters in the strings. One possibility is to simply add the ASCII values of the characters: h(str) = (sum(str(i)))%T Another possibility is to convert the string into some number in some arbitrary base b (b also might be a prime number): Example h(ABC)=(65b 0 +66b 1 +67b 2 )%T
Folding Example int h(String x, int T) { int i, sum; for (sum=0, i=0; i<x.length(); i++) sum+= (int)x.charAt(i); return (sum%T); } – sums the ASCII values of the letters in the string ASCII value for “ A ” =65; sum will be in range for 10 upper-case letters; good when T around 100, for example – order of chars in string has no effect
Collision What happens if more than one key hash to the same position? The problem is called a “collision” – Solution #1: Open Addressing – Solution #2: “separate chaining” collision resolution approach, all data items that hash to the same position are kept in a linked list. So, to find the item that we are looking for, we have to search the linked list Hash functions should try to achieve uniform coverage of the hash table, while minimizing collisions
Handling Collision “Open addressing” resolves collisions by trying alternative slots in the hash table, until an empty cell is found. In general we try cells in the following order H 1 (key) = (h(key) + f(i))%T f(0) = 0
Example: Handling Collision-Open Addressing T =10, after 0, 1, 4, 19, 16, 25, 36, 59, 64 and 81 have been inserted using open addressing. The hash function h(key) = key %10 and f(i) =i ** 381 ** ** 919
Handling Collision: Chaining With “separate chaining” all data items that hash to the same location are kept in a linked list in that location. So, each entry in the hash table can be a linked list of arbitrary length. To find a piece of data we use the hash function to find the correct list, and then we search the list.
Chaining Example T =10, after 0, 1, 4, 19, 16, 25, 36, 59, 64 and 81 have been inserted > > > >19
Store colliding elements in the same position in the table When a bucket is full, the open addressing can be used.