1 HashingHashing Alan, Tam Siu Lung
2 Prerequisites List ADT –Linked List Table ADT –Array Mathematics –Modular Arithmetic Computer Organization –ASCII Algorithm –Order Analysis
3 Basic Data Types Pascal TypeStorageOperations WordA positive integer+, -, *, div, mod DoubleA real number +, -, *, /, int, frac Array[1..12] of Boolean A sequence of 12 bits y := a[] (get), a[] := y (set)
4 Abstract Data Types (ADT) Stack –Can add and remove in LIFO order Queue –Can add and remove in FIFO order Priority Queue –Can add. Can remove in larger first order. v is comparable.
5 Data Structure An ADT, implemented by a Data Type E.g. –ArrayList, using an array to implement a List ADT –ArrayHeap, using an array to implement a Heap (may in turn implements a PQ)
6 Dictionary ADT Add(k, v) –Add a key-value pair Remove(k) –Remove a key-value pair given the key Search(k) : v –Search for the value given the key A Table ADT only differs in that key is an integer in range.
7 Direct Addressing Use the Table ADT The key is the location Efficient: O(1) for all operations Infeasible: if the key can range from 1 to , if the key is not numeric... 0Ant 5Boy 99Car
8 Time Complexity Average CaseAddRemoveSearch ArrayO(1)O(n) Sorted ArrayO(n)O(lg n) Linked ListO(1)O(n) BSTO(lg n) Hash Table~O(1) Note: For sorted array and BST, keys have to be ordered.
9 Hash Function Hash Function: h m (k) Map all keys into an integer domain, e.g. 0 to m - 1 E.g. CRC32 hashes strings into 32-bit integer (i.e. m = 2 32 ) –Alan: –Max: –Man: –On: Note: We won ’ t use such a big m in our programs!
10 Hash Table Use a Table ADT of size m Use h(k) as the key All operations can be done like using Table Solved except –Collision: What to do if two different k have same h(k) –How to find a suitable hash function
11 Hash Functions If k is an integer, use h(k) = k mod m More advanced: floor(m*frac(k*A)) for some 0 < A < 1 If k is a string, convert it to an integer, e.g. h( ‘ Alan ’ ) = [ASC( ‘ A ’ )* ASC( ‘ l ’ )* ASC( ‘ a ’ )*256+ASC( ‘ n ’ )] mod m If k is other data type, try to combine all features of the type
12 Chaining (a.k.a. Open Hashing) Use Table > instead When there are multiple k ’ s with same h(k), add it to the list (usually linked list) When searching, remove it from the list Order: O(length of all lists)
13 Chaining Samples 0 5 99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man AlanD
14 Chaining Samples 0 5 99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man AlanD MaxZ
15 Chaining Samples 0 5 99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man ManX AlanD MaxZ
16 Chaining Samples 0 5 99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ
17 Chaining Samples 0 5 99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ
18 Chaining Samples 0 5 99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ
19 Chaining Samples 0 5 99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ
20 Chaining Samples 0 5 99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY AlanD MaxZ
21 Chaining (Optional) Note that the Table can be Table > for any Container supporting Add, Remove and Search. Why not consider other things, say another hash table? A BST?
22 Open Addressing (a.k.a. Closed Hashing) During collission, find another slot for the entry E.g. if h(k) is not empty, try h(k)+1, h(k)+2, etc Define the probe sequence be the sequence to slots to try (it should be a permutation of Then both add and search will try the same sequence, so finally must find the pair before an empty slot is reached How about delete? Search and mark it empty? Order: O(length of probe sequence)
23 Open Addressing Samples 0AlanD 1Nil 99Nil Add Max Add Man 0AlanD 1Nil MaxZ 99Nil 0AlanD 1ManX 2Nil 3 4 5MaxZ 99Nil
24 Open Addressing Samples 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ 99Nil Search for Max Add Man 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ 99Nil 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ 99Nil
25 Open Addressing Samples 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ 99Nil Search for Max Delete Man 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ 99Nil 0AlanD 1DelX 2OnY 3Nil 4 5MaxZ 99Nil
26 Collision Resolution The method outlined above is called linear probing –In general, h(k, i) = h(k) + c i –Forms Primary Clustering There is also quadratic probing –In general, h(k, i) = h(k) + c 1 i 2 + c 2 i –Still forms Secondary Clustering
27 Double Hashing (Optional) h(k, i) = ( h(k) + i h ’ (k) ) mod m Note: h ’ (k) cannot be 0 Meaningful h ’ (k) should be in [1, m) E.g. m – k mod (m – 1)
28 How good is Hashing? Nearly constant time if very short list or very low probing rate So we need –A uniform hash function (your job) –A larger hash table (trade it off with memory limit)
29 Size too small? (Optional) Create a new hash table and re-hash all entries (not useful for OI use) If use open addressing, need to re- hash to remove the deleted items anyway
30 Extensible Hashing (Optional) Use Table (Ptr is like the list in chaining) The size m = 2 k Given any uniform hash function h(k), g(k) = last k bits of h(k) Ptr points to an array of size r, each storing an entry The problem: what to do when the array is full
31 Extensible Hashing (Optional) AlanManOn Ben Max h(‘Alan’) = 0, h(‘Man’) = 4, h(‘On’) = 12, h(‘Ben’) = 5, h(‘Max’)=5
32 Extensible Hashing (Optional) AlanManOn BenSi Max Add Si where h( ‘ Si ’ ) = 9, i.e. g( ‘ Si ’ ) = 01
33 Extensible Hashing (Optional) AlanOn BenSi Max Add Unu where h( ‘ Unu ’ ) = 4, i.e. g( ‘ Unu ’ ) = 100 The first array will be split according to their h(k) Still need to chain? ManUnu