Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 HashingHashing Alan, Tam Siu Lung 96397999 99967891.

Similar presentations


Presentation on theme: "1 HashingHashing Alan, Tam Siu Lung 96397999 99967891."— Presentation transcript:

1 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891

2 2 Prerequisites List ADT –Linked List Table ADT –Array Mathematics –Modular Arithmetic Computer Organization –ASCII Algorithm –Order Analysis

3 3 Basic Data Types Pascal TypeStorageOperations WordA positive integer+, -, *, div, mod DoubleA real number +, -, *, /, int, frac Array[1..12] of Boolean A sequence of 12 bits y := a[] (get), a[] := y (set)

4 4 Abstract Data Types (ADT) Stack –Can add and remove in LIFO order Queue –Can add and remove in FIFO order Priority Queue –Can add. Can remove in larger first order. v is comparable.

5 5 Data Structure An ADT, implemented by a Data Type E.g. –ArrayList, using an array to implement a List ADT –ArrayHeap, using an array to implement a Heap (may in turn implements a PQ)

6 6 Dictionary ADT Add(k, v) –Add a key-value pair Remove(k) –Remove a key-value pair given the key Search(k) : v –Search for the value given the key A Table ADT only differs in that key is an integer in range.

7 7 Direct Addressing Use the Table ADT The key is the location Efficient: O(1) for all operations Infeasible: if the key can range from 1 to 20000000000, if the key is not numeric... 0Ant  5Boy           99Car

8 8 Time Complexity Average CaseAddRemoveSearch ArrayO(1)O(n) Sorted ArrayO(n)O(lg n) Linked ListO(1)O(n) BSTO(lg n) Hash Table~O(1) Note: For sorted array and BST, keys have to be ordered.

9 9 Hash Function Hash Function: h m (k) Map all keys into an integer domain, e.g. 0 to m - 1 E.g. CRC32 hashes strings into 32-bit integer (i.e. m = 2 32 ) –Alan: 1598313570 –Max: 3452409927 –Man: 943766770 –On: 2246271074 Note: We won ’ t use such a big m in our programs!

10 10 Hash Table Use a Table ADT of size m Use h(k) as the key All operations can be done like using Table Solved except –Collision: What to do if two different k have same h(k) –How to find a suitable hash function

11 11 Hash Functions If k is an integer, use h(k) = k mod m More advanced: floor(m*frac(k*A)) for some 0 < A < 1 If k is a string, convert it to an integer, e.g. h( ‘ Alan ’ ) = [ASC( ‘ A ’ )*256 3 + ASC( ‘ l ’ )*256 2 + ASC( ‘ a ’ )*256+ASC( ‘ n ’ )] mod m If k is other data type, try to combine all features of the type

12 12 Chaining (a.k.a. Open Hashing) Use Table > instead When there are multiple k ’ s with same h(k), add it to the list (usually linked list) When searching, remove it from the list Order: O(length of all lists)

13 13 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man AlanD

14 14 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man AlanD MaxZ

15 15 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man ManX AlanD MaxZ

16 16 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ

17 17 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ

18 18 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ

19 19 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ

20 20 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY AlanD MaxZ

21 21 Chaining (Optional) Note that the Table can be Table > for any Container supporting Add, Remove and Search. Why not consider other things, say another hash table? A BST?

22 22 Open Addressing (a.k.a. Closed Hashing) During collission, find another slot for the entry E.g. if h(k) is not empty, try h(k)+1, h(k)+2, etc Define the probe sequence be the sequence to slots to try (it should be a permutation of Then both add and search will try the same sequence, so finally must find the pair before an empty slot is reached How about delete? Search and mark it empty? Order: O(length of probe sequence)

23 23 Open Addressing Samples 0AlanD 1Nil 2 3 4 5  99Nil Add Max Add Man 0AlanD 1Nil 2 3 4 5MaxZ  99Nil 0AlanD 1ManX 2Nil 3 4 5MaxZ  99Nil

24 24 Open Addressing Samples 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ  99Nil Search for Max Add Man 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ  99Nil 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ  99Nil

25 25 Open Addressing Samples 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ  99Nil Search for Max Delete Man 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ  99Nil 0AlanD 1DelX 2OnY 3Nil 4 5MaxZ  99Nil

26 26 Collision Resolution The method outlined above is called linear probing –In general, h(k, i) = h(k) + c i –Forms Primary Clustering There is also quadratic probing –In general, h(k, i) = h(k) + c 1 i 2 + c 2 i –Still forms Secondary Clustering

27 27 Double Hashing (Optional) h(k, i) = ( h(k) + i h ’ (k) ) mod m Note: h ’ (k) cannot be 0 Meaningful h ’ (k) should be in [1, m) E.g. m – k mod (m – 1)

28 28 How good is Hashing? Nearly constant time if very short list or very low probing rate So we need –A uniform hash function (your job) –A larger hash table (trade it off with memory limit)

29 29 Size too small? (Optional) Create a new hash table and re-hash all entries (not useful for OI use) If use open addressing, need to re- hash to remove the deleted items anyway

30 30 Extensible Hashing (Optional) Use Table (Ptr is like the list in chaining) The size m = 2 k Given any uniform hash function h(k), g(k) = last k bits of h(k) Ptr points to an array of size r, each storing an entry The problem: what to do when the array is full

31 31 Extensible Hashing (Optional) 00 01 10 11 AlanManOn Ben Max h(‘Alan’) = 0, h(‘Man’) = 4, h(‘On’) = 12, h(‘Ben’) = 5, h(‘Max’)=5

32 32 Extensible Hashing (Optional) 00 01 10 11 AlanManOn BenSi Max Add Si where h( ‘ Si ’ ) = 9, i.e. g( ‘ Si ’ ) = 01

33 33 Extensible Hashing (Optional) 000 001 010 011 100 101 110 111 AlanOn BenSi Max Add Unu where h( ‘ Unu ’ ) = 4, i.e. g( ‘ Unu ’ ) = 100 The first array will be split according to their h(k) Still need to chain? ManUnu


Download ppt "1 HashingHashing Alan, Tam Siu Lung 96397999 99967891."

Similar presentations


Ads by Google