© Love Ekenberg Hashing Love Ekenberg. © Love Ekenberg In General These slides provide an overview of different hashing techniques that are used to store.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
Hashing CS 3358 Data Structures.
Data Structures Hash Tables
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Spring 2015 Lecture 6: Hash Tables
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
IT 60101: Lecture #151 Foundation of Computing Systems Lecture 15 Searching Algorithms.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
1 Chapter 9 Searching And Table. 2 OBJECTIVE Introduces: Basic searching concept Type of searching Hash function Collision problems.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Sections 10.5 – 10.6 Hashing.
CE 221 Data Structures and Algorithms
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
Hash Tables (Chapter 13) Part 2.
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash functions Open addressing
Hash Table.
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Lecture-Hashing.
Presentation transcript:

© Love Ekenberg Hashing Love Ekenberg

© Love Ekenberg In General These slides provide an overview of different hashing techniques that are used to store data efficiently. The use of hash tables and hash functions is shown The three hashing techniques treated here are: separate linking, linear probing, and double hashing.

© Love Ekenberg Storage The representation of arrays requires too much space. Suppose we want to represent all words in Swedish that are shorter than 10 letters. Because the Swedish alphabet contains 29 letters, we would have to store up to … + 29 words, i.e, all 10-letter words and all 9-letter words, etc. Each and every word would then be placed in an array. There are about 250,000 words in Swedish so only a fraction would be meaningful.

© Love Ekenberg Projection 100 milion words can be projected into a number of boxes Problem: There would be some difficulty in addressing the elements. This would become almost as demanding memory-wise as storing the words. Uneven storage may arise in the boxes. Solution: List all the meaningful words in each box. The name of the box then becomes a headinng for the list. Distribute the words as randomly as possible in the boxes in order to achieve even storage.

© Love Ekenberg Hash Tables Hash tables attempt to find an appropriate path between memory and efficiency requirements. Headings are set for addressing the boxes. Then each box can be searched sequentially. In the example below, there are M boxes, where M is an appropriate number. The number (heading) for an element x is generated from the element x via a hash function - h(x). Heading: … h(x)element 1->element 2-> …-> element n … M-1

© Love Ekenberg Hash Functions A hash function takes and argument x and generates a value between 0 and M-1, where M is the number of boxes (headings) in the table. The value h(x) is where element x is put. The idea is to combine direct access with searching in a list, but where the list (in the best case) only has 1/M times as many elements as the original set and where the elements are more or less evenly distributed amongst the boxes. Given 100 words evenly sorted into 10 boxes, then 10, (100/10) elements are in each box.

© Love Ekenberg Example Let ORD be a function that yields position in the Swedish alphabet. If the elements to be stored are words in Swedish the element x can be a 1 a 2 …a k, where a i is a letter. Then f(x) can be defined as ORD(a 1 ) + ORD(a 2 ) + … + ORD(a k ). Lastly let h(x) be the hash function f(x) MOD M. (x MOD M yields the remainder of x divided by M. For example, 75 MOD 10 = 5 since 75 = 7  )

© Love Ekenberg Example (cont.) Store the following string. anyone lived in a pretty how town word = array [1..10] of characters function h(x: word): integer sum := 0 for i := 1 to 10 do sum := sum + ORD(x[i]) h := sum MOD M

© Love Ekenberg Example (cont.) WordSumBucket anyone7783 lived6922 in4711 a3850 pretty8083 how5583 town6483 Here the choice of hash function is important. The example displays a certain uneveness because too many elements come under heading 3.

© Love Ekenberg Operations on Hash Tables We can now operate on hash tables in various ways. Common such operations are: –inserting elements –deleting elements –checking elements (Look up) The algorithm for performing one of these operations is: 1. Calculate h(x). 2. Use the array of pointers to find the list h(x) of elements. 3. Carry out the operation.

© Love Ekenberg Example Proc bucketInsert(x:Word; L:List) Proc inserts element x if L = NIL then NIL is the end of the list new(L) Here a new element is defined in list L L.element := x the element is x L.next := NIL the element after x is not found so the list ends here else if L.element <>x then bucketInsert(x, L.next) If the element x differs form the current element in the list then the procedure is called again with the next element in the list. Suppose we want to delete ‘pretty’. Calculate h(pretty) which is 3. The second cell contains pretty and is deleted.

© Love Ekenberg Complexity of Operations on Hash Tables Finding the hash number is O(1) Naturally this assumes the hash function is not too complicated Furthermore O(N/M) is required, where N is the number of elements stored. This holds since, for example, insertBucket requires time proportional to the number of elements in the list, which on average is the total number of elements N divided by the number of boxes M.

© Love Ekenberg Separate linking The technique described here is called separate linking. Separate linking is a technique which divides a number of elements in boxes within these boxes the elements are sequentially linked to each other. It should now be easy to accept the following theorem. Theorem Separate linking reduces the number of comparisons for a sequential search by a factor of M (on average).

© Love Ekenberg Some Observations Let N be the total number of elements and M be the number of headings. If N and M are close then the result is about O(1). If M > N then O(1) still holds if at most one element is sorted under each heading. It is therefore pointless to extend the table. If N is much larger than M, then a larger M can (and should) be chosen and all the elements moved to the new table. This takes time O(N), but this is no longer than the time it takes to insert the elements into the original table O(N*O(1)).

© Love Ekenberg Linear Probing If the number of elements in the table can be assessed in advance, then M > N can be chosen and so called ‘open addressing methods’ used. This means that we know that there is room for an element in each box and therefore do not need linked lists. The advantage of this is direct access to the elements, never requiring a search through the linked lists. A suitable technique in this case is linear probing. If a collision occurs then the next box is used If there is free space: insert (or delete, or check) the element and finish Otherwise continue

© Love Ekenberg Example Let M = 19. Sort the string ASEARCHINGEXAMPLE using the hash function ORD(x) MOD 19 as below. ASEARCHINGEXAMPLE Clearly several elements come under the same heading, which should be avoided. When such collisions occur a simple trick is to move the element to the next available space, i.e, test the next box. If there is an element there then test the next box etc. Continue in this way until an empty box is found.

© Love Ekenberg Example (cont.) The first collision occurs when trying to place the second A, i.e, upon reaching ASEA. The hash function prescribes sorting it under heading SAE However, heading 1is taken and since there are no elements under heading 2, the A can be put under there SAAE

© Love Ekenberg Example (cont.) Continuing like this will gradually yíeld the following table SAACEGHINR The next element is a new E. Heading 5 is taken, but heading 6 is free. So E can be put under heading SAACEEGHINR The next element is X. The hashing function ORD(x) MOD 19 projects X onto 5, which is taken so the algorithm tries 6, which is also taken so it then tries 7 which is also taken. Continuing in this way, finally 10 is found to be free and X is placed there SAACEEGHIXNR

© Love Ekenberg Theorem The following holds but need not be proved. Linear probing uses 1/2 + 1/2(1 - N/M)^2 operations in the worst case and 1/2 + 1/2(1 - N/M) on average.

© Love Ekenberg Double Hashing As can be seen from the example above, linear probing is inefficient when nearby boxes begin to fill up. This is termed clustering. An alternative is double hashing. Double hashing is used to avoid clustering, and uses a function h 2 (v) to shunt elements along. Instead of moving one step ((h 1 ( x) + 1) MOD M) as in linear probing, h 1 ( x) + h 2 (h 1 (x)) MOD M steps are moved, where h 1 (x) is the first hash function. A good function is h 2 (h 1 (x)) = M (h 1 (x)) MOD (M-2) Another is h 2 (h 1 (x)) = 8 - ((h 1 (x)) MOD 8). (See the example below)

© Love Ekenberg Example The table below shows the projections of the functions h 1 and h 2. ASEARCHINGEXAMPLE h h 2 When a collison occurs the first the square to be examined is that at position x + h 2 (h 1 (x)) MOD 19, where h 2 (h 1 (x)) = 8 - ((h 1 (x)) MOD 8). For example: h 2 (h 1 (A)) = h 2 (1) = 8 - (1 MOD 8) = = 7. h 2 (h 1 (P)) = h 2 (16) = 8 - (16 MOD 8) = 8.

© Love Ekenberg Example (cont.) The first collision occurs upon trying to insert the second A, i.e, upon arriving at ASEA. The hash function prescribes placing it under heading SAE h 2 (h 1 (A)) = h 2 (1) = 8 - (1 MOD 8) = = = 8 and since there no elements under heading 8, the new A is put there SAE A In this way the elements can be spread out using both functions.

© Love Ekenberg The Choice of Function Naturally h 2 (x) should be chosen wisely. For example neither M nor h 2 (x) should be divisors of the other. Example: Let M be 10 and h 2 (x) = x MOD 6. Now try to sort the string EEEE. ORD(E) = 5, h 2 (5) = 5. The first E is put under heading 1. Because ORD(E) = 5 och h 2 (5) = 5, the second E comes under heading 5+5 = 10. Since 0 = 10 MOD 10, E comes under heading 0. The third E comes under 0+5 = 5. This heading is taken so E is sent on to heading 5+5 MOD 10 = 0. With this heading also taken, E is sent to 5 again. The algorithm has returned without h 2 able to find a free space, in spite of several available spaces remaining. Similar behaviour occurs for any function that is a divisor of M EE EEE

© Love Ekenberg Theorem The following holds but need not be proved. Double hashing uses 1/(1 - N/M) operations in the worst case and ln(1 - N/M)/(N/M) on average.