1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables.
Hash Tables CIS 606 Spring 2010.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Search  We’ve got all the students here at this university and we want to find information about one of the students.  How do we do it?  Linked List?
Hashing Hashing is another method for sorting and searching data.
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Hashing Jeff Chastine.
Hash table CSC317 We have elements with key and satellite data
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
Hash Tables – 2 Comp 122, Spring 2004.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hash Tables – 2 1.
Presentation transcript:

1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Examples : Examples :  a symbol table created by a compiler  a phone book  an actual dictionary Hash table = a data structure good at implementing dictionaries Hash table = a data structure good at implementing dictionaries

2 Hashing - Introduction Why not just use an array with direct addressing (where each array cell corresponds to a key)? Why not just use an array with direct addressing (where each array cell corresponds to a key)?  Direct-addressing guarantees O(1) worst-case time for Insert/Delete/Search.  BUT sometimes, the number K of keys actually stored is very small compared to the number N of possible keys. Using an array of size N would waste space.  We’d like to use a structure that takes up  (K) space and O(1) average-case time for Insert/Delete/ Search

3 Hashing Hashing = Hashing =  use a table (array/vector) of size m to store elements from a set of much larger size  given a key k, use a function h to compute the slot h(k) for that key. Terminology: Terminology:  h is a hash function  k hashes to slot h(k)  the hash value of k is h(k)  collision : when two keys have the same hash value

4 Hashing What makes a good hash function? What makes a good hash function?  It is easy to compute  It satisfies uniform hashing hash = to chop into small pieces (Merriam- Webster) = to chop any patterns in the keys so that the results are uniformly distributed (cs311) hash = to chop into small pieces (Merriam- Webster) = to chop any patterns in the keys so that the results are uniformly distributed (cs311)

5 Hashing What if the key is not a natural number? What if the key is not a natural number? We must find a way to represent it as a natural number. We must find a way to represent it as a natural number. Examples: Examples:  key i  Use its ascii decimal value, 105  key inx  Combine the individual ascii values in some way, for example, 105* * =

6 Hashing - hash functionsTruncation Ignore part of the key and use the remaining part directly as the index. Ignore part of the key and use the remaining part directly as the index. Example: if the keys are 8-digit numbers and the hash table has 1000 entries, then the first, fourth and eighth digit could make the hash function. Example: if the keys are 8-digit numbers and the hash table has 1000 entries, then the first, fourth and eighth digit could make the hash function. Not a very good method : does not distribute keys uniformly Not a very good method : does not distribute keys uniformly

7 HashingFolding Break up the key in parts and combine them in some way. Break up the key in parts and combine them in some way. Example : if the keys are 8 digit numbers and the hash table has 1000 entries, break up a key into three, three and two digits, add them up and, if necessary, truncate them. Example : if the keys are 8 digit numbers and the hash table has 1000 entries, break up a key into three, three and two digits, add them up and, if necessary, truncate them. Better than truncation. Better than truncation.

8 HashingDivision If the hash table has m slots, define h(k)=k mod m If the hash table has m slots, define h(k)=k mod m Fast Fast Not all values of m are suitable for this. For example powers of 2 should be avoided. Not all values of m are suitable for this. For example powers of 2 should be avoided. Good values for m are prime numbers that are not very close to powers of 2. Good values for m are prime numbers that are not very close to powers of 2.

9 HashingMultiplication h(k)=  m  (k  c-  k  c  ) , 0<c<1 h(k)=  m  (k  c-  k  c  ) , 0<c<1 In English : In English :  Multiply the key k by a constant c, 0<c<1  Take the fractional part of k  c  Multiply that by m  Take the floor of the result The value of m does not make a difference The value of m does not make a difference Some values of c work better than others Some values of c work better than others A good value is A good value is

10 HashingMultiplication Example: Example: Suppose the size of the table, m, is For k=1234, h(k)=850 For k=1235, h(k)=353 For k=1236, h(k)=115 For k=1237, h(k)=660 For k=1238, h(k)=164 For k=1239, h(k)=968 For k=1240, h(k)=471 pattern broken distribution fairly uniform

11 Hashing Universal Hashing Worst-case scenario: The chosen keys all hash to the same slot. This can be avoided if the hash function is not fixed: Worst-case scenario: The chosen keys all hash to the same slot. This can be avoided if the hash function is not fixed: Start with a collection of hash functions Start with a collection of hash functions Select one in random and use that. Select one in random and use that. Good performance on average: the probability that the randomly chosen hash function exhibits the worst-case behavior is very low. Good performance on average: the probability that the randomly chosen hash function exhibits the worst-case behavior is very low.

12 Hashing Universal Hashing Let H be a collection of hash functions that map a given universe U of keys into the range {0, 1,..., m-1}. Let H be a collection of hash functions that map a given universe U of keys into the range {0, 1,..., m-1}. If for each pair of distinct keys k, l  U the number of hash functions h  H for which h(k)==h(l) is  H  / m, then H is called universal. If for each pair of distinct keys k, l  U the number of hash functions h  H for which h(k)==h(l) is  H  / m, then H is called universal.

13 Hashing Given a hash table with m slots and n elements stored in it, we define the load factor of the table as =n/m Given a hash table with m slots and n elements stored in it, we define the load factor of the table as =n/m The load factor gives us an indication of how full the table is. The load factor gives us an indication of how full the table is. The possible values of the load factor depend on the method we use for resolving collisions. The possible values of the load factor depend on the method we use for resolving collisions.

14 Hashing - resolving collisions Chaining a.k.a closed addressing Idea : put all elements that hash to the same slot in a linked list (chain). The slot contains a pointer to the head of the list. Idea : put all elements that hash to the same slot in a linked list (chain). The slot contains a pointer to the head of the list. The load factor indicates the average number of elements stored in a chain. It could be less than, equal to, or larger than 1. The load factor indicates the average number of elements stored in a chain. It could be less than, equal to, or larger than 1.

15 Hashing - resolving collisionsChaining Insert : O(1) Insert : O(1)  worst case Delete : O(1) Delete : O(1)  worst case  assuming doubly-linked list  it’s O(1) after the element has been found Search : ? Search : ?  depends on length of chain.

16 Hashing - resolving collisionsChaining Assumption: simple uniform hashing Assumption: simple uniform hashing  any given key is equally likely to hash into any of the m slots Unsuccessful search: Unsuccessful search:  average time to search unsuccessfully for key k = the average time to search to the end of a chain.  The average length of a chain is.  Total (average) time required :  (1+ )

17 Hashing - resolving collisionsChaining Successful search: Successful search:  expected number e of elements examined during a successful search for key k =1 more than the expected number of elements examined when k was inserted.  it makes no difference whether we insert at the beginning or the end of the list.  Take the average, over the n items in the table, of 1 plus the expected length of the chain to which the ith element was added:

18 Hashing - resolving collisionsChaining – Total time :  (1+ )

19 Hashing - resolving collisionsChaining Both types of search take  (1+ ) time on average. Both types of search take  (1+ ) time on average. If n=O(m), then =O(1) and the total time for Search is O(1) on average If n=O(m), then =O(1) and the total time for Search is O(1) on average Insert : O(1) on the worst case Insert : O(1) on the worst case Delete : O(1) on the worst case Delete : O(1) on the worst case Another idea: Link all unused slots into a free list Another idea: Link all unused slots into a free list

20 Hashing - resolving collisions Open addressing Idea: Idea:  Store all elements in the hash table itself.  If a collision occurs, find another slot. (How?)  When searching for an element examine slots until the element is found or it is clear that it is not in the table.  The sequence of slots to be examined (probed) is computed in a systematic way. It is possible to fill up the table so that you can’t insert any more elements. It is possible to fill up the table so that you can’t insert any more elements.  idea: extendible hash tables?

21 Hashing - resolving collisions Open addressing Probing must be done in a systematic way (why?) Probing must be done in a systematic way (why?) There are several ways to determine a probe sequence: There are several ways to determine a probe sequence:  linear probing  quadratic probing  double hashing  random probing