CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Hashing General idea Hash function Separate Chaining Open Addressing
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing: Collision Resolution Schemes
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?
Hash Tables1 Part E Hash Tables  
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
Hash Table March COP 3502, UCF.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Vishnu Kotrajaras, PhD. What do we want to do? Insert Delete find (constant time) No sorting No Findmin findmax.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hash Tables - Motivation
CS201: Data Structures and Discrete Mathematics I Hash Table.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing Vishnu Kotrajaras, PhD Nattee Niparnan, PhD.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Fundamental Structures of Computer Science II
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
CMSC 341 Hashing.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Advanced Associative Structures
CMSC 341 Hashing 12/2/2018.
Hashing Alexandra Stefan.
CMSC 341 Hashing 2/18/2019.
CMSC 341 Hashing.
CMSC 341 Hashing 4/11/2019.
CMSC 341 Hashing 4/27/2019.
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing Vishnu Kotrajaras, PhD.
CMSC 341 Lecture 12.
CMSC 341 Lecture 12.
Hashing.
Lecture-Hashing.
Presentation transcript:

CMSC 341 Hashing Readings: Chapter 5

Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2

Project due Nov 5 Midterm II review posted on Tuesday CMSC 341 Hashing 3

4 Motivations We have lots of data to store. We desire efficient – O( 1 ) – performance for insertion, deletion and searching. Too much (wasted) memory is required if we use an array indexed by the data’s key. The solution is a “hash table”.

CMSC 341 Hashing 5 Hash Table Basic Idea  The hash table is an array of size ‘m’  The storage index for an item determined by a hash function h(k): U  {0, 1, …, m-1} 012m-1

Exercise: A Simple Example CMSC 341 Hashing 6 Example: insert 89, 18, 49, 58, 69 to a table size of 10. Public static int hash(String key, int tableSize) { hashVal %= tableSize; return hasVal; } What is the problem here? How to resolve it? Hints: (1)How should we choose m? (2)How to pick a hashing function? Getting a better hash function; make a table (instead we make a linked list); pick a better table size (prime number) Hash function: h( k ) = k mod m where m is the table size.

CMSC 341 Hashing 7 Example: h’(k) = k mod 10 in a table of size 10 (not prime, but easy to calculate) U={89,18,49,58,69} f(I) = I hashes to hashes to hashes to 9, collides with 89 h(k,1) = (49%10+1)%10= hashes to 8, collides with 18 h(k,1)=(58 % ) % 10=9, collides with 89 h(k,2)=(58 % 10+2)%10=0, collides with 49 h(k,3)=(58 % 10+3)%10= hashes to 9, collides with 89 h(69,1) = (h’(69)+f(1))mod 10 = 0, collides with 49 h(69,2) = (h’(69+f(2))mod 10 =0, collides with 58 h(69,3) = (h’(69)+f(3))mod 10 = 2 Hashing function: F(i) = i

CMSC 341 Hashing 8 Hash Table Basic Idea  The hash table is an array of size ‘m’  The storage index for an item determined by a hash function h(k): U  {0, 1, …, m-1} Desired Properties of h(k)  easy to compute  uniform distribution of keys over {0, 1, …, m-1} when h(k 1 ) = h(k 2 ) for k 1, k 2  U, we have a collision 012m-1

CMSC 341 Hashing 9 Division Method The hash function: h( k ) = k mod m where m is the table size. m must be chosen to spread keys evenly.  Poor choice: m = a power of 10  Poor choice: m = 2 b, b> 1 A good choice of m is a prime number. Table should be no more than 80% full.  Choose m as smallest prime number greater than m min, where m min = (expected number of entries)/0.8

CMSC 341 Hashing 10 Handle Non-integer Keys In order to have a non-integer key, must first convert to a positive integer: h( k ) = g( f( k ) ) with f: U  integer g: I  {0.. m-1} Suppose the keys are strings. How can we convert a string (or characters) into an integer value?

CMSC 341 Hashing 11 Horner’s Rule static int hash(String key, int tableSize) { int hashVal = 0; for (int i = 0; i < key.length(); i++) hashVal = 37 * hashVal + key.charAt(i); hashVal %= tableSize; if(hashVal < 0) hashVal += tableSize; return hashVal; }

Exercise: Hash Function Which hashFunction is better, when tableSize =10,007? Method 1: Public static int hash(String key, int tableSize) { int hashVal =0; for(int i=0; i<key.length(); i++) hashVal += key.charAt (i); return hashVal % tableSize } // not good: waste a lot of memory Method 2: Assuming three letters Public static int hash(String key, int tableSize) { return (key.charAt(0)+27*key.charAt(1)+27^2*key.charAt(2)) % tableSize; } CMSC 341 Hashing 12 Method 3: Public static int hash(String key, int tableSize) { int hashVal = 0; for(int i=0; i<key.length(); i++) hashVal = 37*hashVal + key.charAt(i); hashVal %= tableSize; if(hashVal < 0) hashVal += tableSize; }

CMSC 341 Hashing 13 HashTable Class public class SeparateChainingHashTable { public SeparateChainingHashTable( ){/* Later */} public SeparateChainingHashTable(int size){/*Later*/} public void insert( AnyType x ){ /*Later*/ } public void remove( AnyType x ){ /*Later*/} public boolean contains( AnyType x ){/*Later */} public void makeEmpty( ){ /* Later */ } private static final int DEFAULT_TABLE_SIZE = 101; private List [ ] theLists; private int currentSize; private void rehash( ){ /* Later */ } private int myhash( AnyType x ){ /* Later */ } private static int nextPrime( int n ){ /* Later */ } private static boolean isPrime( int n ){ /* Later */ } }

CMSC 341 Hashing 14 HashTable Ops boolean contains( AnyType x )  Returns true if x is present in the table. void insert (AnyType x)  If x already in table, do nothing.  Otherwise, insert it, using the appropriate hash function. void remove (AnyType x)  Remove the instance of x, if x is present.  Otherwise, does nothing void makeEmpty()

CMSC 341 Hashing 15 Hash Methods private int myhash( AnyType x ) { int hashVal = x.hashCode( ); hashVal %= theLists.length; if( hashVal < 0 ) hashVal += theLists.length; return hashVal; }

CMSC 341 Hashing 16 Handling Collisions Collisions are inevitable. How to handle them? Separate chaining hash tables  Store colliding items in a list.  If m is large enough, list lengths are small. Insertion of key k  hash( k ) to find the proper list.  If k is in that list, do nothing, else insert k on that list. Asymptotic performance  If always inserted at head of list, and no duplicates, insert = O(1) for best, worst and average cases

CMSC 341 Hashing 17 Hash Class for Separate Chaining To implement separate chaining, the private data of the hash table is an array of Lists. The hash functions are written using List functions private List [ ] theLists;

CMSC 341 Hashing 18 Performance of contains( ) contains  Hash k to find the proper list.  Call contains( ) on that list which returns a boolean. Performance  best: selected list is empty or key is first -> O(1)  worst: let N be the number of elements in the hash table. All N elements are in one list (all have the same hash value) and key not there -> O(N)  Average: suppose there are M buckets and N elements in the table. Then expected list length = N/M -> O (N/M) = O(N) if M is small. = O(1) if M is large. Here = N/M is called the load factor of the table. It is important to keep the load factor from getting too large. If N O(1) where N/M is constant.

CMSC 341 Hashing 19 Performance of remove( ) Remove k from table  Hash k to find proper list.  Remove k from list. Performance  Best: K is the 1 st element on list, or list is empty: O(1)  Worst: all elements on one list: O(n)  Average: O(N/M)-> O(1) for <=1. So what is the big deal? Performance for hash table and list are the same best and worst… But average performance for a well- designed hash table is much better: O(1).

CMSC 341 Hashing 20 Handling Collisions Revisited Probing hash tables  All elements stored in the table itself (so table should be large. Rule of thumb: m >= 2N)  Upon collision, item is hashed to a new (open) slot. Hash function h: U x {0,1,2,….}  {0,1,…,m-1} h( k, i ) = ( h’( k ) + f( i ) ) mod m for some h’: U  { 0, 1,…, m-1} and some f( i ) such that f(0) = 0 Each attempt to find an open slot (i.e. calculating h( k, i )) is called a probe

CMSC 341 Hashing 21 HashEntry Class for Probing Hash Tables In this case, the hash table is just an array private static class HashEntry { public AnyType element; // the element public boolean isActive; // false if deleted public HashEntry( AnyType e ) { this( e, true ); } public HashEntry( AnyType e, boolean active ) { element = e; isActive = active; } } // The array of elements private HashEntry [ ] array; // The number of occupied cells private int currentSize;

CMSC 341 Hashing 22 Linear Probing Use a linear function for f( i ) f( i ) = c * i Example: h’( k ) = k mod 10 in a table of size 10, f( i ) = i So that h( k, i ) = (k mod 10 + i ) mod 10 Insert the values U={89,18,49,58,69} into the hash table

CMSC 341 Hashing 23 Linear Probing (cont.) Problem: Clustering  When the table starts to fill up, performance  O(N) Asymptotic Performance  Insertion and unsuccessful find, average is the “load factor” – what fraction of the table is used Number of probes  ( ½ ) ( 1+1/( 1- ) 2 ) if  1, the denominator goes to zero and the number of probes goes to infinity

CMSC 341 Hashing 24 Linear Probing (cont.) Remove  Can’t just use the hash function(s) to find the object and remove it, because objects that were inserted after X were hashed based on X’s presence.  Can just mark the cell as deleted so it won’t be found anymore. Other elements still in right cells Table can fill with lots of deleted junk

CMSC 341 Hashing 25 Quadratic Probing Use a quadratic function for f( i ) f( i ) = c 2 i 2 + c 1 i + c 0 The simplest quadratic function is f( i ) = i 2 Example: Let f( i ) = i 2 and m = 10 Let h’( k ) = k mod 10 So that h( k, i ) = (k mod 10 + i 2 ) mod 10 Insert the value U={89, 18, 49, 58, 69 } into an initially empty hash table

CMSC 341 Hashing 26 Quadratic Probing (cont.) Advantage:  Reduced clustering problem Disadvantages:  Reduced number of sequences  No guarantee that empty slot will be found if λ ≥ 0.5, even if m is prime  If m is not prime, may not find an empty slot even if λ < 0.5

CMSC 341 Hashing 27 Double Hashing Let f( i ) use another hash function f( i ) = i * h 2 ( k ) Then h( k, I ) = ( h’( k ) + * h 2 ( k ) ) mod m And probes are performed at distances of h 2 ( k ), 2 * h 2 ( k ), 3 * h 2 ( k ), 4 * h 2 ( k ), etc Choosing h 2 ( k )  Don’t allow h 2 ( k ) = 0 for any k.  A good choice: h 2 ( k ) = R - ( k mod R ) with R a prime smaller than m Characteristics  No clustering problem  Requires a second hash function

CMSC 341 Hashing 28 Rehashing If the table gets too full, the running time of the basic operations starts to degrade. For hash tables with separate chaining, “too full” means more than one element per list (on average) For probing hash tables, “too full” is determined as an arbitrary value of the load factor. To rehash, make a copy of the hash table, double the table size, and insert all elements (from the copy) of the old table into the new table Rehashing is expensive, but occurs very infrequently.

CMSC 341 Hashing 29 Multiplication Method The hash function: h( k ) =  m( kA -  kA  )  where A is some real positive constant. A very good choice of A is the inverse of the “golden ratio.” Given two positive numbers x and y, the ratio x/y is the “golden ratio” if  = x/y = (x+y)/x The golden ratio: x 2 - xy - y 2 = 0   2 -  - 1 = 0  = (1 + sqrt(5))/2 = … ~= Fib i /Fib i-1

CMSC 341 Hashing 30 Multiplication Method (cont.) Because of the relationship of the golden ratio to Fibonacci numbers, this particular value of A in the multiplication method is called “Fibonacci hashing.” Some values of h( k ) =  m(k  -1 -  k  -1  )  = 0for k = 0 = 0.618mfor k = 1 (  -1 = 1/ 1.618… = 0.618…) = 0.236mfor k = 2 = 0.854mfor k = 3 = 0.472mfor k = 4 = 0.090m for k = 5 = 0.708m for k = 6 = 0.326m for k = 7 = … = 0.777m for k = 32

CMSC 341 Hashing 31