DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

§4 Open Addressing 2. Quadratic Probing f ( i ) = i 2 ; /* a quadratic function */ 【 Theorem 】 If quadratic probing is used, and the table size is prime,
Hashing General idea Hash function Separate Chaining Open Addressing
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Techniques.
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Hash Tables1 Part E Hash Tables  
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
§3 Separate Chaining ---- keep a list of all keys that hash to the same value struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef.
1 Joe Meehean 1.  BST easy to implement average-case times O(LogN) worst-case times O(N)  AVL Trees harder to implement worst case times O(LogN)  Can.
HASHING Section 12.7 (P ). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Vishnu Kotrajaras, PhD. What do we want to do? Insert Delete find (constant time) No sorting No Findmin findmax.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables - Motivation
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hashing Vishnu Kotrajaras, PhD Nattee Niparnan, PhD.
1 Designing Hash Tables Sections 5.3, 5.4, 5.5, 5.6.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Fundamental Structures of Computer Science II
CE 221 Data Structures and Algorithms
Hashing.
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
CMSC 341 Hashing.
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations
Collision Resolution Neil Tang 02/18/2010
CMSC 341 Hashing 12/2/2018.
CS202 - Fundamental Structures of Computer Science II
CMSC 341 Hashing 2/18/2019.
CMSC 341 Hashing.
Tree traversal preorder, postorder: applies to any kind of tree
CMSC 341 Hashing 4/11/2019.
Collision Resolution Neil Tang 02/21/2008
CMSC 341 Hashing 4/27/2019.
Ch Hash Tables Array or linked list Binary search trees
Hashing Vishnu Kotrajaras, PhD.
CMSC 341 Lecture 12.
Data Structures and Algorithm Analysis Hashing
CMSC 341 Lecture 12.
Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

2 REVIEW We have investigated the following ADTs LISTS Array Linked List STACKS QUEUE TREES Binary Trees Binary Search Trees AVL Trees What about their running times ?

3 Running times of important operations insertiondeletionfind ArrayO(n) Linked listO(1)O(n) TreeO(log n) Can we decrease the running times more ?

4 ROAD MAP HASHING General Idea Hash Function Separate Chaining Open Adressing Rehashing

5 Hashing Hashing: implementation of hash tables hash table: an array of elements fixed size TableSize Search is performed on a part of the item: key Each key is mapped into a number in the range 0 to TableSize-1 Used as array index Mapping by hash function Simple to compute Ensure that any two distinct keys get different cells How to perform insert, delete and find operations in O(1) time ?

6 An ideal hash table Each key is mapped to a different index ! Not always possible many keys, finite indexes Even distribution Considerations : Choose a hash function Decide what to do when two keys hash to the same value Decide on table size

7 Hash function If keys are integers hash function return Key mod TableSize Ex: TableSize = 10 Keys = 120, 330, 1000 TableSize should be prime

8 Hash function If keys are strings Add ASCII values of the characters If TableSize is large and number of characters is small TableSize = & number of characters in a key = 8 127*8=1016 < int hash( const string & key, int tableSize ) { int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal += key[i]; return hashVal % tableSize; }

9 Hash function If keys are strings Use all characters ∑ 32 i Key [KeySize -i -1 ] Early characters does not count Use only some number of characters Use characters in odd spaces

10 Hash function If keys are strings Use first three characters 729*key[2] + 27*key[1] + key[0] If the keys are not random some part of the table is not used. int hash( const string & key, int tableSize ) { return ( key [0] + 27 * key [1] * key [2]) % tableSize; }

11 int hash( const string & key, int tableSize ) { int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key[ i ]; hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } A good hash function

12 Collusion Main programming detail is collision resolution If when an element is inserted, it hashes to the same value as an already inserted element, there is collision. There are several methods to deal with this problem Separate chaining Open addressing

13 Separate Chaining Hash Table Keep a list of all elements that hash to the same value TableSize = 10 is not good not prime

14 Type declaration for separate chaining hash table template class HashTable { public: explicit HashTable(const HashedObj & notFound,int size = 101); HashTable( const HashTable & rhs ) :ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND),theLists( rhs.theLists ) { } const HashedObj & find( const HashedObj & x ) const; void makeEmpty( ); void insert( const HashedObj & x ); void remove( const HashedObj & x ); const HashTable & operator=( const HashTable & rhs ); private: vector > theLists; // The array of Lists const HashedObj ITEM_NOT_FOUND; }; int hash( const string & key, int tableSize ); int hash( int key, int tableSize );

15 /* Construct the hash table. template HashTable ::HashTable( const HashedObj & notFound, int size ) : ITEM_NOT_FOUND(notFound), theLists( nextPrime( size ) ) {} /* Make the hash table logically empty. template void HashTable ::makeEmpty( ) { for( int i = 0; i < theLists.size( ); i++ ) theLists[ i ].makeEmpty( ); } /* Deep copy. template const HashTable & HashTable :: operator=( const HashTable & rhs ) { if( this != &rhs ) theLists = rhs.theLists; return *this; }

16 /* Remove item x from the hash table. template void HashTable ::remove( const HashedObj & x ) { theLists[ hash( x, theLists.size( ) ) ].remove( x ); } /* Find item x in the hash table. template const HashedObj & HashTable :: find( const HashedObj & x ) const { ListItr itr; itr = theLists[ hash( x, theLists.size( ) ) ].find( x ); if( itr.isPastEnd( ) ) return ITEM_NOT_FOUND; else return itr.retrieve( ); }

17 /* Insert item x into the hash table. template void HashTable ::insert( const HashedObj & x ) { List & whichList = theLists[ hash( x, theLists.size( ) ) ]; ListItr itr = whichList.find( x ); if( itr.isPastEnd( ) ) whichList.insert( x, whichList.zeroth( ) ); }

18 Analysis Let ג be load factor of a hash table number of elements / TableSize ג is the avarage length of a list Successful Find  ג/2 comparisons + time to evaluate hash function Unsuccessful Find & Insert  ג comparisons + time to evaluate hash function Good choise ג ~ 1 Disadvantage of separate chaining is allocate/deallocate memory !

19 Open Adressing If collision  try an alternate cell h 0 (x), h 1 (x), h 2 (x), … h i (x) = (hash(x) + F(i)) mod TableSize F(0) = 0 ג < 1 Good choise < 0.5

20 Linear Probing F is a linear function of i –F(i) = i Insert keys {89, 18, 49, 58, 69} When 49 is inserted collision occurs –Put into the next available spot 0 58 collidates with 18, 89, 49

21 Linear Probing Problem: It is not easy to delete an element May have caused a collision before Mark the element deleted Problem: Primary Clustering

22 Linear Probing Analysis Problem: Primary Clustering

23 Quadratic Probing F(i) is a quadratic function Ex : F(i) = i 2

24 Quadratic Probing When 49 collides with 89, next position attemped is one cell away 58 collides at position 8. The cell one away is tried, another collision occurs. It is inserted into the cell 2 2 =4 away

25 Quadratic Probing Solves primary clustering problem All empty cells may not be accessed A loop around full cells may happen Hash table not full but empty space not found Theorem : If the table size is prime and ג<0.5 new element can always be inserted. Problem : Secondary clustering!...

26 template class HashTable { public: explicit HashTable(const HashedObj & notFound,int size = 101); HashTable( const HashTable & rhs) : ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND), array( rhs.array ), currentSize( rhs.currentSize ) { } const HashedObj & find( const HashedObj & x ) const; void makeEmpty( ); void insert( const HashedObj & x ); void remove( const HashedObj & x ); const HashTable & operator=( const HashTable & rhs ); enum EntryType { ACTIVE, EMPTY, DELETED }; Type declaration for open addressing hash table

27 private: struct HashEntry { HashedObj element; EntryType info; HashEntry( const HashedObj & e = HashedObj( ), EntryType i = EMPTY ) : element( e ), info(i) { } }; vector array; int currentSize; const HashedObj ITEM_NOT_FOUND; bool isActive( int currentPos ) const; int findPos( const HashedObj & x ) const; void rehash( ); }; Type declaration for open addressing hash table

28 /* Construct the hash table. template HashTable :: HashTable( const HashedObj & notFound, int size ) :ITEM_NOT_FOUND( notFound ), array( nextPrime( size ) ) { makeEmpty( ); } /* Make the hash table logically empty. template void HashTable ::makeEmpty( ) { currentSize = 0; for( int i = 0; i < array.size( ); i++ ) array[ i ].info = EMPTY; }

29 /* Find item x in the hash table. template const HashedObj & HashTable :: find( const HashedObj & x ) const { int currentPos = findPos( x ); if( isActive( currentPos ) ) return array[ currentPos ].element; else return ITEM_NOT_FOUND; } /* Method that performs quadratic probing resolution. template int HashTable ::findPos(const HashedObj & x) const { int collisionNum = 0; int currentPos = hash( x, array.size( ) ); while ( array[ currentPos ].info != EMPTY && array[ currentPos ].element != x ) { currentPos += 2 * ++collisionNum - 1; if( currentPos >= array.size( ) ) currentPos -= array.size( ); } return currentPos; }

30 /* Return true if currentPos exists and is active. template bool HashTable ::isActive( int currentPos ) const { return array[ currentPos ].info == ACTIVE; } /* Remove item x from the hash table. template void HashTable ::remove( const HashedObj & x ) { int currentPos = findPos( x ); if( isActive( currentPos ) ) array[ currentPos ].info = DELETED; } /* Insert routine with quadratic probing template void HashTable ::insert( const HashedObj & x ) { int currentPos = findPos( x ); if( isActive( currentPos ) )return; array[ currentPos ] = HashEntry( x, ACTIVE ); }

31 /* Deep copy. template const HashTable & HashTable :: operator=( const HashTable & rhs ) { if( this != &rhs ) { array = rhs.array; currentSize = rhs.currentSize; } return *this; }

32 Double Hashing Use second hash function F(i) = i * hash 2 (x) Poor example : hash 2 (x) = X mod 9 hash 1 (x) = X mod 10 TableSize = 10 If X = 99 what happens ? hash 2 (x) ≠ 0 for any X

33 Double Hashing Good choise : hash 2 (x) = R – (X mod R) R is a prime and < TableSize

34 Double Hashing hash 2 (x) = 7 – (X mod 7)

35 Analysis Random collision resolution Probes are independent No clustering problem Unsuccessful search and Insert Number of probes until an empty cell is found (1- ג) = fraction of cells that are empty 1 / (1- ג) = expected number of probes Successful search P(X)=Number of probes when the element X is inserted 1/N∑ P(X) approximately

36 Rehashing If ג gets large, number of probes increases. Running time of operations starts taking too long and insertions might fail Solution : Rehashing with larger TableSize (usually *2) When to rehash if ג > 0.5 if insertion fails

37 Rehashing Example Elements 13, 15, 24 and 6 is inserted into an open addressing hash table of size 7 H(X) = X mod 7 Linear probing is used to resolve collisions

38 Rehashing Example If 23 is inserted, the table is over 70 percent full. A new table is created 17 is the first prime twice as large as the old one; so H new (X) = X mod 17 

39 Rehashing Rehashing is an expensive operation Running time is O(N) Rehashing frees the programmer from worrying about table size Amortized Analysis: Average over N operations Operations take: O(1) time

40 /* Insert routine with quadratic probing template void HashTable ::insert( const HashedObj & x ) { int currentPos = findPos( x ); if( isActive( currentPos ) )return; array[ currentPos ] = HashEntry( x, ACTIVE ); if( ++currentSize > array.size( ) / 2 ) rehash( ); } /* Expand the hash table. template void HashTable ::rehash( ) { vector oldArray = array; array.resize( nextPrime( 2 * oldArray.size( ) ) ); for( int j = 0; j < array.size( ); j++ ) array[ j ].info = EMPTY; currentSize = 0; for( int i = 0; i < oldArray.size( ); i++ ) if( oldArray[ i ].info == ACTIVE ) insert( oldArray[ i ].element ); }