Hashing Vishnu Kotrajaras, PhD. What do we want to do? Insert Delete find (constant time) No sorting No Findmin findmax.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

§4 Open Addressing 2. Quadratic Probing f ( i ) = i 2 ; /* a quadratic function */ 【 Theorem 】 If quadratic probing is used, and the table size is prime,
Hashing General idea Hash function Separate Chaining Open Addressing
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Hashing CS 3358 Data Structures.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
§3 Separate Chaining ---- keep a list of all keys that hash to the same value struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
The Map ADT and Hash Tables. 2 The Map ADT  Map: An abstract data type where a value is "mapped" to a unique key  Need a key and a value to insert new.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
LECTURE 35: COLLISIONS CSC 212 – Data Structures.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Hashing Vishnu Kotrajaras, PhD Nattee Niparnan, PhD.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
1 Designing Hash Tables Sections 5.3, 5.4, 5.5, 5.6.
Fundamental Structures of Computer Science II
CE 221 Data Structures and Algorithms
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
CMSC 341 Hashing.
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Collision Resolution Neil Tang 02/18/2010
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CMSC 341 Hashing 12/2/2018.
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CMSC 341 Hashing.
Collision Resolution Neil Tang 02/21/2008
Hashing Vishnu Kotrajaras, PhD.
Data Structures and Algorithm Analysis Hashing
Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Hashing Vishnu Kotrajaras, PhD

What do we want to do? Insert Delete find (constant time) No sorting No Findmin findmax

Hash table We have key and value. The key is an argument of our hash function. The result of a hash function is an index that we will store our value. Therefore a hash function should: –Be easy to calculate. –Different keys must give you different index. This is difficult to achieve, but it can be done.

Hash function We use it to try to distribute values evenly throughout our table. We may use: –Key number % tableSize But if tableSize is 10, 20, 30, … we cannot use this function. –What if keys are Strings? Let ’ s see some example.

Hash function (1 st example) Sum the ASCII values of all alphabets public static int hash(String key, int tableSize){ int hashVal = 0; for(int i =0; i<key.length(); i++) hashVal += key.charAt(i); return hashVal%tableSize; }

The method in the last page is not good if the table is large: –Whet if each key is short (e.g. 8 alphabets?) –An ASCII normally has a maximum value of 127. Therefore the sum of all 8 alphabets will not exceed 127*8. –If the table is big, data will not be distributed evenly. The 10,000th member Indices will concentrate at the front.

Hash function (2 nd example) Assume we have a big table, and each key is made from at least 3 random alphabets. We look at the first 3 alphabets only. public static int hash(String key, int tableSize){ return (key.charAt(0) +27*key.charAt(1) +729* key.charAT(2))%tableSize; } All alphabets, including space 27*27 This distributes well in a table of size (10007 is the first prime after 10000, we will use this number. You will see why).

Wait, any actual key will never be random like this: –There will be a lot of repetition.

Hash function (3rd example) We calculate a polynomial function of 37, using Horner ’ s Rule. We can calculate k k *37k 2 by using [(k 2 *37)+k 1 ]*37 +k 0 Horner rule is to repeat this -> n times. In fact, it is a calculation of:

public static int hash(String key, int tableSize){ int hashVal = 0; for(int i =0; i<key.length(); i++) hashVal= 37*hashVal+key.charAt(i); hashVal %= tableSize; if(hashVal<0) hashVal += tableSize; return hashVal; } Possible overflow

May not be very well distributed, but it ’ s easy to calculate. But if a key is long, the corresponding calculation will take some time. –We solve it by not using every alphabet. –We may chose alphabets from important parts of the key. In any case a hash function cannot distribute items into a table with 100% uniquely different indices. When 2 or more values fall in the same slot we say it is a collision. How do we fix a collision?

Fixing collision: separate chaining Store repeated elements in a linked list. If you want to search for an element, use hash function, then search in the list given by that hash function. If you want to insert an element, –use hash function to find a list to put that element in. –After that, check the list to see whether it already contains the element. If the list does not have that element then insert the element at the front. –Statistically, a newly inserted element is often accessed again soon after the insertion.

Code for an object that has a hash function. 1. public interface Hashable 2. { 3. /** 4. * Compute a hash function for this object. 5. tableSize the hash table size. 6. (deterministically) a number between 7. * 0 and tableSize-1, distributed equitably. 8. */ 9. int hash( int tableSize ); 10. }

static method from our HashTable class. How we use a Hashable object. Public class Student implements Hashable{ private String name; private double number; private int year; public int hash(int tableSize){ return SeparateChainingHashTable.hash(name, tableSize); } public boolean equals(Object rhs){ return name.equals(((Student)rhs).name); }

1. public class SeparateChainingHashTable 2. { 3. /** 4. * Construct the hash table. 5. */ 6. public SeparateChainingHashTable( ) 7. { 8. this( DEFAULT_TABLE_SIZE ); 9. } 10. /** 11. * Construct the hash table. 12. size approximate table size. 13. */ 14. public SeparateChainingHashTable( int size ) 15. { 16. theLists = new LinkedList[ nextPrime( size ) ]; 17. for( int i = 0; i < theLists.length; i++ ) 18. theLists[ i ] = new LinkedList( ); 19. }

20. /** 21. * Insert into the hash table. If the item is 22. * already present, then do nothing. 23. x the item to insert. 24. */ 25. public void insert( Hashable x ) 26. { 27. LinkedList whichList = theLists[ x.hash( theLists.length ) ]; 28. LinkedListItr itr = whichList.find( x ); 29. if( itr.isPastEnd( ) ) 30. whichList.insert( x, whichList.zeroth( ) ); 31. } 32. /** 33. * Remove from the hash table. 34. x the item to remove. 35. */ 36. public void remove( Hashable x ) 37. { 38. theLists[ x.hash( theLists.length ) ].remove( x ); 39. } We use Student here

40. /** 41. * Find an item in the hash table. 42. x the item to search for. 43. the matching item, or null if not found. 44. */ 45. public Hashable find( Hashable x ) 46. { 47. return (Hashable)theLists[ x.hash( theLists.length ) ].find( x ).retrieve( ); 48. } 49. /** 50. * Make the hash table logically empty. 51. */ 52. public void makeEmpty( ) 53. { 54. for( int i = 0; i < theLists.length; i++ ) 55. theLists[ i ].makeEmpty( ); 56. }

57. /** 58. * A hash routine for String objects. 59. key the String to hash. 60. tableSize the size of the hash table. 61. the hash value. 62. */ 63. public static int hash( String key, int tableSize ) 64. { 65. int hashVal = 0; 66. for( int i = 0; i < key.length( ); i++ ) 67. hashVal = 37 * hashVal + key.charAt( i ); 68. hashVal %= tableSize; 69. if( hashVal < 0 ) 70. hashVal += tableSize; 71. return hashVal; 72. }

73. private static final int DEFAULT_TABLE_SIZE = 101; 74. /** The array of Lists. */ 75. private LinkedList [ ] theLists; 76. /** 77. * Internal method to find a prime number at least as large as n. 78. n the starting number (must be positive). 79. a prime number larger than or equal to n. 80. */ 81. private static int nextPrime( int n ) 82. { 83. if( n % 2 == 0 ) 84. n++; 85. for( ; !isPrime( n ); n += 2 ) 86. ; 87. return n; 88. }

89. /** 90. * Internal method to test if a number is prime. 91. * Not an efficient algorithm. 92. n the number to test. 93. the result of the test. 94. */ 95. private static boolean isPrime( int n ) 96. { 97. if( n == 2 || n == 3 ) 98. return true; 99. if( n == 1 || n % 2 == 0 ) 100. return false; 101. for( int i = 3; i * i <= n; i += 2 ) 102. if( n % i == 0 ) 103. return false; 104. return true; 105. }

106. // Simple main 107. public static void main( String [ ] args ) 108. { 109. SeparateChainingHashTable H = new SeparateChainingHashTable( ); 110. final int NUMS = 4000; 111. final int GAP = 37; 112. System.out.println( "Checking... (no more output means success)" ); 113. for( int i = GAP; i != 0; i = ( i + GAP ) % NUMS ) 114. H.insert( new MyInteger( i ) ); 115. for( int i = 1; i < NUMS; i+= 2 ) 116. H.remove( new MyInteger( i ) ); 117. for( int i = 2; i < NUMS; i+=2 ) 118. if( ((MyInteger)(H.find( new MyInteger( i ) ))).intValue( ) != i ) 119. System.out.println( "Find fails " + i ); 120. for( int i = 1; i < NUMS; i+=2 ) 121. { 122. if( H.find( new MyInteger( i ) ) != null ) 123. System.out.println( "OOPS!!! " + i ); 124. } 125. } 126.}

Definition Load factor It is an average length of linked list. Search time = time to do hashing + time to search list = constant + time to search list Unsuccessful search Search time == average list length == load factor

Successful search –In a list that we will search, there is one node that contains an object that we want to find. There are other nodes too (0 or more). –in a table, if we have N members, distributed into M lists. There are N-1 nodes that do not have what we want. If we distribute these nodes evenly among the lists. Each list will have (N-1)/M nodes. = lambda- (1/M) = lambda, because M is large. On average, half the list will be searched before we find what we want. That is, lambda/2 steps will be executed. Therefore the average time to find the required element is 1 + (lambda/2) steps. The tableSize is not important. What really matters is the load factor.

Fixing collision by using Open addressing No list. If there is a collision, then keep calculating a new index until an empty slot is found. –The new index is at h 0 (x), h 1 (x), … –h i (x)=[hash(x)+f(i)]%tableSize, f(0)=0 Every data must be put into our table. Therefore the table must be large enough to distribute data. –Load factor <=0.5

Open addressing: linear probing F is a linear function of i. Normally we have -> f(i)=i It is “ looking ahead one slot at a time. ” –This may take time. There will be consecutive filled slots, called primary clustering. If a new collision takes place, it will take some time before we can find another empty slot.

Open addressing: quadratic probing There is no primary clustering by this method. We usually have -> f(i)=i 2 h i (x)=[hash(x)+f(i)]%tableSize a if b collides with a, we add 1 2 to find a new empty slot. If c also collides with a, we add 1 2 to find b. We need to go further by adding 2 2 instead.

However, if our table is more than half full or the tableSIze is not prime, this method does not guarantee an empty slot. But if the table is not yet half full and the tableSize is prime, it is proven that we can always find an empty slot for a new value.

Proof Let the tableSize be a prime number greater than 3. Let (h(x)+i 2 ) mod tableSize (h(x)+j 2 ) mod tableSize Prove by contradiction –Assume both positions are the same and i !=j. Be 2 empty slot positions.

i-j =0 is impossible because we assumed they are not equal. i+j=0 is also impossible, Therefore our assumption that the two positions are the same is wrong. Thus the two positions are always different. So there is always a slot for a new value, if the table is not yet half full and the tableSize is prime.

Why prime? If not, the number of available slots will greatly reduce. Example: tableSize == 16. Assume a normal hashing gives index ==0. (quadratic probing) You can see that they fall in the same positions.

We cannot use ordinary deletion. If we remove, then later attempt to find another value, we may encounter an empty space and think that we cannot find the value (in fact the value is in the table, but requires jumping from a collision point) Use lazy deletion -> mark a deleted slot without actually removing its element.

Open addressing implementation class HashEntry { Hashable element; // the element boolean isActive; // false means -> deleted public HashEntry( Hashable e ){ this( e, true ); } public HashEntry( Hashable e, boolean i ){ element = e; isActive = i; }

1. public class QuadraticProbingHashTable{ 2. private static final int DEFAULT_TABLE_SIZE = 11; 3. /** The array of elements. */ 4. private HashEntry [ ] array; // The array of elements 5. private int currentSize; // The number of occupied cells public QuadraticProbingHashTable( ){ 8. this( DEFAULT_TABLE_SIZE ); 9. } 10. /** 11. * Construct the hash table. 12. size the approximate initial size. 13. */ 14. public QuadraticProbingHashTable( int size ){ 15. allocateArray( size ); 16. makeEmpty( ); 17. } nullactive nonactive

18. /** 19. * Internal method to allocate array. 20. arraySize the size of the array. 21. */ 22. private void allocateArray( int arraySize ){ 23. array = new HashEntry[ arraySize ]; 24. } 25. /** 26. * Make the hash table logically empty. 27. */ 28. public void makeEmpty( ){ 29. currentSize = 0; 30. for( int i = 0; i < array.length; i++ ) 31. array[ i ] = null; 32. }

33. /** 34. * Return true if currentPos exists and is active. 35. currentPos the result of a call to findPos. 36. true if currentPos is active. 37. */ 38. private boolean isActive( int currentPos ){ 39. return array[ currentPos ] != null && array[ currentPos ].isActive; 40. }

41. /** 42. * Method that performs quadratic probing resolution. 43. x the item to search for. 44. the position where the search terminates. 45. */ 46. private int findPos( Hashable x ) { 47./* 1*/ int collisionNum = 0; 48./* 2*/ int currentPos = x.hash( array.length ); 49./* 3*/ while( array[ currentPos ] != null && 50. !array[ currentPos ].element.equals( x ) ){ 51./* 4*/ currentPos += 2 * ++collisionNum - 1; // Compute ith probe 52./* 5*/ if( currentPos >= array.length ) // Implement the mod 53./* 6*/ currentPos -= array.length; 54. } 55./* 7*/ return currentPos; 56. } f(i)=i 2 =f(i-1)+2i-1

57. /** 58. * Find an item in the hash table. 59. x the item to search for. 60. the matching item. 61. */ 62. public Hashable find( Hashable x ){ 63. int currentPos = findPos( x ); 64. return isActive( currentPos ) ? array[ currentPos ].element : null; 65. }

66. /** 67. * Insert into the hash table. If the item is 68. * already present, do nothing. 69. x the item to insert. 70. */ 71. public void insert( Hashable x ) 72. { 73. // Insert x as active 74. int currentPos = findPos( x ); 75. if( isActive( currentPos ) ) 76. return; //x is already inside, so do nothing 77. array[ currentPos ] = new HashEntry( x, true ); 78. // Rehash; see Section if( ++currentSize > array.length / 2 ) 80. rehash( ); 81. }

82. /** 83. * Expand the hash table. 84. */ 85. private void rehash( ) 86. { 87. HashEntry [ ] oldArray = array; 88. // Create a new double-sized, empty table 89. allocateArray( nextPrime( 2 * oldArray.length ) ); 90. currentSize = 0; 91. // Copy table over 92. for( int i = 0; i < oldArray.length; i++ ) 93. if( oldArray[ i ] != null && oldArray[ i ].isActive ) 94. insert( oldArray[ i ].element ); 95. return; 96. } recalculate index because this is a new array. O(N) because there are N members to be rehashed. This is not done often because the table has to be half filled first.

rehashing Rehash can be done due to 3 situations. –Do it immediately when the table is half full. –Do it when our insert starts to fail. –Do it when a load factor is up to some value (Does not have to be 0.5) Do not forget that the more the load factor value, the more difficult it is to insert.

hash, nextPrime, isPrime are the same as before. 97. /** 98. * Remove from the hash table. 99. x the item to remove */ 101. public void remove( Hashable x ) 102. { 103. int currentPos = findPos( x ); 104. if( isActive( currentPos ) ) 105. array[ currentPos ].isActive = false; 106. }

Downside of quadratic probing Secondary clustering Fixed by double hashing: –f(i) = i*hash 2 (x) –We find hash 2 (x), 2 *hash 2 (x), … and so on. Must be careful when choosing a function. –If our array has 9 slots and hash 2 (x) = x%9 -> if we insert 99, we will always get 0. –hash 2 (x) must not give 0.

Example of hash 2 Assume hash(x) = x%tableSize hash 2 (x)=R-(x%R), R is prime and R< tableSize Let our tableSize be 16. We insert 9, 25, 26, 41, 42, 58 respectively collides, so we add 13-(25%13)=1 26 collides, so we add 13-(26%13)=13

collides, so we add 13-(41%13)=11 42 collides, so we add 13-(42%13)=10 but 42 still collides, so we add 2*10 from its original index.

collides, so we add 13-(58%13)=7