Hashing1 Maps and hashing. Dr.Alagoz Hashing2 Maps A map models a searchable collection of key- value entries Typically a key is a string with an associated.

Hashing1 Maps and hashing

Dr.Alagoz Hashing2 Maps A map models a searchable collection of key- value entries Typically a key is a string with an associated value (e.g. salary info) The main operations of a map are for searching, inserting, and deleting items Multiple entries with the same key are not allowed Applications: address book student-record database Salary information,etc

Dr.Alagoz Hashing3 The Map ADT Map ADT methods: Get(k): if the map M has an entry with key k, return its associated value; else, return null Put (k, v): insert entry (k, v) into the map M; if key k is not already in M, then return null; else, return old value associated with k Remove (k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null Size(), isEmpty() Keys(): return an iterator of the keys in M Values(): return an iterator of the values in M

Dr.Alagoz Hashing4 Example OperationOutputMap isEmpty() trueØ put(5,A)null(5,A) It returns null because it is not in M put(7,B)null(5,A),(7,B) put(2,C)null(5,A),(7,B),(2,C) put(8,D)null(5,A),(7,B),(2,C),(8,D) put(2,E)C(5,A),(7,B),(2,E),(8,D) get(7)B(5,A),(7,B),(2,E),(8,D) get(4)null(5,A),(7,B),(2,E),(8,D) Because 4 is not in M get(2)E(5,A),(7,B),(2,E),(8,D) size()4(5,A),(7,B),(2,E),(8,D) remove(5)A(7,B),(2,E),(8,D) Remove 5 and return A remove(2)E(7,B),(8,D) get (2)null(7,B),(8,D) Because 4 is not in M isEmpty() false(7,B),(8,D) Because there are 2 in M

Dr.Alagoz Hashing5 Comparison to java.util.Map Map ADT Methodsjava.util.Map Methodssize()isEmpty() get(k)get(k) put(k,v)put(k,v) remove(k)remove(k) All same except: keys()keySet().iterator() values()values().iterator()

Dr.Alagoz Hashing6 Performance of a List-Based Map Performance: put takes O(1) time, since we can insert the new item at the beginning or at the end of the sequence get and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key The unsorted list implementation is effective only for maps of small size or for maps in which puts are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation)

Dr.Alagoz Hashing7

Dr.Alagoz Hashing8 Content Idea: when using various operations on binary search trees, we use hash table ADT Hashing: implementation of hash tables for insert and find operations is called hashing. Collision: when two keys hash to the same value Resolving techniques for collision using linked lists Rehashing: when a hash table is full, then operations will take longer time. Then we need to build a new double sized hash table. (HWLA??? Why double sized..) Extendible hashing: for fitting large data in main memory

Dr.Alagoz Hashing9 Hashing as a Data Structure Performs operations in O(1) Insert Delete Find Is not suitable for FindMin FindMax Sort or output as sorted

Dr.Alagoz Hashing10 General Idea Array of Fixed Size (TableSize) Search is performed on some part of the item (Key) Each key is mapped into some number between 0 and (TableSize-1) Mapping is called a hash function Ensure that two distinct keys get different cells Problem: Since there are a finite # of cells and virtually inexhaustible supply of keys, we need a hash function to distribute the keys evenly among the cells!!!!!!

Dr.Alagoz Hashing11 Hash Functions and Hash Tables A hash function h maps keys of a given type to integers in a fixed interval [0, N  1] Example: h(x)  x mod N is a hash function for integer keys The integer h(x) is called the hash value of key x A hash table for a given key type consists of Hash function h Array (called table) of size N When implementing a map with a hash table, the goal is to store item (k, o) at index i  h(k)

Dr.Alagoz Hashing12 Example We design a hash table for a map storing entries as (SSN, Name), where SSN (social security number) is a nine-digit positive integer Our hash table uses an array of size N  10,000 and the hash function h(x)  last four digits of x     0 1 2 3 4 9997 9998 9999 … 451-229-0004 981-101-0002 200-751-9998 025-612-0001

Dr.Alagoz Hashing13 Hash Functions Easy to compute Key is of type Integer  reasonable strategy is to return Key ModTableSize-1 Key is of type String (mostly used in practice) Hash function needs to be chased carefully! Eg. Adding up the ASCII values of the characters in the string Proper selection of hash function is required

Dr.Alagoz Hashing14 Hash Function (Integer) Simply return Key % TableSize Choose carefully TableSize TableSize is 10, all keys end in zero??? To avoid such pitfalls, choose TableSize a prime number

Dr.Alagoz Hashing15 Hash Function I (String) Adds up ASCII values of characters in the string Advantage: Simple to implement and computes quickly Disadvantage: If TableSize is large (see Fig.5.2 in your book), function does not distribute keys well Example: Keys are at most 8 characters. Maximum sum (8*256 = 2048), but TableSize 10007. Only 25 percent could be filled.

Dr.Alagoz Hashing16 Hash Function II (String) Assumption: Key has at least 3 characters Hash Function: (26 characters for alphabet + blank) key[0] + 27 * key[1] + 27 2 * key[2] Advantage: Distributes better than Hash Function I, and easy to compute. Disadvantage: 26 3 = 17,576 possible combinations of 3 characters However, English has only 2,851 different combinations by a dictionary check. HWLA: Explain why? Read p.157 for the answer. Similar to Hash Function 1, it is not appropriate, if the hash table is reasonably large!

Dr.Alagoz Hashing17 Hash Function III (String) Idea: Computes a polynomial function of Key’s characters P(Key with n+1 characters) = Key[0]+37Key[1]+37 2 Key[2]+...+37 n Key[n] If find 37 n then sum up complexity O(n 2 ) Using Horner’s rule complexity drops to O(n) ((Key[n]*37+Key[n-1])*37+...+Key[1])*37+Key[0] Very simple and reasonably fast method, but there will be complexity problem if the key-characters are very long! The lore is to avoid using all characters to set a key. Eg: the keys could be a complete street address. The hash function might include a couple of characters from the street address, and may be a couple of characters from the city name, or zipcode. Think of other options??? Quiz: Lately 31 is proposed instead of 37.. Why not 19?

Dr.Alagoz Hashing18 public static int hash( String key, int tableSize ) { int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } Hash Function III (String)

Dr.Alagoz Hashing19 Collision Collisions occur when different elements are mapped to the same cell    0 1 2 3 4 451-229-0004981-101-0004 025-612-0001

Dr.Alagoz Hashing20 Collision When an element is inserted, it hashes to the same value as an already inserted element we have collision. (e.g. 564 and 824 will collide at 4) Example: Hash Function (Key % 10)

Dr.Alagoz Hashing21 Solving Collision Separate Chaining: keep a list of all elements hashing to the same value, and traverse the list to find corresponding hash. Hint: lists should be large and kept as prime number table size to ensure a good distribution. (Limited use due to space limitations of lists, and needs linked lists!!!) Open Addressing: at a collision, search for alternative cells until finding an empty cell.  Linear Probing  Quadratic Probing  Double Hashing

Dr.Alagoz Hashing22 Solving Collision Define: The load factor: Lamda= #of elements/Tablesize HWLA: Search Internet for other techniques (is there any algorithm without using a linked list)  A) Binary search tree  B) Using another hash table  Explain why we donot use A and B?  Solution: If the table size is large and a proper hash function is used, then the list should be short. Therefore, it is not worth to find anything more complicated!!!

Dr.Alagoz Hashing23 Separate Hashing Separate Chaining: let each cell in the table point to a linked list of entries that map there Keep a list of all elements that hash to the same value Each element of the hash table is a Link List Separate chaining is simple, but requires additional memory outside the table Insert keys [ 9, 8, 7, 6, 5,4,3,2,1,0] into the hast table using h(x)= x 2 % TableSize

Dr.Alagoz Hashing24 Load Factor (Lambda): Lambda: the number of elements in hash table divided by the Tablesize. ( Eg. Lambda = 1 for the table.) To perform a search is the constant time required evaluate the hash function + the time to traverse the list  Unsuccessful search: (Lambda) nodes to be examined, on average.  Successful search: (1+Lambda/2) links to be traversed.  Note: Average number of other nodes in Tablesize of N with M lists: (N-1)/M=N/M-1/M= Lambda-1/M=Lambda for large M. i.e., Tablesize is not important, but load factor is.. So, in separate chaining, Lambda should be kept nearer to 1.  i.e. Make the has table as large as the number of elements expected (for possible collision).  Remember also that the tablesize should be prime for ensuring a good distribution…....

Dr.Alagoz Hashing25 Separate Hashing /** * Construct the hash table. */ public SeparateChainingHashTable( ){ this( DEFAULT_TABLE_SIZE ); } /** * Construct the hash table. * @param size approximate table size. */ public SeparateChainingHashTable( int size ){ theLists = new LinkedList[ nextPrime( size ) ]; for( int i = 0; i < theLists.length; i++ ) theLists[ i ] = new LinkedList( ); }

Dr.Alagoz Hashing26 Separate Hashing Find Use hash function to determine which list to traverse Traverse the list to find the element public Hashable find( Hashable x ){ return (Hashable)theLists[ x.hash( theLists.length ) ].find(x).retrieve( ); }

Dr.Alagoz Hashing27 Separate Hashing Insert Use hash function to determine in which list to insert Insert element in the header of the list public void insert( Hashable x ){ LinkedList whichList = theLists[x.hash(theLists.length) ]; LinkedListItr itr = whichList.find( x ); if( itr.isPastEnd( ) ) whichList.insert( x, whichList.zeroth( ) ); }

Dr.Alagoz Hashing28 Separate Hashing Delete Use hash function to determine from which list to delete Search element in the list and delete public void remove( Hashable x ){ theLists[ x.hash( theLists.length ) ].remove( x ); }

Dr.Alagoz Hashing29 Separate Hashing Advantages Solves the collision problem totally Elements can be inserted anywhere Disadvantages Need the use of link lists.. And, all lists must be short to get O(1) time complexity otherwise it take too long time to compute…

Dr.Alagoz Hashing30 Separate Hashing needs extra space!!! Alternatives to Using Link Lists Binary Trees Hash Tables However, If the Tablesize is large and a good hash function is used, all the lists expected to be short already, i.e., no need to complicate!!! Instead of the above alternative techniques, we use Open Addressing>>>>>

Dr.Alagoz Hashing31 Open Addressing Solving collisions without using any other data structure such as link list this is a major problem especially for other languages!!! Idea: If collision occurs, alternative cells are tried until an empty cell is found =>>> Cells h 0 (x), h 1 (x),..., are tried in succession h i (x)=(hash(x) + f(i)) % TableSize The function f is the collision resolution strategy with f(0)=0. Since all data go inside the table, Open addressing technique requires the use of bigger table as compared to separate chaining. Lambda should be less than 0.5. (it was 1 for separate hashing)

Dr.Alagoz Hashing32 Open Addressing Depending on the collision resolution strategy, f, we have Linear Probing: f(i) = i Quadratic Probing: f(i) = i 2 Double Hashing: f(i) = i hash 2 (x)

Dr.Alagoz Hashing33 Linear Probing Advantages: Easy to compute Disadvantages: Table must be big enough to get a free cell Time to get a free cell may be quite large Primary Clustering  Any key that hashes into the cluster will require several attempts to resolve the collision f(i) = i is the amount to trying cells sequentially in search of an empty cell.

Dr.Alagoz Hashing34 Example: Linear Probing Open addressing: the colliding item is placed in a different cell of the table Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell Each table cell inspected is referred to as a “probe” Colliding items lump together, causing future collisions to cause a longer sequence of probes Example: h(x)  x mod 13 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 0123456789101112 41 18445932223173 0123456789101112

Dr.Alagoz Hashing35 Example in the book: Linear Probing Insert keys [ 89, 18, 49, 58, 69] into a hast table using h i (x)=(hash(x) + i) % TableSize

Dr.Alagoz Hashing36 First collision occurs when 49 is inserted. (then, put in the next available cell, i.e. cell 0) 58 collides with 18, 89, and then 49 before an empty cell is found three away The collision 69 is handled as above. Note: insertions and unsuccessful searches require the same number of probes. Primary clustering? If table is big enough, a free cell will always be found (even if it takes long time!!!) If the table is relatively empty (lowerLambda), yet key may require several attempts to resolve collision. Then, blocks of occupied cells start forming, i.e., need to add to cluster …..

Dr.Alagoz Hashing37 Quadratic Probing Eliminates Primary Clustering problem Theorem: If quadratic probing is used, and the table size is prime, then a new element can always be inserted if the table is at least half empty Secondary Clustering Elements that hash to the same position will probe the same alternative cells

Dr.Alagoz Hashing38 Quadratic Probing Insert keys [ 89, 18, 49, 58, 69] into a hast table using h i (x)=(hash(x) + i 2 ) % TableSize

Dr.Alagoz Hashing39 Quadratic Probing /** * Construct the hash table. */ public QuadraticProbingHashTable( ) { this( DEFAULT_TABLE_SIZE ); } /** * Construct the hash table. * @param size the approximate initial size. */ public QuadraticProbingHashTable( int size ) { allocateArray( size ); makeEmpty( ); }

Dr.Alagoz Hashing40 Quadratic Probing /** * Method that performs quadratic probing resolution. * @param x the item to search for. * @return the position where the search terminates. */ private int findPos( Hashable x ) { /* 1*/ int collisionNum = 0; /* 2*/ int currentPos = x.hash( array.length ); /* 3*/ while( array[ currentPos ] != null && !array[ currentPos ].element.equals( x ) ) { /* 4*/ currentPos += 2 * ++collisionNum - 1; /* 5*/ if( currentPos >= array.length ) /* 6*/ currentPos -= array.length; } /* 7*/ return currentPos; }

Dr.Alagoz Hashing41 Double Hashing Double hashing uses a secondary hash function d(k) and handles collisions by placing an item in the first available cell of the series (i  jd(k)) mod N for j  0, 1, …, N  1 The secondary hash function d ( k ) cannot have zero values The table size N must be a prime to allow probing of all the cells Common choice of compression function for the secondary hash function: d 2 ( k )  q  k mod q where q  N q is a prime The possible values for d 2 ( k ) are 1, 2, …, q

Dr.Alagoz Hashing42 Double Hashing Popular choice: f(i)=i. hash 2 (x) i.e., apply a second hash function to x and probe at a distance hash 2 (x), 2hash 2 (x), 3 hash 2 (x) … hash 2 (x) = R – (x % R) Poor choice of hash 2 (x) could be disastrous Observe: hash 2 (x) =xmod9 would not work if 99 were inserted into the input in the previous example R should also be a prime number smaller than TableSize If double hashing is correctly implemented, simulations imply that the expected number of probes is almost the same as for a random collision resolution strategy

Dr.Alagoz Hashing43 Consider a hash table storing integer keys that handles collision with double hashing N  13 h(k)  k mod 13 d(k)  7  k mod 7 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order Example of Double Hashing 0123456789101112 31 41 183259732244 0123456789101112

Dr.Alagoz Hashing44 Example in the book Double Hashing H i (x) = (x + i (R – (x mod R))) % N, R = 7, N=10 The first collision occurs when 49 is inserted hash 2 (49)=7-0=7 thus, 49 is inserted in position 6.hash2(58)=7-2=5, so 58 is inserted at location 3. Finally, 69 collides and is inserted at a distance hash2(69)=7-6=1 away. Observe a bad scenario: if we had an input 60, then what happens? First, 60 collides with 69 in the position 0. Since hash 2 (60)=7-4=3, we would then try positions 3, 6, 9, and then 2 until an empty cell is found.

Dr.Alagoz Hashing45 Rehashing If Hash Table gets too full, running time for the operations will start taking too long time Insertions might fail for open addressing with quadratic probing Solution: Rehashing build another table that is about twice as big…. Rehashing is used especially when too many removals intermixed with insertions

Dr.Alagoz Hashing46 Rehashing Build another table that is about twice as big E.g. if N=11, then N’=23 Associate a new hash function Scan down the entire original hash table Compute the new hash value for each nondeleted element. Insert it in the new table

Dr.Alagoz Hashing47 Rehashing Very expensive operation; O(N) Good news is that rehashing occurs very infrequently If data structure is part of the program, effect is not noticeable. If hashing is performed as part of an interactive system, then the unfortunate user whose insertion caused a rehash could observe a slowdown

Dr.Alagoz Hashing48 Rehashing When to apply rehashing? Strategy1: As soon as the table is half full Strategy2: Only when an insertion fails Strategy3: When the table reaches a certain load factor No certain rule for the best strategy! Since load factor is directly related to the performance of the system, 3 rd strategy may work better.. Then what is the threshold???

Dr.Alagoz Hashing49 Example: Rehashing Suppose 13,15,23, 24, and 6 are inserted into a hash table of size 7. Assume h(x)= x mod 7, using linear probing we get the table on the left. But table is 5/7 full, rehashing is required…. The new tablesize is 17. 17 is the first prime number greater than 2*7 The new h(x)=x mod17. Scanned the old table, insert the elements 6,13,15,23,24 (as shown in table on the right).

Dr.Alagoz Hashing50 Rehashing private void allocateArray( int arraySize ) { array = new HashEntry[ arraySize ]; } private void rehash( ) { HashEntry [ ] oldArray = array; // Create a new double-sized, empty table allocateArray( nextPrime( 2 * oldArray.length ) ); currentSize = 0; // Copy table over for( int i = 0; i < oldArray.length; i++ ) if( oldArray[ i ] != null && oldArray[ i ].isActive) insert( oldArray[ i ].element ); return; }

Dr.Alagoz Hashing51 Extendible Hashing (Why?) Amount of data is too large to fit in main memory Main consideration is the number of disk accesses required to retrieve data Assume: we have N records to store, and at most M records fit in one disk block. (assume M=4)

Dr.Alagoz Hashing52 Extendible Hashing (Why?) Open addressing or separate chaining is used, collisions could cause several disk blocks to be examined during a find, even for a well-distributed hash table. When the table gets too full, rehashing requires O(N) disk accesses (very expensive!!!) Instead, we use extendible hashing for a find with two disk access requirements only. Similarly, insertions may require only few accesses.

Dr.Alagoz Hashing53 Extendible Hashing Use of idea in B-Trees with a depth O(log M/2 N). Choose M too large so that B-Tree has a depth of 1 Now, a find needs one disk access, assuming that the root node could be stored in main memory. However?? We have a problem here!!! Problem: Branching factor is too high, requires to much time to determine which leaf the data was in This strategy works in practice only if the time to perform this step is reduced.. This is what we exactly do with extendible hashing strategy..

Dr.Alagoz Hashing54 Example: Extendible Hashing Assume our source data consists of several 6 bit integers. The root of the “tree” contains four links determined by the leading two bits of the data. Each leaf has at most M=4 elements based on the earlier assumption. (D=2) denotes the number of bits used by the root. D is known as the directory. (2^D) will be the number of entries in directory D. dL is the number of leading bits that all the elements of some leaf L have in common. dL<=D

Dr.Alagoz Hashing55 Extendible Hashing Suppose we want to insert 100100. since leading two bits is 10, this would go to 3 rd leaf. But the 3 rd leaf is already full (due to M=4)!! Thus, we split this leaf into tow leaves which are now determined with three bits.. Need to increase the Directory size!! Note, although an entire directory is rewritten, none of the other leaves (1,2,4) is actually accessed..

Dr.Alagoz Hashing56 Extendible Hashing Suppose we want to insert 000000. since leading two bits is 00, this will split 1 st leaf as shown below.. Only change in directory is updating 000 and 001. Therefore, this is a very good and fast strategy for insert and find operations on large databases. However, READ: page 175-176 for scenarios when this algorithm do not work, how to avoid possible problems!!!.

Dr.Alagoz Hashing57 HWLAs Problem 2 in the book: When rehashing, we choose a table size that is roughly twice as large and prime. In our case, the appropriate new table size is 19, with hash function h (x ) = x (mod 19). (a) Scanning down the separate chaining hash table, the new locations are 4371 in list 1, 1323 in list 12, 6173 in list 17, 4344 in list 12, 4199 in list 0, 9679 in list 8, and 1989 in list 13. (b) The new locations are 9679 in bucket 8, 4371 in bucket 1, 1989 in bucket 13, 1323 in bucket 12, 6173 in bucket 17, 4344 in bucket 14 because both 12 and 13 are already occupied, and 4199 in bucket 0. (c) The new locations are 9679 in bucket 8, 4371 in bucket 1, 1989 in bucket 13, 1323 in bucket 12, 6173 in bucket 17, 4344 in bucket 16 because both 12 and 13 are already occupied, and 4199 in bucket 0. (d) The new locations are 9679 in bucket 8, 4371 in bucket 1, 1989 in bucket 13, 1323 in bucket 12, 6173 in bucket 17, 4344 in bucket 15 because 12 is already occupied, and 4199 in bucket 0. Problems in CHP5: 1, 4, 5, 11, 16 Improved Merkle Cryptosystem

Dr.Alagoz Hashing58 Java Example: hash table with linear probing (*) /** A hash table with linear probing and the MAD hash function */ public class HashTable implements Map { protected static class HashEntry implements Entry { Object key, value; HashEntry () { /* default constructor */ } HashEntry(Object k, Object v) { key = k; value = v; } public Object key() { return key; } public Object value() { return value; } protected Object setValue(Object v) { // set a new value, returning old Object temp = value; value = v; return temp; // return old value } /** Nested class for a default equality tester */ protected static class DefaultEqualityTester implements EqualityTester { DefaultEqualityTester() { /* default constructor */ } /** Returns whether the two objects are equal. */ public boolean isEqualTo(Object a, Object b) { return a.equals(b); } } protected static Entry AVAILABLE = new HashEntry(null, null); // empty marker protected int n = 0; // number of entries in the dictionary protected int N; // capacity of the bucket array protected Entry[] A;// bucket array protected EqualityTester T;// the equality tester protected int scale, shift; // the shift and scaling factors /** Creates a hash table with initial capacity 1023. */ public HashTable() { N = 1023; // default capacity A = new Entry[N]; T = new DefaultEqualityTester(); // use the default equality tester java.util.Random rand = new java.util.Random(); scale = rand.nextInt(N-1) + 1; shift = rand.nextInt(N); } /** Creates a hash table with the given capacity and equality tester. */ public HashTable(int bN, EqualityTester tester) { N = bN; A = new Entry[N]; T = tester; java.util.Random rand = new java.util.Random(); scale = rand.nextInt(N-1) + 1; shift = rand.nextInt(N); }

Dr.Alagoz Hashing59 Java Example (cont. *) /** Determines whether a key is valid. */ protected void checkKey(Object k) { if (k == null) throw new InvalidKeyException("Invalid key: null."); } /** Hash function applying MAD method to default hash code. */ public int hashValue(Object key) { return Math.abs(key.hashCode()*scale + shift) % N; } /** Returns the number of entries in the hash table. */ public int size() { return n; } /** Returns whether or not the table is empty. */ public boolean isEmpty() { return (n == 0); } /** Helper search method - returns index of found key or -index-1, * where index is the index of an empty or available slot. */ protected int findEntry(Object key) throws InvalidKeyException { int avail = 0; checkKey(key); int i = hashValue(key); int j = i; do { if (A[i] == null) return -i - 1; // entry is not found if (A[i] == AVAILABLE) {// bucket is deactivated avail = i;// remember that this slot is available i = (i + 1) % N;// keep looking } else if (T.isEqualTo(key,A[i].key())) // we have found our entry return i; else // this slot is occupied--we must keep looking i = (i + 1) % N; } while (i != j); return -avail - 1; // entry is not found } /** Returns the value associated with a key. */ public Object get (Object key) throws InvalidKeyException { int i = findEntry(key); // helper method for finding a key if (i < 0) return null; // there is no value for this key return A[i].value(); // return the found value in this case } /** Put a key-value pair in the map, replacing previous one if it exists. */ public Object put (Object key, Object value) throws InvalidKeyException { if (n >= N/2) rehash(); // rehash to keep the load factor <= 0.5 int i = findEntry(key); //find the appropriate spot for this entry if (i < 0) { // this key does not already have a value A[-i-1] = new HashEntry(key, value); // convert to the proper index n++; return null; // there was no previous value } else // this key has a previous value return ((HashEntry) A[i]).setValue(value); // set new value & return old } /** Doubles the size of the hash table and rehashes all the entries. */ protected void rehash() { N = 2*N; Entry[] B = A; A = new Entry[N]; // allocate a new version of A twice as big as before java.util.Random rand = new java.util.Random(); scale = rand.nextInt(N-1) + 1; // new hash scaling factor shift = rand.nextInt(N); // new hash shifting factor for (int i=0; i&ltB.length; i++) if ((B[i] != null) && (B[i] != AVAILABLE)) { // if we have a valid entry int j = findEntry(B[i].key()); // find the appropriate spot A[-j-1] = B[i]; // copy into the new array } /** Removes the key-value pair with a specified key. */ public Object remove (Object key) throws InvalidKeyException { int i = findEntry(key); // find this key first if (i < 0) return null; // nothing to remove Object toReturn = A[i].value(); A[i] = AVAILABLE; // mark this slot as deactivated n--; return toReturn; } /** Returns an iterator of keys. */ public java.util.Iterator keys() { List keys = new NodeList(); for (int i=0; i&ltN; i++) if ((A[i] != null) && (A[i] != AVAILABLE)) keys.insertLast(A[i].key()); return keys.elements(); } } //... values() is similar to keys() and is omitted here...

Dr.Alagoz Hashing60 Hash Functions (*) A hash function is usually specified as the composition of two functions: Hash code: h 1 : keys  integers Compression function: h 2 : integers  [0, N  1] The hash code is applied first, and the compression function is applied next on the result, i.e., h(x) = h 2 (h 1 (x)) The goal of the hash function is to “disperse” the keys in an apparently random way

Dr.Alagoz Hashing61 Performance of Hashing(*) In the worst case, searches, insertions and removals on a hash table take O(n) time The worst case occurs when all the keys inserted into the map collide The load factor   n  N affects the performance of a hash table Assuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is 1  (1   ) The expected running time of all the dictionary ADT operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100% Applications of hash tables: small databases compilers browser caches

Dr.Alagoz Hashing62 Hash Codes (*) Memory address: We reinterpret the memory address of the key object as an integer (default hash code of all Java objects) Good in general, except for numeric and string keys Integer cast: We reinterpret the bits of the key as an integer Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., byte, short, int and float in Java) Component sum: We partition the bits of the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows) Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double in Java)

Dr.Alagoz Hashing63 Hash Codes (* cont.) Polynomial accumulation: We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) a 0 a 1 … a n  1 We evaluate the polynomial p(z)  a 0  a 1 z  a 2 z 2  … …  a n  1 z n  1 at a fixed value z, ignoring overflows Especially suitable for strings (e.g., the choice z  33 gives at most 6 collisions on a set of 50,000 English words) Polynomial p(z) can be evaluated in O(n) time using Horner’s rule: The following polynomials are successively computed, each from the previous one in O(1) time p 0 (z)  a n  1 p i (z)  a n  i  1  zp i  1 (z) (i  1, 2, …, n  1) We have p(z)  p n  1 (z)

Dr.Alagoz Hashing64 Compression Functions (*) Division: h 2 (y)  y mod N The size N of the hash table is usually chosen to be a prime The reason has to do with number theory and is beyond the scope of this course Multiply, Add and Divide (MAD): h 2 (y)  (ay  b) mod N a and b are nonnegative integers such that a mod N  0 Otherwise, every integer would map to the same value b

Dr.Alagoz Hashing65 Map Methods with Separate Chaining used for Collisions (*) Delegate operations to a list-based map at each cell: Algorithm get(k): Output: The value associated with the key k in the map, or null if there is no entry with key equal to k in the map return A[h(k)].get(k) {delegate the get to the list-based map at A[h(k)]} Algorithm put(k,v): Output: If there is an existing entry in our map with key equal to k, then we return its value (replacing it with v); otherwise, we return null t = A[h(k)].put(k,v) {delegate the put to the list-based map at A[h(k)]} if t = null then {k is a new key } n = n + 1 return t Algorithm remove(k): Output: The (removed) value associated with key k in the map, or null if there is no entry with key equal to k in the map t = A[h(k)].remove(k) {delegate the remove to the list-based map at A[h(k)]} if t ≠ null then {k was found} n = n - 1 return t

Dr.Alagoz Hashing66 Search with Linear Probing (*) Consider a hash table A that uses linear probing get (k) We start at cell h(k) We probe consecutive locations until one of the following occurs  An item with key k is found, or  An empty cell is found, or  N cells have been unsuccessfully probed Algorithm get(k) i  h(k) p  0 repeat c  A[i] if c   return null else if c.key ()  k return c.element() else i  (i  1) mod N p  p  1 until p  N return null

Dr.Alagoz Hashing67 Updates with Linear Probing(*) To handle insertions and deletions, we introduce a special object, called AVAILABLE, which replaces deleted elements remove (k) We search for an entry with key k If such an entry (k, o) is found, we replace it with the special item AVAILABLE and we return element o Else, we return null put (k, o) We throw an exception if the table is full We start at cell h(k) We probe consecutive cells until one of the following occurs  A cell i is found that is either empty or stores AVAILABLE, or  N cells have been unsuccessfully probed We store entry (k, o) in cell i

Dr.Alagoz Hashing68 A Simple List-Based Map (*) We can efficiently implement a map using an unsorted list We store the items of the map in a list S (based on a doubly-linked list), in arbitrary order trailer header nodes/positions entries 9 c 6 c 5 c 8 c

Dr.Alagoz Hashing69 The get(k) Algorithm Algorithm get(k): B = S.positions() {B is an iterator of the positions in S} while B.hasNext() do p = B.next()if the next position in Bg if p.element().key() = kthen return p.element().value() return null {there is no entry with key equal to k}

Dr.Alagoz Hashing70 The put(k,v) Algorithm Algorithm put(k,v): B= S.positions() while B.hasNext() do p = B.next() if p.element().key() = k then t = p.element().value() B.replace(p,(k,v)) return t{return the old value} S.insertLast((k,v)) n = n + 1 {increment variable storing number of entries} return null{there was no previous entry with key equal to k}

Dr.Alagoz Hashing71 The remove(k) Algorithm Algorithm remove(k): B =S.positions() while B.hasNext() do p = B.next() if p.element().key() = k then t = p.element().value() S.remove(p) n = n – 1 {decrement number of entries} return t{return the removed value} return null{there is no entry with key equal to k}

Hashing1 Maps and hashing. Dr.Alagoz Hashing2 Maps A map models a searchable collection of key- value entries Typically a key is a string with an associated.

Similar presentations

Presentation on theme: "Hashing1 Maps and hashing. Dr.Alagoz Hashing2 Maps A map models a searchable collection of key- value entries Typically a key is a string with an associated."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hashing1 Maps and hashing. Dr.Alagoz Hashing2 Maps A map models a searchable collection of key- value entries Typically a key is a string with an associated.

Similar presentations

Presentation on theme: "Hashing1 Maps and hashing. Dr.Alagoz Hashing2 Maps A map models a searchable collection of key- value entries Typically a key is a string with an associated."— Presentation transcript:

Similar presentations

About project

Feedback