Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.

Similar presentations


Presentation on theme: "Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function."— Presentation transcript:

1 Hashing Chapters 19-20

2 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function receives the search key Returns the index of an element in an array called the hash table The index is known as the hash index A perfect hash function maps each search key into a different integer suitable as an index to the hash table

3 3 What is Hashing? A hash function indexes its hash table.

4 4 What is Hashing? Two steps of the hash function Convert the search key into an integer called the hash code Compress the hash code into the range of indices for the hash table Typical hash functions are not perfect They can allow more than one search key to map into a single index This is known as a collision

5 5 What is Hashing? A collision caused by the hash function h (for a table of size 101) h(555–1163)

6 6 Hash Functions General characteristics of a good hash function Minimize collisions Distribute entries uniformly throughout the hash table Be fast to compute

7 7 Computing Hash Codes We will override the hashCode method of Object Guidelines If a class overrides the method equals, it should override hashCode If the method equals considers two objects equal, hashCode must return the same value for both objects If an object invokes hashCode more than once during execution of program on the same data, it must return the same hash code

8 8 Computing Hash Codes Hash code for a primitive type Use the primitive typed key itself Can cast types byte, short, or char to int Manipulate internal binary representations (e.g. use folding) e.g. for long, casting would lose 1 st 32 bits but, could divide into two 32-bit halves (by shifting), then add or XOR (^)the results e.g. int hashCode = (int)(key^(key>>32)) For a search key of type double: long bits = Double.doubleToLongBits(key); int hashCode = (int)(bits^(bits>>32))

9 9 Computing Hash Codes The hash code for a string, s int hash = 0; int n = s.length(); for (int i = 0; i < n; i++) hash = g * hash + s.charAt(i); // g is a positive constant For a string s with n characters having Unicode value u i for the ith character (e.g., u 0 u 1 u 2 … u n-1 ) and positive constant g (e.g., 31 in Java’s String class), the hash code could be: u 0 g n-1 + u 1 g n-2 + … + u n-2 g + u n-1 or (…((u 0 g+u 1 )g+u 2 )g+…+u n-2 )g+u n-1 (Horner’s method) Note: hash could be negative due to overflow e.g., public int hashCode() in Java class String

10 10

11 11

12 12 Compressing a Hash Code Must compress the hash code “c” so it fits into the index range Typical method is to compute c modulo n n is a prime number (the size of the table) Index will then be between 0 and n – 1 private int getHashIndex(Object key) { int hashIndex = key.hashCode() % hashTable.length; if (hashIndex < 0) hashIndex = hashIndex + hashTable.length; return hashIndex; } // end getHashIndex Note:if c is non-negative, 0 <= c%n <= n-1 if c is negative, -(n-1) <= c%n <= -1 (if c is negative, add n to c%n so 1 <= result <= n-1 )

13 13 Resolving Collisions Options when hash functions returns location already used in the table Use another location in the table (“open addressing”) Change the structure of the hash table so that each array location can represent multiple values

14 14 Open Addressing with Linear Probing Open addressing scheme locates alternate location New location must be open, available Linear probing If collision occurs at hashTable[k], look successively at location k + 1, k + 2, …

15 15 Open Addressing with Linear Probing The effect of linear probing after adding four entries whose search keys hash to the same index. h(555–1163)

16 16 Open Addressing with Linear Probing A revision of the hash table in the previous figure when linear probing resolves collisions; each entry contains a search key and its associated value h(555–1163) 555–1163

17 17 Removals A hash table if remove used null to remove entries. 555–1163 h(555–1163 )

18 18 Removals We need to distinguish among three kinds of locations in the hash table 1.Occupied The location references an entry in the dictionary 2.Empty The location contains null and always did 3.Available The location's entry was removed from the dictionary

19 19 Open Addressing with Linear Probing A linear probe sequence (a) after adding an entry; (b) after removing two entries;

20 20 Open Addressing with Linear Probing A linear probe sequence (c) after a search; (d) during the search while adding an entry; (e) after an addition to a formerly occupied location.

21 21 Searches that Dictionary Operations Require To retrieve an entry Search the probe sequence for the key Examine entries that are present, ignore locations in available state Stop search when key is found or null reached To remove an entry Search the probe sequence same as for retrieval If key is found, mark location as available To add an entry Search probe sequence same as for retrieval Note first available slot Use available slot if the key is not found

22 22 Open Addressing, Quadratic Probing Change the probe sequence Given search key k Probe to k + 1, k + 2 2, k + 3 2, … k + n 2 Can reach any location in the hash table if table size is a prime number and if hash table is at most half full For avoiding primary clustering But can lead to secondary clustering

23 23 Open Addressing, Quadratic Probing A probe sequence of length 5 using quadratic probing. Note: for hash index k and table size n, we can improve efficiency by using the recurrence relation k i+1 = (k i + 2i + 1) modulo n for i>=0 and k 0 =k.

24 24 Open Addressing with Double Hashing Resolves collision by examining locations At original hash index Plus an increment determined by 2 nd function Second hash function Different from first Depends on search key Returns nonzero value Reaches every location in hash table if table size is prime Avoids both primary and secondary clustering

25 25 Open Addressing with Double Hashing The first three locations in a probe sequence generated by double hashing for the search key. Note: sum of the two hash functions must be computed modulo table size. e.g. h 1 (key) = key modulo 7 and h 2 (key) = 5 – key modulo 5

26 26 Separate Chaining Alter the structure of the hash table Each location can represent multiple values Each location called a bucket Bucket can be a(n) List Sorted list Chain of linked nodes Array Vector

27 27 Separate Chaining A hash table for use with separate chaining; each bucket is a chain of linked nodes.

28 28 Separate Chaining Where new entry is inserted into linked bucket when integer search keys are (a) duplicate and unsorted;

29 29 Separate Chaining Where new entry is inserted into linked bucket when integer search keys are (b) distinct and unsorted;

30 30 Separate Chaining Where new entry is inserted into linked bucket when integer search keys are (c) distinct and sorted

31 31 //Algorithm add(key, value) index = getHashIndex(key) if (hashTable[index] = = null) { hashTable[index] = new Node(key, value) currentSize++ } else { Search chain that begins at hashTable[index] for a node that contains key if (key is found) {// assume currentNode references the node that contains key oldValue = currentNode.getValue() currentNode.setValue(value) return oldValue } else {// add new node to end of chain // assume nodeBefore references the last node newNode = new Node(key, value) nodeBefore.setNextNode(newNode) currentSize++ } Pseudo-code for Chaining Algorithms (for distinct search keys and unsorted chains)

32 32 //Algorithm remove(key) index = getHashIndex(key) Search chain that begins at hashTable[index] for node that contains key if (key is found) { Remove node that contains key from chain currentSize-- return value in removed node } else return null //Algorithm getValue(key) index = getHashIndex(key) Search chain that begins at hashTable[index] for node that contains key if (key is found) return value in found node else return null Pseudo-code for Chaining Algorithms (continued)

33 33 Efficiency Observations Successful retrieval or removal Same efficiency as successful search Unsuccessful retrieval or removal Same efficiency as unsuccessful search Successful addition Same efficiency as unsuccessful search Unsuccessful addition Same efficiency as successful search

34 34 Load Factor Perfect hash function not always possible or practical Thus, collisions likely to occur As hash table fills Collisions occur more often Measure for table fullness, the load factor* Note: max value for load factor depends on type of collision resolution used; for separate chaining, there is no maximum value… *for open addressing

35 35 Cost of Open Addressing The average number of comparisons required by a search of the hash table for given values of the load factor when using linear probing. ½[1 + 1/(1-λ) 2 ] for an unsuccessful search ½[1 + 1/(1-λ)] for a successful search (Linear Probing)

36 36 Cost of Open Addressing The average number of comparisons required by a search of the hash table for given values of the load factor when using either quadratic probing or double hashing. Note: for quadratic probing or double hashing, should have < 0.5 1/(1-λ) for an unsuccessful search (1/λ)log[1/(1-λ)] for a successful search (Quadratic Probing or Double Hashing)

37 37 Cost of Separate Chaining Average number of comparisons required by search of hash table for given values of load factor when using separate chaining. Note that the load factor here is the # of dictionary entries / # of chains (i.e., the load factor is the average # of dictionary entries per chain). Note: Reasonable efficiency requires only < 1

38 38 Rehashing When load factor becomes too large Expand the hash table Double present size, increase result to next prime number Place current entries into new hash table in locations (i.e., recompute the index for each entry for the new table size)

39 39 Comparing Schemes for Collision Resolution Average number of comparisons required by search of hash table versus for 4 techniques when search is (a) successful; (b) unsuccessful.

40 40 A Dictionary Implementation That Uses Hashing A hash table and one of its entry objects

41 41 Beginning of private class TableEntry Made internal to dictionary class A Dictionary Implementation That Uses Hashing private class TableEntry implements java.io.Serializable {private Object entryKey; private Object entryValue; private boolean inTable; // true if entry is in hash table private TableEntry(Object key, Object value) {entryKey = key; entryValue = value; inTable = true; } // end constructor...

42 42 A Dictionary Implementation That Uses Hashing A hash table containing dictionary entries, removed entries, and null values.

43 43 Java Class Library: The Class HashMap Assumes search-key objects belong to a class that overrides methods hashCode and equals Hash table is collection of buckets Constructors public HashMap() public HashMap (int initialSize) public HashMap (int initialSize, float maxLoadFactor) public HashMap (Map table )


Download ppt "Hashing Chapters 19-20. 2 What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function."

Similar presentations


Ads by Google