Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 373, Copyright S. Tanimoto, 2002 Hashing -

Similar presentations


Presentation on theme: "CSE 373, Copyright S. Tanimoto, 2002 Hashing -"— Presentation transcript:

1 CSE 373, Copyright S. Tanimoto, 2002 Hashing -

2 CSE 373, Copyright S. Tanimoto, 2002 Hashing -
Motivation Many applications need to store "associations." Rapid retrieval is sometimes more important than storage efficiency. Hashing is flexible family of techniques for implementing associations between keys and values. CSE 373, Copyright S. Tanimoto, Hashing -

3 Mathematical Description
Suppose we have a function mapping keys to values. Its domain is a set KEYS For example, KEYS = {0, 1, ..., m-1}, or KEYS = the set of all possible ASCII strings. Its range is a set of possible values. For example, the values may themselves be ASCII strings. How can we represent such a function? CSE 373, Copyright S. Tanimoto, Hashing -

4 A Dictionary Abstract Data Type
A function expressed as a finite set of (key,value) pairs. A: KEYS  VALUES { (key1, value1), ..., (keyn, valuen)} Methods: PUT: DICTIONARIES  KEYS  VALUES  DICTIONARIES GET: DICTIONARIES  KEYS  VALUES GETALLKEYS: DICTIONARIES  2KEYS For any set S, the set of all possible subsets of S is written 2S, and is called the power set of S. CSE 373, Copyright S. Tanimoto, Hashing -

5 One Implementation of a Dictionary: An Association List
( (key1, value1), ..., (keyn, valuen) ) (key1,value1) (key2,value2) (keyn,valuen) Worst case time for GET is (n) cell examinations. Expected case time for a successful GET is n/2 cell examinations, also (n). CSE 373, Copyright S. Tanimoto, Hashing -

6 Hashing: Practical Implementations of the Dictionary ADT.
A hash table is a 1-dimensional array in which each cell stores zero or more associations of a dictionary. The array index for (keyi, valuei) is determined by applying a "hash function" to the key: h(keyi), and then possibly taking additional steps, depending on the particular hashing method and whether there are any "collisions". CSE 373, Copyright S. Tanimoto, Hashing -

7 CSE 373, Copyright S. Tanimoto, 2002 Hashing -
Hashing with Chains h(keyi) (keyi1,valuei1) (keyi2,valuei2) (keyin,valuein) keyi (keyj1,valuej1) (keyj2,valuej2) Each table entry is the head of a linked list of elements all of which share the same hash value h(keyi). This is sometimes called "open" hashing. CSE 373, Copyright S. Tanimoto, Hashing -

8 CSE 373, Copyright S. Tanimoto, 2002 Hashing -
Closed Hashing h(keyi) keyi valuei keyi The associations are all stored within the hash table; not on linked lists. CSE 373, Copyright S. Tanimoto, Hashing -

9 CSE 373, Copyright S. Tanimoto, 2002 Hashing -
Hashing Example Keys: 4-digit numbers. Values: names. Let h(d1d2d3d4) = (d1 + d2 + d3 + d4) mod E.g., h(1978) = 5. Data: (1978, VAX-11/780), (1982, IBM-PC), (1984, Macintosh) 0: IBM-PC 1: 2: Macintosh 3: 4: 5: VAX-11/780 6: 7: 8: 9: CSE 373, Copyright S. Tanimoto, Hashing -

10 Hashing Example (continued)
Let h(d1d2d3d4) = (d1 + d2 + d3 + d4) mod E.g., h(1978) = 5. Put (1993, Java). h(1993) = Collision! 0: IBM-PC 1: 2: Macintosh 3: 4: 5: VAX-11/780 6: 7: 8: 9: CSE 373, Copyright S. Tanimoto, Hashing -

11 Hashing Example (continued)
Linear Probing: Try h(1993) + 1 mod 10. If we keep getting collisions, the formula is h(d1d2d3d4) + k mod 10, in the kth attempt. If all 10 positions are full, a new hash table must be created and all the old elements placed in the new table. 0: IBM-PC 1: 2: Macintosh 3: Java 4: 5: VAX-11/780 6: 7: 8: 9: CSE 373, Copyright S. Tanimoto, Hashing -

12 Collision Resolution Methods
Linear Probing: (h(key) + ck) mod n Quadratic Probing: (h(key) + ck2) mod n Double hashing: (i  h2(key)) mod n Rehashing: Create a larger hash table and try again. (Only perform when the load factor is at least 0.5). Note: n should normally be a prime number to help avoid collisions. The constant c is normally small, and typically is 1. CSE 373, Copyright S. Tanimoto, Hashing -

13 Deletion in Closed Hash Tables
Simply removing an association can break "chains" formed after collisions and make it difficult to perform GET operations on associations that collided with now-deleted associations. Therefore, it is common to use a "delete bit" to mark an association as deleted. When the table gets too full of associations and deleted associations, rehashing is necessary. The rehashing may use a table of the same size as or smaller than before (if many of the entries are deleted entries), or it may use a larger table. CSE 373, Copyright S. Tanimoto, Hashing -


Download ppt "CSE 373, Copyright S. Tanimoto, 2002 Hashing -"

Similar presentations


Ads by Google