Chapter 11 Hash Anshuman Razdan Div of Computing Studies

Chapter 11 Hash Anshuman Razdan Div of Computing Studies razdan@asu.edu http://dcst2.east.asu.edu/~razdan/cst230/

CST 230 - Razdan et al.2 Searching Searching for a specific value among a collection of values is a common operation. Complexity of search/find using: –array –linked list –ordered list –binary tree –BST

CST 230 - Razdan et al.3 Linear Search search an array A of n elements for a specified element target i = 0; found = false; while( (i < n) && !found ) if( A[ i ] == (or equals) target found = true; else i++; if( found ) target is at position i else target is not in array

CST 230 - Razdan et al.4 Complexity of Linear Search count # of comparisons that must be done. Worst Case Average Case

CST 230 - Razdan et al.5 Binary Search search a sorted array A of n elements for a specified element target public static int BinarySearch( int[] A, int first, int n, int target ){ int middle; if( n <= 0 ) found = -1; else{ middle = first + size/2; if( target == A[middle] ) found = middle; else if( target < A[middle] ) found = BinarySearch( A, first, n/2, target ); else found = BinarySearch( A, middle+1, (n-1)/2, target ); } return found; }

CST 230 - Razdan et al.6 Complexity of BinarySearch BinarySearch body has constant time – so we need to count the number of calls made to BinarySearch Find the depth of recursive calls – the length of the longest chain on recursive calls in the execution of an algorithm.

CST 230 - Razdan et al.7 Motivation: Direct Access is Fast Suppose we have a large number of products to store and that each product has a unique product ID. If n products have ID’s in range 0.. n-1, we can store each product in an array at index prodID. –time to find product? If # ID’s is much smaller than range of ID’s storing each product at prodID is VERY space inefficient.

CST 230 - Razdan et al.8 Hashing Each element has a unique key that identifies the element. We have: large range of keys We want: index of elements to be 0..numElem-1 key1... key2... key3... key4... keyn 0 1 2 3... n-1 hash function

CST 230 - Razdan et al.9 Common hashing function: Mod The mod function is a natural choice for hashing because x mod n always results in a number in the range 0.. n-1. E.g., Insert the following numbers into a hash table of size 10: 432, 321, 17, 65, 9388, 200, 83, 564

CST 230 - Razdan et al.10 Collisions A perfect hashing function will produce a different index for every key. Unfortunately, mod is NOT perfect. –20 mod 10 = 0 –520 mod 10 = 0 –1030 mod 10 = 0 –etc. When two (or more) distinct keys hash to the same index, we have a collision. There are various methods used to deal with collisions.

CST 230 - Razdan et al.11 Open-address Hashing One method to deal with collisions is open- addressing: –compute hash(key) –if data[hash(key)] is not occupied, insert key. else –search forward starting at index hash(key) + 1 until a vacant position is found and insert key. (Note: array is circular, so that after the last index of the array is tried, index 0 is tried next.) This method is also called “linear probing”

CST 230 - Razdan et al.12 Example Insert keys 89, 18, 49, 58, and 9 into a hash table of size 10.

CST 230 - Razdan et al.13 Hashing non-integer keys Many applications require collections of objects with non-integer keys (often Strings). an encoding function converts the key to an integer, and the hash function is performed on the encoding. all Java classes (objects) include a method called hashCode. Note: keys must be unique – so encoding of keys must be unique as well. This is very important when designing an encoding scheme.

CST 230 - Razdan et al.14 Hashtable methods Common Hashtable methods are: –put  put a new object into the table –containsKey  search for object with specified key (returns boolean) –get  retrieve an object for a specified key –remove  removes an object with a specified key

CST 230 - Razdan et al.15 Example Implementation public class Hashtable{ private int manyItems; private Object[] keys; private Object[] data; private boolean[] hasBeenUsed; private int hash(Object key){ return Math.abs(key.hashCode())%data.length; } private int nextIndex(int i){ return (i+1) % data.length; }...

CST 230 - Razdan et al.16 Constructor public Hashtable( int capacity ){ if( capacity <= 0 ) throw new IllegalArgumentException (“Capacity is negative.”); keys = new Object[capacity]; data = new Object[capacity]; hasBeenUsed = new boolean[capacity]; }

CST 230 - Razdan et al.17 findIndex private int findIndex( Object key ){ int count = 0; int i = hash(key); int retVal = -1; while( (count<data.length) && (hasBeenUsed[i]) && (retVal == -1) ){ if( key.equals(keys[i]) ) retVal = i; count++; i = nextIndex(i); } return retVal; }

CST 230 - Razdan et al.18 put public Object put(Object key, Object element){ int index = findIndex{key); Object answer = null; if( index != -1 ){ answer = data[index]; data[index] = element; } else if( manyItems < data.length ){ index = hash(key); while( keys[index] != null ) index = nextIndex(index); keys[index] = key; data[index] = element; hasBeenUsed[index] = true; manyItems++; } else throw new IllegalStateException (“Table is full”); return answer; }

CST 230 - Razdan et al.19 remove public Object remove( key ){ int index = findIndex( key ); Object answer = null; if( index != -1 ){ answer = date[index]; keys[index] = null; data[index] = null; manyItems--; } return answer; }

CST 230 - Razdan et al.20 get public Object get( Object key ){ int index = findIndex( key ); Object answer = null; if( index != -1 ){ answer = data[index]; } return answer; }

CST 230 - Razdan et al.21 containsKey public boolean containsKey( Object key ){ }

CST 230 - Razdan et al.22 Example Show state of Hashtable after the following are performed (assume hashCode of an integer is the integer itself): –construct Hashtable with capacity 10 –put( new Integer(29), “Barb” ) –put ( new Integer(19), “Mateo” ) –put( new Integer( 9 ), “Eddie” ) –remove( new Integer(19) ) –containsKey( new Integer(9) ) –put( new Integer(30), “Jerry” )

CST 230 - Razdan et al.23 Linear probing and clustering In linear probing, when several keys hash to same index a “cluster” of values forms around the index. elements take longer to find/add because we must move linearly through entire cluster. elements are put farther and farther away from desired index. need other methods that avoid clustering.

CST 230 - Razdan et al.24 Double Hashing The most common technique to avoid clustering is double hashing: –use hash function hash1 to determine desired index of element. –if collision occurs, use hash function hash2 to determine next index to search for open spot. In particular, if index i is occupied, the next index to examine is: (i + hash2(key) ) % data.length

CST 230 - Razdan et al.25 choosing hash2 as we step through the array, we must ensure that every array position is examined. we must choose hash2 to prevent returning to original hash index before visiting entire array. Array capacity & hash2 value should be relatively prime. One way to accomplish this: –choose data.length as a prime number and have hash2 return values from range 1.. data.length – 1 Donald Knuth’s suggestion: –both data.length and data.length – 2 should be prime numbers (called twin primes) e.g. 1231 and 1229 –hash1(key) = Math.abs(key.hashCode()) % data.length –hash2(key) = 1 + (Math.abs(key.hashCode())%(data.length – 2)

CST 230 - Razdan et al.26 Chained Hashing In chaining, we essentially allow collisions to occur, and store more than one element at a given array index. How can we store more than one element? –list –ordered list –bst If the hash function equally distributes keys over the array, the chains at each index should be relatively short.

CST 230 - Razdan et al.27 Time Analysis Worst case for hashing is when all keys hash to same index (linear) Best case for hashing is when all keys hash to different indices (constant) Average case analysis gives a better picture of what happens in reality.

CST 230 - Razdan et al.28 Load Factor The load factor for a hash table is defined as: For open-address hashing  <= 1. For chaining,  could be larger than 1.

CST 230 - Razdan et al.29 Average Time (Linear Probing) In open-address hashing with linear probing, a nonfull hash table and no removals, the average number of table elements examined is about For example. Suppose we have 800 items in a table of capacity 1000. How many entries will we examine on average?

CST 230 - Razdan et al.30 Average Time (Double Hashing) In open-address hashing with double hashing, a nonfull hash table, and no removals, the average number of elements examined is about: How many comparisons for previous example?

CST 230 - Razdan et al.31 Average Time (Chaining) In open-address hashing with chained hashing, the average number of table elements examined is about: How many for previous example?

CST 230 - Razdan et al.32 Java Data Structures the java.util package includes the following classes (see http://java.sun.com/j2se/1.4.2/docs/api/ ) http://java.sun.com/j2se/1.4.2/docs/api/ –HashMap –Hashtable –LinkedList as well as interfaces: –Iterator –ListIterator

Chapter 11 Hash Anshuman Razdan Div of Computing Studies

Similar presentations

Presentation on theme: "Chapter 11 Hash Anshuman Razdan Div of Computing Studies"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 11 Hash Anshuman Razdan Div of Computing Studies

Similar presentations

Presentation on theme: "Chapter 11 Hash Anshuman Razdan Div of Computing Studies"— Presentation transcript:

Similar presentations

About project

Feedback