1 Implementing a queue in an array Hashing We show a neat idea that allows a queue to be stored in an array but takes constant time for both adding and.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

Hash Tables.
Hash Tables and Sets Lecture 3. Sets A set is simply a collection of elements Unlike lists, elements are not ordered Very abstract, general concept with.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Skip List & Hashing CSE, POSTECH.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Hashing. Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
Hashing The Magic Container. Interface Main methods: –Void Put(Object) –Object Get(Object) … returns null if not i –… Remove(Object) Goal: methods are.
CS2110 Recitation 07. Interfaces Iterator and Iterable. Nested, Inner, and static classes We work often with a class C (say) that implements a bag: unordered.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
CSC 427: Data Structures and Algorithm Analysis
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
Building Java Programs Generics, hashing reading: 18.1.
Sets and Maps Chapter 9.
Sections 10.5 – 10.6 Hashing.
Hash table CSC317 We have elements with key and satellite data
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Recitation 7 Hashing.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CSE 373 Data Structures and Algorithms
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Sets and Maps Chapter 9.
Collision Handling Collisions occur when different elements are mapped to the same cell.
Data Structures and Algorithm Analysis Hashing
Presentation transcript:

1 Implementing a queue in an array Hashing We show a neat idea that allows a queue to be stored in an array but takes constant time for both adding and removing an element. We then discuss hashing. Implementing a queue in an array A first attempt at implementing a queue in an array usually uses the following idea: At any point, the queue contains k elements. The elements are stored in a[0], a[1],.. a[k-1] in the order in which they were placed in the queue, so a[0] is at the front. In this situation, adding an element to the queue takes constant time: place it in a[k] and increase k. But deleting an element takes O(k) time. The items in a[1..k-1] have to be moved down to a[0..k-2] and k has to be decreased. Instead let’s use two variables, f and b, to mark the front and the back of the queue: Adding an element is done as before: a[b]= element; b= b+1; Now, removing an element takes O(1) time: f= f+1; But where do we put the next element when the array looks like this? The answer is: in a[0] --we allow wraparound!. Thus, we can describe the general picture, the invariant for this class, using two pictures. The queue is defined by: front back a 0 k a.length front back a 0 f b a.length front back a 0 f b (= a.length) or We also maintain: (1) 0 <= f < a.length and 0 < b < a.length. (2) the queue elements are in a[f], a[(f+1)%a.length], a[(f+2)%a.length], …, a[(b-1)%a.length] (3) the queue is empty when b = f (4) the queue is full when (b+1)%a.length = f. To understand point (4), suppose the array has only one unoccupied element: If we try adding another element using a[b]= element; b= b+1; we end up with b=f. But b=f is supposed to describe an empty queue, not a full queue. So we have to let the above picture represent a full queue. At least one array element will always be empty. An alternative is to introduce a fresh variable, size, that will contain the number of elements in the queue. A good exercise is to write a class QueueArray, similar to class QueueVector but using an array to implement the queue, as just described. If you do this, be sure you give comments on the fields of the class that describe how the queue is implemented in the array!!!! Points (0)-(4) above should be given as comments. front back a 0 f b a.length back front a 0 b f a.length back front a 0 b f a.length

2 Hashing Hashing is a technique for maintaining a set of elements in an array. You should also read Weiss, chapter 20, which goes into more detail (but is harder to read). A set is just a collection of distinct (different) elements on which the following operations can be performed: Make the set empty Add an element to the set Remove an element from the set Get the size of the set (number of elements in it) Tell whether a value is in the set Tell whether the set is empty. Obvious first implementation: Keep the elements in an array b. The elements are in b[0..n-1], where variable n contains the size of the array. No duplicates are allowed. Problems: Adding an item take time O(n) --it shouldn’t be inserted if it is already in the set, so b[0..n-1] has first to be searched for it. Removing an item also takes time O(n) in the worst case. We would like an implementation in which the expected time for these operations is constant: O(1). Solution: Use hashing. We illustrate hashing assuming that the elements of the set are Strings. Basic idea: Rather than keep the Strings in b[0..n-1], we allow them to be anywhere in the b. We use an array whose elements are of the following nested class type: // An instance is an entry in array b private static class HashEntry { public String element; // the element public boolean isInSet; // = “element is in set” // Constructor: an entry that is in the set iff b public HashEntry( String e, boolean b ) { element = e; isInSet= b; } Each element of our array b is either null or the name of a HashEntry, and that entry indicates whether it is in the set or not. So, to remove an element of the set, just set its isInSet field to false. Hashing with linear probing. Here’s the basic idea. Suppose we want to insert the String “bc” into the set. We compute an index k of the array, using what’s called a hash function, int k= hashCode(“bc”); and try to store the element at position b[k]. If that entry is already filled with some other element, we try to store it in b[(k+1)%b.length] --note that we use wraparound, just as in implementing a queue in an array. If that position is filled, we keep trying successive elements in the same way. Each test of an array element to see whether it is the String is called a probe. The hash function just picks some index, depending on its argument. We’ll show a hash function later. Checking to see whether a String “xxx” is in the set is similar; compute k= hashCode(“xxx”) and look in successive elements of b[k..] until a null element is reached or until “xxx” is found. If it is found, it is in the set iff the position in which it is found has its isInSet field true. You might think that this is a weird way to implement the set, that it couldn’t possibly work. But it does, provided the set doesn’t fill up too much, and provided we later make some adjustments. Here’s a basic fact: Suppose String s is in the set and hashCode(s) = k. Let b[j] be the first nonnull element after b[k] (we include wraparound here). Then s is one of the elements b[k], b[k+1], …, b[j-1] (with wraparound). Then, because of the basic fact, we can write method add as follows, assuming that array b is never full: b 0 k b.length try to insert element at b[k], b[k+1], etc...

3 Hashing // Add s to this set public void add(String s) { int k= hashCode(s); while (b[k] != null && !b[k].element.equals(s)) { k= (k+1)%b.length(); } if (b[k] != null && b.isInSet) return; // s is not in the set; store it in b[k]. b[k]= new HashEntry(s, true); size= size+1; } Removing an element is just as easy. Note that removing a value from the set leaves it in the array. // Remove s from this set (if it is in it) public void remove(String s) { int k= hashCode(s); while (b[k] != null && !b[k].element.equals(s)) { k= (k+1)%b.length(); } if (b[k] == null || !b[k].isInSet) return; // s is in the set; remove it. b[k].isInSet= false; size= size-1; } Hashing functions We need a function that turns a String s into an int that is in the range of array b. It doesn’t matter what this function is as long as it distributes Strings to integers in a fairly even manner. Here is the function that Weiss uses, assuming that s has 4 characters. s[0]* s[1]* s[2]* s[3]*37 0 i.e. ((s[0]*37 + s[1])*37 + s[2])*37 + s[3] The result is then reduced modulo the size of array b to produce an int in the range of b. Some of the above calculations may overflow, but that’s okay. The overflow produces an integer in the range of int that satisfies our needs. See page 686 of Weiss for an example of this hash function as a Java method. What about the load factor? The load factor, lf, is the value of lf = (size of elements of b in use) / (size of array b) The load factor is an estimate of how full the array is. If lf is close to 0, the array is relatively empty, and hashing will be quick. If lf is close to 1, then adding and removing elements will tend to take time linear in the size of b, which is bad. Here’s what someone proved: Under certain independence assumptions, the average number of array elements examined in adding an element is 1/(1-lf). So, if the array is half full, we can expect an addition to look at 1/(1-1/2) = 2 array elements. That’s pretty good! If the set contains 1,000 elements and the array size is over 2,000, only 2 probes are needed! So, we will keep the array no more than half full. Whenever insertion of an element will increase the number of used elements to more than 1/2 the size of the array, we will “rehash”. A new array will be created and the elements that are in the set will be copied over to it. Of course, this takes time, but it is worth it. Here’s the method: /** Rehash array b */ private void rehash( ) { HashEntry[] oldb= b; // copy of array b // Create a new, empty array b= new HashEntry[nextPrime(4*size())]; size= 0; // Copy active elements from oldb to b for (int i= 0; i != oldb.length; i= i+1) if (oldb[i] !== null && oldb[i].isInSet) add(oldb[i].element); } The size of the new array is the smallest prime number that is at least 4*b.size(). The reason for choosing a prime number is explained on the next page.

4 Hashing Quadratic probing. Linear probing looks for a String in the following entries, given that the String hashed to k (we implicitly assume that wraparound is being used): b[k], b[k+1], b[k+1], b[k+1], … This tends to produce clustering --long sequences of nonnull elements. This is because two Strings that hash to k and k+1 use almost the same probe sequence. A better idea is to probe the following entries: b[k],(for obvious reasons, b[k ] this is called b[k ] “quadratic probing”) b[k ]... This has been shown to remove the “primary clustering” that happens with linear probing. However, Strings that hash to the same value k still use the same sequence of probes. There are ways to eliminate this “secondary clustering”, but we won’t go into them here. We just want to present the basic ideas. Quadratic probing has been shown to be feasible if the size of array b is a prime and if the table is always at least 1/2 empty. In this case, it has been proven that: A new element can always be added, and its probe sequence never probes the same array elements twice. Calculating the next element to probe The calculation of k+i 2 is expensive. We show how to make it more efficient. Let H i = k+i 2, for i = 0, 1, 2, 3 For i>0, we calculate: H i - H i-1 = <definition of H i and H i-1 k+i*i - (k+(i-1)*(i-1)) = 2*i - 1 Therefore, we can calculate H i from H i-1 using the formula H i = H i-1 + 2*i - 1. An implementation The CS211 course website contains a file HashSet.java --look under “recitations”. An instance of class HashSet implements a set as a hash table, using the material discussed in this handout. File Main.java contains a method main that is used to test HashSet (at least partially). When you look at HashSet, think of the following: Class HashSet contains a nested class, HashEntry. This class can be static because it does not refer to any fields or methods of class HashSet. It is nested because there is no need for the user to know anything about it. One such good use of nested classes is information hiding, as we do here. Class HashSet contains an inner class, HashSet- Enumeration. It can’t be a nested class because it DOES make use of fields of class HashSet. This is a good use of inner classes for information hiding. Enumerating the elements of the set does NOT produce them in ascending order. We do not use the String hash function described in this handout. Instead, we make use of a function hashCode that is supplied in many Java API classes. Method hashCode is first defined in class Object, the superest class of them all. Method hashCode in class String actually computes the hash code using the equivalent of: ((s[0]*31 + s[1])*31 + … )*31 + s[s.length()-1]