Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin, and Skylight Publishing. All rights reserved. Lookup Tables and Hashing
24-2 Objectives: Learn about lookup tables Learn about hashing Review java.util.HashSet and java.util.HashMap
24-3 Lookup Tables A lookup table is a one-dimensional array that helps to find data very quickly. The array stores references to data records (or some values). A data record is identified by some key. The value of a key is directly translated into an array index using a simple formula.
24-4 Lookup Tables (cont’d) Only one key can be mapped onto a particular index (no collisions). The index that corresponds to a key must fall into the valid range (from 0 to array.length - 1). Access to data is “instantaneous” (O(1)).
24-5 Lookup Tables: Example 1 Zip codes Corresponding locales Some table entries remain unused
24-6 Lookup Tables: Example 2 private static final int [ ] n_thPowerOf3 = { 1, 3, 9, 27, 81, 243, 729, 2187, 6561, };... // precondition: 0 <= n < 10 public int powOf3 (int n) { return n_thPowerOf3 [ n ]; }
24-7 Lookup Tables: Example colors used in a particular image; each of the palette entries corresponds to a triplet of RGB values
24-8 Applications of Lookup Tables Data retrieval Data compression and encryption Tabulating functions Color mapping
24-9 Hash Tables A hash table is similar to a lookup table. The value of a key is translated into an array index using a hash function. The index computed for a key must fall into the valid range. The hash function can map different keys onto the same array index — this situation is called a collision.
24-10 Hash Tables (cont’d) The hash function should map the keys onto the array indices randomly and uniformly. A well-designed hash table and hash function minimize the number of collisions. There are two common techniques for resolving collisions: chaining and probing.
24-11 Chaining Buckets Each element in the array is itself a collection, called a bucket (a list or a BST), which is searched for the desired key
24-12 Probing The probing function recalculates the index If the place where we want to store the key is occupied by a different key, we store the former in another location in the same array, computed using a certain probing formula
24-13 java.util.HashSet and java.util.HashMap Classes These classes implement the Set and Map interfaces, respectively, using hash tables (with chaining). This implementation may be more efficient than TreeSet and TreeMap.
24-14 HashSet and HashMap (cont’d) Collisions are resolved through chaining. The sizes of buckets must remain relatively small. Load factor: Load factor = Total number of items Number of buckets
24-15 HashSet and HashMap (cont’d) Fine tuning: Load factor too large lots of collisions Load factor too small wasted space and slow iterations over the whole set If the load factor exceeds the specified limit, the table is automatically rehashed into a larger table; if possible this should be avoided.
24-16 HashSet and HashMap (cont’d) Objects in a HashSet or keys in a HashMap must have a reasonable int hashCode method that overrides Object’s hashCode and helps calculate the hashing function. The hashCode method returns an int from the entire int range; it is later mapped on the range of indices in a particular hash table. String, Integer, Double each have a reasonable hashCode defined.
24-17 hashCode Examples For String: (where s i is Unicode for the i-th character in the string) For Person: public int hashCode ( ) { return getFirstName ().hashCode () + getLastName ().hashCode (); }
24-18 Consistency HashSet / HashMap first use hashCode, then equals. TreeSet / TreeMap use only compareTo (or a comparator) For consistent performance, these methods should agree with each other: x.equals (y) x.compareTo (y) == 0 x.equals (y) x.hashCode() == y.hashCode()
24-19 HashSet Constructors Never mind...
24-20 HashMap Constructors
24-21 Review: What is the main difference between a lookup table and a hash table? What is a collision? Name two common techniques for resolving collisions. What is a bucket? What is a load factor?
24-22 Review (cont’d): How is hash table performance affected when the load factor is too high? Too low? What happens to a HashSet or a HashMap when the load factor exceeds the specified limit? HashSet’s no-args constructor sets the initial capacity to 16 and the load factor limit to How many values can be stored in this table before it is rehashed?
24-23 Review (cont’d): What is the sequence of values returned by an iterator for a HashSet? What is the range of values for the hashCode method? Which method(s) of an object are used to find it in a TreeSet? A HashSet?