Sets and Maps Chapter 7 CS 225.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Lecture Objectives To learn about hash coding and its use to facilitate efficient search and retrieval To study two forms of hash tables—open addressing.
Sets and Maps ITEC200 – Week Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn.
Fall 2007CS 225 Sets and Maps Chapter 9. Fall 2007CS 225 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Hash Tables1 Part E Hash Tables  
Sets and Maps (and Hashing)
Hash Tables1 Part E Hash Tables  
Hashing General idea: Get a large array
Fall 2007CS 225 Sets and Maps Chapter 7. Fall 2007CS 225 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Lecture Objectives  To learn about hash coding and its use to facilitate efficient search and retrieval  To study two forms of hash tables—open addressing.
Building Java Programs Generics, hashing reading: 18.1.
Sets and Maps Chapter 9.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Sections 10.5 – 10.6 Hashing.
Data Structures Using C++ 2E
Data Abstraction & Problem Solving with C++
Slides by Steve Armstrong LeTourneau University Longview, TX
Hashing Alexandra Stefan.
Week 8 - Wednesday CS221.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash functions Open addressing
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Hashing CS2110 Spring 2018.
Building Java Programs
Hash Tables.
Hashing CS2110.
Sets and Maps Chapter 7.
Dictionaries and Their Implementations
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSE 373: Data Structures and Algorithms
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Sets and Maps Chapter 9.
Dictionaries 4/5/2019 1:49 AM Hash Tables  
slides created by Marty Stepp
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing based on slides by Marty Stepp
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Sets and Maps Chapter 7 CS 225

The Set Abstraction A set is a collection that contains no duplicate elements Operations on sets include: Testing for membership Adding elements Removing elements Union Intersection Difference Subset CS 225

Sets and the Set Interface The part of the Collection hierarchy that relates to sets includes three interfaces, two abstract classes, and two actual classes CS 225

Methods of the Set Interface Has required methods for testing set membership, testing for an empty set, determining set size, and creating an iterator over the set Two optional methods for adding an element and removing an element Constructors enforce the no duplicate members criterion Add method does not allow duplicate items to be inserted CS 225

The Set Interface (Set<E>) cardinality int size() set intersection, optional boolean retainAll(Collection<E>) set difference, optional boolean removeAll(Collection<E>) optional boolean remove( Object) Iterator<E> iterator() boolean isEmpty() subset test boolean containsAll(Collection<E>) set membership boolean contains( E) set union, optional boolean addAll( Collection<E>) false if o already present, optional boolean add( E o) Behavior Method CS 225

Maps and the Map Interface The Map is related to the Set Mathematically, a Map is a set of ordered pairs whose elements are known as the key and the value Keys are required to be unique, but values are not necessarily unique You can think of each key as a “mapping” to a particular value A map can be used to enable efficient storage and retrieval of information in a table CS 225

The Map Interface (Map<K, V>) int size() returns value associated with key or null V remove( Object key) returns previous value or null if not present V put( K key, V value) boolean isEmpty() returns object with given key or null if not present V get( Object key) Behavior Method CS 225

Hash Tables The goal behind the hash table is to be able to access an entry based on its key value, not its location We want to be able to access an element directly through its key value, rather than having to determine its location first by searching for the key value in the array Using a hash table enables us to retrieve an item in constant time (on average) with a linear (worst case) behavior CS 225

Hash Codes and Index Calculation The basis of hashing is to transform the item’s key value into an integer value which will then be transformed into a table index CS 225

Example Getting letter frequencies to generate Huffman codes For text-only documents, use the ascii code of a character as its key value table size of 128 isn't too big If you need the unicode value, the table would get too large. (65,536) Not all characters will occur Use smaller table and use mod operation to compute the index from the unicode value CS 225

Methods for Generating Hash Codes In most applications, the keys will consist of strings of letters or digits rather than a single character The number of possible key values is much larger than the table size Generating good hash codes is somewhat of an experimental process Besides a random distribution of its values, the hash function should be relatively simple and efficient to compute CS 225

Hash Codes for Strings For strings, simply summing the int values of all characters will return the same hash code for sign and sing The Java API algorithm accounts for position of the characters as well CS 225

String hashCode Method The String.hashCode() returns the integer calculated by the formula: s0 x 31(n-1) + s1 x 31(n-2) + … + sn-1 where si is the ith character of the string, and n is the length of the string “Cat” will have a hash code of: ‘C’ x 312 + ‘a’ x 31 + ‘t’ 31 is a prime number that generates relatively few collisons CS 225

Handling Collisions A collision happens when two distinct entries in the hash table have the same hash value Consider two ways to handle collisions Open addressing each element of the hash table has a single entry Chaining each element of the hash table points to a list (possibly empty) of entries CS 225

Open Addressing Linear probing can be used to access an item in a hash table If that element contains an item with a different key, increment the index by one Keep incrementing until you find the key or a null entry CS 225

Algorithm for Open Addressing Compute index using hashCode % table.length if table[index] is null add item at position index else if table[index] equals item being added item is already in the table else while table[index] not null increment index add item at new value of index CS 225

Search Termination As you increment the table index, your table should wrap around as in a circular array This leads to the possibility of an infinite loop How do you know when to stop searching if the table is full and you have not found the correct value? Stop when the index value for the next probe is the same as the hash code value for the object Ensure that the table is never full by increasing its size after an insertion if its occupancy rate exceeds a specified threshold CS 225

Example Data Name hashCode() hashCode()%5 hashCode()%11 "Tom" 84274 4 3 "Dick" 2129869 5 "Harry" 69496488 10 "Sam" 82789 "Pete" 2484038 7 CS 225

Traversing a hash table You can visit all elements in a hash table by getting each element of the array in turn The sequence of values that results is arbitrary For previous example size = 5 "Dick", "Sam", "Pete", "Harry", "Tom" size = 11 "Tom", "Dick", "Sam", "Pete", "Harry" CS 225

Deleting from a Hash Table When an item is deleted, you cannot just set its table entry to null Any element with the same hash code that was inserted into the table later will not be found Store a dummy value instead Deleted items waste storage space and reduce search efficiency CS 225

Hash Table Considerations Using a prime number for the size of the table tends to reduce collisions Load factor is ratio of filled elements to total elements load factor = actual elements / size A larger load factor will result in increased collisions Don't allow the table get too full Specify a maximum load factor If table gets too full, rehash Allocate a new table with twice the capacits Get each element from the old table, compute its index in new table and insert Don't insert deleted items CS 225

Clustering Example Consider a hash table of size 11 5(1) 6(1) 5(2) 6(2) 7(1) Consider a hash table of size 11 Add elements with hash codes 5, 6, 5, 6, 7 in that order CS 225

Reducing Collisions Using Quadratic Probing Linear probing tends to form clusters of keys in the table, causing longer search chains Quadratic probing can reduce the effect of clustering Increments form a quadratic series Disadvantage is that the next index calculation is time consuming as it involves multiplication, addition, and modulo division Not all table elements are examined when looking for an insertion index CS 225

Clustering Example Consider a hash table of size 11 Add elements with hash codes 5, 6, 5, 6, 7 in that order using quadratic probing There is less overlap in the search chains 5 -> 6 -> 9 -> 14 6 -> 7 -> 10 -> 15  5(1) 6(1) 6(2) 7(1) 5(2) CS 225

Hashing with Chaining Chaining is an alternative to open addressing Each table element references a linked list that contains all of the items that hash to the same table index The linked list is often called a bucket The approach sometimes called bucket hashing Only items that have the same value for their hash codes will be examined when looking for an object CS 225

Chaining Illustrated CS 225

Performance of Hash Tables Load factor is the number of filled cells divided by the table size Load factor has the greatest effect on hash table performance The lower the load factor, the better the performance as there is a lesser chance of collision when a table is sparsely populated CS 225

Performance of Hash Tables CS 225

Implementing a Hash Table Use interface KWHashMap and implement the interface using both open hashing and chaining. Method Behavior Object get( Object key) returns value associated with key or null boolean isEmpty() Object put( Object key, Object value) Inserts or replaces object associated with key Object remove( Object key) Removes mapping for key; returns value or null int size() CS 225

Entry class Create an inner class to store key-value pairs Data fields private Object key private Object value Constructor public Entry( Object key, Object value) Methods public Object getKey() public Object getValue(); public Object setValue(); CS 225

Open Hash Table Data Fields Private Methods private Entry table[] private static final int START_CAPACITY private double LOAD_THRESHOLD private int numKeys private int numDeletes private static final Entry DELETED Private Methods private int find( Object key) private void rehash() CS 225

find Algorithm for Open Hash Table set index to key.hashCode() % table.length if index < 0 add table.length while table[index] not empty and != key increment index if index >= table.length index = 0 return index CS 225

get Algorithm for Open Hash Table find table element for key if table element contains key return value at this element else return null CS 225

put Algorithm for Open Hash Table find table element for key if table element is empty insert new item increment numKeys rehash if needed return null else save old value of element replace value with new value return old value CS 225

Removing from Open Hash Table find table element for key if element is empty return null else save value of element replace element with DELETED increment numDeletes decrement numKeys return saved value CS 225

Rehashing Alocate a new table with twice as many elements set numKeys to 0 set numDeletes to 0 add each undeleted element of original table to new table CS 225

Chained Hash Table Data Fields Rehash when numKeys=3*CAPACITY private LinkedList table[] private static final int CAPACITY private static final int LOAD_THRESHOLD Rehash when numKeys=3*CAPACITY CS 225

get Algorithm for Chained Hash Table set index to key.hashCode % table.length if index < 0 add table.length if table[index] is empty return null for each element in list at table[index] if element key = search key return element's value return null (key not found) CS 225

put Algorithm for Chained Hash Table set index to key.hashCode % table.length if index < 0 add table.length if table[index] is empty create a new linked list at table[index] Search list at table[index] for key if successful replace value and return old value else insert new key-value pair into list increment numKeys rehash if needed return null CS 225

Removing from Chained Hash Table set index to key.hashCode % table.length if index < 0 add table.length if table[index] is empty return null Search list at table[index] for key if successful remove value and decrement numKeys if list at table[index] is empty table[index] -> null return value associated with key CS 225

Implementation Considerations for Maps and Sets Class Object implements methods hashCode and equals, so every class can access these methods unless it overrides them Object.equals compares two objects based on their addresses, not their contents Object.hashCode calculates an object’s hash code based on its address, not its contents Java recommends that if you override the equals method, then you should also override the hashCode method CS 225

Implementing HashSetOpen HashSetOpen is similar to a hash table Methods have slightly different signatures HashSetOpen is an adapter class with an internal HashTableOpen Map Method Set Method Object get( Object key) boolean contains( Object key) Object put(Object key, Object value) boolean add( Object key) Object remove( Object key) boolean remove( Object key) CS 225

Implementing the Java Map and Set Interfaces The Java API uses a hash table to implement both the Map and Set interfaces The task of implementing the two interfaces is simplified by the inclusion of abstract classes AbstractMap and AbstractSet in the Collection hierarchy CS 225

Nested Interface Map.Entry One requirement on the key-value pairs for a Map object is that they implement the interface Map.Entry<K, V>, which is an inner interface of interface Map An implementer of the Map interface must contain an inner class that provides code for the methods Object getKey() Object getValue() Object setValue( Object value) CS 225

Additional Applications of Maps Can implement the phone directory using a map PhoneDirectory Interface Map Interface addOrChangeEntry put lookUpEntry get removeEntry remove loadData none save CS 225

Additional Applications of Maps Huffman Coding Problem Use a map for creating an array of elements and replacing each input character by its bit string code in the output file Frequency table The key will be the input character The value is the character code string CS 225