9.2 Maps and Multimaps 9.3 Hash Tables 9.2, pgs. 521-530.

Slides:



Advertisements
Similar presentations
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Advertisements

Hashing as a Dictionary Implementation
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
Sets and Maps ITEC200 – Week Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Sets and Maps (and Hashing)
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hash Table March COP 3502, UCF.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Search  We’ve got all the students here at this university and we want to find information about one of the students.  How do we do it?  Linked List?
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Sets and Maps Chapter 9.
Sections 10.5 – 10.6 Hashing.
CHP - 9 File Structures.
Data Structures Using C++ 2E
Hashing Alexandra Stefan.
Hashing CSE 2011 Winter July 2018.
Data Abstraction & Problem Solving with C++
Slides by Steve Armstrong LeTourneau University Longview, TX
Hashing Alexandra Stefan.
Hashing - Hash Maps and Hash Functions
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Search by Hashing.
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash functions Open addressing
Hash Tables.
Defining the Compare Function
Chapter 9 – Sets and Maps 9.1 Associative Container
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Hash Table.
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations
Sets and Maps Chapter 9.
Data Structures and Algorithms
Dictionaries and Their Implementations
Hashing.
Resolving collisions: Open addressing
Data Structures and Algorithms
CSCE 3110 Data Structures & Algorithm Analysis
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Sets and Maps Chapter 9.
Dictionaries 4/5/2019 1:49 AM Hash Tables  
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Hashing.
Hash Maps Introduction
CS 144 Advanced C++ Programming April 23 Class Meeting
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Sets and Maps Chapter 7 CS 225.
Chapter 9 – Sets and Maps 9.1 Associative Container
Lecture-Hashing.
Presentation transcript:

9.2 Maps and Multimaps 9.3 Hash Tables 9.2, pgs. 521-530

Defining the Compare Function 9.2 Maps and Multimaps The map Functions Defining the Compare Function The multimap 9.2, pgs. 521-530

Maps Associative Containers A map is effectively a set of key-value pairs whose values are accessed via the key (not key-value). STL map keys are ordered (sorted) – iterators will return values in key sorted order. STL unordered_map keys are not in any particular order with respect to either their keys. The map is a template class that takes the following template parameters: Key_Type: The type of the keys contained in the key set Value_Type: The type of the values in the value set Compare: A function class that determines the ordering of the keys; by default this is stl::less<K> Allocator: The memory allocator for key objects; we will use the library-supplied default (J, Jane) (B, Bill) (S, Sam) (B1, Bob) (B2, Bill)

Maps Example 01 #include <iostream> #include <map> Associative Containers #include <iostream> #include <map> #include <string> using namespace std; int main() { map<string, string> myMap; myMap["Charlie Brown"] = "Manager"; myMap["Snoopy"] = "Catcher"; myMap["Lucy"] = "Right Field"; myMap["Linus"] = "2nd Base"; map<string, string>::iterator iter = myMap.begin(); while (iter != myMap.end()) cout << iter->first << " => " << iter->second << endl; ++iter; } return 0; The subscript operator is overloaded for the map class, so you can have statements of the form: v = a_map[k]; and a_map[k] = v1; Charlie Brown => Manager Linus => 2nd Base Lucy => Right Field Snoopy => Catcher

9.3, pgs. 530-541 9.3 Hash Tables Hash Codes and Index Calculation Functions for Generating Hash Codes Open Addressing Traversing a Hash Table Deleting an Item Using Open Addressing Reducing Collisions by Expanding the Table Size Reducing Collisions Using Quadratic Probing Problems with Quadratic Probing Chaining Performance of Hash Tables 9.3, pgs. 530-541

Ordered / Unordered Maps Associative Containers The C++ standard library uses a special type of binary search tree, called a balanced binary search tree, to implement the ordered set and map classes. This provides access to items in O(log n) time. Sets and maps can also be implemented using a data structure known as a hash table, which has some advantages over balanced search trees. The goal of hash table is to be able to access an entry based on its key value, not its location. We want to be able to access an entry directly through its key value, rather than by having to determine its location first by searching for the key value in an array. The "order of elements" may change when the container is modified (upon insertion/deletion). Using a hash table enables us to retrieve an entry in constant time (on average, O(1)).

Hash Codes and Index Calculation Associative Containers The basis of hashing is to transform the item’s key value into an integer value (its hash code) which is then transformed into a table index.

Hash Codes and Index Calculation Associative Containers If a text contains only ASCII values, which are the first 128 Unicode values, we could use a table of size 128 and let its location in the table int index = ascii_char be the index of the character we are seeking the table. However, what if all 65,536 Unicode characters were allowed? If you assume that on average 100 characters were used, you could use a table of 200 characters and compute the index by: int index = uni_char % 200 . . . 65 A, 8 66 B, 2 67 C, 3 68 D, 4 69 E, 12 70 F, 2 71 G, 2 72 H, 6 73 I, 7 74 J, 1 75 K, 2

Hash Codes and Index Calculation Associative Containers If a text contains this snippet: . . . mañana (tomorrow), I'll finish my program. . . Given the following Unicode values: Hexadecimal Decimal Name Character 0x0029 41 right parenthesis ) 0x00F1 241 small letter n with tilde ñ The indices for letters 'ñ' and ')' are both 41 41 % 200 = 41 and 241 % 200 = 41 This is called a collision; we will discuss how to deal with collisions shortly.

Generating Hash Codes Associative Containers For strings, simply summing the char values of all characters returns the same hash code for "sign" and "sing". One algorithm that has shown good results uses the following formula: s0 x 31(n-1) + s1 x 31(n-2) + … + sn-1 where si is the ith character of the string of length n. “Cat” has a hash code of: 'C' x 312 + 'a' x 31 + 't' = 67,510 31 is a prime number, and prime numbers generate relatively few collisions.

Generating Hash Codes Associative Containers In most applications, a key will consist of strings of letters or digits (such as a Social Security Number, an email address, or a partial ID) rather than a single character. The number of possible key values is much larger than the table size. Generating good hash codes typically is an experimental process. The goal is a random distribution of values. Simple algorithms sometimes generate lots of collisions. Although the hash function result appears to be random and gives a random distribution of keys, keep in mind that the calculation is deterministic. You always get the same hash code for a particular key. If the object is not already present in the table, the probability that the table slot with this index is empty is proportional to how full the table is. A good hash function should be relatively simple and efficient to compute - doesn't make sense to use an O(n) hash function to avoid doing an O(n) search.

Open Addressing We now consider two ways to organize hash tables: Associative Containers We now consider two ways to organize hash tables: Open addressing Chaining In open addressing, linear probing can be used to access an item (type Entry_Type*) in a hash table If the index calculated for an item's key is. occupied by an item with that key, we have found the item. If that element contains an item with a different key, increment the index by one. Keep incrementing until you find the key or a NULL entry (if the table is not full).

Search Termination Associative Containers As you increment the table index, your table should wrap around as in a circular array. This enables you to search the part of the table before the hash code value in addition to the part of the table after the hash code value. But this could lead to an infinite loop. How do you know when to stop searching if the table is full and you have not found the correct value? Stop when the index value for the next probe is the same as the hash code value for the object. Or ensure that the table is never full by increasing its size after an insertion when its load factor exceeds a specified threshold.

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 Tom Dick Harry Sam Pete [0] [1] [2] [3] [4] Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 Dick Harry Sam Pete [0] [1] [2] [3] [4] Dick Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 Harry Sam Pete [0] [1] [2] [3] [4] Dick Dick Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 Harry Sam Pete [0] [1] [2] [3] [4] Dick Harry Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 Sam Pete [0] [1] [2] [3] [4] Dick Harry Sam Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 Pete Sam [0] [1] [2] [3] [4] Dick Harry Sam Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 Pete Sam [0] [1] [2] [3] [4] Dick Sam Harry Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 Pete [0] [1] [2] [3] [4] Dick Sam Pete Harry Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 [0] [1] [2] [3] [4] Dick Sam Harry Pete Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 Pete [0] [1] [2] [3] [4] Dick Sam Harry Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 [0] [1] [2] [3] [4] Dick Pete Sam Harry Tom

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%5 "Tom" 84274 4 "Dick" 2129869 "Harry" 69496448 3 "Sam" 82879 "Pete" 2484038 [0] [1] [2] [3] [4] Dick Pete Sam Pete Harry Tom Retrieval of "Tom" or "Harry" takes one step, O(1) Because of collisions, retrieval of the others requires a linear search

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%11 "Tom" 84274 3 "Dick" 2129869 5 "Harry" 69496448 10 "Sam" 82879 "Pete" 2484038 7 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Hash Code Insertion Example Associative Containers Name hash_fcn() hash_fcn()%11 "Tom" 84274 3 "Dick" 2129869 5 "Harry" 69496448 10 "Sam" 82879 "Pete" 2484038 7 Tom [0] [1] [2] [3] [4] Only one collision occurred Dick Sam Pete [5] [6] [7] [8] [9] The best way to reduce the possibility of collision (and reduce linear search retrieval time because of collisions) is to increase the table size [10] Harry

Traversing a Hash Table Associative Containers You cannot traverse a hash table in a meaningful way since the sequence of stored values is arbitrary Tom [0] [1] [2] [3] [4] [0] [1] [2] [3] [4] Dick Sam Tom, Dick, Sam, Pete, Harry Pete Harry Dick Sam Pete [5] [6] [7] [8] [9] Tom Dick, Sam, Pete, Harry, Tom [10] Harry

Deleting an Item w/Open Addressing Associative Containers When an item is deleted, you cannot simply set its table entry to null. If we search for an item that may have collided with the deleted item, we may conclude incorrectly that it is not in the table. Instead, store a dummy value or mark the location as available, but previously occupied. Deleted items reduce search efficiency which is partially mitigated if they are marked as available. You cannot replace a deleted item with a new item until you verify that the new item is not in the table.

Rehashing Associative Containers To reduce collisions, use a prime number for the size of the table. A fuller table results in more collisions, so, when a hash table becomes sufficiently full, a larger table should be allocated and the entries reinserted. You must reinsert (rehash) values into the new table; do not simply transfer values as some search chains which were wrapped may break. Deleted items are not reinserted, which saves space and reduces the length of some search chains. Algorithm for Rehashing Allocate a new hash table with twice the capacity of the original. Reinsert each old table entry that has not been deleted into the new hash table. Reference the new table instead of the original.

9.3 Hash Tables Reducing Collisions by Expanding the Table Size Reducing Collisions Using Quadratic Probing 9.3, pgs. 530-541

Reducing Collisions Associative Containers Linear probing tends to form clusters of keys in the hash table, causing longer search chains. Quadratic probing can reduce the effect of clustering. Increments form a quadratic series (H+12, H+22, H+32, H+42}, ..., H+k2) If an item has a hash code of 5, successive values of index will be 6 (5+1), 9 (5+4), 14 (5+9), 21 (5 + 16), . . .

Linear Probing Quadratic Probing h(28) = 28 54 h(47) = 47 h(20) = 20 Given the hash function values to the right, fill in the hashed table values using a) linear probing, and b) quadratic probing. Linear Probing Quadratic Probing h(28) = 28 h(47) = 47 h(20) = 20 h(36) = 36 h(43) = 43 h(23) = 23 h(25) = 25 h(54) = 54 % 11 = 6 28 % 11 = 10  0 54 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 % 11 = 3 47 % 11 = 1 23 % 11 = 9 20 % 11 = 3  4 36 % 11 = 10 43 % 11 = 3  5 25

Linear Probing Quadratic Probing h(28) = 28 h(47) = 47 h(20) = 20 Given the hash function values to the right, fill in the hashed table values using a) linear probing, and b) quadratic probing. Linear Probing Quadratic Probing h(28) = 28 h(47) = 47 h(20) = 20 h(36) = 36 h(43) = 43 h(23) = 23 h(25) = 25 h(54) = 54 % 11 = 10  0 54 % 11 = 6 28 1 2 3 4 5 6 7 8 9 10 54 1 2 3 4 5 6 7 8 9 10 % 11 = 3 47 % 11 = 1 23 23 % 11 = 9 20 % 11 = 3  4 36 47 % 11 = 10 43 36 25 % 11 = 3  4  7 25 28 20 43

Linear Probing Quadratic Probing 54 23 47 36 25 28 20 43 54 23 47 36 Given the hash function values to the right, fill in the hashed table values. a) Linear probing Linear Probing Quadratic Probing h(28) = 28 % 11 = 6 h(47) = 47 % 11 = 3 h(20) = 20 % 11 = 9 h(36) = 36 % 11 = 3 -> 4 h(43) = 43 % 11 = 10 h(23) = 23 % 11 = 1 h(25) = 25 % 11 = 3 -> 5 h(54) = 54 % 11 = 10 -> 0 54 1 23 2 3 47 4 36 5 25 6 28 7 8 9 20 10 43 54 1 23 2 3 47 4 36 5 6 28 7 25 8 9 20 10 43 b) Quadratic probing h(28) = 28 % 11 = 6 h(47) = 47 % 11 = 3 h(20) = 20 % 11 = 9 h(36) = 36 % 11 = 3 -> 4 h(43) = 43 % 11 = 10 h(23) = 23 % 11 = 1 h(25) = 25 % 11 = 3 -> 4 -> 7 h(54) = 54 % 11 = 10 -> 0

Trees

Standard Library Class pair Associative Containers The C++ standard library defines the class pair in the header <utility>. This class is a simple grouping of two values typically of different types. Since all of its members are public, it is declared as a struct. template<typename T1, typename T2> struct pair { T1 first; T2 second; // Construct a pair from two values. pair(const T1& x, const T2& y) : first(x), second(y) {} // Construct a default pair. pair() : first(T1()), second(T2()) {} // Construct an assignable pair template<type Other_T1, type Other_T2> pair(const pair<Other_T1, Other_T2>& other) first = other.first; second = other.second; } };

Standard Library Class pair Associative Containers Pairs are used as the return type from functions that need to return two values, such as the element type for maps. /** Function to create a pair object */ template<typename T1, typename T2> make_pair(const T1& firstValue, const T2& secondValue) { return pair<T1& T2&>(firstValue, secondValue); } Comparing pair objects is needed for insertion in a maps. /** Less-than operator */ template<typename T1, typename T2> bool operator<(pair<T1, T2>& left, pair<T1, T2>& right) { return (left.first < right.first) || (!(right.first < left.first) && (left.second < right.second)); }

Compare Function Associative Containers The default value for the Compare template parameter is the Key_Type’s less-than operator. If Key_Type is a primitive type or the string class, the < operator is defined. However, let us assume that we wanted to use the class Person, with data fields family_name and given_name to be the key in a map. The family_name should determine the ordering of Person objects.

Associative Containers #include <iostream> #include <string> #include <set> using namespace std; template<typename T> class compare { public: bool operator() (const T& lhs, const T& rhs) { return lhs < rhs; } }; ostream& operator<<(ostream& out, const set<T, compare<T>>& a_set) for (typename set<T>::const_iterator iter = a_set.begin(); iter != a_set.end(); ++iter) out << " " << *iter; } return out; int main() set<string, compare<string>> mySet; string data1[] = { "Apples", "Oranges", "Pineapples", "Peaches", "Apples", "Grapes", "Plums" }; mySet.insert(data1, data1 + data1->size()); cout << "mySet is" << mySet << endl; return 0;

Maps The map is related to the set. Associative Containers The map is related to the set. Mathematically, a map is a set of ordered pairs whose elements are known as the key and the value. Keys are unique and ordered. Values are not unique. You can think of each key as “mapping” to a particular value. A map provides efficient storage and retrieval of information in a table. A map can have many-to-one mapping: (B, Bill), (B2, Bill). (J, Jane) (B, Bill) (S, Sam) (B1, Bob) (B2, Bill)

Maps Associative Containers In comparing maps to indexed collections, you can think of the keys as selecting the elements of a map, just as indexes select elements in a vector object. The keys for a map, however, can have arbitrary values (not restricted to 0, 1, 2, and so on, as for indexes). The subscript operator is overloaded for the map class, so you can have statements of the form: v = a_map[k]; // Assign to v the value for key k a_map[k] = v1; // Set the value for key k to v1 where k is of the key type, v and v1 are of the value type, and a_map is a map.

Maps Associative Containers A map is effectively a set of key-value pairs whose values are accessed via the key (not key-value). STL map keys are ordered (sorted) – iterators will return values in key sorted order. STL unordered_map keys are not in any particular order with respect to either their keys. The map is a template class that takes the following template parameters: Key_Type: The type of the keys contained in the key set Value_Type: The type of the values in the value set Compare: A function class that determines the ordering of the keys; by default this is stl::less<K> Allocator: The memory allocator for key objects; we will use the library-supplied default

Maps Example 02 Size = 4 Item 3 Key found = Item 3 1 => Item 1 Associative Containers #include <iostream> #include <map> #include <string> using namespace std; int main() { map<int, string> myMap; myMap[1] = "Item 1"; myMap[2] = "Item 2"; myMap[3] = "Item 3"; myMap[4] = "Item 4"; cout << "Size = " << myMap.size() << endl; cout << myMap[3] << endl; myMap.insert(pair<int, string>{5, "Item 5"}); map<int, string>::iterator it = myMap.find(3); cout << "Key found = " << it->second << endl; myMap.erase(it); for (map<int, string>::iterator iter = myMap.begin(); iter != myMap.end(); iter++) cout << iter->first << " => " << iter->second << endl; } return 0; Size = 4 Item 3 Key found = Item 3 1 => Item 1 2 => Item 2 4 => Item 4 5 => Item 5

Maps Associative Containers The key is a unique identification value associated with each item in the map. The “value” in the key-value pair might be an object, or a pointer to an object, with objects in the class distinguished by some attribute associated with the key that is then mapped to the value. Examples: Type of item Key Value University student Student ID number Student name, address, major, grade point average Online store customer E-mail address Customer name, address, credit card information, shopping cart Inventory item Part ID Description, quantity, manufacturer, cost, price