Homework will be announced soon Midterm exam date announced

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Search(theKey)  Delete(theKey)  Insert(theKey, theElement)
Skip List & Hashing CSE, POSTECH.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey) 5/2/20151.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by:  Search the hash table in.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Hash Tables1 Part E Hash Tables  
Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Get(theKey)  Delete(theKey)  Insert(theKey, theElement)
Hash Tables1 Part E Hash Tables  
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey)
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys  E.g. a pair contains two components, student id and name (1234, Nan)
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Comp 335 File Structures Hashing.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  find(theKey)  erase(theKey)  insert(theKey, theElement)
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Appendix I Hashing.
Dictionaries Collection of items. Each item is a pair. (key, element)
Sets and Maps Chapter 9.
Hashing (part 2) CSE 2011 Winter March 2018.
Data Structures Using C++ 2E
Hashing CSE 2011 Winter July 2018.
Slides by Steve Armstrong LeTourneau University Longview, TX
Hashing Alexandra Stefan.
EEE2108: Programming for Engineers Chapter 8. Hashing
Hashing Alexandra Stefan.
Data Structures & Algorithms Hash Tables
Data Structures Using C++ 2E
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash functions Open addressing
Advanced Associative Structures
Hash Table.
Hashing CS2110.
Chapter 21 Hashing: Implementing Dictionaries and Sets
Dictionaries Collection of unordered pairs. Operations. (key, element)
Dictionaries Collection of pairs. Operations. (key, element)
Hashing.
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Data Structures & Algorithms Hash Tables
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Dictionaries Collection of pairs. Operations. (key, element)
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Sets and Maps Chapter 9.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Podcast Ch21a Title: Hash Functions
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
What we learn with pleasure we never forget. Alfred Mercier
Lecture-Hashing.
Presentation transcript:

Homework will be announced soon Midterm exam date announced

Dictionaries Collection of unordered pairs. Operations. (key, element) Pairs have different keys. Stores ‘mapping’ from key to element Operations. find(theKey) erase(theKey) insert(theKey, theElement) A linear list is an ordered sequence of elements. A dictionary is just a collection of pairs.

Application Collection of student records in this class. (key, element) = (student name, linear list of assignment and exam scores) All keys are distinct. Get the element whose key is John Adams. Update the element whose key is Diana Ross. insert() implemented as update when there is already a pair with the given key. erase() followed by insert().

Dictionary With Duplicates In this case, keys are not required to be distinct. Word dictionary. Pairs are of the form (word, meaning). May have two or more entries for the same word. (bolt, a threaded pin) (bolt, a crash of thunder) (bolt, to shoot forth suddenly) (bolt, a gulp) (bolt, a standard roll of cloth) etc. Sometimes called ‘multimap’

Represent As A Linear List L = (e0, e1, e2, e3, …, en-1) Each ei is a pair (key, element). 5-pair dictionary D = (a, b, c, d, e). a = (aKey, aElement), b = (bKey, bElement), etc. Array or linked representation. Even though a dictionary is not an ordered sequence, it may be represented as a linear list.

Array Representation find(theKey) O(size) time b c d e find(theKey) O(size) time insert(theKey, theElement) O(size) time to verify duplicate, O(1) to add at right end. erase(theKey) O(size) time.

Reminder: Binary Search Sorted Array Worst-case time complexity: O(logn) Source: https://en.wikipedia.org/wiki/File:Binary_search_into_array.png

Reminder: Binary Search Searching for 37: Source: https://blog.penjee.com/binary-vs-linear-search-animated-gifs/

Reminder: Binary Search public static int indexOf(int[] a, int key) { int lo = 0; int hi = a.length - 1; while (lo <= hi) { // Key is in a[lo..hi] or not present. int mid = lo + (hi - lo) / 2; if (key < a[mid]) hi = mid - 1; else if (key > a[mid]) lo = mid + 1; else return mid; } return -1; source: http://algs4.cs.princeton.edu/11model/BinarySearch.java.html

Reminder: Growth rates Image credit: http://math.hws.edu/javanotes/c8/rates-of-growth.png

Sorted Array elements are in ascending order of key. find(theKey) B C D E elements are in ascending order of key. find(theKey) O(log size) time insert(theKey, theElement) O(log size) time to verify duplicate, O(size) to add. erase(theKey) O(size) time. O(log (size)) for get() results from the use of a binary search.

Unsorted Chain findt(theKey) O(size) time insert(theKey, theElement) b c d e NULL firstNode findt(theKey) O(size) time insert(theKey, theElement) O(size) time to verify duplicate, O(1) to add at left end. erase(theKey) O(size) time.

Sorted Chain Elements are in ascending order of Key. find(theKey) B C D E NULL firstNode Elements are in ascending order of Key. find(theKey) O(size) time insert(theKey, theElement) O(size) time to verify duplicate, O(1) to put at proper place. Binary search cannot be done in O(log size) time on a chain, because it takes O(size) time to locate the middle node. Worst-case performance of a sorted chain is no better than that of an unsorted chain.

Sorted Chain Elements are in ascending order of Key. erase(theKey) B C D E NULL firstNode Elements are in ascending order of Key. erase(theKey) O(size) time.

Skip Lists Worst-case time for find, insert, and erase is O(size). Expected time is O(log size). We’ll skip skip lists.

Hash Tables Worst-case time for find, insert, and erase is O(size). Expected time is O(1). Can use animation at: https://www.cs.usfca.edu/~galles/visualization/OpenHash.html

Ideal Hashing Uses a 1D array (or table) table[0:b-1]. Each position of this array is a bucket. A bucket can normally hold only one dictionary pair. Uses a hash function f that converts each key k into an index in the range [0, b-1]. f(k) is the home bucket for key k. Every dictionary pair (key, element) is stored in its home bucket table[f[key]]. Bucket size equal to size of L1 cache line or of a disk track or sector may yield better performance in the case of non-ideal hashing.

Ideal Hashing Example Pairs are: (22,a), (33,c), (3,d), (73,e), (85,f). Hash table is table[0:7], b = 8. Hash function is key/11. Pairs are stored in table as below: (3,d) (22,a) (33,c) (73,e) (85,f) [0] [1] [2] [3] [4] [5] [6] [7] get, put, and remove take O(1) time.

What Can Go Wrong? (3,d) (22,a) (33,c) (73,e) (85,f) [0] [1] [2] [3] [4] [5] [6] [7] Where does (26,g) go? Keys that have the same home bucket are synonyms. 22 and 26 are synonyms with respect to the hash function that is in use. The home bucket for (26,g) is already occupied. Handle Collision and overflow Where does (100,h) go? Not mapped to a valid bucket  Choose a better hash function Where does (100,h) go? There is no bucket with index 100/11 = 9. There is a problem with the hash function, because it is supposed to map all valid keys into a valid bucket.

What Can Go Wrong? (85,f) (22,a) (33,c) (3,d) (73,e) A collision occurs when the home bucket for a new pair is occupied by a pair with a different key. An overflow occurs when there is no space in the home bucket for the new pair. When a bucket can hold only one pair, collisions and overflows occur together. Need a method to handle overflows.

Hash Table Issues Choice of hash function. Overflow handling method. Size (number of buckets) of hash table.

Hash Collision resolution Credits: By Jorge Stolfi (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons (Retrieved from https://commons.wikimedia.org/wiki/File%3AHash_table_5_0_1_1_1_1_1_LL.svg on Feb 8, 2016) Hash collision resolved by separate chaining. There are several other methods.

Hash Collision/overflow resolution Separate chaining Open addressing Linear probing Quadratic probing Double hashing Cuckoo hashing … We’ll discuss a few of these in the next lecture! Lec 17 will discuss some of these in more detail.

Hash Functions Two parts: 2. Map an integer into a home bucket. 1. Convert key into a nonnegative integer in case the key is not an integer. Done by the function hash(). 2. Map an integer into a home bucket. f(k) is an integer in the range [0, b-1], where b is the number of buckets in the table.

String To Integer Each character is 1 byte long. An int is 4 bytes. A 2 character string s may be converted into a unique 4 byte non-negative int using the code: int answer = s.at(0); answer = (answer << 8) + s.at(1); Strings that are longer than 3 characters do not have a unique non-negative int representation.

1. String To Nonnegative Integer template<> class hash<string> { public: size_t operator()(const string theKey) const {// Convert theKey to a nonnegative integer. unsigned long hashValue = 0; int length = (int) theKey.length(); for (int i = 0; i < length; i++) hashValue = 5 * hashValue + theKey.at(i); return size_t(hashValue); } }; We want the int produced somehow depend on all the characters in the string. Representations are not unique though, because the space of key can be bigger than ints.

2. Map Into A Home Bucket Most common method is by division. (85,f) [0] [1] [2] [3] [4] [5] [6] [7] Most common method is by division. homeBucket = hash(theKey) % divisor; divisor equals number of buckets b. 0 <= homeBucket < divisor = b Dynamic resizing reduces chance of ‘%’ collision. Resizing is accompanied by a full or incremental table rehash whereby existing items are mapped to new bucket locations.* Note: hash maps into a nonnegative integer. % works properly only with nonnegative integers. * Source: https://en.wikipedia.org/wiki/Hash_table#Dynamic_resizing

Uniform Hash Function Let keySpace be the set of all possible keys. (3,d) (22,a) (33,c) (73,e) (85,f) [0] [1] [2] [3] [4] [5] [6] [7] Let keySpace be the set of all possible keys. A uniform hash function maps the keys in keySpace into buckets such that approximately the same number of keys get mapped into each bucket.

Uniform Hash Function (3,d) (22,a) (33,c) (73,e) (85,f) [0] [1] [2] [3] [4] [5] [6] [7] Equivalently, the probability that a randomly selected key has bucket i as its home bucket is 1/b, 0 <= i < b. A uniform hash function minimizes the likelihood of an overflow when keys are selected at random.

Hashing By Division keySpace = all ints. For every b, the number of ints that get mapped (hashed) into bucket i is approximately 232/b. Therefore, the division method results in a uniform hash function when keySpace = all ints. In practice, keys tend to be correlated. So, the choice of the divisor b affects the distribution of home buckets.

Selecting The Divisor Because of this correlation, applications tend to have a bias towards keys that map into odd integers (or into even ones). When the divisor is an even number, odd integers hash into odd home buckets and even integers into even home buckets. 20%14 = 6, 30%14 = 2, 8%14 = 8 15%14 = 1, 3%14 = 3, 23%14 = 9 The bias in the keys results in a bias toward either the odd or even home buckets.

Selecting The Divisor When the divisor is an odd number, odd (even) integers may hash into any home. 20%15 = 5, 30%15 = 0, 8%15 = 8 15%15 = 0, 3%15 = 3, 23%15 = 8 The bias in the keys does not result in a bias toward either the odd or even home buckets. Better chance of uniformly distributed home buckets. So do not use an even divisor.

Selecting The Divisor Similar biased distribution of home buckets is seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, … The effect of each prime divisor p of b decreases as p gets larger. Ideally, choose b so that it is a prime number. Alternatively, choose b so that it has no prime factor smaller than 20. Hard to generate large prime numbers (necessary due to dynamic resizing)

STL hash_map Simply uses a divisor that is an odd number. This simplifies implementation because we must be able to resize the hash table as more pairs are put into the dictionary. Array doubling, for example, requires you to go from a 1D array table whose length is b (which is odd) to an array whose length is 2b+1 (which is also odd). C++11 has unordered_map which is similar