Hashing II CS2110 Spring 2018.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hashing.
© 2004 Goodrich, Tamassia Hash Tables1  
Maps, Dictionaries, Hashtables
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hash Tables1   © 2010 Goodrich, Tamassia.
Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing.
Hash table CSC317 We have elements with key and satellite data
CSCI 210 Data Structures and Algorithms
Dynamic Hashing (Chapter 12)
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Week 8 - Wednesday CS221.
Recitation 7 Hashing.
Hashing Alexandra Stefan.
Dictionaries Dictionaries 07/27/16 16:46 07/27/16 16:46 Hash Tables 
© 2013 Goodrich, Tamassia, Goldwasser
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash functions Open addressing
Hash Tables Part II: Using Buckets
Hashing CS2110 Spring 2018.
Advanced Associative Structures
Recitation 7 Hashing.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hash Tables.
Hashing CS2110.
Data Structures and Algorithms
Chapter 21 Hashing: Implementing Dictionaries and Sets
Maps.
Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Sets, Maps and Hash Tables
"He's off the map!" - Eternal Sunshine of the Spotless Mind
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Dictionaries 4/5/2019 1:49 AM Hash Tables  
Algorithms: Design and Analysis
Hash Tables Buckets/Chaining
slides created by Marty Stepp
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing based on slides by Marty Stepp
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
Lecture-Hashing.
Presentation transcript:

Hashing II CS2110 Spring 2018

Hash Functions Requirements: Properties of a good hash: deterministic Requirements: deterministic return a number in [0..n] 1 1 4 3 Properties of a good hash: fast collision-resistant evenly distributed hard to invert if the values are equal then… If hashes are equal then… For non-integer, encode as int (e.g., ascii) sum_{i=0}^{n-1} s[i]* 31^{n-1-i} Finding good hash functions is hard. SHA-3 2006-2015.

Hash Table Two ways of handling collisions: add(“CA”) Hash hunction mod 6 CA 5 CA 1 2 3 4 5 NY MA b Two ways of handling collisions: Chaining 2. Open Addressing

HashSet and HashMap Set<V>{ boolean add(V value); boolean contains(V value); boolean remove(V value); } Map<K,V>{ V put(K key, V value); V get(K key); V remove(K key); } Note: also called Map, associative array used to implement database indexes, caches, Possible implementations: Array: put O(1) if space, update O(n) get O(n) remove O(n) LinkedList: put O(1) update O(n) get O(n) remove O(n) Tree: put O(1)/O(log n)/O(n) update O(log n)/O(n) get O(log n)/O(n) remove O(log n)/O(n)

Remove a e c d b put('a') put('b') put('c') put('d') get('d') remove('c') put('e') Remove Chaining Open Addressing 1 2 3 1 2 3 a e c d b a e b c add/update/lookup/delete analyse running time (naïve version) d

Time Complexity (no resizing) Collision Handling put(v) get(v) remove(v) Chaining 𝑂(1) 𝑂(𝑛) Open Addressing

Load Factor Load factor When array becomes too full: with open addressing it becomes impossible to insert with chaining, still possible but the operations slow down Solution: same concept as ArrayLists: when it gets too full, copy to a larger array sometimes decrease size if too small, but remove is a much less common operation and it is likely that it will have to grow again soon anyway typically double the size of the array

Expected Chain Length 1 2 3 4 5 a e b c d For each bucket, probability that a single object is hashed to that bucket is 1/length of array There are n objects in the hash table Expected length of chain is n/length of array = 𝜆

Expected Time Complexity (no resizing) Collision Handling put(v) get(v) remove(v) Chaining 𝑂(1) 𝑂(1+𝜆) Open Addressing

Expected Number of Probes 1 2 3 4 5 We always have to probe H(v) With probability 𝜆, first location is full, have to probe again With probability 𝜆⋅𝜆, second location is also full, have to probe yet again … Expected #probes = 1+𝜆+ 𝜆 2 + …= 1 1−𝜆

Expected Time Complexity (no resizing) Collision Handling put(v) get(v) remove(v) Chaining 𝑂(1) Open Addressing Collision Handling put(v) get(v) remove(v) Chaining 𝑂(1) 𝑂(1+𝜆) Open Addressing 𝑂( 1 1−𝜆 ) Assuming constant load factor We need to dynamically resize!

Amortized Analysis In an amortized analysis, the time required to perform a sequence of operations is averaged over all the operations Can be used to calculate average cost of operation vs.

Amortized Analysis of put Assume dynamic resizing with load factor 𝜆= 1 2 : Most put operations take (expected) time 𝑂 1 If 𝑖= 2 𝑗 , put takes time 𝑂 𝑖 Total time to perform n put operations is 𝑛⋅𝑂 1 +𝑂( 2 0 + 2 1 + 2 2 + …+ 2 𝑗 ) Average time to perform 1 put operation is 𝑂 1 +𝑂 1 2 𝑗 + 1 2 𝑗−1 + …+ 1 4 + 1 2 +1 =𝑂(1)

Expected Time Complexity (with dynamic resizing) Collision Handling put(v) get(v) remove(v) Chaining 𝑂(1) Open Addressing

Cuckoo Hashing

Cuckoo Hashing a b d b c c e d Alternative solution to collisions Assume you have two hash functions H1 and H2 element a b c d e H1 9 17 11 5 H2 2 10 3 13 1 2 3 4 5 a b d b c c e d What if there are loops?

Complexity of Cuckoo Hashing Worst Case: Expected Case: Collision Handling put(v) get(v) remove(v) Chaining 𝑂(1) 𝑂(𝑛) Open Addressing Cuckoo Hashing ∞ Discussion: what about more than two hash functions? Collision Handling put(v) get(v) remove(v) Chaining 𝑂(1) Open Addressing Cuckoo Hashing

Bloom Filters Assume we only want to implement a set What if you had stored the value at "all" hash locations (instead of one)? element a b c d e H1 9 17 11 5 H2 2 10 3 13 1 2 3 4 5 🔵 🔵 🔵 🔵

Features of Bloom Filters Worst-case 𝑂(1) put, get, and remove Works well with higher load factors But: false positives