Hashing CS2110 Spring 2018.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hashing as a Dictionary Implementation
© 2004 Goodrich, Tamassia Hash Tables1  
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hash Tables1   © 2010 Goodrich, Tamassia.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Building Java Programs Generics, hashing reading: 18.1.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Fundamental Structures of Computer Science II
Hash Maps Rem Collier Room A1.02 School of Computer Science and Informatics University College Dublin, Ireland.
Sets and Maps Chapter 9.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing CSE 2011 Winter July 2018.
Recitation 7 Hashing.
Hashing Alexandra Stefan.
Hashing Exercises.
© 2013 Goodrich, Tamassia, Goldwasser
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash functions Open addressing
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Recitation 7 Hashing.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hashing II CS2110 Spring 2018.
Building Java Programs
Hash Tables.
Hashing CS2110.
Data Structures and Algorithms
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Data Structures and Algorithms
slides created by Marty Stepp and Hélène Martin
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Sets, Maps and Hash Tables
slides adapted from Marty Stepp and Hélène Martin
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Sets and Maps Chapter 9.
Dictionaries 4/5/2019 1:49 AM Hash Tables  
slides created by Marty Stepp
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
slides created by Marty Stepp and Hélène Martin
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Hashing CS2110 Spring 2018

Hash Functions Requirements: Properties of a good hash: deterministic Requirements: deterministic return a number in [0..n] 1 1 4 3 Properties of a good hash: fast collision-resistant evenly distributed hard to invert if the values are equal then… If hashes are equal then… For non-integer, encode as int (e.g., ascii) sum_{i=0}^{n-1} s[i]* 31^{n-1-i} Finding good hash functions is hard. SHA-3 2006-2015.

Example: SHA-256

Example: hashCode() Method defined in java.lang.Object Default implementation: uses memory address of object If you override equals, you must override hashCode!!! String overrides hashCode: s.hashCode() := 𝑠[0]∗ 31 𝑛−1 + 𝑠[1]∗ 31 𝑛−2 + … + 𝑠[𝑛−1]

Application: Error Detection Hash functions are used for error detection E.g., hash of uploaded file should be the same as hash of original file (if different, file was corrupted)

Application: Integrity Hash functions are used to "sign" messages Provides integrity guarantees in presence of an active adversary Principals share some secret sk Send (m, h(m,sk))

Application: Password Storage Hash functions are used to store passwords Could store plaintext passwords Problem: Password files get stolen Could store (username, h(password)) Problem: password reuse Instead, store (username, s, h(password, s))

Application: Hash Set 𝑂(𝑛) 𝑂(1) 𝑂(𝑛) 𝑂(1) 𝑂(𝑛) 𝑂(𝑛) 𝑂( log 𝑛 ) Data Structure add(val x) lookup(int i) find(val x) ArrayList LinkedList TreeSet HashSet 𝑂(𝑛) 𝑂(1) 𝑂(𝑛) 2 1 3 𝑂(1) 𝑂(𝑛) 𝑂(𝑛) 2 1 3 2 1 3 𝑂( log 𝑛 ) 𝑂( log 𝑛 ) 3 1 2 0 1 2 3 𝑂(1) 𝑂(1) Expected time Worst-case: 𝑂(𝑛)

Hash Tables Idea: finding an element in an array takes constant time when you know which index it is stored in add(“CA”) Hash hunction mod 6 CA 5 CA 1 2 3 4 5 add/update/lookup/delete analyse running time (naïve version) MA NY b

So what goes wrong? k1 k2 hashIndex hashIndex 1 2 3 4 5

Can we have perfect hash functions? Perfect hash functions map each value to a different index in the hash table Impossible in practice don’t know size of the array Number of possible values far far exceeds the array size no point in a perfect hash function if it takes too much time to compute

Collision Resolution Two ways of handling collisions: Chaining 2. Open Addressing

Chaining add(“NY”) add(“CA”) lookup("CA") CA New York hashIndex 3 1 2 1 2 3 4 5 NY VA run through lookup method (requires linear search) bucket/chain (linked list) CA

Open Addressing probing: Find another available space add(“CA”) hashIndex CA 3 NY CA VA 1 2 3 4 5 describe lookup (search until null) Compute time for lookup/insert: The higher the load factor, the larger number of probes that is expected to find the value 1+ \sum Pr[at least i occupied] 1 + \lambda +\lambda^2+… = 1/(1-\lambda) discuss how to delete (dummy elements) MA NY CA VA 1 2 3 4 5

Different probing strategies When a collision occurs, how do we search for an empty space? linear probing: search the array in order: i, i+1, i+2, i+3 . . . quadratic probing: search the array in nonlinear sequence: i, i+12, i+22, i+32 . . . clustering: problem where nearby hashes have very similar probe sequence so we get more collisions discuss caching vs. clustering clustering is more problematic with linear probing. Quadratic probing attempts to solve this Reason why clustering is particularly bad: With chaining, if lots of keys hash to the same index it will decrease performance. But keys can hash to nearby indices without affecting each other. With probing, if two keys hash to indices nearby in the probe sequence, it will affect the performance of both.

Load Factor Load factor What happens when the array becomes too full? i.e. load factor gets a lot bigger than ½? no longer expected constant time operations When array becomes too full: with open addressing it becomes impossible to insert with chaining, still possible but the operations slow down Solution: same concept as ArrayLists: when it gets too full, copy to a larger array sometimes decrease size if too small, but remove is a much less common operation and it is likely that it will have to grow again soon anyway typically double the size of the array 1 waste of memory too slow best range

Resizing double the size. reinsert / rehash all elements to new array Solution: Dynamic resizing double the size. reinsert / rehash all elements to new array Why not simply copy into first half? Example for why you cannot just copy blindly to larger array: Object e has hashcode 5. In a table of length 4 this hashes to 1 due to modding. When doubled to length 8, this hashes to 5

Let's try it Insert the following elements (in order) into an array of size 6: element a b c d e hashCode 9 17 11 19 1 2 3 4 5 a e b c d

Let's try it Insert the following elements (in order) into an array of size 6: element a b c d e hashCode 9 17 11 19 1 2 3 4 5 a d e b c Note: Using linear probing, no resizing

Collision Resolution Summary Chaining Open Addressing store entries in separate chains (linked lists) can have higher load factor/degrades gracefully as load factor increases store all entries in table use linear or quadratic probing to place items uses less memory clustering can be a problem — need to be more careful with choice of hash function

Application: Hash Map Map<K,V>{ void put(K key, V value); void update(K key, V value); V get(K key); V remove(K key); } Note: also called Map, associative array used to implement database indexes, caches, Possible implementations: Array: put O(1) if space, update O(n) get O(n) remove O(n) LinkedList: put O(1) update O(n) get O(n) remove O(n) Tree: put O(1)/O(log n)/O(n) update O(log n)/O(n) get O(log n)/O(n) remove O(log n)/O(n)

Application: Hash Map Idea: finding an element in an array takes constant time when you know which index it is stored in put("California",“CA”) get("California") Hash hunction mod 6 California 5 CA 1 2 3 4 5 add/update/lookup/delete analyse running time (naïve version) MA NY b

HashMap in Java Computes hash using key.hashCode() No duplicate keys Uses chaining to handle collisions Default load factor is .75 Java 8 attempts to mitigate worst-case performance by switching to a BST-based chaining!

Hash Maps in the Real World Network switches Distributed storage Database indexing Index lookup (e.g., Dijkstra's shortest-path algorithm) Useful in lots of applications…