Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Slides:



Advertisements
Similar presentations
An Introduction to Hashing. By: Sara Kennedy Presented: November 1, 2002.
Advertisements

CSCE 3400 Data Structures & Algorithm Analysis
Hashing as a Dictionary Implementation
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
HashMaps. Overview What are HashMaps? Implementing DictionaryADT with HashMaps HashMaps 2/16.
Hashing. 2 Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
Fall 2007CS 225 Sets and Maps Chapter 9. Fall 2007CS 225 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Maps & Hashing Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Hashing. Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 48 Hashing.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Sets and Maps (and Hashing)
Sets and Maps Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Hashing. 2 Preview A hash function is a function that: When applied to an Object, returns a number When applied to equal Objects, returns the same number.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Not overriding equals  what happens if you do not override equals for a value type class?  all of the Java collections will fail in confusing ways 1.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
ADSA: Hashing/ Advanced Data Structures and Algorithms Objectives – –introduce hashing, hash functions, hash tables, collisions, linear probing,
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
CSC 427: Data Structures and Algorithm Analysis
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
HashCode() 1  if you override equals() you must override hashCode()  otherwise, the hashed containers won't work properly  recall that we did not override.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin, and Skylight.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Hashing. Searching Consider the problem of searching an array for a given value If the array is not sorted, the search requires O(n) time If the value.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Building Java Programs Generics, hashing reading: 18.1.
Sets and Maps Chapter 9.
Sections 10.5 – 10.6 Hashing.
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
Hashing.
Hashing.
Slides by Steve Armstrong LeTourneau University Longview, TX
Advanced Associative Structures
CMSC 341 Hashing Prof. Neary
Chapter 28 Hashing.
Hashing.
Chapter 21 Hashing: Implementing Dictionaries and Sets
CS202 - Fundamental Structures of Computer Science II
Hashing.
Sets and Maps Chapter 9.
Hashing.
CSE 373 Separate chaining; hash codes; hash maps
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing.
Hashing.
Hashing.
Skip List: Implementation
Presentation transcript:

Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park

Hashing Approach Transform key into number (hash value) Use hash value to index object in hash table Use hash function to convert key to number

Hashing Hash Table Array indexed using hash values Hash Table A with size N Indices of A range from 0 to N-1 Store in A[ hashValue % N]

Beware of % The % operator is integer remainder x % y == x - y * (x/y) It doesn’t work the way mathematicians would think 3/2 == 13%2 == 1 2/2 == 12%2 == 0 1/2 == 01%2 == 1 0/2 == 00% 2 == 0 (-1)/2 == 0(-1)%2 == -1 (-2)/2 == -1(-2)%2 == 0 (-3)/2 == -1 (-3)%2 == -1

Scattering hash values hashCode is a 32-bit signed int Have to reduce it to 0..N-1 Could use Math.abs(key.hashCode() % N) might not distribute values well, particularly if N is a power of 2 Multiplicative congruency method Produces good hash values Hash value = Math.abs((a * key.hashCode()) % N) Where N is table size a, N are large primes

Be careful with Math.abs We have to use Math.abs( x % N ) Rather than Math.abs(x) % N why?

A scary fact about ints Integer.MIN_VALUE = Integer.MIN_VALUE == - Integer.MIN_VALUE Math.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE An int value can represent any integer from ( )... ( ) An int cannot represent 2 31 ( )+1 == ( )

Art and magic of hashCodes There is no “right” hashCode function some art and magic to finding a good hashCode function, and to finding a hashCode to hashBucket function From java.util.HashMap: static int hashBucket(Object x, int N) { int h = x.hashCode(); h += ~(h << 9); h ^= (h >>> 14); h += (h << 4); h ^= (h >>> 10); return Math.abs(h % N);

Hash Function Example hashCode("apple") = 5 hashCode("watermelon") = 3 hashCode("grapes") = 8 hashCode("kiwi") = 0 hashCode("strawberry") = 9 hashCode("mango") = 6 hashCode("banana") = 2 Perfect hash function Unique values for each key kiwi banana watermelon apple mango grapes strawberry

Hash Function Suppose now hashCode("apple") = 5 hashCode("watermelon") = 3 hashCode("grapes") = 8 hashCode("kiwi") = 0 hashCode("strawberry") = 9 hashCode("mango") = 6 hashCode("banana") = 2 hashCode(“orange") = 3 Collision Same hash value for multiple keys kiwi banana watermelon apple mango grapes strawberry

Types of Hash Tables Open addressing Store objects in each table entry Chaining (bucket hashing) Store lists of objects in each table entry

Open Addressing Hashing Approach Hash table contains objects Probe  examine table entry Collision Move K entries past current location Wrap around table if necessary Find location for X Examine entry at A[ bucket(X) ] If entry = X, found If entry = empty, X not in hash table Else increment location by K, repeat

Open Addressing Hashing Approach Linear probing K = 1 May form clusters of contiguous entries Deletions Find location for X If X inside cluster, leave non-empty marker Insertion Find location for X Insert if X not in hash table Can insert X at first unoccupied location

Open Addressing Example Hash codes H(A) = 6H(C) = 6 H(B) = 7H(D) = 7 Hash table Size = 8 elements  = empty entry * = non-empty marker Linear probing Collision  move 1 entry past current location 

Open Addressing Example Operations Insert A, Insert B, Insert C, Insert D AA ABAB ABCABC DABCDABC

Open Addressing Example Operations Find A, Find B, Find C, Find D DABCDABC DABCDABC DABCDABC DABCDABC

Open Addressing Example Operations Delete A, Delete C, Find D, Insert C DCB*DCB* D*BCD*BC D*B*D*B* D*B*D*B*

Efficiency of Open Hashing Load factor = entries / table size Hashing is efficient for load factor < 90%

Chaining (Bucket Hashing) Approach Hash table contains lists of objects Find location for X Find hash code key for X Examine list at table entry A[ key ] Collision Multiple entries in list for entry

Chaining Example Hash codes H(A) = 6H(C) = 6 H(B) = 7H(D) = 7 Hash table Size = 8 elements  = empty entry 

Chaining Example Operations Insert A, Insert B, Insert C    A  A B  C B A

Chaining Example Operations Find B, Find A  C B A  C B A

Efficiency of Chaining Load factor = entries / table size Average case Evenly scattered entries Operations = O( load factor ) Worse case Entries mostly have same hash value Operations = O( entries )

Hashing in Java Collections HashMap & HashSet implement hashing Objects Built-in support for hashing boolean equals(object o) int hashCode() Can override with own definitions Must be careful to support Java contract

Java Contract hashCode() Must return same value for object in each execution, provided no information used in equals comparisons on the object is modified equals() if a.equals(b), then a.hashCode() must be the same as b.hashCode() if a.hashCode() != b.hashCode(), then !a.equals(b) a.hashCode() == b.hashCode() Does not imply a.equals(b) Though Java libraries will be more efficient if it is true