Hashing & HashMaps CS-2851 Dr. Mark L. Hornick.

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

Hashing as a Dictionary Implementation
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Hash Tables1 Part E Hash Tables  
Lecture 10: Search Structures and Hashing
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
CS-2851 Dr. Mark L. Hornick 1 Tree Maps and Tree Sets The JCF Binary Tree classes.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
CS261 Data Structures Hash Tables Concepts. Goals Hash Functions Dealing with Collisions.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Not overriding equals  what happens if you do not override equals for a value type class?  all of the Java collections will fail in confusing ways 1.
CSS446 Spring 2014 Nan Wang.  Java Collection Framework ◦ Set ◦ Map 2.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Appendix I Hashing.
Sets and Maps Chapter 9.
Sections 10.5 – 10.6 Hashing.
Chapter 12 Hash Table.
11 Map ADTs Map concepts. Map applications.
Slide style: Dr. Hornick
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Data Abstraction & Problem Solving with C++
Slides by Steve Armstrong LeTourneau University Longview, TX
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
JCF Hashmap & Hashset Winter 2005 CS-2851 Dr. Mark L. Hornick.
Efficiency add remove find unsorted array O(1) O(n) sorted array
Searching.
Hash functions Open addressing
Hash Tables Part II: Using Buckets
Strings: Tries, Suffix Trees
Road Map CS Concepts Data Structures Java Language Java Collections
Hash Tables -The Hacker's Dictionary
Hashing CS2110 Spring 2018.
Topic 22 Hash Tables -The Hacker's Dictionary
Hashing CS2110.
Dictionaries Collection of pairs. Operations. (key, element)
Chapter 12 Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Sets, Maps and Hash Tables
Arrays and Collections
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Binary Search Trees A special case of a Binary Tree
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Sets and Maps Chapter 9.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Strings: Tries, Suffix Trees
slides created by Marty Stepp
Podcast Ch21a Title: Hash Functions
Chapter 5: Hashing Hash Tables
Presentation transcript:

Hashing & HashMaps CS-2851 Dr. Mark L. Hornick

Let’s review the worst-case performance characteristics of previously covered data structures ArrayList – JCF class get() add() contains() SortedArrayList (uses binary searching) LinkedList – JCF class BinaryTree CS-2851 Dr. Mark L. Hornick

Let’s review the worst-case performance characteristics of previously covered data structures ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList get() add() contains() LinkedList BinaryTree CS-2851 Dr. Mark L. Hornick

Let’s review the worst-case performance characteristics of previously covered data structures ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList add() – O(n+log n)->O(n); due to need to figure out first where to add; then the need to shift elements to the right contains() – O(log n); search based on the Splitting Rule (binary search) LinkedList get() add() contains() BinaryTree CS-2851 Dr. Mark L. Hornick

Let’s review the worst-case performance characteristics of previously covered data structures ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList add() – O(n+log n)->O(n); due to need to figure out first where to add; then the need to shift elements to the right contains() – O(log n); search based on the Splitting Rule (binary search) LinkedList get() – O(n); sequential access add() – O(k); once where to add has been determined really O(n), because that’s how it takes to find the location to insert BinaryTree get() add() contains() CS-2851 Dr. Mark L. Hornick

Let’s review the worst-case performance characteristics of previously covered data structures ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList add() – O(n+log n)->O(n); due to need to figure out first where to add; then the need to shift elements to the right contains() – O(log n); search based on the Splitting Rule (binary search) LinkedList get() – O(n); sequential access add() – O(k); once where to add has been determined really O(n), because that’s how it takes to find the location to insert BinaryTree get() – not supported due to lack of indexing (but do we always need it?) add() – O(log n); due to sorting built into the tree structure contains() – O(log n); due to sorting built into the tree structure What about memory usage?? CS-2851 Dr. Mark L. Hornick

Is there anything faster at everything? CS-2851 Dr. Mark L. Hornick

Map definition A map is a collection in which each Entry element has two parts a unique key part a value part (which may not be unique) Each unique key “maps” to a corresponding value Example: Morse code map – each character maps to a (unique) sequence of dots and dashes Example: a map of Students, in which each key is the (unique) student ID, and each (non-unique?) value is a reference to the Student object itself Example: a phonebook, where each number (each key) maps to a person Entry key value CS-2851 Dr. Mark L. Hornick

What is a Key? A key is just something that uniquely identifies a particular instance of an value/object A key can be a number, a string, or an object, so long as it is unique If two values/objects have the same key, then they are (theoretically) equal Only one ID per MSOE student, so if the ID’s match, it must (by definition) be the same student If the equals() method comparing two keys returns true, then the objects are equal too, by definition CS-2851 Dr. Mark L. Hornick

What if an object doesn’t possess a specific unique attribute? Scenario: pretend MSOE ID’s didn’t exist Can any of the attributes of a student, taken together, be unique? …even though any individual attribute may not exhibit this uniqueness? Exercise CS-2851 Dr. Mark L. Hornick

A key can be generated from a unique combination of non-unique attributes All of an object’s attributes can be used to generate the key That is, the object itself is the key Or the key can be generated from just a subset of an object’s attributes Provided that subset is unique CS-2851 Dr. Mark L. Hornick

OK, so what role do keys play in making a faster data structure? What if each unique key corresponded to a unique index within an array of Entries? Maps to key index Entry key value CS-2851 Dr. Mark L. Hornick

Hash definition A hash is a transformation of a key into a numeric value that maps to the index of an array (or table) This is done in two steps: generate a numeric hashcode from the key (which is not necessarily numeric) If the key is already numeric and unique (like an ID), then the key can be used as the hashcode transform the hashcode into an array index Key hashcode index Winter 2005 CS-2851 Dr. Mark L. Hornick

HashMap definition A HashMap<E> is an array-based collection of Entry<E> elements a value part (which could be anything) a unique key part (somehow derived from value) Each Entry is at a specific index in the array, where the index is determined from the hashcode of the key Example: a map of Students, in which each key is the (unique) student ID, and each (non-unique?) value is a reference to the Student object itself Entry<E> key E value CS-2851 Dr. Mark L. Hornick

How do you generate a hashcode? In Java, all classes have a built-in hashCode() method defined in the Object class Key hashcode CS-2851 Dr. Mark L. Hornick

Classes that don’t override hashCode() inherit the Object class’s hashCode() method Which returns the memory address of the object Is this a repeatable hashcode??? No! Mem addr Object hashcode CS-2851 Dr. Mark L. Hornick

A given key should always generate the same hashcode So that the hashcode computation can be repeated at any time, and always result in the same value …and therefore, the same index Q: If keys are unique, does this guarantee the hashcode generated from the keys are also unique?? Key hashcode index CS-2851 Dr. Mark L. Hornick

Exercise Generate a hashcode from a String of characters What approach should you use?? CS-2851 Dr. Mark L. Hornick

How do you generate a hashcode? In Java, many classes override Objects hashcode() method in order to generate unique hashcodes Integer class Integer’s hashCode( ) method simply returns the underlying int value String class Look at the javadoc for String.hashCode Key hashcode CS-2851 Dr. Mark L. Hornick

Writing your own hashCode() A key should uniquely identify an object Hashcodes generated from keys should be as unique as possible to avoid collisions Depending on the hashcode algorithm, different keys can generate the same hashcode Key hashcode index CS-2851 Dr. Mark L. Hornick

How do you transform a hashcode into an array index? Assume you have an array with length=1024 An array index in the range 0…1023 can be computed as follows using modulo arithmetic: int index = hashCode(123456789)% 1024; The resulting index=933 CS-2851 Dr. Mark L. Hornick

More hashing examples (for a table 1024 in length) 123456789 indexes to 933 428671256 indexes to 500 884739816 indexes to 234 CS-2851 Dr. Mark L. Hornick

Exercise table size null 3 … xxx Anne xxx yyy yyy Susan zzz Ed zzz xxx yyy zzz 1023 3 null … xxx Anne yyy Susan zzz Ed What are the index values xxx, yyy, and zzz? CS-2851 Dr. Mark L. Hornick

Hashing can result in Collisions 123456789 indexes to 933 428671256 indexes to 500 884739816 indexes to 234 403578063 also indexes to 933 When two different keys yield the same index (even from different hashcodes), that is called a collision Keys that yield the same index are called synonyms Special handling is required CS-2851 Dr. Mark L. Hornick

Hashing is inefficient when there are a lot of collisions Ideally, we want the hashing algorithm to generate indices “sprinkled” randomly throughout the underlying table The Uniform Hashing Assumption assumes Each key is equally likely to hash to any one of the table addresses, independently of where the other keys have hashed CS-2851 Dr. Mark L. Hornick

Even if this assumption is true, collisions still occur This is due to the finite set of indices in a table The bigger the table, the less likely a collision is to occur But tables cannot be made infinitely large An infinite number of keys cannot be mapped into a finite set of indices So collision handlers have to be implemented Winter 2005 CS-2851 Dr. Mark L. Hornick