Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing & HashMaps CS-2851 Dr. Mark L. Hornick.

Similar presentations


Presentation on theme: "Hashing & HashMaps CS-2851 Dr. Mark L. Hornick."— Presentation transcript:

1 Hashing & HashMaps CS-2851 Dr. Mark L. Hornick

2 Let’s review the worst-case performance characteristics of previously covered data structures
ArrayList – JCF class get() add() contains() SortedArrayList (uses binary searching) LinkedList – JCF class BinaryTree CS-2851 Dr. Mark L. Hornick

3 Let’s review the worst-case performance characteristics of previously covered data structures
ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList get() add() contains() LinkedList BinaryTree CS-2851 Dr. Mark L. Hornick

4 Let’s review the worst-case performance characteristics of previously covered data structures
ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList add() – O(n+log n)->O(n); due to need to figure out first where to add; then the need to shift elements to the right contains() – O(log n); search based on the Splitting Rule (binary search) LinkedList get() add() contains() BinaryTree CS-2851 Dr. Mark L. Hornick

5 Let’s review the worst-case performance characteristics of previously covered data structures
ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList add() – O(n+log n)->O(n); due to need to figure out first where to add; then the need to shift elements to the right contains() – O(log n); search based on the Splitting Rule (binary search) LinkedList get() – O(n); sequential access add() – O(k); once where to add has been determined really O(n), because that’s how it takes to find the location to insert BinaryTree get() add() contains() CS-2851 Dr. Mark L. Hornick

6 Let’s review the worst-case performance characteristics of previously covered data structures
ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList add() – O(n+log n)->O(n); due to need to figure out first where to add; then the need to shift elements to the right contains() – O(log n); search based on the Splitting Rule (binary search) LinkedList get() – O(n); sequential access add() – O(k); once where to add has been determined really O(n), because that’s how it takes to find the location to insert BinaryTree get() – not supported due to lack of indexing (but do we always need it?) add() – O(log n); due to sorting built into the tree structure contains() – O(log n); due to sorting built into the tree structure What about memory usage?? CS-2851 Dr. Mark L. Hornick

7 Is there anything faster at everything?
CS-2851 Dr. Mark L. Hornick

8 Map definition A map is a collection in which each Entry element has two parts a unique key part a value part (which may not be unique) Each unique key “maps” to a corresponding value Example: Morse code map – each character maps to a (unique) sequence of dots and dashes Example: a map of Students, in which each key is the (unique) student ID, and each (non-unique?) value is a reference to the Student object itself Example: a phonebook, where each number (each key) maps to a person Entry key value CS-2851 Dr. Mark L. Hornick

9 What is a Key? A key is just something that uniquely identifies a particular instance of an value/object A key can be a number, a string, or an object, so long as it is unique If two values/objects have the same key, then they are (theoretically) equal Only one ID per MSOE student, so if the ID’s match, it must (by definition) be the same student If the equals() method comparing two keys returns true, then the objects are equal too, by definition CS-2851 Dr. Mark L. Hornick

10 What if an object doesn’t possess a specific unique attribute?
Scenario: pretend MSOE ID’s didn’t exist Can any of the attributes of a student, taken together, be unique? …even though any individual attribute may not exhibit this uniqueness? Exercise CS-2851 Dr. Mark L. Hornick

11 A key can be generated from a unique combination of non-unique attributes
All of an object’s attributes can be used to generate the key That is, the object itself is the key Or the key can be generated from just a subset of an object’s attributes Provided that subset is unique CS-2851 Dr. Mark L. Hornick

12 OK, so what role do keys play in making a faster data structure?
What if each unique key corresponded to a unique index within an array of Entries? Maps to key index Entry key value CS-2851 Dr. Mark L. Hornick

13 Hash definition A hash is a transformation of a key into a numeric value that maps to the index of an array (or table) This is done in two steps: generate a numeric hashcode from the key (which is not necessarily numeric) If the key is already numeric and unique (like an ID), then the key can be used as the hashcode transform the hashcode into an array index Key hashcode index Winter 2005 CS-2851 Dr. Mark L. Hornick

14 HashMap definition A HashMap<E> is an array-based collection of Entry<E> elements a value part (which could be anything) a unique key part (somehow derived from value) Each Entry is at a specific index in the array, where the index is determined from the hashcode of the key Example: a map of Students, in which each key is the (unique) student ID, and each (non-unique?) value is a reference to the Student object itself Entry<E> key E value CS-2851 Dr. Mark L. Hornick

15 How do you generate a hashcode?
In Java, all classes have a built-in hashCode() method defined in the Object class Key hashcode CS-2851 Dr. Mark L. Hornick

16 Classes that don’t override hashCode() inherit the Object class’s hashCode() method
Which returns the memory address of the object Is this a repeatable hashcode??? No! Mem addr Object hashcode CS-2851 Dr. Mark L. Hornick

17 A given key should always generate the same hashcode
So that the hashcode computation can be repeated at any time, and always result in the same value …and therefore, the same index Q: If keys are unique, does this guarantee the hashcode generated from the keys are also unique?? Key hashcode index CS-2851 Dr. Mark L. Hornick

18 Exercise Generate a hashcode from a String of characters
What approach should you use?? CS-2851 Dr. Mark L. Hornick

19 How do you generate a hashcode?
In Java, many classes override Objects hashcode() method in order to generate unique hashcodes Integer class Integer’s hashCode( ) method simply returns the underlying int value String class Look at the javadoc for String.hashCode Key hashcode CS-2851 Dr. Mark L. Hornick

20 Writing your own hashCode()
A key should uniquely identify an object Hashcodes generated from keys should be as unique as possible to avoid collisions Depending on the hashcode algorithm, different keys can generate the same hashcode Key hashcode index CS-2851 Dr. Mark L. Hornick

21 How do you transform a hashcode into an array index?
Assume you have an array with length=1024 An array index in the range 0…1023 can be computed as follows using modulo arithmetic: int index = hashCode( )% 1024; The resulting index=933 CS-2851 Dr. Mark L. Hornick

22 More hashing examples (for a table 1024 in length)
indexes to 933 indexes to 500 indexes to 234 CS-2851 Dr. Mark L. Hornick

23 Exercise table size null 3 … xxx Anne xxx yyy yyy Susan zzz Ed zzz
xxx yyy zzz 1023 3 null xxx Anne yyy Susan zzz Ed What are the index values xxx, yyy, and zzz? CS-2851 Dr. Mark L. Hornick

24 Hashing can result in Collisions
indexes to 933 indexes to 500 indexes to 234 also indexes to 933 When two different keys yield the same index (even from different hashcodes), that is called a collision Keys that yield the same index are called synonyms Special handling is required CS-2851 Dr. Mark L. Hornick

25 Hashing is inefficient when there are a lot of collisions
Ideally, we want the hashing algorithm to generate indices “sprinkled” randomly throughout the underlying table The Uniform Hashing Assumption assumes Each key is equally likely to hash to any one of the table addresses, independently of where the other keys have hashed CS-2851 Dr. Mark L. Hornick

26 Even if this assumption is true, collisions still occur
This is due to the finite set of indices in a table The bigger the table, the less likely a collision is to occur But tables cannot be made infinitely large An infinite number of keys cannot be mapped into a finite set of indices So collision handlers have to be implemented Winter 2005 CS-2851 Dr. Mark L. Hornick


Download ppt "Hashing & HashMaps CS-2851 Dr. Mark L. Hornick."

Similar presentations


Ads by Google