1 Hashing Starring: HashSet Co-Starring: HashMap.

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

Hashing as a Dictionary Implementation
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Hashing Techniques.
Data Structures Hash Tables
Dictionaries and Their Implementations
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Hashing General idea: Get a large array
CSE 373 Data Structures and Algorithms Lecture 18: Hashing III.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hash Functions and the HashMap Class A Brief Overview On Green Marble John W. Benning.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
1 Hashing Starring: HashSet Co-Starring: HashMap.
1 Sets and Maps Starring: keySet Co-Starring: Collections.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
111 © 2002, Cisco Systems, Inc. All rights reserved.
Comp 335 File Structures Hashing.
1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.
Chapter 18 Java Collections Framework
Data structures Abstract data types Java classes for Data structures and ADTs.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
CSC 427: Data Structures and Algorithm Analysis
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
1 CSC 427: Data Structures and Algorithm Analysis Fall 2011 Space vs. time  space/time tradeoffs  hashing  hash table, hash function  linear probing.
1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Collections Mrs. C. Furman April 21, Collection Classes ArrayList and LinkedList implements List HashSet implements Set TreeSet implements SortedSet.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Nov 22, 2010IAT 2651 Java Collections. Nov 22, 2010IAT 2652 Data Structures  With a collection of data, we often want to do many things –Organize –Iterate.
Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin, and Skylight.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Hashing By Emily Nelson. The Official Definition Using a hash function to turn some kind of data in relatively small integers or Strings The “hash code”
Appendix I Hashing.
Sets and Maps Chapter 9.
Data Abstraction & Problem Solving with C++
Dictionaries and Their Implementations
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Sets and Maps Chapter 9.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

1 Hashing Starring: HashSet Co-Starring: HashMap

2 Purpose: In this lecture we will discuss another data structure, the Hash Table. We will also learn how to use Java’s Map and Set implementations in the HashSet and HashMap classes

3 Resources: Barrons Chapter 11 p.378 – 379 & p. 383 – 385 Chapter 12 p.422 Lambert Fundamentals Comprehensive Lesson 17 p.567 C++ Notes Chapter 26 Java Essentials Study Guide Chapter 17 p.303 & Chapter 20 p.370 Java Methods Chapter 6 p.151 Litvin Be Prepared Chapter 5 p.137

4 Handouts: YOU MUST BRING YOUR BARRONS TEXT TO EACH CLASS !!! 1.Map-Key_value.java 2.Hashing --- Illustration.doc

5 Intro: We have discussed various data structures like the List implementations ArrayList and LinkedList. We have also discussed Stacks and Queues and will soon learn about Binary trees.

6 Intro: With these structures we can iterate over the entire structure and determine if a specific value is in the set.

7 As an example, we can maintain a structure of domain names and determine if a given name has already been assigned. However, we do not know anything about the user who owns it.

8 In another example, we can have a structure of dictionary words. We can determine if a given word is spelled correctly, but if we also wanted to get the meaning, pronunciation or derivation of the word these current structures would come up short.

9 These requirements lead us to utilizing a structure that is more elaborate, such as a Map. A Map allows us to associate a Key with an object. Example: Key / Index (Lot #)Ultimately Links to a HomeOwnerInfo ObjectA140 Smith, Joe120 East End Avenue value $ Property Taxes $11,000 Family Income $ 210,000 3 Children

10 Databases are based on this principle as we can perform searches on the existence of specific objects by searching against an INDEX (key) that provides a LINK to the actual data (object)

11 In this example, the Index or KEY is stored SEPARATE from the data it will to which it will ultimately point This structure allows us to maintain the physical data in a separate storage location The Index or Key provides a link to the data

12 We can have multiple / separate INDICES that work against a single set of objects For example, we can store objects that maintain information on homeowners We can keep their name, address, lot number, home value, tax base, income, number of children, etc

13 We might wish to access this information in different ways Maybe we want to search by phone number or Lot number

14 Key / Index (Phone Number)Ultimately Links to a HomeOwnerInfo Object A140 Smith, Joe120 East End Avenue value $ Property Taxes $11,000 Family Income $ 210,000 3 Children

15 Maybe we want to get information on all homes worth over $500,000 If we were to attempt to store this information in a linked list or an array we would have difficulty implementing efficient search (or sort) processes that could perform searches based on different pieces of data

16 If we were to sort this data it can only be sorted based on 1 piece of information, (Lot Number) further changes to elements require re sorting This is where a Map implementation is best used This lecture will focus on this type of implementation including Hash Tables, HashSet and HashMap

17 Hashing: A System of mapping from KEYS to integer indices in a table The goal is to Map all possible KEY values into a smaller set of indices & to cover that range uniformly

18 The hash algorithm will convert a KEY (SSN, UPC, Account Number) into a representation of a specific location to store or find that information (converts a KEY into a location in the hash table). This tells us where to look for a specific item or where to insert an item. It always returns an integer

19 The “perfect hash function” is one where it yields a 1 to 1 mapping from the index elements to the integers starting at 0 and ending at the last element in the set (array, list)

20 However, there is no known systematic process that can be used to generate a perfect hash function from an arbitrary set of values Therefore we will have to account for and resolve Collisions when several different Keys map to the same position in the Hash Table

21 Example: Using our Homeowner Database for Example, we can write our own “hashing algorithm” that converts a given Key, Lot Number for example, into an integer value that corresponds to an index in an Array or ArrayList

22 We MUST makes certain assumptions, we MUST understand our data so we can estimate its load In this example, lets assume that our universe of LOTS in Millburn is approximately 1,000

23 So, lets count on an array (to hold the Key and related HomeOwnerInfo) that can hold about 1,500 indices This will allow us to spread out our data so that we can minimize situations where our Keys “hash” to the same index on the array (a Collision)

24 Our “Hashing Algorithm” is simple, we take the numeric value of the Lot and add in the ASCII value of the letter, Given this: A140will “hash” to the integer value 205 ( ) A151  216 B140  206 C150  217

25 So, the HomeOwnerInfo along with the Key will be inserted into the array, known as our “Hash Table” as follows:

26 Index #HomeOwnerObjectInfo with a Key of: 205A B A C150

27 So if we were looking for HomeOwner Information for lot Number C150 All we need to do is “Hash” the Lot number which will result in the integer 217 We can then access the Homeowner information as follows MyHomeownerInfoArray[ hashedInteger]

28 Hash Tables: Typically a fixed sized array that contains an integer representation of a KEY

29 A well balanced Hash Table hinges upon the proper handling of two major issues: Deciding on a solid Hash Function Building an Algorithm for dealing with Collisions

30 The KEY can be SSN’s, last names, UPC Codes When we retrieve an element we need to verify that its KEY matches the target so the KEY must be explicitly stored in the table along with the rest of the record

31 Hash Functions: Converts a KEY into an integer (hashed) where the integer ranges from 0 to one less than the size of the table Properties of a good hash function: Easy and fast to compute

32 Hash Functions: Scatter the data evenly throughout the hash table (uniform) Select a data structure that has more space than actually required Develop a function to compute the hash address (value) Minimize collisions

33 For example, if our Key is a String we could slice the String into parts and add them (using their ASCII values)

34 For Example, the String containing SSN can be broken down into parts mod the first part 133 % 100 = 33 reverse the second part 56 = 65 int divide 3rd part by 100 = 78 The hashed value for is 176 ( )

35 How good a hash function this is will depend on how evenly it scatters the data over the array and how well it minimizes any collisions The result MUST be an integer that does not exceed the range of the Hash Table This method of manipulating the key is given the term “hashing”

36 Common hash functions are: Numeric / Division: MOD the KEY by an integer equal to the size of the array KEY % (#elements) Example: UPC # ArraySize 1500 Hash Value = (501) UPC % Size

37 Alpha: Hash the sum of ACSII values of its characters

38 MidSquare: Square the KEY and maintain the KEY’s middle digits for the Hashed value Works better with smaller values (less than 10,000) Example:number ^2 = becomes the hashed value

39 Folding: Divide KEY into several parts Each of which are combined to provide the hashed value

40 Example: Social Security Number : hash as sum of three integers: = 1950 The data stored in the KEY is everything you need for a given structure or record (price, item name, etc…)

41 Example: Bar Coding of items in a supermarket UPC codes allow for up to 1 billion items (10 digit code) The average store has aproximatly 10,000 items

42 If the program that scans these items had to search through all 1 billion possibilities It would be very inefficient We can store the UPC codes, specific to that store, in an array called the HASH TABLE We typically size the hash table with more elements (items) than the initial universe of elements (KEYS)

43 We could size our array at 15,000 elements The HASH Function will tell us where a specific item is stored in the 15,000 element Array

44 UPCHash Value

45 So, if we were to add in information on Products Keyed by UPC code into a hash table, we could do so as follows: MyHashTable[myProduct.getUPC( ) % 15000] = myProduct;

46 To retrieve product price for a given product you can: priceOfProduct = MyHashTable[ % 15000],getPrice( );

47 Using our HomeOwnerInfo Example: So, if we were to add in information on HomwOwners Keyed by Lot Number into a hash table, we could do so as follows: aString = myHomeOwnerInfo.getLot( ); index = // break up the string and calculate the // hash value; MyHashTable[index] = myHomeOwnerInfo; To retrieve Lot value for a given home you can: lotValue = MyHashTable[index],getValue( );

48 Collisions: Problems occur when 2 different keys MAY map to the same hash value, the same element (location) in the table This Occurs when we try to insert a new element into the table and that element is already occupied

49 Example, if we used a hashing function that combines Folding with Division: UPC Group into pairs: Multiply the first three pairs together 70 X 66 X 21 = Add this number to the last two pairs: = Find the remainder of mod division by (15000 – 3) % = 7049

50 What happens when we have an item with the bar code and we use the same hash function to code it: X 70 X 21 = = % = 7049

51 This is the same address as the previous bar code. When this event occurs, two values need to be stored in the same hash address. This is called a collision (or hash clash) One reason why our table size is and not is to help avoid collisions. The smaller the number of possible addresses the higher the probability of a collision.

52 In order for a hash table to work properly it is important that the programmer knows the number of items in the table in advance There are several ways to resolve a Collision:

53 Chaining Probing Review Example on Hash Coding in Barrons P.422 to 424

54 Load Factor: A Hash table with many collisions degrades its performance If the hash table resolves collisions via Chaining then the ratio of entries in the table to the total number of “buckets” is called the Hash Table’s Load Factor

55 The Load Factor determines how full the table may get BEFORE the Maps capacity is increased A small Load Factor means that there is significant wasted space in the Hash Table

56 A high Load Factor means that the advantages of the Hash Table are minimized Reasonable Load Factors range from 0.5 to 2.0 Java’s HashSet and HashMap take in maximum Load Factors in the constructor but have a default Load Factor of.75

57 HashSet: Remember that a Set Interface --- extends the collection interface Definition: a collection that contains NO DUPLICATES of an Object For example the input of: 1, 3, 5, 6, 7, 7, 8, 2, 9 Has a set of: 1, 3, 5, 6, 7, 8, 9 class java.util.HashSet implements java.util.Set

58 This class is implemented with a Hash table The hashSet contains an Object that can be hashed, but it holds a single object With a hashSet (unlike the hashMap), you do not select a “key” to hash by, the object is hashed based on it’s implementation of the hashCode method

59 The HashSet implements the Set behaviors: boolean add(Object x) adds element if unique otherwise leaves set unchanged boolean contains(Object x) determines if a given object is an element of the set boolean remove(Object x) removes the element from the set or leaves set unchanged

60 The HashSet implements the Set behaviors: int size( ) number of elements in the set Iterator iterator( ) allows for set traversal Object [] toArray( ); Returns elements in the set as a array

61 HashSet has a default constructor that creates an empty Hash Table with a default capacity and Load Factor You may set the initial capacity by using the overloaded constructor HashSet myHash = new HashSet(200);

62 To avoid unnecessary reallocation and rehashing of the table when it runs out of space set the initial capacity, number of buckets to be used in the table, to roughly 2 times the expected number of elements to be stored Another overloaded constructor allows to also set Load Factor limit HashSet myHash = new HashSet(200, 1.5);

63 Objects stored in the HashSet DO NOT need to implement the Comparable interface An Iterator for the HashSet produces the set’s values in NO particular order When ordering is not important HashSet is a better choice than the TreeSet (discussed in next lecture)

64 When iterating over a HashSet Do NOT modify the Set with any iterator method other than the iter.remove( ) as an error will be produced Invoking the HashSet’s add or contains method invokes the OBJECT (value) being stored’s HashCode method

65 For example, if we were storing a String as the value, the String’s HashCode is executed The String class returns a HashCode value as an int for the String Sets DO NOT allow duplicates

66 A duplicate exists when the equals method applied against two objects resolves to true Therefore, if you use a user defined class in a HashSet make sure the equals AND HashCode methods are defined (overridden from the super Object’s version) Otherwise unwanted duplicates may result

67 Review Examples on HashSet Coding in Barrons P. 379 to 381

68 NOTE: in example 2 Remember that ArrayList IS A Collection and HashSet has a constructor that takes in a Collection, therefore passing this as a constructor to HashSet will automatically remove any duplicates

69 OPEN and Review HashSet on Java Docs

70 Another Example: Lets Review the HashSet Example in the Handout

71 The add method of HashSet names.add(“Julie”); calls the hashCode of the Object being added, String in this example String has a hashCode method and resolves the “state” of the String into a hash value (integer) that is the place in the HashSet’s hash table where this object will be stored

72 In the same manner the call to the HashSet’s remove method names.remove(“Eve”); invokes the String’s hashCode to determine where in the Hash Table this object resides

73 This is why it is CRITICAL to understand that Objects used in a HashSet MUST have the equals and hashCode methods defined !!! In your own classes, you would need to have the hashCode and equals methods defined

74 HashMap: class java.util.HashMap implements java.util.Map The HashMap implements the Map behaviors:

75 Object put(Object key, Object value) Associates a Value with a Key and places this pair into the Map REPLACES a prior value if the Key already is Mapped to a value Returns the PREVIOUS Key associated value or NULL if no prior mapping exists Object get(Object key) Returns the value associated with a Key OR NULL if no map exists or the Key does map to a NULL

76 Object remove (Object key) Removes the map to this Key and returns its associated value OR returns NULL if no map existed or mapping was to NULL boolean containsKey(Object key) True if there is a key / value map otherwise false int size( ) Returns the number key / value mappings Set keySet( ) Retuns the Set of keys in the map

77 Default constructor creates an empty Map Keys (Objects) stored in the HashMap DO NOT need to implelement the Comparable interface Invoking the HashMap’s put or containsKey method invokes the OBJECT (Key) being stored’s HashCode method

78 For example, if an Integer is the Key, the Integer’s HashCode is executed The Integer class returns a HashCode value as an int for the Integer (Key) You are not required to Iterate over a HashMap

79 However, you will be expected to write code that iterates over the Set of Keys in a Map:

80 HashMap m = new HashMap( ); // add key / value pairs to the map for (Iterator I = m.keySet( ).iterator( ) ; i.hasNext( ) ; ) System.out.println( i.next( ) ); The Keys will appear in an unpredictable order If I.remove( ) is executed during this iteration over the Key Set, then the associated

81 Key / Value pair will be removed from the HashMap

82 Review Examples on HashMap Coding in Barrons P OPEN and Review HashMap on Java Docs

83 Another Example: Lets Review the HashMap Example in the Handout

84 Misc: Java’s String, Double and Integer classes have their own HashCode methods built When designing your own class for use in a HashSet or HashMap you need to override the Object’s HashCode method with a method that is appropriate for your specific class

85 The Object HashCode operates on the Objects memory location to hash and NOT on the attributes of the class Regardless of who designs it, you MUST supply a HashCode if you plan on using your objects in a HashSet or a HashMap

86 The HashCode method returns an integer from which the HashSet and HashMap further map the HashCode onto the range of valid table indices for a particular table

87 Big-O: HashSet has a Big-O of O(1) for adds removes and contains HashMap has a Big-O of O(1) for get and put but could be O(n) in worst case if many collisions occur Hash Table provides a structure where insert and search is carried out in constant time

88 AP AB Subset Requirements: Students should be able to understand: Hash tables as well as understand how to use the Java classes HashSet and HashMap Understand and be able to utilize the three HashSet constructors

89 AP AB Subset Requirements: Know the concept of hashing and how collisions are created and resolved Explain how best to construct a Hash Table to minimize collisions Understand the goal of a good hash function

90 AP AB Subset Requirements: Understand chaining, probing and load factor Determine when to use the HashSet and HashMap and know the Big-O of their behaviors Write code that creates, adds, removes and iterates over Sets using HashSet

91 AP AB Subset Requirements: Write code that creates, puts, gets, removes and returns the Set of Keys for a HashMap

92 Tips for the AP Exam: Do not change objects in a Set Sets do not contain duplicates Sets are not ordered

93 Tips for the AP Exam: Use an Iterator to list all of the elements of a Set Iterating thru a HashSet Does not iterate in any specific order You can not add an element to a set at an iterator position

94 Tips for the AP Exam: In a HashMap only the Keys are hashed HashSet and HashMaps add, remove, contains run in O(1) expected time but O(n) in worst case User Defined Classes that will be used in a HashSet or HashMap should have on overloaded Equals and HashCode methods

95 Project:MyMap POE

96 TEST FOLLOWS LABS !!!