1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.

Slides:



Advertisements
Similar presentations
Hashing.
Advertisements

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing as a Dictionary Implementation
Hashing Part Two Better Collision Resolution Small parts of this material stolen from "File Organization and Access" by Austing and Cassel.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing CS 3358 Data Structures.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
Hash Table March COP 3502, UCF.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Final Review Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
Hash Tables1   © 2010 Goodrich, Tamassia.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 the hash table. hash table A hash table consists of two major components …
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashtables.
Hash table CSC317 We have elements with key and satellite data
Slides by Steve Armstrong LeTourneau University Longview, TX
Hashing Alexandra Stefan.
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash functions Open addressing
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Computer Science and Engineering
Hash Tables Computer Science and Engineering
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hash Tables Computer Science and Engineering
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

1 Chapter 9 Maps and Dictionaries

2 A basic problem We have to store some records and perform the following: add new record add new record delete record delete record search a record by key search a record by key Find a way to do these efficiently!

3 Map (Dictionary) A Map is a collection of pairs of the form ( k, e ) where K is a key K is a key E is the element associated with the key E is the element associated with the key Operations allowed on a map Get the element associated with a particular key Get the element associated with a particular key Insert an element with a specific key Insert an element with a specific key Delete an element with a specific key Delete an element with a specific key

4 Array as table tom mary peter david andy betty studidnamescore bill49... Consider this problem. We want to store 1000 student records and search them by their social security number.

5 Array as table : : : : andy : bill : : 81.5 : 49 : studidnamescore david56.8 : : : : betty : : : 90 : : One approach would be to store the records in an array (index ). The index is used as the student id, i.e. the record of the student with id is stored at A[12345]

6 Array as table Store the records in an array where the index corresponds to the key add - very fast O(1) add - very fast O(1) delete - very fast O(1) delete - very fast O(1) search - very fast O(1) search - very fast O(1)Problems?

7 Typical Solution Normally it is not possible to make a table large enough to hold the values associated with all possible keys A table used to lookup student names might only contain a few thousand entries A table used to lookup student names might only contain a few thousand entries This means The table length will be less than the key range The table length will be less than the key range

8Hashing Two components Table (can be accessed randomly by position) Table (can be accessed randomly by position) Hash Function  h(key) Hash Function  h(key) Maps keys into positions in the table Generally speaking, the domain of h << the range of h In other words, h maps lots of possible keys to just a (relatively) few possible outputs. Basic Idea Element with the key k is stored in position h(k) of the table Element with the key k is stored in position h(k) of the table

9 Hash function int Hash(key) Imagine that we have a magic function that we’ll call “Hash”. It maps the key (student ssnum) of the 1000 records into the integers , one to one. No two different keys maps to the same number. H(‘ ’) = 134 H(‘ ’) = 67 H(‘ ’) = 764 … H(‘ ’) = 3

10 Hash table : betty : bill : : 90 : 49 : studidnamescore andy81.5 : : david : : : 56.8 : : : : : : : To store a record, we compute Hash(ssnum) for the record and store it at the location Hash(ssnum) of the array. To search for a student, we only need to peek at the location Hash(ssnum).

11 Hash table with Perfect Hash Such magic function is called perfect hash add - very fast O(1) add - very fast O(1) delete - very fast O(1) delete - very fast O(1) search - very fast O(1) search - very fast O(1) But it is generally difficult to design perfect hash. (for example, when the potential key space is large) A hash function should try to mix the information in the key and convert it to an index within the range of the hash table (the hash address).

12 Hash function A hash function maps a key to an index within a particular range A good hash function: Provides a one-to-one mapping between table locations and keys. Provides a one-to-one mapping between table locations and keys. Is easy and quick to compute. Is easy and quick to compute. Achieves an even distribution of the keys that actually occur over the locations in the table. Achieves an even distribution of the keys that actually occur over the locations in the table. It is often difficult to come up with a good hash function. May require a mathematician or statistical analysis of the expected keys. May require a mathematician or statistical analysis of the expected keys.

13 Phone Numbers Consider using a hash function to lookup the name of a faculty member given their BSU phone number How would you do it? How would you do it?

14 Common Hash Functions Division (very common): H(key) = key % hashTable.length H(key) = key % hashTable.length Using a prime number for a modulus usually has the effect of spreading the keys quite uniformly Using a prime number for a modulus usually has the effect of spreading the keys quite uniformlyTruncation: Ignore part of the key Ignore part of the key A good example is using the last 4 digits of a SSN A good example is using the last 4 digits of a SSNFolding: Partition the key into several parts and combine the parts in some way Partition the key into several parts and combine the parts in some way

15 Collisions Generally speaking, we cannot avoid collisions Collision resolution – what do we do when two different keys map to the same index? H(‘ ’) = 134 H(‘ ’) = 67 H(‘ ’) = 764 … H(‘ ’) = 3 H(‘ ’) = 3

16 Linear Probing When a collision occurs just look for the next available slot To find an element in a table Hash to its position Hash to its position If it is not there look at successive slots until If it is not there look at successive slots until You find it You hit an empty slot You return to the original slot Removing an element is tricky

17 Linear Probing

18 Linear Probing add( 10 )

19 Linear Probing add( 23 )

20 Linear Probing COLLISION!! add( 3 )

21 Linear Probing add( 56 )

22 Linear Probing COLLISION!! add( 44 )

23 Linear Probing COLLISION!! add( 14 )

24 Linear Probing COLLISION!! add( 93 )

25 Linear Probing COLLISION!! add( 81 )

26 Linear Probing COLLISION!! add( 72 )

27 Linear Probing COLLISION!! add( 100 )

28 Analysis The major drawback of linear probing is clustering Values tend to clump up in the table Values tend to clump up in the table Thus the sequential searches required to find an element become longer and longer Thus the sequential searches required to find an element become longer and longer Possible solutions Pseudo-random probing Pseudo-random probing Quadratic probing ( h+1, h+4, h+9, …) Quadratic probing ( h+1, h+4, h+9, …) Probe at ( h + i 2 ) mod table.Length Key dependent increments Key dependent increments For example use the first digit as an increment

29 Chained Hash Table nil 5 : HASHMAX Key: name: tom score: 73 One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.

30 Chaining NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL

31 Chaining NULL NULL NULL NULL NULL NULL NULL NULL NULL add( 10 )

32 Chaining NULL NULL NULL NULL NULL NULL NULL NULL add( 23 )

33 Chaining NULL NULL NULL NULL NULL NULL NULL NULL add( 3 )

34 Chaining NULL NULL NULL NULL NULL NULL NULL add( 56 )

35 Chaining NULL NULL NULL NULL NULL NULL add( 44 )

36 Chaining NULL NULL NULL NULL NULL NULL add( 14 )

37 Chaining NULL NULL NULL NULL NULL NULL add( 93 )

38 Chaining NULL NULL NULL NULL NULL add( 81 )

39 Chaining NULL NULL NULL NULL add( 72 )

40 Chaining NULL NULL NULL NULL add( 100 )

41 Analysis Advantages of chaining Space savings if items are large Space savings if items are large Simple and efficient collision handling Simple and efficient collision handling Deleting items is very easy Deleting items is very easy Disadvantages of chaining Links take up space Links take up space As chains increases in length search time takes longer As chains increases in length search time takes longer

42 Chained Hash table Hash table, where collided records are stored in linked list good hash function, appropriate hash size good hash function, appropriate hash size Few collisions. Add, delete, search very fast O(1) otherwise … otherwise … some hash value has a long list of collided records.. add - just insert at the head fast O(1) delete a target - delete from unsorted linked list slow O(n) search - sequential search slow O(n)

43

44

45

46

47

48 hashCode() All Java objects have a method named hashCode() (defined in class Object ) By default hashCode() returns a value based on the address in memory where the object is stored By default hashCode() returns a value based on the address in memory where the object is stored General rules for implementing hashCode() When invoked more than once on the same object it must return the same value each time. When invoked more than once on the same object it must return the same value each time. If o1.equals(o2) then o1.hashCode() must be equal to o2.hashCode(). If o1.equals(o2) then o1.hashCode() must be equal to o2.hashCode(). Note that it is not required that if o1.equals(o2) is false that o1.hashCode() != o2.hashCode()

49

50

51

52