ICOM 4035 – Data Structures Lecture 12 – Hashtables & Map ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico,

Slides:



Advertisements
Similar presentations
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Advertisements

Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
CSCE 3400 Data Structures & Algorithm Analysis
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 11 – Hash-based Indexing.
Skip List & Hashing CSE, POSTECH.
Hashing as a Dictionary Implementation
ICOM 4035 – Data Structures Lecture 8 – Doubly Linked Lists Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez.
© 2004 Goodrich, Tamassia Hash Tables1  
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing Techniques.
Maps, Dictionaries, Hashtables
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CSE 373 Data Structures and Algorithms Lecture 18: Hashing III.
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
COSC 2007 Data Structures II
ICOM 4035 – Data Structures Lecture 2 – ADT and Interfaces Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
ICOM 4035 – Data Structures Lecture 9 – Stack ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez ©Manuel.
ICOM 4035 – Data Structures Lecture 5 – List ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez ©Manuel.
ICOM 4035 – Data Structures Lecture 13 – Trees ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez ©Manuel.
ICOM 4035 – Data Structures Lecture 14 – Binary Search Tree ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico,
ICOM 4035 – Data Structures Lecture 11 – Map ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez ©Manuel.
ICOM 4035 – Data Structures Lecture 1 - Introduction Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Hash Tables1   © 2010 Goodrich, Tamassia.
ICOM 4035 – Data Structures Lecture 3 – Bag ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez ©Manuel.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
ICOM 4035 – Data Structures Lecture 7 – Linked Lists Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez.
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
ICOM 4035 – Data Structures Lecture 10 – Queue ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez ©Manuel.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CS261 Data Structures Hash Tables Open Address Hashing.
ICOM 4035 – Data Structures Lecture 4 – Set ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez ©Manuel.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Copyright © Curt Hill Hashing A quick lookup strategy.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
1 the hash table. hash table A hash table consists of two major components …
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
1 Resolving Collision Although collisions should be avoided as much as possible, they are inevitable Need a strategy for resolving collisions. We look.
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing Alexandra Stefan.
Week 8 - Wednesday CS221.
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Computer Science and Engineering
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

ICOM 4035 – Data Structures Lecture 12 – Hashtables & Map ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico, Mayagüez ©Manuel Rodriguez – All rights reserved

Lecture Organization Part I – Introduction to the hash table and its use for implementing a Map Part II – Design and implementation of a hash table using separate chaining Part III – Design and implementation of a hash table using open addressing M. Rodriguez-MartinezICOM

Objectives Introduce the concept of a hash table – Mechanism to implement the Map ADT Discuss the uses of the hash table Understand the design and implementation of a hash tables using – Separate chaining – Open addressing Provide motivating examples ICOM 4035M. Rodriguez-Martinez 3

Companion videos Lecture12 videos – Contains the coding process associated with this lecture – Shows how to build the interfaces, concrete classes, and factory classes mentioned here M. Rodriguez-MartinezICOM

Part I Introduction to the hash table and its use for implementing a Map M. Rodriguez-MartinezICOM

Map ADT Map : – collection of items that can be retrieved through a key Stores key-value pairs : (key, value) Key represents a unique identifier for each item Each item has its key – repetitions are not allowed – Similar to mathematical functions M. Rodriguez-MartinezICOM Folder with label Books with id tags Car with license plate

Map: Keys and Values Keys: – Attributes that uniquely identify values stored Values: – Data to be stored in the map Simple values Complex objects M. Rodriguez-MartinezICOM Jil 24 NY Bob 21 LA Apu 41 NY Amy 18 SF Li 19 SF Student Map

Accessing data in Map M. Rodriguez-MartinezICOM Jil 24 NY Name Age City Student records Bob 21 LA Apu 41 NY Amy 18 SF Li 19 SF Student Map Records can be fetched based on key Apu 41 NY key M.get(Apu)

Adding data to Map M. Rodriguez-MartinezICOM Jil 24 NY Name Age City Student records Bob 21 LA Apu 41 NY Amy 18 SF Li 19 SF Student Map Records can be added and marked with key Moe 32 SJ key put (Moe, {Moe, 32, SJ})

Implementing the Map Linked List implementation – Viable but very inefficient Most operations are O(n), n = map size – Same thing happens to ArrayList (think about it) Can we do better? – Answer: Hash table M. Rodriguez-MartinezICOM Jil 24 NY Bob 21 LA Li 19 SF Apu 41 NY Amy 18 SF

Key idea: Hashing Consider get operation : M.get(“Apu”) In List implementation – We must inspect the list to find it (this is O(n)) Suppose that “someone” can tell us the exact position of a value V with given key K – Element with key Apu is at position 2 With linked list, operation is O(n), n = map size But with ArrayList is O(1)! Hashing: method to map a key to position! M. Rodriguez-MartinezICOM

Hashing: Mapping key to array entries Hash function h(k) maps keys to array entries – Entries are called “buckets” – h(k) is a function from set of keys to range [0, N-1], where N is length of array (typically a big prime) M. Rodriguez-MartinezICOM Set of Keys Table (array) h(key)

Hash function: Keys & Hash Code Most frequently the key will be – Integer – String The hash function maps the key to an integer – Called the hash code Not necessarily in the range of array – From hash code, we go to position in bucket array Usually: hashCode % N, where N is array length Hash function: h(key) = hashCode(key) % N M. Rodriguez-MartinezICOM

Computing hash codes Integer i : (i >> 32) + i – Take number i, shift it 32 bits right and add i Strings s : (e.g., “Apu”) – Simple: int hashCode = 0; for (int i=0; i < s.length();++i){ hashCode += s.charAt(i); } return hashCode; – Add all characters in the string M. Rodriguez-MartinezICOM

Computing hash codes (2) String s: – Polynomial : int hashCode = 0; int k = 37; // a prime! int p = s.length() – 1; for (int i=0; i < s.length(); ++i){ hashCode += s.charAt(p-i) * Math.pow(k,i); } return hashCode; String s: – Cyclic: int hashCode = 0; for (int i=0; < s.length(); ++i){ hashCode = (hashCode > 27); hashCode += (int) s.charAt(i); } return hashCode; M. Rodriguez-MartinezICOM

Map Operations with hash table: get M. Rodriguez-MartinezICOM Jil 24 NY Name Age City Student records Bob 21 LA Apu 41 NY Amy 18 SF Li 19 SF Student Map Records can be fetched based on key Apu 41 NY key get(Apu)

Map Operations with hash table: get M. Rodriguez-MartinezICOM Apu 41 NY Jil 24 NY Apu 41 NY Amy 18 SF Bob 21 LA Li 19 SF h(“Apu”) = 2 Element is found at bucket 2 Cost is O(1)

Collisions M. Rodriguez-MartinezICOM Moe 23 SF Jil 24 NY Apu 41 NY Amy 18 SF Bob 21 LA Li 19 SF h(“Moe”) = 2 Collision happens when two different keys k1 and k2 map to the same bucket Different implementations handle collisions differently Using big prime as N helps prevent Clustering – many values hash to same bucket

Load Factor Load factor λ : – n = size of table – N = number of buckets – percentage of space already occupied When, λ = 1, collisions are for sure – As λ gets close to 1 number of collisions increases Reallocation and rehashing – From experience, when λ gets close to 70% table is expanded and all values are rehashed Prevent collisions M. Rodriguez-MartinezICOM

Complexity of operations On average, operations on hash table are O(1) Worst case behavior is O(n), but the most frequent case is O(1) As collisions increases, O(n) cost becomes more frequent Big effort is to prevent collisions M. Rodriguez-MartinezICOM

Part II Design and implementation of a hash table using separate chaining M. Rodriguez-MartinezICOM

Separate Chaining Scheme Each bucket is a linked List Collision are managed by adding stuff to list Key to the scheme: – List length should be n/N, where n = table size, N = # of buckets M. Rodriguez-MartinezICOM Jil 24 NY Apu 41 NY Amy 18 SF Bob 21 LA Li 19 SF Moe 23 SF

Complexity of Operations Each bucket has average size of n/N – Hash function must spread keys uniformly through out the table Operations involve: – Hashing key (O(1)) – Searching/inserting/deleting in bucket O(n/N) Cost on average: O(1 + n/N) – If N is big compared to n, then n/N tends to 0 and cost is O(1) M. Rodriguez-MartinezICOM

Operations in the Map size() – number of elements isEmpty() – determine if the map is empty get(key) – returns the value associated with a key put(key, value) – adds a new value to the map with a given key (overwrites old one) remove(key) – removes the value with the given key from map makeEmpty() – removes all values from map containsKey(key) – determines if there is a value with the given key getKeys() – returns a list with all keys in the map getValue() – returns a list with all values in the map M. Rodriguez-MartinezICOM

Map Operations: get M. Rodriguez-MartinezICOM Jil 24 NY Name Age City Student records Bob 21 LA Apu 41 NY Amy 18 SF Li 19 SF Student Map Records can be fetched based on key Apu 41 NY key get(Apu)

Map Operations: get(2) M.get(“Jil”) Use hash function to hash Jil bucket 1 Search for Jil in that bucket Complexity: – O(1) on average – O(n) worst case M. Rodriguez-MartinezICOM Jil 24 NY Apu 41 NY Amy 18 SF Bob 21 LA Li 19 SF

Map operations: put M. Rodriguez-MartinezICOM Jil 24 NY Name Age City Student records Bob 21 LA Apu 41 NY Amy 18 SF Li 19 SF Student Map Records can be added and marked with key Moe 32 SJ key M.put (Moe, {Moe, 32, SJ})

Map Operations: put(2) M.put(“Moe”, {Moe, 23, SF}) Use hash function to hash Moe to bucket 2 Insert the record in that bucket Complexity: – O(1) on average M. Rodriguez-MartinezICOM Jil 24 NY Moe 23 SF Amy 18 SF Bob 21 LA Li 19 SF Apu 41 NY

Map Operations: remove M. Rodriguez-MartinezICOM Jil 24 NY Name Age City Student records Bob 21 LA Apu 41 NY Amy 18 SF Li 19 SF Student Map Records can be removed based on key Apu 41 NY key remove (Apu)

Map Operations: remove(2) M.remove(“Jil”) Use hash function to hash Jil bucket 1 Search for Jil in that bucket and erase it Complexity: – O(1) on average – O(n) worst case M. Rodriguez-MartinezICOM Jil 24 NY Apu 41 NY Amy 18 SF Bob 21 LA Li 19 SF

Easy Operations containsKey(key) – return this.get(key) != null – Complexity: same as get(key) getKeys() – Create a new list and add the key of every value – Complexity: O(n), n = M.size() (see all elements) getValues() – Create a new list and add every value to it – Complexity: O(n), n = M.size() (see all elements) M. Rodriguez-MartinezICOM

Easy operations (2) size() – Return the size of list – Complexity: O(1) isEmpty() – Return size() == 0 – Complexity: O(1) makeEmpty() – Call clear() on the list – Complexity: O(n), n = M.size() M. Rodriguez-MartinezICOM

Part III Design and implementation of a hash table using open addressing M. Rodriguez-MartinezICOM

Open Addressing Scheme Each bucket stores a value and in-use Boolean flag Collisions are managed by finding another empty bucket – Called Probing Key to the scheme: – Probing should find alternative quickly – Table should be big vs. expected number of elements M. Rodriguez-MartinezICOM Jil 24 NY Apu 41 NY Amy 18 SF Bob 21 LA Li 19 SF

Probing schemes Hash function : h(key) = hashCode(key)%N = i Linear probing: – Try positions: i, (i + 1) % N, (i+2) % N, (i+3) % N, … – Tends to cluster values nearby Quadratic probing – Try positions: i, (i ) % N, (i+2 2 ) % N, (i+3 2 ) % N, … – Spreads the values around table Double hashing – Use a second hash function to break the collision H 2 (key) = q – (hashCode(key) %q), q is a prime, q < N – Spreads the values around table M. Rodriguez-MartinezICOM

Complexity of Operations Each bucket has room for 1 element – Hash function must spread keys uniformly through out the table Probing: worst case is O(n) Operations involve: – Hashing key (O(1)) – Reading/inserting/deleting in bucket Average O(1) (no collision) – use big table + do not use linear probing Worst case O(n) (collision) M. Rodriguez-MartinezICOM

Map Operations: get M. Rodriguez-MartinezICOM Jil 24 NY Name Age City Student records Bob 21 LA Apu 41 NY Amy 18 SF Li 19 SF Student Map Records can be fetched based on key Apu 41 NY key get(Apu)

Map Operations: get(2) M.get(“Jil”) Use hash function to hash Jil bucket 1 Search for Jil in the that bucket Complexity: – O(1) on average – O(n) worst case If collision forces search with probing M. Rodriguez-MartinezICOM Jil 24 NY Apu 41 NY Amy 18 SF Bob 21 LA Li 19 SF

Map operations: put M. Rodriguez-MartinezICOM Jil 24 NY Name Age City Student records Bob 21 LA Apu 41 NY Amy 18 SF Li 19 SF Student Map Records can be added and marked with key Moe 32 SJ key M.put (Moe, {Moe, 32, SJ})

Map Operations: put(2) M.put(“Jil”, {Jil, 24, NY}) Use hash function to hash Jil to bucket 1 Store value in buck and marked in use Complexity: – O(1) on average – O(n) worst case If collision happens M. Rodriguez-MartinezICOM Jil 24 NY Apu 41 NY Amy 18 SF Bob 21 LA Li 19 SF

Map Operations: put(3) M.put(“Moe”, {Moe, 23, SF}) Use hash function to hash Moe to bucket 2 If collision happens probe until empty bucket is found Complexity: – O(1) on average – O(n) worst case M. Rodriguez-MartinezICOM Moe 32 SJ Jil 24 NY Apu 41 NY Amy 18 SF Bob 21 LA Li 19 SF

Map Operations: remove M. Rodriguez-MartinezICOM Jil 24 NY Name Age City Student records Bob 21 LA Apu 41 NY Amy 18 SF Li 19 SF Student Map Records can be removed based on key Apu 41 NY key remove (Apu)

Map Operations: remove(2) M.remove(“Apu”) Use hash function to hash Apu to bucket 2 Mark bucket as not in- use Complexity: – O(1) on average – O(n) worst case If collision happens M. Rodriguez-MartinezICOM Jil 24 NY Apu 41 NY Amy 18 SF Bob 21 LA Li 19 SF

Easy Operations containsKey(key) – return this.get(key) != null – Complexity: same as get(key) getKeys() – Create a new list and add the key of every value – Complexity: O(n), n = M.size() (see all elements) getValues() – Create a new list and add every value to it – Complexity: O(n), n = M.size() (see all elements) M. Rodriguez-MartinezICOM

Easy operations (2) size() – Return the size of list – Complexity: O(1) isEmpty() – Return size() == 0 – Complexity: O(1) makeEmpty() – Make all buckets free – Complexity: O(n), n = M.size() M. Rodriguez-MartinezICOM

Summary Introduced the concept of a hash table Discussed the concepts of – Collision – Load factor – Probing Presented implementations of hash tables that handle collisions differently: – Separate chaining – Linear hashing M. Rodriguez-MartinezICOM

Questions? – ICOM 4035M. Rodriguez-Martinez 47