Programming Abstractions

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

Hashing as a Dictionary Implementation
CS 171: Introduction to Computer Science II Hashing and Priority Queues.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
COSC 2007 Data Structures II
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
CS106X – Programming Abstractions in C++ Cynthia Bailey Lee CS2 in C++ Peer Instruction Materials by Cynthia Bailey Lee is licensed under a Creative Commons.
Hash Table March COP 3502, UCF.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS106X – Programming Abstractions in C++ Cynthia Bailey Lee CS2 in C++ Peer Instruction Materials by Cynthia Bailey Lee is licensed under a Creative Commons.
Big Java Chapter 16.
CS106X – Programming Abstractions in C++ Cynthia Bailey Lee CS2 in C++ Peer Instruction Materials by Cynthia Bailey Lee is licensed under a Creative Commons.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
CSE 12 – Basic Data Structures Cynthia Bailey Lee Some slides and figures adapted from Paul Kube’s CSE 12 CS2 in Java Peer Instruction Materials by Cynthia.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Programming Abstractions Cynthia Lee CS106X. Topics:  Finish up heap data structure implementation › Enqueue (“bubble up”) › Dequeue (“trickle down”)
Programming Abstractions Cynthia Lee CS106X. Topics:  Binary Search Tree (BST) › Starting with a dream: binary search in a linked list? › How our dream.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Programming Abstractions Cynthia Lee CS106X. Topics:  Priority Queue › Linked List implementation › Heap data structure implementation  TODAY’S TOPICS.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 the hash table. hash table A hash table consists of two major components …
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin, and Skylight.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive.
Sets and Maps Chapter 9.
Programming Abstractions
Programming Abstractions
Programming Abstractions
COMP 53 – Week Eleven Hashtables.
Hash table CSC317 We have elements with key and satellite data
CSE373: Data Structures & Algorithms Lecture 6: Hash Tables
CSCI 210 Data Structures and Algorithms
Programming Abstractions
Hashing CSE 2011 Winter July 2018.
Data Abstraction & Problem Solving with C++
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Programming Abstractions
Programming Abstractions
Hashing Exercises.
Introduction to Hashing - Hash Functions
Efficiency add remove find unsorted array O(1) O(n) sorted array
Searching.
Hash Tables.
Hash Functions Sections 5.1 and 5.2
Hash Tables Part II: Using Buckets
Hash table another data structure for implementing a map or a set
Hash Table.
Chapter 21 Hashing: Implementing Dictionaries and Sets
Dictionaries Collection of pairs. Operations. (key, element)
CSE 373: Data Structures and Algorithms
Programming Abstractions
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hashing.
Sets and Maps Chapter 9.
CSE 12 – Basic Data Structures
Hash Tables Buckets/Chaining
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

Programming Abstractions CS106X Cynthia Lee

Topics Overview Recently: Priority Queue implementations: linked list heap Map interface implementation: Binary Search Tree (BST) While we’re doing trees, another kind of tree: Huffman coding trees (used for compression of files) Today: Hashing: Another Map interface implementation Background: Lookup tables Details: Hashing

Hashing Implementing the Map interface with Hashing/Hash Tables PART 1: Intuition behind the invention of the hash table

Imagine you want to look up your neighbors’ names, based on their house number House numbers: 10565 through 90600 (roughly 1000 houses—there are varying gaps in house numbers between houses) All the houses are on the same street, so we only need to lookup by house number Names: string containing the name(s) living there We will consider two data structure options: linked list, and array of strings Image dedicated to public domain under Creative Commons license: http://commons.wikimedia.org/wiki/File:Salsbury_Row_House.jpg

Option #1: Linked list … Linked list: 10565, “Kyung Suk and Yong Han Lee” 10567, “Isaiah White” 90600, “Josie Spencer and Solange Clark” head … Linked list: Struct has 3 fields: next pointer, int data (house number), and string data (name) Sort them by house number Add/remove: O(n) Find: O(n)

Option #2: Array of strings Index (house number) String value (name) “” 1 … 10565 “Yong Han and Kyung Suk Lee” 10566 10567 “Isaiah White” 90598 90599 90600 “Josie Spencer and Solange Clark” Array of strings: Index is house number, string is name Add/remove: Find:

Array of Strings Array of strings: Array of strings: Index (house number) String value (name) “” 1 … 10565 “Yong Han and Kyung Suk Lee” 10566 10567 “Isaiah White” 90598 90599 90600 “Josie Spencer and Solange Clark” Array of strings: Index is house number, string is name(s)

Array of Strings Array of strings: Array of strings: O(1), O(1) Index (house number) String value (name) “” 1 … 10565 “Yong Han and Kyung Suk Lee” 10566 10567 “Isaiah White” 90598 90599 90600 “Josie Spencer and Solange Clark” Array of strings: Index is house number, string is name(s) Add/remove: ____ Ex.: if somebody moves into the vacant house at 90598, how long would it take to update? Find: ____ Ex.: you want to find the name of the resident at 12475, if any O(1), O(1) O(logn), O(logn) O(n), O(n) Other/none/combination

Array of Strings Array of strings: Index (house number) String value (name) “” 1 … 10565 “Yong Han and Kyung Suk Lee” 10566 10567 “Isaiah White” 90598 90599 90600 “Josie Spencer and Solange Clark” Wow, excellent performance on both!! Only way to do better than O(1) is a time machine that can go back in time and make it take zero/negative time! Everything is awesome (?) Discuss: Can you identify 1-2 specific areas of waste in this approach? Bonus: can you think of a simple fix for at least one of the areas of waste?

One quick fix: Array of strings: Index (house number) String value (name) “” 1 … 10565 “Yong Han and Kyung Suk Lee” 10566 10567 “Isaiah White” 90598 90599 90600 “Josie Spencer and Solange Clark” /* When accessing the array, use array[hash(houseNum)] * rather than array[houseNum] */ int hash(int houseNumber){ return houseNumber-10565; } This solves the problem of the enormous gap from 0 to 10565 So our array size could be ~80,000 entries instead of 90,600 Doesn’t solve the problem of gaps between houses How could we do that? A tricky problem… This approach only works for keys of type int

Hashing Implementing the Map interface with Hashing/Hash Tables PART 2: Getting the MAGICAL performance of our simple house numbers example on any key type, and with less waste

Hash Table is just a modified, more flexible array Keys don’t have to be integers in the range [0-(size-1)] They don’t even have to be integers at all! (Ideally) avoids big gaps like we had with house numbers array Replicates the MAGICAL performance of our array of strings on ANY key/value!! THANK YOU, HASH FUNCTION!! ♥ ♥ ♥ Not necessarily int int in range [0-(size-1)]

hash() function This is where the MAGIC happens! These are typically mathematically sophisticated functions They do their best to ensure a nice uniform distribution of elements across the available array (hash table) They use tricks like modulus (remainder) and prime numbers to do this A lot of art & science, beyond the scope of this class Fun times!

Hashing Implementing the Map interface with Hashing/Hash Tables

Hash table inserts Array index Hashed data 1 2 (A) "Annie", 3 3 (B) "Annie", 3 4 (C) "Annie", 3 5 (D) "Annie", 3 6 7 8 Let’s pretend we have a profoundly not-mathematically-sophisticated hash function: int hash(string key) { return key.length(); } Where does key="Annie" value=3 go? HashMap<string,int> mymap; mymap["Annie"] = 3; See choices in table at right, or: (E) Some other place

Hash table inserts Array index Hashed data 1 2 3 4 5 "Annie", 3 6 7 8 Let’s pretend we have a profoundly not-mathematically-sophisticated hash function: int hash(string key) { return key.length(); } Where does key="Michael", value=5 go? mymap["Michael"] = 5;

Hash table inserts Array index Hashed data 1 2 3 4 5 "Annie", 3 6 7 "Michael", 5 8 Let’s pretend we have a profoundly not-mathematically-sophisticated hash function: int hash(string key) { return key.length(); } Where does key="Michael", value=5 go? mymap["Michael"] = 5;

Hash table inserts Array index Hashed data 1 2 3 4 5 (A) "Annie", 3 7 6 7 (B) "Michael", 5 "Annie", 7 8 Let’s pretend we have a profoundly not-mathematically-sophisticated hash function: int hash(string key) { return key.length(); } Now insert key="Annie", value=7 mymap["Annie"] = 7; See choices in table at right, or: (C) Index 5 should store both "Annie",3 and "Annie",7

Hash table inserts Array index Hashed data 1 2 3 4 5 (A) "Annie", 7 "Maria", 8 6 (B) "Maria", 8 7 "Michael", 5 8 (C) "Maria", 8 Let’s pretend we have a profoundly not-mathematically-sophisticated hash function: int hash(string key) { return key.length(); } Now insert key="Maria", value=8 mymap["Maria"] = 8; See choices in table at right, or: (D) Index 5 should store both "Annie",7 and "Maria",8

Uh-oh! Hash collisions We can NOT overwrite the value the way we would if it really were the same key Can you imagine how you would feel if you used Stanford library HashMap like this and it printed 8?! mymap["Annie"] = 3; mymap["Annie"] = 7; cout << mymap["Annie"] << endl; //expect 7, not 3 mymap["Maria"] = 8; cout << mymap["Annie"] << endl; //expect 7, not 8!!!

Uh-oh! Hash collisions We may need to worry about hash collisions Two keys a, b, a≠b, have the same hash code index (i.e. hash(a) == hash(b)) Need a way of storing multiple values in a given “place” in the hash table, so all user’s data is preserved

Uh-oh! Hash collisions Array index Hashed data 1 2 3 4 5 (A) "Annie", 7 "Maria", 8 6 (B) "Maria", 8 7 "Michael", 5 8 (C) "Maria", 8 There are two main strategies for resolving this: Put the item in the next bin (as in the (B) choice from our previous slide)—this is called “open addressing” Make each bin be the head of a linked list, and elements can chain off each other as long as needed—this is called “closed addressing”

Map Interface: hash-map.h … private: struct node { Key key; Value value; node *next; }; node **buckets; int numBuckets; int count; int hash(const Key& key) const;

HashMap NULL … private: struct node { Key key; Value value; node *next; }; node **buckets; int numBuckets; int count; int hash(const Key& key) const; // Q: Can you draw the HashMap // object in this memory diagram, // including filling in values for // all fields? 1 2 3 4 5 6 7

Hash key collisions & Big-O of HashMap If there are no collisions, find/add/remove are all O(1)—just compute the key and go! Two factors for ruining this magical land of instantaneous lookup: Too-small table (worst case = 1) Hash function doesn’t produce a good spread int awfulHashFunction(string input) { return 4; } Find/add/remove all O(n) worst case // h/t http://xkcd.com/221/

The Birthday Problem Commonly mentioned fact of probability theory: In a group of n people, what are the odds that 2 people have the same birthday? (assume birthdays are uniformly distributed across 365 days, which is wrong in a few ways) For n ≥ 23, it is more likely than not For n ≥ 57, >99% chance Moral of the story: hash tables almost certainly have at least one collision