Programming Abstractions

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

Hashing as a Dictionary Implementation
CS 171: Introduction to Computer Science II Hashing and Priority Queues.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
COSC 2007 Data Structures II
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
CS106X – Programming Abstractions in C++ Cynthia Bailey Lee CS2 in C++ Peer Instruction Materials by Cynthia Bailey Lee is licensed under a Creative Commons.
Hash Table March COP 3502, UCF.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS106X – Programming Abstractions in C++ Cynthia Bailey Lee CS2 in C++ Peer Instruction Materials by Cynthia Bailey Lee is licensed under a Creative Commons.
Big Java Chapter 16.
CS106X – Programming Abstractions in C++ Cynthia Bailey Lee CS2 in C++ Peer Instruction Materials by Cynthia Bailey Lee is licensed under a Creative Commons.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
CSE 501N Fall ‘09 11: Data Structures: Stacks, Queues, and Maps Nick Leidenfrost October 6, 2009.
CSE 12 – Basic Data Structures Cynthia Bailey Lee Some slides and figures adapted from Paul Kube’s CSE 12 CS2 in Java Peer Instruction Materials by Cynthia.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing as a Dictionary Implementation Chapter 19.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Programming Abstractions Cynthia Lee CS106X. Topics:  Finish up heap data structure implementation › Enqueue (“bubble up”) › Dequeue (“trickle down”)
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin, and Skylight.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive.
Sets and Maps Chapter 9.
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
Programming Abstractions
Programming Abstractions
COMP 53 – Week Eleven Hashtables.
Hash table CSC317 We have elements with key and satellite data
CSCI 210 Data Structures and Algorithms
Programming Abstractions
Hashing CSE 2011 Winter July 2018.
Data Abstraction & Problem Solving with C++
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Programming Abstractions
Programming Abstractions
Programming Abstractions
Hashing Exercises.
Introduction to Hashing - Hash Functions
Efficiency add remove find unsorted array O(1) O(n) sorted array
Searching.
Hash Tables.
Hash Tables Part II: Using Buckets
Stack Lesson xx   This module shows you the basic elements of a type of linked list called a stack.
Hash table another data structure for implementing a map or a set
Hash Table.
CMSC 341 Hashing (Continued)
Chapter 28 Hashing.
Chapter 21 Hashing: Implementing Dictionaries and Sets
CSE 373: Data Structures and Algorithms
Programming Abstractions
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hashing.
Sets and Maps Chapter 9.
CSE 12 – Basic Data Structures
Hash Tables Buckets/Chaining
File Organization.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

Programming Abstractions CS106B Cynthia Lee

Topics Overview Recently: Priority Queue implementations: Linked list (sorted, unsorted) Heap Map interface implementation: Binary Search Tree (BST) Another kind of tree: Huffman coding trees (used for compression of files) Today: Hashing An alternative Map implementation Has pros and cons relative to our other Map implementation, BST

Hashing Implementing the Map interface (or Stanford HashMap class) with Hashing/Hash Tables PART 1: Intuition behind the invention of the hash table

Imagine you want to look up your neighbors’ names, based on their house number House numbers: 10565 through 90600 (roughly 1000 houses—there are varying gaps in house numbers between houses) All the houses are on the same street, so we only need to lookup by house number Names: string containing the name(s) living there We will consider two data structure options: linked list, and array of strings Image dedicated to public domain under Creative Commons license: http://commons.wikimedia.org/wiki/File:Salsbury_Row_House.jpg

Option #1: Linked list … Linked list: 10565, “Kyung Suk and Yong Han Lee” 10567, “Isaiah White” 90600, “Josie Spencer and Solange Clark” head … Linked list: Struct has 3 fields: next pointer, int house number, and string name(s) Sort them by house number? (compared sorted/unsorted) Add/remove: O(n) Find: O(n)

Option #2: Array of strings Index (house number) String value (name) “” 1 … 10565 “Yong Han and Kyung Suk Lee” 10566 10567 “Isaiah White” 90598 90599 90600 “Josie Spencer and Solange Clark” Array of strings: string* addressBook = new string[90601]; Index is house number, string is name The first part of the array will be empty since addresses start at 10565 Empty string for any number that is not currently a valid address

Array of Strings Array of strings: Array of strings: Index (house number) String value (name) “” 1 … 10565 “Yong Han and Kyung Suk Lee” 10566 10567 “Isaiah White” 90598 90599 90600 “Josie Spencer and Solange Clark” Array of strings:

Array of Strings Array of strings: Array of strings: O(1), O(1) Index (house number) String value (name) “” 1 … 10565 “Yong Han and Kyung Suk Lee” 10566 10567 “Isaiah White” 90598 90599 90600 “Josie Spencer and Solange Clark” Array of strings: Add/remove: ____ Ex.: if somebody moves into the vacant house at 90598, how long would it take to update? Find: ____ Ex.: you want to find the name of the resident at 12475, if any O(1), O(1) O(logn), O(logn) O(n), O(n) Other/none/combination

Array of Strings Array of strings: Index (house number) String value (name) “” 1 … 10565 “Yong Han and Kyung Suk Lee” 10566 10567 “Isaiah White” 90598 90599 90600 “Josie Spencer and Solange Clark” Wow, excellent performance on both!! Only way to do better than O(1) is a time machine that can go back in time and make it take zero/negative time! Everything is awesome (?) Discuss: Can you identify 1-2 specific areas of waste in this approach? Bonus: can you think of a simple fix for at least one of the areas of waste?

One quick fix: Array of strings: Index (house number) String value (name) “” 1 … 10565 “Yong Han and Kyung Suk Lee” 10566 10567 “Isaiah White” 90598 90599 90600 “Josie Spencer and Solange Clark” /* When accessing the array, use array[hash(houseNum)] * rather than array[houseNum]. The function hash is * just a way to adjust houseNum for efficiency. */ int hash(int houseNumber){ return houseNumber-10565; } This solves the problem of the enormous gap from 0 to 10565 So our array size could be ~80,000 entries instead of 90,600 Doesn’t solve the problem of gaps between houses How could we do that? A tricky problem… Also, this approach only works for keys of type int

Hashing Implementing the Map interface (Stanford HashMap class) with Hashing/Hash Tables PART 2: Getting the MAGICAL performance of our simple house numbers example on any key type, and with less waste

Hash Table is just a modified, more flexible array Keys don’t have to be integers in the range [0-(size-1)] They don’t even have to be integers at all! (Ideally) avoids big gaps like we had with house numbers array Replicates the MAGICAL performance of our array of strings on ANY key/value!! THANK YOU, HASH FUNCTION!! ♥ ♥ ♥ Not necessarily int int in range [0-(size-1)]

hash() function This is where the MAGIC happens! These are typically mathematically sophisticated functions They do their best to ensure a nice uniform distribution of elements across the available array (hash table) They use tricks like modulus (remainder) and prime numbers to do this A lot of art & science, beyond the scope of this class Fun times!

Hashing Implementing the Map interface (Stanford HashMap class) with Hashing/Hash Tables

Hash table inserts Array index Hashed data 1 2 (A) "Annie", 3 3 (B) "Annie", 3 4 (C) "Annie", 3 5 (D) "Annie", 3 6 7 8 Let’s pretend we have a profoundly not-mathematically-sophisticated hash function: int hash(string key) { return key.length(); } Where does key="Annie" value=3 go? HashMap<string,int> mymap; mymap["Annie"] = 3; See choices in table at right, or: (E) Some other place

Hash table inserts Array index Hashed data 1 2 3 4 5 "Annie", 3 6 7 8 Let’s pretend we have a profoundly not-mathematically-sophisticated hash function: int hash(string key) { return key.length(); } Where does key="Michael", value=5 go? mymap["Michael"] = 5;

Hash table inserts Array index Hashed data 1 2 3 4 5 "Annie", 3 6 7 "Michael", 5 8 Let’s pretend we have a profoundly not-mathematically-sophisticated hash function: int hash(string key) { return key.length(); } Where does key="Michael", value=5 go? mymap["Michael"] = 5;

Hash table inserts Array index Hashed data 1 2 3 4 5 (A) "Annie", 3 7 6 7 (B) "Michael", 5 "Annie", 7 8 Let’s pretend we have a profoundly not-mathematically-sophisticated hash function: int hash(string key) { return key.length(); } Now insert key="Annie", value=7 mymap["Annie"] = 7; See choices in table at right, or: (C) Index 5 should store both "Annie",3 and "Annie",7

Hash table inserts Array index Hashed data 1 2 3 4 5 (A) "Annie", 7 "Maria", 8 6 (B) "Maria", 8 7 "Michael", 5 8 (C) "Maria", 8 Let’s pretend we have a profoundly not-mathematically-sophisticated hash function: int hash(string key) { return key.length(); } Now insert key="Maria", value=8 mymap["Maria"] = 8; See choices in table at right, or: (D) Index 5 should store both "Annie",7 and "Maria",8

Uh-oh! Hash collisions We can NOT overwrite the value the way we would if it really were the same key Can you imagine how you would feel if you used Stanford library HashMap like this and it printed 8?! mymap["Annie"] = 3; mymap["Annie"] = 7; cout << mymap["Annie"] << endl; //expect 7, not 3 mymap["Maria"] = 8; cout << mymap["Annie"] << endl; //expect 7, not 8!!!

Uh-oh! Hash collisions We may need to worry about hash collisions Two keys a, b, a≠b, have the same hash code index (i.e. hash(a) == hash(b)) Need a way of storing multiple values in a given “place” in the hash table, so all user’s data is preserved

Uh-oh! Hash collisions Array index Hashed data 1 2 3 4 5 (A) "Annie", 7 "Maria", 8 6 (B) "Maria", 8 7 "Michael", 5 8 (C) "Maria", 8 There are two main strategies for resolving this: Put the item in the next bin (as in the (B) choice from our previous slide)—this is called “open addressing” Make each bin be the head of a linked list, and elements can chain off each other as long as needed—this is called “closed addressing”

Map Interface: hash-map.h … private: struct Cell { KeyType key; ValueType value; Cell* next; }; /* Instance variables */ Vector<Cell*> buckets; int nBuckets; int numEntries; int hash(const Key& key) const;

HashMap NULL … private: struct Cell { KeyType key; ValueType value; Cell* next; }; /* Instance variables */ Vector<Cell*> buckets; int nBuckets; int numEntries; int hash(const Key& key) const; // Q: Can you draw the HashMap // object in this memory diagram, // including filling in values for // all fields? 1 2 3 4 5 6 7

Hash key collisions & Big-O of HashMap If there are no collisions, find/add/remove are all O(1)—just compute the key and go! Two factors for ruining this magical land of instantaneous lookup: Too-small table (worst case = 1) Hash function doesn’t produce a good spread int awfulHashFunction(string input) { return 4; } Find/add/remove all O(n) worst case // h/t http://xkcd.com/221/