SETS, HASH TABLES, AND DICTIONARIES CS16: Introduction to Data Structures & Algorithms Tuesday February 10, 2015 1.

Slides:



Advertisements
Similar presentations
Hashing.
Advertisements

CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
Dictionaries and Hash Tables1  
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Hash Tables1 Part E Hash Tables  
Tirgul 9 Hash Tables (continued) Reminder Examples.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Mathematics of Cryptography Part I: Modular Arithmetic
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hashing CS 105. Hashing Slide 2 Hashing - Introduction In a dictionary, if it can be arranged such that the key is also the index to the array that stores.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
October 5, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL7.1 Prof. Charles E. Leiserson L ECTURE 8 Hashing II Universal hashing Universality.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
Hash Tables and Hash Maps. DCS – SWC 2 Hash Tables A Set and a Map are both abstract data types – we need a concrete implemen- tation in order to use.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Design & Analysis of Algorithm Hashing
Hash table CSC317 We have elements with key and satellite data
Introduction to Algorithms 6.046J/18.401J
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Copyright © Cengage Learning. All rights reserved.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Data Structures and Algorithm Analysis Hashing
Presentation transcript:

SETS, HASH TABLES, AND DICTIONARIES CS16: Introduction to Data Structures & Algorithms Tuesday February 10,

Outline 1. Set ADT 2. Dictionary ADT 3. Hash Tables 4. Example: JUMBLE Tuesday February 10,

Set A set is a collection of distinct elements (no repeats) Unlike a list or an array, a set doesn’t maintain any particular order of its elements Tuesday February 10, {,, }

Set ADT add(obj): adds an element to the set, if it is not there already. remove(obj): removes an element from the set, if it is there. boolean contains(obj): checks whether an object is in the set int size(): returns the number of elements in the set boolean isEmpty(): checks whether the set is empty list enumerate(): returns a list of all the elements in some arbitrary order Tuesday February 10,

Simple Set Implementation We could use an (expandable) array add : add to end of array O(1) contains : step through array O(n) remove : find, then compress O(n) Can we do any better? hold that thought… Tuesday February 10,

Dictionary A dictionary is used to store (key, value) pairs, where the key is used to lookup its corresponding value Also known as a map Applications: address book (name  address) …a dictionary (word  definition) Tuesday February 10,

Dictionary ADT add(key, val): adds a (key, value) pair to the dictionary V get(key): returns the value mapped to by the key remove(key): removes the key and its corresponding value from the dictionary int size(): returns the number of (key, value) pairs in the dictionary boolean isEmpty(): checks whether the dictionary is empty GOAL: Implement a dictionary so that all of these methods run in O(1) time! Tuesday February 10,

Hash Tables A hash table is an implementation of a dictionary Hash tables are built using an array h(key) is a “hash function” that takes in a key and returns an index into the array, where the key’s corresponding value will be stored However, it’s possible multiple keys will “hash” to the same index. How can we store multiple values at a single index? Let’s make the array an array of “buckets”, where each bucket is a list of the values whose keys hash to that index In fact, we’ll store the (key, value) pair itself in the bucket – not just the value. Think about why this may be Note: it is important that h(key) runs in constant time! Tuesday February 10,

Hash Tables (2) Tuesday February 10, table = array of some size h = some hash function function add(key, val): index = h(key) table[index].append(key, val) function get(key): index = h(key) for (k, v) in table[index]: if k = key: return v error(“key not found”) O(1), as long as h() is constant depends on the size of the bucket!

Hash Table Illustrated Tuesday February 10, B David Laidlaw B Leah Steinberg B Patrick Maiden B Sarah Parker B Marley Rafson B Luke Fiorante B Surbhi Madan B B B keys: Banner ID # hash function: h(key) = key % 7 array of “buckets” with (key, val) pairs:

Hash Functions In the example on the last slide, the hash table had size 7, and the hash function used was: h(key) = key % 7 If we expect ~150 students to be stored in our hash table, then we’re bound to have lots of collisions. If we’re lucky, the IDs will distribute themselves uniformly so each bucket will contain about 150 /7 students But we’d still have to look through a list of length n/7 to find the right one, which is O(n) How can we do better? Tuesday February 10,

Hash Functions (2) Solution: bigger table! We know Banner IDs have 8 digits. That means the largest possible ID is 99,999,999. Let’s make an array of size 100,000,000 and use the hash function: h(key) = key Since every ID gets its own index in the array, we’re guaranteed to have no collisions. All functions run in O(1) ! But if we only need to keep track of 150 students, then …% of our array goes to waste Besides, we might not even have enough memory for these kinds of shenanigans! Tuesday February 10,

Hash Functions (3) Solution: smaller bigger table! Since we only expect to store ~150 students, let’s only allocate the space we need Make an array of size 150, and use the hash function: h(key) = key % 150 This would be great if we were guaranteed that the IDs were randomly distributed But what if next year the registrar assigned new Banner IDs in multiples of 150? Now we’re screwed! Since we can’t count on our keys to be random, we’ll just have to make our hash function random! Tuesday February 10,

Universal Hashing Magical universal hash function: 1. Pick a prime number greater than your expected capacity: 151 This is your array size 2. Fix 4 random numbers between 0 and 151 a 1, a 2, a 3, a 4 These stay constant for the life of the hash table 3. Break keys (Banner IDs) into 4 chunks x 1, x 2, x 3, x 4 e.g. B  00, 23, 89, h(key) = (a 1 x 1 + a 2 x 2 + a 3 x 3 + a 4 x 4 ) % 151 Tuesday February 10,

Universal Hashing (2) Tuesday February 10,

Universal Hashing Proof: Background Remember fractions and their inverses? The inverse of 3/4 is 4/3, because (3/4)*(4/3) = 1 Sometimes we write it like this: (3/4) -1 = 4/3 Normally, integers don’t have (multiplicative) inverses, because you can’t multiply an integer i by anything to get 1. (Unless i = 1 … duh.) But as soon as we enter modulo world, suddenly integers can have inverses too! Take the integers mod 7 : The inverse of 2 is 4, because 2*4 = 8 ≅ 1 mod 7 The inverse of 5 is 3, because 5*3 = 15 ≅ 1 mod 7 But does every integer always have an inverse under any modulo? What about the integers mod 4 ? Does 2 have an inverse? 2*0 = 0 ≅ 0 mod 4 2*1 = 2 ≅ 2 mod 4 2*2 = 4 ≅ 0 mod 4 2*3 = 6 ≅ 2 mod 4 Turns out, an integer i (mod n) only has an inverse if i and n are relatively prime, which means the only positive integer that evenly divides both of them is 1 Then it definitely has an inverse, and that inverse is unique. Take Abstract Algebra to find out why! Woo!! So if we’re talking about the integers mod n, where n is a prime number, then every integer has an inverse—because they’re all relatively prime to n ! Oh, except for 0. Because 0 × anything is still 0. Wow, we just talked about modular stuff AND prime numbers. Sounds likes some serious foreshadowing!!!!!! Tuesday February 10, Crap! No inverse!

Universal Hashing Proof Now for the actual proof: Let n be the prime size of our array Choose any 2 distinct Banner IDs, broken into their 4 chunks: (x 1, x 2, x 3, x 4 ) and (y 1, y 2, y 3, y 4 ) Because the IDs are distinct, we know that they must differ by at least 1 chunk. Without loss of generality, we can assume that they differ by the last one. That is, x 4 ≠ y 4 Fix 4 random numbers for our hash function, h : a 1, a 2, a 3, a 4 The probability that these 2 IDs will hash to the same bucket is the probability that: h(x 1, x 2, x 3, x 4 ) = h(y 1, y 2, y 3, y 4 ) Tuesday February 10,

Universal Hashing Proof (2) Tuesday February 10, This is just some number, c subtract stuff from both sides multiply both sides by (x 4 – y 4 ) -1 Now let’s simplify that last expression:

…Therefore, the probability that 2 distinct IDs will collide is the probability that a 4 ≅ c(x 4 – y 4 ) -1 mod n Because x 4 ≠ y 4, we know that (x 4 – y 4 ) ≠ 0 And since we chose n to be prime, (x 4 – y 4 ) is guaranteed to have a unique inverse mod n. Therefore, there is only one possible value that c(x 4 – y 4 ) -1 could take, and only one value of a 4 that would satisfy this congruence. Since a 4 is randomly selected from n possible values, the probability that a 4 was chosen “right” is 1/n. Therefore, the probability that a particular ID, x, will collide with another given ID is 1/n = 1/151. This means the expected number of collisions between x and all other IDs is 149/151 ≈ 1. So the expected size of x ’s bucket is ≈ 2 Universal Hashing Proof (3) Tuesday February 10, OMG WE DID IT.

Back to Sets We can also use hashing to implement a set! There are no key-value pairs, just elements. Also called a Hash Set Tuesday February 10, function add(obj): index = h(obj) table[index].append(obj) function contains(obj): index = h(obj) for elt in table[index]: if elt == obj: return true return false

HashMap vs HashSet Tuesday February 10, Hash MapHash Set Maps keys to values There is no ordering No keys, just values. That is, it is like a HashMap where the key and value are the same. There is no ordering

Example: JUMBLE Leah is making a Jumble for the Daily Herald. There should only be one solution for a set of jumbled letters. How can she find all 5-letter words for which there is no other valid permutation? Input: list of all 5-letter words in English (each word represented as an array of 5 characters) Output: all words for which no other permutation is a word Tuesday February 10,

Example: JUMBLE Plan Naive solution: for every valid word, find ALL of its permutations, and check if each permutation is an English word. Keep track of a list for each word and return which words have only a single valid permutation. The problem with this: generating every permutation for every word is very expensive! For a 5-letter word, there are as many as 5! permutations we would have to check. The better solution: sort the letters of each valid word alphabetically. Use the sorted letter combination as the keys in the hashmap. Therefore, every two words that are permutations of each other will have the same key, so they'll be mapped to the same "value", a list of permutations of the same letter combination. We use only the valid English word to generate this, so we're never touching the tons and tons of non-valid letter combinations. Tuesday February 10,

Example: JUMBLE Solution function jumble(words): // Input: list of words // Output: list of all words for which no other // permutation is a word output = [] permutations = dictionary() for each word in words: sortedKey = sort the letters of “word” alphabetically permlist = permutations.get(sortedKey) or [] // [] if empty permlist.append(word) permutations.add(sortedKey, permlist) for each word in words: sortedKey = sort the letters of word alphabetically if permutations.get(sortedKey).length == 1: output.append(word) return output Tuesday February 10,

Readings Dasgupta section 1.5 covers universal hashing, pages Dasgupta “Randomized algorithms: a virtual chapter” on page 39 motivates algorithms like hashing. Tuesday February 10,