Lecture 20 Hashing Amortized Analysis

Slides:



Advertisements
Similar presentations
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Advertisements

M180: Data Structures & Algorithms in Java
ArrayLists David Kauchak cs201 Spring Extendable array Arrays store data in sequential locations in memory Elements are accessed via their index.
© 2004 Goodrich, Tamassia Hash Tables1  
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch.
Data Structures Hashing Uri Zwick January 2014.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
© 2004 Goodrich, Tamassia Hash Tables1  
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Dynamic Array. An Array-Based Implementation - Summary Good things:  Fast, random access of elements  Very memory efficient, very little memory is required.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
ICS201 Lecture 21 : Sorting King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
Introduction toData structures and Algorithms
22C:21 Problem 2 (Set 1) Solution outline.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Analysis of Algorithms CS 477/677
Hashing (part 2) CSE 2011 Winter March 2018.
CSE373: Data Structures & Algorithms Priority Queues
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Data Structures I (CPCS-204)
Lecture 10 Hashing.
Hash table CSC317 We have elements with key and satellite data
Topological Sort In this topic, we will discuss: Motivations
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Cse 373 May 15th – Iterators.
October 30th – Priority QUeues
Lecture 16 Amortized Analysis
Hashing Exercises.
Lecture 8 Randomized Algorithms
Dictionaries Dictionaries 07/27/16 16:46 07/27/16 16:46 Hash Tables 
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash Functions/Review
Binary Trees.
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
How can this be simplified?
Hashing CS2110 Spring 2018.
original list {67, 33,49, 21, 25, 94} pass { } {67 94}
Randomized Algorithms CS648
Quick Sort (11.2) CSE 2011 Winter November 2018.
CO 303 Algorithm Analysis And Design Quicksort
Hashing CS2110.
Unit-2 Divide and Conquer
Sets, Maps and Hash Tables
Lectures on Graph Algorithms: searching, testing and sorting
8/04/2009 Many thanks to David Sun for some of the included slides!
Lecture No 6 Advance Analysis of Institute of Southern Punjab Multan
Searching CLRS, Sections 9.1 – 9.3.
EE 312 Software Design and Implementation I
CSE373: Data Structures & Algorithms Implementing Union-Find
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
Data Structures & Algorithms
Amortized Analysis and Heaps Intro
Richard Anderson Winter 2019 Lecture 7
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
CS 3343: Analysis of Algorithms
Lecture 12 Shortest Path.
Lecture 10 Graph Algorithms
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Lecture 21 Amortized Analysis
Presentation transcript:

Lecture 20 Hashing Amortized Analysis

QuickSelection Goal: Given an array of numbers Find the k-th smallest number. Example: a[] = {4, 2, 8, 6, 3, 1, 7, 5} k = 3 Output = 3

Recursion Consider the possible choices for the first pivot Let Xn be a random variable that represents the running time of QuickSelect on n numbers. 𝔼 𝑋𝑛 = 𝑖=1 𝑛 Pr 𝑝𝑖𝑣𝑜𝑡=𝑖 𝔼[𝑋𝑛|𝑝𝑖𝑣𝑜𝑡=𝑖] = 1 𝑛 𝑖=1 𝑘−1 𝔼 𝑋 𝑛−𝑖 + 1 𝑛 𝑖=𝑘+1 𝑛 𝔼 𝑋 𝑖−1 +𝐴𝑛 Right Part Left Part Split cost

Motivation: Set and Map Goal: An array whose index can be any object. Example: Dictionary Dictionary[“hash”] = “a dish of diced or chopped meat and often vegetables…” Properties: 1. Efficient lookup: Hope lookup is O(1) 2. Space: space is within constant factor to a list. This lecture: maintain a set of numbers from 0 to N-1. N is very large (think N = 232 or 264)

Naïve implementation of a set Method 1: Maintain a linked list. Problem: Lookup takes O(n) time. Method 2: Use a large array a[i] = 1 if i is in the set Problem: Needs huge amount of memory.

Hashing Idea: for each number, assign a random location Example: {3, 10, 3424, 643523} Store number i in a[f(i)] f(i): hash function.

Collisions Problem: want to add 123, f(123) = 4 = f(3424). (This will always happen because of pigeon hole principle) Solution: 123 and 3424 will share this location. null 10 3 3424 643523 123

Fixed Hash Function If the hash function is fixed, then it can be very slow for some bad examples. Example: We can try to find n numbers x1, x2, …, xn such that f(xi) = y for some fixed y (always possible by pigeon hole principle) Then hash table degenerates into a linked list. Solution: Use a family of random hash functions.

When do we “randomly select” the hash function? Idea 1: Choose a new hash function every time we make a query. Does not work. We may store 123 at position 4 because f(123) = 4, but after we choose a new hash function, f’(123) may not be equal to 4. Idea 2: Choose a random hash function when creating the hash table. This makes sure we can access the numbers consistently, need to consider this in analysis.

Universal Hash Function Hash function should be as “random” as possible. Ideally: Choose a random function out of all functions! However: cannot store a totally random function. Definition: A family F is called pairwise independent, if for any x ≠ y, we have Pr 𝑓∼𝐹 𝑓 𝑥 =𝑓 𝑦 = 1 𝑚 .

Amortized Analysis

“Amortized” verb (used with object), amortized, amortizing. 1. Finance. to liquidate or extinguish (a mortgage, debt, or other obligation), especially by periodic payments to the creditor or to a sinking fund. to write off a cost of (an asset) gradually. Definition from Dictionary.com

Amortized Analysis in Algorithms Scenario: Operation A is repeated many times in an algorithm. In some cases, Operation A is very fast. In some other cases, Operation A can be very slow. Idea: If the bad cases don’t happen very often, then the average cost of Operation A can still be small.

Amortized Analysis in disguise MergeSort For each iteration, steps 4-5 can take different time Worst case: O(n) per iteration  O(n2)? The total amount of time 4-5 can take is O(n). “Amortized Cost” = O(1) Merge(b[], c[]) a[] = empty i = 1 FOR j = 1 to length(c[]) WHILE b[i] < c[j] a.append(b[i]); i = i+1 a.append(c[j]); j = j+1 RETURN a[]

Amortized Analysis in disguise DFS For each vertex, the number of edges can be different. If a graph has m = 5n edges, and there is one vertex connected to n/2 other vertices. Worst case for a vertex: O(n)  O(n2)? No: the total amount of time is proportional to the number of edges. “Amortized Cost” = O(m/n + 1)

Dynamic Array problem Design a data-structure to store an array. Items can be added to the end of the array. At any time, the amount of memory should be proportional to the length of the array. Example: ArrayList in java, vector in C++ Goal: Design a data-structure such that adding an item has O(1) amortized running time.

Why naïve approach does not work 1 2 3 4 5 6 7 a.add(8) 1 2 3 4 5 6 7 8 Need to allocate a new piece of memory, copy the first 7 elements and add 8. a.add(9) 1 2 3 4 5 6 7 8 9 Need to allocate a new piece of memory, copy the first 8 elements and add 9. Running Time for n add operation = O(n2)! Amortized cost = O(n2)/n = O(n)