Advanced Algorithms Piyush Kumar (Lecture 12: Online Algorithms) Welcome to COT5405.

Slides:



Advertisements
Similar presentations
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Advertisements

Hash Tables CIS 606 Spring 2010.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Theory I Algorithm Design and Analysis (5 Hashing) Prof. Th. Ottmann.
Princeton University COS 423 Theory of Algorithms Spring 2001 Kevin Wayne Competitive Analysis.
September 26, Algorithms and Data Structures Lecture VI Simonas Šaltenis Nykredit Center for Database Research Aalborg University
1 Randomization Algorithmic design patterns. n Greed. n Divide-and-conquer. n Dynamic programming. n Network flow. n Randomization. Randomization. Allow.
Hashing Techniques.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
Dictionaries and Hash Tables1  
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Online Algorithms Motivation and Definitions Paging Problem Competitive Analysis Online Load Balancing.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 9 Hash Tables (continued) Reminder Examples.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Online Paging Algorithm By: Puneet C. Jain Bhaskar C. Chawda Yashu Gupta Supervisor: Dr. Naveen Garg, Dr. Kavitha Telikepalli.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Hash Tables1   © 2010 Goodrich, Tamassia.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
CSE 5314 On-line Computation Homework 1 Wook Choi Feb/26/2004.
1 Chapter 13 Randomized Algorithms Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.
© 2004 Goodrich, Tamassia Hash Tables1  
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
A Optimal On-line Algorithm for k Servers on Trees Author : Marek Chrobak Lawrence L. Larmore 報告人:羅正偉.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
CS 3343: Analysis of Algorithms
Data Structures and Algorithm Analysis Hashing
Lecture-Hashing.
Presentation transcript:

Advanced Algorithms Piyush Kumar (Lecture 12: Online Algorithms) Welcome to COT5405

On Bounds Worst Case. Average Case: Running time over some distribution of input. (Quicksort) Amortized Analysis: Worst case bound on sequence of operations. –(Bit Increments, Union-Find) Competitive Analysis: Compare the cost of an on- line algorithm with an optimal prescient algorithm on any sequence of requests. –Today.

Problem 1 The online dating game. –You get to date fixed number of partners. –You either choose to pick them up or try your luck again. –You can not go back in time. –What strategy would you use to pick?

Problem 2. You like to Ski. When weather AND mood permits, you go skiing If you own the equipment, you take it with you, Otherwise Rent. You can buy the equipment whenever you decide, but not while skiing.

Costs 1 Unit to rent, M units to buy If you go ski I times, what is OPT? OPT = min (I,M) What algorithm should you use to decide whether you Should buy the equipment?

Algorithms Algorithm 1: Buy equipment ofter first day. Competitive algorithm –Cost ALG (σ) <= ρCost OPT (σ)+b Cost OPT (σ)=Min(I,M) = 1? Cost ALG (σ)=M ρ >= M An Algorithm is called ρ-competitive if there exists some constant b such that for every sequence of inputs σ

Algorithms Algorithm 2: Rent for (M-1) days and buy on Mth day. –L < M : Cost ALG (σ) = Cost OPT (σ) –L >= M : Cost ALG (σ) = 2M – 1 Cost OPT (σ) = M –Competitive ratio = 2 – 1/M

Ski Rental Alg 3: Rent for k days and buy on (k+1)th day. –Cost ALG (σ) = k+M –Cost OPT (σ) = min(M,k) –Competitive ratio = 2?

Problem 3: (1D) Monkey Looking for food Hidden What is the best competitive algorithm you can come up With? What is its competitive ratio?

Problem 3.(3D) Monkey looking for food. Hidden

On Line Algorithms Work without full knowledge of the future Deal with a sequence of events Future events are unknown to the algorithm The algorithm has to deal with one event at each time. The next event happens only after the algorithm is done dealing with the previous event

On-Line versus off-line We compare the behavior of the on- line algorithm to an optimal off-line algorithm “OPT” which is familiar with the sequence The off-line algorithm knows the exact properties of all the events in the sequence

We measure the performance of an on-line algorithm by the competitive ratio This is the ratio between what the on-line algorithms “pays” to what the optimal off-line algorithm “pays” Absolute competitive ratio (for minimization problems)

Formally: let be the cost of the on-line algorithm on sequence. Let be the optimal off-line cost on then the competitive ratio is: Calculus: supremum is similar to maximum but may be achieved in the limit

Problem 4: Caching K-competitive caching. Two level memory model If a page is not in the cache, a page fault occurs. A Paging algorithm specifies which page to evict on a fault. Paging algorithms are online algorithms for cache replacement.

Online Paging Algorithms Assumption: cache can hold k-pages. CPU accesses memory thru cache. Each request specifies a page in the memory system. –We want to minimize the page faults.

A Lower bound Theorem: Let A be a deterministic online paging algorithm. If A is  -competitive, then   k. Pf: Let S ={p_1,p_2, …, p_k+1} be a set of k+1 arbitrary memory pages. Assume w.l.g. that A and OPT initially have p_1, …, p_k in their cache. In the worst case A has a page fault on any request  t.

Online Algorithm and Competitive Analysis Theorem. LRU is k-competitive. Proof: Let  be a subsequence of  on which LRU faults exactly k times. Let p denote page requested just before . –Case 1: LRU faults in sequence  on p.  requests at least k+1 different pages  MIN faults at least once –Case 2: LRU faults on some page, say q, at least twice in .  requests at least k+1 different pages  MIN faults at least once LRU : Least recently used Evicts page whose most recent access was earliest

Theorem. LRU is k-competitive. Proof: Let  be a subsequence of  on which LRU faults exactly k times. Let p denote page requested just before . –Case 3: LRU does not fault on p, nor on any page more than once. k different pages are accessed and faulted on, none of which is p p is in MIN's cache at start of   MIN faults at least once 00 11 22... 11 pp :: LRU faults k times MIN faults  1 times LRU faults  k times

Universal Hashing

Dictionary Data Type Dictionary. Given a universe U of possible elements, maintain a subset S  U so that inserting, deleting, and searching in S is efficient. Dictionary interface. –Create() :Initialize a dictionary with S = . –Insert(u) :Add element u  U to S. –Delete(u) :Delete u from S, if u is currently in S. –Lookup(u) :Determine whether u is in S. Challenge. Universe U can be extremely large so defining an array of size |U| is infeasible. Applications. File systems, databases, Google, compilers, checksums P2P networks, associative arrays, cryptography, web caching, etc.

Hashing Hash function. h : U  { 0, 1, …, n-1 }. Hashing. Create an array H of size n. When processing element u, access array element H[h(u)]. Collision. When h(u) = h(v) but u  v. –A collision is expected after  (  n) random insertions. This phenomenon is known as the "birthday paradox." –Separate chaining: H[i] stores linked list of elements u with h(u) = i. jocularlyseriously browsing H[1] H[2] H[n] suburbanuntravelled H[3] considerating null

Ad Hoc Hash Function Ad hoc hash function. Deterministic hashing. If |U|  n 2, then for any fixed hash function h, there is a subset S  U of n elements that all hash to same slot. Thus,  (n) time per search in worst-case. Q. But isn't ad hoc hash function good enough in practice? int h(String s, int n) { int hash = 0; for (int i = 0; i < s.length(); i++) hash = (31 * hash) + s[i]; return hash % n; } hash function ala Java string library

Algorithmic Complexity Attacks When can't we live with ad hoc hash function? –Obvious situations: aircraft control, nuclear reactors. –Surprising situations: denial-of-service attacks. Real world exploits. [Crosby-Wallach 2003] –Bro server: send carefully chosen packets to DOS the server, using less bandwidth than a dial-up modem –Perl 5.8.0: insert carefully chosen strings into associative array. –Linux kernel: save files with carefully chosen names. malicious adversary learns your ad hoc hash function (e.g., by reading Java API) and causes a big pile-up in a single slot that grinds performance to a halt

Hashing Performance Idealistic hash function. Maps m elements uniformly at random to n hash slots. –Running time depends on length of chains. –Average length of chain =  = m / n. –Choose n  m  on average O(1) per insert, lookup, or delete. Challenge. Achieve idealized randomized guarantees, but with a hash function where you can easily find items where you put them. Approach. Use randomization in the choice of h. adversary knows the randomized algorithm you're using, but doesn't know random choices that the algorithm makes

Universal Hashing Universal class of hash functions. [Carter-Wegman 1980s] –For any pair of elements u, v  U, –Can select random h efficiently. –Can compute h(u) efficiently. Ex. U = { a, b, c, d, e, f }, n = 2. chosen uniformly at random abcdef h 1 (x) h 2 (x) H = {h 1, h 2 } Pr h  H [h(a) = h(b)] = 1/2 Pr h  H [h(a) = h(c)] = 1 Pr h  H [h(a) = h(d)] = 0... abcdef h 3 (x) h 4 (x) H = {h 1, h 2, h 3, h 4 } Pr h  H [h(a) = h(b)] = 1/2 Pr h  H [h(a) = h(c)] = 1/2 Pr h  H [h(a) = h(d)] = 1/2 Pr h  H [h(a) = h(e)] = 1/2 Pr h  H [h(a) = h(f)] = h 1 (x) h 2 (x) not universal universal

Universal Hashing Universal hashing property. Let H be a universal class of hash functions; let h  H be chosen uniformly at random from H; and let u  U. For any subset S  U of size at most n, the expected number of items in S that collide with u is at most 1. Pf. For any element s  S, define indicator random variable X s = 1 if h(s) = h(u) and 0 otherwise. Let X be a random variable counting the total number of collisions with u. linearity of expectationX s is a 0-1 random variable universal (assumes u  S)

Designing a Universal Family of Hash Functions Theorem. [Chebyshev 1850] There exists a prime between n and 2n. Modulus. Choose a prime number p  n. Integer encoding. Identify each element u  U with a base-p integer of r digits: x = (x 1, x 2, …, x r ). Hash function. Let A = set of all r-digit, base-p integers. For each a = (a 1, a 2, …, a r ) where 0  a i < p, define Hash function family. H = { h a : a  A }. no need for randomness here

Designing a Universal Class of Hash Functions Theorem. H = { h a : a  A } is a universal class of hash functions. Pf. Let x = (x 1, x 2, …, x r ) and y = (y 1, y 2, …, y r ) be two distinct elements of U. We need to show that Pr[h a (x) = h a (y)]  1/n. –Since x  y, there exists an integer j such that x j  y j. –We have h a (x) = h a (y) iff –Can assume a was chosen uniformly at random by first selecting all coordinates a i where i  j, then selecting a j at random. Thus, we can assume a i is fixed for all coordinates i  j. –Since p is prime, a j z = m mod p has at most one solution among p possibilities. –Thus Pr[h a (x) = h a (y)] = 1/p  1/n. ▪ see lemma on next slide

Number Theory Facts Fact. Let p be prime, and let z  0 mod p. Then  z = m mod p has at most one solution 0   < p. Pf. –Suppose  and  are two different solutions. –Then (  -  )z = 0 mod p; hence (  -  )z is divisible by p. –Since z  0 mod p, we know that z is not divisible by p; it follows that (  -  ) is divisible by p. –This implies  = . ▪ Bonus fact. Can replace "at most one" with "exactly one" in above fact. Pf idea. Euclid's algorithm.