COMS E6998-9 F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.

Slides:



Advertisements
Similar presentations
Estimating Distinct Elements, Optimally
Advertisements

Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
An Optimal Algorithm for the Distinct Elements Problem
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Data Stream Algorithms Frequency Moments
Shortest Vector In A Lattice is NP-Hard to approximate
Many-to-one Trapdoor Functions and their Relations to Public-key Cryptosystems M. Bellare S. Halevi A. Saha S. Vadhan.
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Applied Algorithmics - week7
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
An Improved Data Stream Summary: The Count-Min Sketch and its Applications Graham Cormode, S. Muthukrishnan 2003.
Finding Frequent Items in Data Streams Moses CharikarPrinceton Un., Google Inc. Kevin ChenUC Berkeley, Google Inc. Martin Franch-ColtonRutgers Un., Google.
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Randomized Algorithms Tutorial 3 Hints for Homework 2.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
The Counting Class #P Slides by Vera Asodi & Tomer Naveh
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Randomized Computation Roni Parshani Orly Margalit Eran Mantzur Avi Mintz
The Goldreich-Levin Theorem: List-decoding the Hadamard code
6/20/2015List Decoding Of RS Codes 1 Barak Pinhas ECC Seminar Tel-Aviv University.
CSE331: Introduction to Networks and Security Lecture 17 Fall 2002.
Estimating Set Expression Cardinalities over Data Streams Sumit Ganguly Minos Garofalakis Rajeev Rastogi Internet Management Research Department Bell Labs,
Statistic estimation over data stream Slides modified from Minos Garofalakis ( yahoo! research) and S. Muthukrishnan (Rutgers University)
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
Functions A B f( ) =. This Lecture We will define a function formally, and then in the next lecture we will use this concept in counting. We will also.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Ariel Rosenfeld.  Input: a stream of m integers i1, i2,..., im. (over 1,…,n)  Output: the number of distinct elements in the stream.  Example – count.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Data Stream Algorithms Lower Bounds Graham Cormode
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Information Complexity Lower Bounds
Introduction to Randomized Algorithms and the Probabilistic Method
Lecture 22: Linearity Testing Sparse Fourier Transform
The Variable-Increment Counting Bloom Filter
Finding Frequent Items in Data Streams
Streaming & sampling.
Lecture 18: Uniformity Testing Monotonicity Testing
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
Sublinear Algorithmic Tools 2
Counting How Many Elements Computing “Moments”
Lecture 10: Sketching S3: Nearest Neighbor Search
CS 154, Lecture 6: Communication Complexity
Lecture 4: CountSketch High Frequencies
Lecture 7: Dynamic sampling Dimension Reduction
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Lecture 16: Earth-Mover Distance
CIS 700: “algorithms for Big Data”
עידן שני ביה"ס למדעי המחשב אוניברסיטת תל-אביב
Sublinear Algorihms for Big Data
CSCI B609: “Foundations of Data Science”
Linear sketching with parities
CSCI B609: “Foundations of Data Science”
Lecture 6: Counting triangles Dynamic graphs & sampling
Lecture 15: Least Square Regression Metric Embeddings
Switching Lemmas and Proof Complexity
Sublinear Algorihms for Big Data
Presentation transcript:

COMS E6998-9 F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image pertinent to the presentation.

Administrivia, Plan Website moved: Piazza: sign-up! Plan: sublinear.wikischolars.columbia.edu/main Piazza: sign-up! Plan: Median trick, Chernoff bound (from Tue) Distinct Elements Count Impossibility Results

Last Lecture Counting frequency Morris Algorithm: Initialize 𝑋=0 On increment, 𝑋=𝑋+1 with prob. 1/ 2 𝑋 Estimator: 2 𝑋 −1 IP Frequency 160.39.142.2 3 Morris: 𝑉𝑎𝑟=𝑂( 𝑛 2 ) Failure prob: 0.1 Morris+: Average of 𝑘=𝑂(1/ 𝜖 2 ) 𝑉𝑎𝑟=𝑂 𝑛 2 /𝑘 use Chebyshev for 1+𝜖 approx. Morris++: Median of 𝑚=𝑂 log 1 𝛿 use Chernoff Failure prob: 𝛿

“Median trick” Chernoff/Hoeffding bounds: 𝑋 1 , 𝑋 2 ,… 𝑋 𝑚 are independent r.v. in {0,1} 𝜇=𝐸 𝑖 𝑋 𝑖 𝜖∈[0,1] Pr 𝑖 𝑋 𝑖 −𝜇 >𝜖𝜇 ≤2 𝑒 − 𝜖 2 𝜇/3 Algorithm 𝐴: output ∈ correct range with 90% probability Algorithm 𝐴 ∗ output ∈ correct range with 1−𝛿 probability Median trick: Repeat 𝐴 for 𝑚=𝑂 log 1 𝛿 times Take median of the answers

Using Chernoff for Median trick Chernoff: Pr 𝑖 𝑋 𝑖 −𝜇 >𝜖𝜇 ≤2 𝑒 − 𝜖 2 𝜇/3 Define 𝑋 𝑖 = 1 iff 𝑖 𝑡ℎ copy of 𝐴 is correct 𝐸 𝑋 𝑖 =0.9 (𝐴 is correct with 90% prob.) 𝜇=0.9𝑚 New alg 𝐴 ∗ is correct when ∑ 𝑋 𝑖 >0.5𝑚 Use Chernoff to bound: Pr ∑ 𝑋 𝑖 −𝜇 >0.4𝑚 = Pr ∑ 𝑋 𝑖 −𝜇 > 0.4 0.9 𝜇 ≤ 𝑒 −𝑐⋅0.9 𝑚 <𝛿 for 𝑚=𝑂 log 1 𝛿

Problem: Distinct Elements Streaming elements from [𝑛] Approximate the number of elements with non-zero freq. Length of stream = 𝑚 Space required? 𝑂(𝑛) bits 𝑂(𝑚⋅log 𝑛) bits IP Frequency 1 3 2 4 9 5 … 𝑛

Algorithm for approximating DE Main tool: hash function ℎ: 𝑛 →[0,1] ℎ(𝑖) random in [0,1] Algorithm [Flajolet-Martin 1985] Init 𝑧=1 When see element 𝑖: 𝑧=min⁡{𝑧, ℎ(𝑖)} Estimator: 1 𝑧 −1 Where from? Will return later…

Analysis Let 𝑑 = count of dist. elm. Claim 1: E 𝑧 = 1 𝑑+1 Proof: Algorithm DE: Init: 𝑧=1 when see element 𝑖: 𝑧=min⁡{𝑧,ℎ 𝑖 } Estimator: 1 𝑧 −1 Let 𝑑 = count of dist. elm. Claim 1: E 𝑧 = 1 𝑑+1 Proof: 𝑧 = minimum of 𝑑 random numbers in [0,1] Pick another random number 𝑎∈[0,1] What’s the probability 𝑎<𝑧 ? 1) exactly 𝑧 2) probability it is smallest among 𝑑+1 reals: 1 𝑑+1 5 7 2 ℎ(5) ℎ(7) ℎ(2) 1/(𝑑+1)

Analysis 2 Need variance too… How do we get 1+𝜖 approximation though? Algorithm DE: Init: 𝑧=1 when see element 𝑖: 𝑧=min⁡{𝑧,ℎ 𝑖 } Estimator: 1 𝑧 −1 Need variance too… Can prove var 𝑧 ≤2/ 𝑑 2 How do we get 1+𝜖 approximation though? We can take 𝑧= 1 𝑘 𝑧 1 + 𝑧 2 +… 𝑧 𝑘 for independent 𝑧 1 ,… 𝑧 𝑘

Alternative: Bottom-k Algorithm DE: Init: 𝑧=1 when see element 𝑖: 𝑧=min⁡{𝑧,ℎ 𝑖 } Estimator: 1 𝑧 −1 Bottom-k alg. [BJKS’02]: Init ( 𝑧 1 , 𝑧 2 ,… 𝑧 𝑘 )=1 Keep 𝑘 smallest hashes seen 𝑧 1 ≤ 𝑧 2 ≤… 𝑧 𝑘 Estimator: 𝑑 = 𝑘 𝑧 𝑘 Proof: will prove Probability that 𝑑 > 1+𝜖 𝑑 is 0.05 Probability that 𝑑 < 1−𝜖 𝑑 is 0.05 Overall only 0.1 probability 𝑑 outside the correct range

Analysis for Bottom-k Compute: Pr 𝑑 > 1+𝜖 𝑑 Suppose we see {1…d} Algorithm Bottom-k: Init: 𝑧 1 ,… 𝑧 𝑘 =1 Keep 𝑘 smallest hashes seen using 𝑧 1 ,… 𝑧 𝑘 Estimator: 𝑑 = 𝑘 𝑧 𝑘 Compute: Pr 𝑑 > 1+𝜖 𝑑 Suppose we see {1…d} Define 𝑋 𝑖 =1 iff ℎ 𝑖 < 𝑘 1+𝜖 𝑑 Then: 𝑑 > 1+𝜖 𝑑 iff 𝑖 𝑋 𝑖 >𝑘 We have: 𝐸 𝑋 𝑖 = 𝑘 1+𝜖 𝑑 𝐸 𝑖 𝑋 𝑖 =𝑑⋅𝐸 𝑋 𝑖 = 𝑘 1+𝜖 var 𝑖 𝑋 𝑖 =𝑑⋅var 𝑋 𝑖 ≤𝑑⋅𝐸 𝑋 1 2 ≤ 𝑘 1+𝜖 ≤𝑘 By Chebyshev: Pr ∑ 𝑋 𝑖 − 𝑘 1+𝜖 > 20𝑘 ≤0.05 or: Pr ∑ 𝑋 𝑖 > 𝑘 1+𝜖 + 20𝑘 ≤0.05 requires 𝑑>𝑘 Implied by ∑ 𝑋 𝑖 >𝑘 for 𝑘=Ω(1/ 𝜖 2 )

Hash functions in Streaming We used ℎ: 𝑛 →[0,1] Issue 1: reals? Issue 2: how do we store it? Issue 1: Ok with: ℎ: 𝑛 → 0, 1 𝑀 , 2 𝑀 , 3 𝑀 ,…1 for 𝑀≫ 𝑛 3 Probability that 𝑑≤𝑛 random numbers collide: at most 1/𝑛

Issue 2: bounded randomness Pairwise independent hash functions Definition: ℎ: 𝑛 → 1,2,…𝑀 s.t. for all 𝑖≠𝑗 and 𝑎,𝑏∈[𝑀] Pr ℎ 𝑖 =𝑎∧ℎ 𝑗 =𝑏 =1/ 𝑀 2 (i.e., like random on pairs) Such hash function enough: Variance cares only about pairs! We defined 𝑋 𝑖 =1 iff ℎ 𝑖 <… And computed 𝑣𝑎𝑟 ∑ 𝑋 𝑖 =𝐸 ∑ 𝑋 𝑖 2 − 𝐸 ∑ 𝑋 𝑖 2 =𝐸 𝑋 1 𝑋 1 + 𝑋 1 𝑋 2 +… − 𝐸 ∑ 𝑋 𝑖 2 same for fully random ℎ and pairwise independent ℎ

Pairwise-Independent: example Definition: ℎ: 𝑛 → 0,1,…𝑀−1 s.t. for all 𝑖≠𝑗 and 𝑎,𝑏∈{0,1,…𝑀−1} Pr ℎ 𝑖 =𝑎∧ℎ 𝑗 =𝑏 =1/ 𝑀 2 (A) construction: Suppose 𝑀 is prime Pick 𝑝,𝑞∈{0,1,…𝑀−1} ℎ 𝑖 =𝑝𝑖+𝑞 (𝑚𝑜𝑑 𝑀) Space: only 𝑂 log 𝑀 =𝑂( log 𝑛 ) bits Proof of correctness: ℎ 𝑖 =𝑎 and ℎ 𝑗 =𝑏 : system of 2 equations in 2 unknowns (𝑝,𝑞) Exactly one pair (𝑝,𝑞) satisfies it Probability it is chosen: exactly 1/ 𝑀 2

Impossibility Results Relaxations: Approximation Randomization Need both for space ≪min⁡{𝑛,𝑚}

Deterministic Exact Won’t Work Suppose algorithm 𝐴, estimator 𝑅 uses space 𝑠≪𝑛,𝑚 We build the following stream: Let vector 𝑥∈ 0,1 𝑛 𝑖 in stream iff 𝑥 𝑖 =1 Run 𝐴 on it and let 𝜎 be memory content 1 𝑥= 1 𝑑 didn’t change ⇒ 𝑥 1 =1 1 3 5 6 7 8 9 10 𝜎 𝜎 2 𝑑 increased ⇒ 𝑥 2 =0 𝜎

Deterministic Exact Won’t Work Using 𝜎, can recover entire 𝑥 ! “𝜎= encoding of a string 𝑥 of length 𝑛” But 𝜎 has only 𝑠≪𝑛 bits! Can think 𝐴: 0,1 𝑛 → 0,1 𝑠 1 𝑑 didn’t change ⇒ 𝑥 1 =1 1 3 5 6 7 8 9 10 𝜎 𝜎 2 𝑑 increased ⇒ 𝑥 2 =0 𝜎

Deterministic Exact Won’t Work Using 𝜎, can recover entire 𝑥 “𝜎 = encoding of a string 𝑥 of length 𝑛” But 𝜎 has only 𝑠≪𝑛 bits! Can think 𝐴: 0,1 𝑛 → 0,1 𝑠 Must be injective Otherwise, suppose 𝐴 𝑥 =𝐴 𝑥 ′ =𝜎 The recovery implies 𝑥=𝑥′ Hence 𝑠≥𝑛

Deterministic Approx Won’t Too Similar: use 𝐴 to compress 𝑥 from a code Code: set 𝑇⊂ 0,1 𝑛 s.t. 𝑦∖𝑥 ≥𝑛/6 for all distinct 𝑥,𝑦∈𝑇 𝑇 ≥ 2 Ω 𝑛 Use 𝐴 to encode an input 𝑥 into 𝜎 For each 𝑦∈𝑇 check whether 𝑥=𝑦: Append 𝑦 If 𝑑 ′>1.01 𝑑 , then 𝑥≠𝑦 By injectivity of 𝐴 on 𝑇: 2 𝑠 ≥|𝑇| or 𝑠=Ω 𝑛 1 𝑥= 1 𝑦= 2 4 7 8 9 10 11 1 3 5 6 7 8 9 10 𝜎′=𝐴(𝑥+𝑦) 𝑑 ′=𝑅(𝜎′) 𝜎=𝐴(𝑥) 𝑑 =𝑅(𝜎) 𝜎=𝐴(𝑥)

Concluding Remarks Median trick + Chernoff Distinct Elements Can also store hashes ℎ(𝑖) approximately (store number of leading zeros) 𝑂(𝑙𝑜𝑔𝑙𝑜𝑔 𝑛) bit per hash value Plus other bells and whisles HyperLogLog Impossibility results Can also prove randomized, exact won’t work