Estimating Lp Norms Piotr Indyk MIT.

Slides:

Advertisements

Similar presentations

Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Advertisements

Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.

Vladimir(Vova) Braverman UCLA Joint work with Rafail Ostrovsky.

ABSTRACT We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result.

Heavy Hitters Piotr Indyk MIT. Last Few Lectures Recap (last few lectures) –Update a vector x –Maintain a linear sketch –Can compute L p norm of x (in.

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

Maximum likelihood (ML) and likelihood ratio (LR) test

Week 51 Theorem For g: R  R If X is a discrete random variable then If X is a continuous random variable Proof: We proof it for the discrete case. Let.

Maximum likelihood (ML)

Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.

Statistical Theory; Why is the Gaussian Distribution so popular? Rob Nicholls MRC LMB Statistics Course 2014.

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Estimating Entropy for Data Streams Khanh Do Ba, Dartmouth College Advisor: S. Muthu Muthukrishnan.

Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.

1 Streaming Algorithms for Geometric Problems Piotr Indyk MIT.

Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.

Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)

Embedding and Sketching Sketching for streaming Alexandr Andoni (MSR)

Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,

The Message Passing Communication Model David Woodruff IBM Almaden.

Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.

Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.

Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Algorithms for Big Data: Streaming and Sublinear Time Algorithms

Recent Developments in the Sparse Fourier Transform

Linear Algebra Review.

EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005

New Characterizations in Turnstile Streams with Applications

THE LOGIT AND PROBIT MODELS

Intro to Sampling Methods

Finding Frequent Items in Data Streams

Estimating L2 Norm MIT Piotr Indyk.

Lebesgue measure: Lebesgue measure m0 is a measure on i.e., 1. 2.

Introduction to Algorithms

Streaming & sampling.

Lecture 18: Uniformity Testing Monotonicity Testing

Sublinear Algorithmic Tools 2

COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.

Lecture 10: Sketching S3: Nearest Neighbor Search

Lecture 4: CountSketch High Frequencies

Randomized Algorithms CS648

Lecture 7: Dynamic sampling Dimension Reduction

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Chapter 7 Random-Number Generation

Y. Kotidis, S. Muthukrishnan,

Overview Massive data sets Streaming algorithms Regression

Summarizing Data by Statistics

CSCI B609: “Foundations of Data Science”

Data Structures Sorting Haim Kaplan & Uri Zwick December 2014.

Range-Efficient Computation of F0 over Massive Data Streams

Monte Carlo I Previous lecture Analytical illumination formula

Lecture 7 Sampling and Sampling Distributions

Introduction to Stream Computing and Reservoir Sampling

Dimension versus Distortion a.k.a. Euclidean Dimension Reduction

Lecture 6: Counting triangles Dynamic graphs & sampling

Further Topics on Random Variables: 1

Sublinear Algorihms for Big Data

Generating Random Variates

Presentation transcript:

Estimating Lp Norms Piotr Indyk MIT

Recap/Today Recap: two algorithms for estimating L2 norm of a stream A stream of updates (i,1) interpreted as xi=xi+1 (fractional and negative updates also OK) Algorithm maintains a linear sketch Rx, where R is a k*m random matrix Use ||Rx||22 to estimate ||x||22 Polylogarithmic space Today: Yet another algorithm for L2 estimation! Generalizes to any Lp, p(0,2]. In particular, to L1 An algorithm for Lk estimation, k≥2 Works only for positive updates Uses sampling, not sketches Space: O(k m1-1/k /ε2) for (1ε)-approximation with const. probability

L2 estimation, version 3.0

Y=median*[ |Z1|, … , |Zk| ]/ median**[|G|] Median Estimator Again we use a linear sketch Rx=[Z1…Zk], where each entry of R has distribution N(0,1), k=O(1/ε2) Therefore, each of Zi has N(0,1) distribution with variance ∑i xi 2=||x||22 Alternatively, Zi = ||x||2 Gi , where Gi drawn from N(0,1) How to estimate ||x||2 from Z1…Zk ? In Algorithms I, II, we used Y=[Z12 + … +Zk2]/k to estimate ||x||22 But there are many other estimators out there… E.g., we could instead use Y=median*[ |Z1|, … , |Zk| ]/ median**[|G|] to estimate ||x||2 (G drawn from N(0,1)) The rationale: median [ |Z1|, … , |Zk| ] = ||x||2 median [ |G1|, … , |Gk | ] For “large enough” k , median [ |G1|, … , |Gk | ] is “close to” median[|G|] (next two slides) * median of an array A of numbers is the usual number in the middle of the sorted A ** M is the median of a random variable U if Pr[U≤M]=½

Closeness in probability Lemma 1: Let U1 … Uk be i.i.d. real random variables chosen from any distribution having continuous c.d.f. F and median M I.e., F(t)=Pr[Ui <t] and F(M)=1/2 Define U=median [U1,…,Uk]. Then, for some absolute const. C>0 Pr[F(U)(1/2-ε,1/2-ε)]≥1-e-Cε2k Proof: Assume k odd (so that median well defined) Consider events Ei: F(Ui)<1/2-ε We have p=Pr[Ei]=1/2-ε F(U)<1/2-ε iff at least k/2 of these events hold By Chernoff bound, the probability that at least k/2 of the events hold is at most e-Cε2k Therefore, Pr[F(U)< 1/2-ε] is at most e-Cε2k The other case can be dealt with in an analogous manner ½ ½-ε ½+ε F

Closeness in value Lemma 2: Let F be c.d.f of a random variable |G|, G drawn from N(0,1). There exists a C’>0 s.t. if for some z we have F(z)(1/2-ε,1/2+ε) then z = median(g)  C’ ε Proof: Use calculus or Matlab. F(z) ½ z

Altogether Theorem: If we use median estimator Y=median[ |Z1|, … , |Zk|] / median[|g|] (where Zj=∑i rij xi , rij chosen i.i.d. from N(0,1) ), then we have Y = ||x||2 [ median(g)  C’ ε ] / median[|g|] = ||x||2 (1  C” ε) with probability 1-e-Cε2k How to extend this to ||x||p ?

Other norms Key property of normal distribution: If U1 … Uk indep., U normal Then x1U1 + …+xmUm is distributed as (|x1|p+…+|xm|p)1/pU , p=2 Such distributions are called “p-stable” and exist for any p(0,2] For example, for p=1, we have Cauchy distribution : Density function: f(x)=1/[(1+x2)] C.d.f.: F(z)=arctan(z)/+1/2 1-stability: x1U1 + …+xmUm is distributed as (|x1|+…+|xm|)U

Cauchy (from Wiki) The median estimator arguments go through Cauchy density functions Cauchy c.d.f.’s The median estimator arguments go through Can generate random Cauchy by choosing a random u[0,1] and computing F-1(u) L1 estimation goes through Similar story for L1/2 (Levy distribution)

p-stability for p≠1, 2 , 1/2 Basically, it is a mess Nevertheless No closed form formula for density/c.d.f. Not clear where the median is Not clear what the derivative of c.d.f. around the median is Nevertheless Can generate random variables Moments are known (more or less) Given samples of a*|g| , g p-stable, can estimate a up to 1ε [Indyk, JACM’06; Ping Li, SODA’08] (using various hacks and/or moments) For more info on p-stable distributions, see: V.V. Uchaikin, V.M. Zolotarev, Chance and Stability. Stable Distributions and their Applications. http://staff.ulsu.ru/uchaikin/uchzol.pdf

Summary Maintaining Lp norm of x under updates Issues ignored: Polylogarithmic space for p≤2 Issues ignored: Randomness Discretization (but everything can be done using O(log (m+n)) bit numbers) See [Kane-Nelson-Woodruff’10] on how to fix this (and get optimal bounds)

Lk norm, k≥2

Lk norm Algorithm for estimating Lk norm of a stream A stream of elements i1…in Each i can be interpreted as xi=xi+1 Note: this algorithms works only for these updates Space: O(m1-1/k /ε2) for (1ε)-approximation with const. probability Sampling, not sketching

Lk Norm Estimation: AMS’96 Useful notion: Fk = ∑i=1m xik = ||x||kk (frequency moment of the stream i1…in ) Algorithm A: two passes Pass 1: Pick a stream element i=ij uniformly at random Pass 2: Compute xi Return Y=n xik-1 Alternative view: Little birdy that samples i and returns xi (Sublinear-Time Algorithms class) xi

Analysis Estimator Y=n xik-1 Expectation E[Y]= ∑i xi/n * nxik-1 = ∑i xik =Fk Second moment (≥variance) E[Y2]= ∑i xi/n * n2xi2k-2 = n ∑i xi2k-1 = n F2k-1 Claim (next slide): n F2k-1 ≤ m1-1/k (Fk)2 Therefore, averaging over O(m1-1/k /ε2) samples + Chebyshev does the job (Lecture 2)

Claim Claim: n F2k-1 ≤ m1-1/k (Fk)2 Proof: n F2k-1 = n ||x||2k-12k-1 = ||x||1 ||x||k2k-1 ≤ m1-1/k ||x||k ||x||k2k-1 = m1-1/k ||x||k2k = m1-1/k Fk2

One Pass Cannot compute xi exactly Instead: Pick i=ij uniformly at random from the stream Compute r=#occurrences of i in the suffix ij…in Use r instead of xi in the estimator Clearly r≤xi ..but E[r]=(xi+1)/2, so intuitively things should work out up to a constant factor Even better idea: use estimator Y’ = n (rk – (r-1)k)

Analysis Expectation: E[Y’] = n E[(rk – (r-1)k)] = n * 1/n ∑i ∑j=1xi [jk – (j-1)k] = ∑i xik Second moment: Observe that Y’ = n (rk – (r-1)k) ≤ n k rk-1 ≤ k Y Therefore E[Y’2 ] ≤ k2 E[Y2] ≤ k2 m1-1/k Fk2 (can improve to k m1-1/k Fk2 for integer k) Altogether, it proves: One pass algorithm for Fk (positive updates) Space: O(k2m1-1/k /ε2) for (1ε)-approximation

Notes Sampling - quite general The analysis in AMS’96, as is, works only for integer k (but is easy to adapt to any k>1) The analysis* in these notes is somewhat simpler (but yields k2 m1-1/k space) m1-1/k is not optimal, can achieve m1-2/k logO(1) (mn)[Indyk-Woodruff’05, …, Braverman-Katzman-Seidell-Vorsanger’14] Sampling - quite general Empirical entropy, i.e., ∑i xi /n log(xi /n), in polylog n space [Chakrabarti-Cormode-McGregor’07] * Contributed by David Woodruff