Estimating L2 Norm MIT Piotr Indyk.

1 Estimating L2 Norm MIT Piotr Indyk

2 Basic Data Stream Model
Single pass over the data i1, i2,…,in from 0…m Typically, we assume n, m are known Bounded storage (often logO(1) n) Units of storage: bits, words or „elements” (e.g., points, nodes/edges) Randomness and approximation OK (in fact, almost always necessary) Last lecture: estimating the number of distinct elements, up to 1±ε, with probability 2/3, using space O(log(n)+ 1/ε2) About 8 distinct elements

3 Generalization Vector x: 0 1 ………………………m
A stream can be viewed as a sequence of updates (i,a) xi=xi+a (initially x=0) Basic streaming model corresponds to updates (i,1) In general, a could be negative Number of distinct elements  number of non-zero coordinates in x, denoted by ||x||0 Similar algorithms as in the previous lecture work for ||x||0 as well Today: two methods for estimating ||x||2 Alon-Matias-Szegedy (AMS) Johnson-Lindenstrauss Really cute and simple Really powerful, need in future lectures

4 Why estimate L2 norm ? Database join (on A): Self-join: if Rel1=Rel2
All triples (Rel1.A, Rel1.B, Rel2.B) s.t. Rel1.A=Rel2.A Self-join: if Rel1=Rel2 Size of self-join: ∑val of A Rows(val)2 Updates to the relation increment/decrement Rows(val) Rel1 Rel2 A B Lec2 distinct elements norm Lec3 L2 …. A B Lec2 distinct elements norm Lec3 L2 …. A Rel1.B Rel2.B Lec2 distinct elements norm ….

5 Algorithm I: AMS

6 Alon-Matias-Szegedy’96
Choose r1 … rm to be i.i.d. r.v., with Pr[ri=1]=Pr[ri=-1]=1/2 Maintain Z=∑i ri xi under increments/decrements to xi Return Y=Z2 as an estimate for ||x||22 Analysis: Compute the expectation of Y Bound the variance of Y (or the second moment)

7 E[Z2] = E[∑i,j rixirjxj] = ∑i,j xi x j E[rirj]
Expectation The expectation of Z2 = (∑i ri xi )2 is equal to E[Z2] = E[∑i,j rixirjxj] = ∑i,j xi x j E[rirj] We have For i≠j, E[rirj] = E[ri] E[rj] =0 – term disappears For i=j, E[rirj] = E[ri2] = E[1] =1 Therefore E[Z2] = ∑i xi2 =||x||22 (unbiased estimator) Now we just need to bound the variance

8 Bounding the second moment
The second moment of Z2 = (∑i ri xi )2 is equal to the expectation of Z4 = (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) This can be decomposed into a sum of ∑i (ri xi )4 →expectation= ∑i xi 4 6 ∑i<j (ri rj xixj )2 →expectation= 6∑i<j xi2 xj2 Terms involving single multiplier ri xi (e.g., r1x1r2x2r3x3r4x4) →expectation=0 Total: ∑i xi 4 + 6∑i<j xi2 xj2 ≤ 3 (∑i xi 2 )2

9 Pr[ |Y’ - ∑i xi2 | ≥c (3/k)1/2 ∑i xi2 ] ≤ 1/c2
Analysis, ctd. We have an estimator Y=Z2 E[Y] = ∑i xi2 σ2 =Var[Y] ≤ 3 (∑i xi 2 )2 Chebyshev inequality : Pr[ |E[Y]-Y| ≥ cσ ] ≤ 1/c2 Algorithm AMS+: Maintain Z1 … Zk (and thus Y1 … Yk ), define Y’ = ∑i Yi /k E[Y’] = k ∑i xi2 /k = ∑i x i2 σ’2 = Var[Y’] ≤ 3k(∑i xi 2 )2 /k2 = 3 (∑i xi 2 )2 /k Guarantee: Pr[ |Y’ - ∑i xi2 | ≥c (3/k)1/2 ∑i xi2 ] ≤ 1/c2 Setting c to a constant and k=O(1/ε2) gives (1 ε)-approximation with const. probability Total space: O( log (nm) / ε2 ) bits (not counting ri’s)

10 Comments Only needed 4-wise indepence of r0…rm What we did:
Can generate such vars from O(log m) random bits (previous lecture) What we did: Maintain a “linear sketch” vector Z=[Z1...Zk] = R x Estimator for ||x||22 : (Z Zk2)/k = ||Rx||22 /k “Dimensionality reduction”: x→ Rx … but the tail somewhat “heavy” Reason: only used second moment of the estimator

11 Algorithm II: Dim. Reduction (JL)

12 Interlude: Normal Distribution
Normal distribution N(0,1): Range: (-∞, ∞) Density: f(x)=e-x^2/2 / (2π)1/2 Mean=0, Variance=1 Basic facts: If X and Y independent r.v. with normal distribution, then X+Y has normal distribution Var(cX)=c2 Var(X) If X,Y independent, then Var(X+Y)=Var(X)+Var(Y)

13 A different linear sketch
Instead of 1, let ri be i.i.d. random variables from N(0,1) Consider Z=∑i ri xi We still have that E[Z2] = ∑i xi2 =||x||22, since: E[ri] E[rj] = 0 E[ri2] = variance of ri , i.e., 1 As before we maintain Z=[Z1 … Zk ] and define Y = ||Z||22= ∑j Zj (so that E[Y]=k||x||22 ) We show that there exists C>0 s.t. for small enough ε>0 Pr[ | Y - k||x||22 |> εk||x||22] ≤ exp(-C ε2 k) Set k=O(1/ε2 log(1/δ)) to get 1±ε approx. with prob. 1-δ

14 Proof See the attached notes, by Ben Rossman and Michel Goemans

15 JL - comments Can use k-wise independence to generate ri’s, but this is much more messy than for AMS Time to compute the sketch vector Z from x is O(dk) Good if k is small, not so great if k is large Fast JL, Sparse JL to the rescue (a few lectures from now)

