Presentation is loading. Please wait.

Presentation is loading. Please wait.

Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.

Similar presentations


Presentation on theme: "Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network."— Presentation transcript:

1 Streaming Algorithms Piotr Indyk MIT

2 Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network traffic –Sensor networks –Approximate query optimization and answering in large databases –Scientific data streams –..And this talk

3 Outline Data stream model, motivations, connections Streaming problems and techniques –Estimating number of distinct elements –Other quantities: Norms, moments, heavy hitters… –Sparse approximations and compressed sensing

4 Basic Data Stream Model Single pass over the data: i 1, i 2,…,i n –Typically, we assume n is known Bounded storage (typically n  or log c n) –Units of storage: bits, words, coordinates or „elements” (e.g., points, nodes/edges) Fast processing time per element –Randomness OK (in fact, almost always necessary) 8 2 1 9 1 9 2 4 6 3 9 4 2 3 4 2 3 8 5 2 5 6...

5 Connections External memory algorithms –Linear scan works best –Streaming algorithms yield efficient external memory algorithms Communication complexity/sketching –Low-space algorithms yield low- communication protocols Metric embeddings, sub-linear time algorithms, pseudo-randomness, compressive sensing AB

6 Counting Distinct Elements Stream elements: numbers from {1...m} Goal: estimate the number of distinct elements DE in the stream –Up to 1  –With probability 1-P Simpler goal: for a given T>0, provide an algorithm which, with probability 1-P: –Answers YES, if DE> (1+  )T –Answers NO, if DE< (1-  )T Run, in parallel, the algorithm with T=1, 1+ , (1+  ) 2,..., n –Total space multiplied by log 1+  n ≈ log(n)/ 

7 Vector Interpretation Initially, x=0 Insertion of i is interpreted as x i = x i +1 Want to estimate DE(x) = the number of non-zero elements of x Stream: 8 2 1 9 1 9 2 4 4 9 4 2 5 4 2 5 8 5 2 5 Vector X: 1 2 3 4 5 6 7 8 9

8 Estimating DE(x) (based on[Flajolet-Martin’85]) Choose a random set S of coordinates –For each i, we have Pr[i  S]=1/T Maintain Sum S (x) =  i  S x i Estimation algorithm I: –YES, if Sum S (x)>0 –NO, if Sum S (x)=0 Analysis: –Pr=Pr[Sum S (x)=0] = (1-1/T) DE –For T large enough: (1-1/T) DE ≈e -DE/T –Using calculus, for  small enough: If DE> (1+  )T, then Pr ≈ e -(1+  ) < 1/e -  /3 if DE 1/e +  /3 Vector X: 1 2 3 4 5 6 7 8 9 Set S:+++ (T=4) DE Pr

9 Estimating DE(x) ctd. We have an algorithm: –If DE> (1+  )T, then Pr<1/e-  /3 –if DE 1/e+  /3 Need to amplify the probability of correctness: –Select sets S 1 … S k, k=O(log(1/P)/  2 ) –Let Z = number of Sum Sj (x) that are equal to 0 –By Chernoff bound with probability >1-P If DE> (1+  )T, then Z<k/e if DE k/e Total space: –Decision version: O(log (1/P)/  2 ) numbers in range 0…n –Estimation: : O(log(n)/  * log (1/P)/  2 ) numbers in range 0…n (the probability of error is O(P log(n)/  ) ) Better algorithms known: –Theory: O( 1/  2 +log n) bits [Kane-Nelson-Woodruff’10] –Practice: need 128 bytes for all works of Shakespeare, ε≈10% [Durand-Flajolet’03]

10 Chernoff bound Let X 1 …X n be a sequence of i.i.d. 0-1 random variables such that Pr[X i =1]=p. Let X=sum i X i Then for any eps<1 we have Pr[ |X-pn| >eps pn] <2e -eps^2 pn/3

11 Comments Implementing S: –Choose a uniform hash function h: {1..m} -> {1..T} –Define S={i: h(i)=1} –We have i  S iff h(i)=1 Implementing h –In practice: to compute h(i) do: Seed=i Return random()

12 More comments The algorithm uses “linear sketches” Sum Sj (x)=  i  Sj x i Can implement decrements x i =x i -1 –I.e., the stream can contain deletions of elements (as long as x≥0) –Other names: dynamic model, turnstile model Vector X: 1 2 3 4 5 6 7 8 9

13 What other functions of a vector x can we maintain in small space ? L p norms: ||x|| p = ( ∑ i |x i | p ) 1/p –We also have ||x||  =max i |x i | –… and we can define ||x|| 0 = DE(x), since ||x|| p p =∑ i |x i | p →DE(x) as p→0 Alternatively: frequency moments F p = p-th power of L p norms (exception: F 0 = L 0 ) How much space do you need to estimate ||x|| p (for const.  ) ? Theorem: –For p  [0,2]: polylog n space suffices –For p>2: n 1-2/p polylog n space suffices and is necessary [Alon-Matias-Szegedy’96, Feigenbaum-Kannan-Strauss-Viswanathan’99, Indyk’00, Coppersmith-Kumar’04, Ganguly’04, Bar-Yossef-Jayram- Kumar-Sivakumar’02’03, Saks-Sun’03, Indyk-Woodruff’05] More General Problem

14 Estimating L 2 norm: AMS

15 Alon-Matias-Szegedy’96 Choose r 1 … r m to be i.i.d. r.v., with Pr[r i =1]=Pr[r i =-1]=1/2 Maintain Z=∑ i r i x i under increments/decrements to x i Algorithm I: Y=Z 2 “Claim”: Y “approximates” ||x|| 2 2 with “good” probability

16 Analysis We will use Chebyshev inequality –Need expectation, variance The expectation of Z 2 = (∑ i r i x i ) 2 is equal to E[Z 2 ] = E[∑ i,j r i x i r j x j ] = ∑ i,j x i x j E[r i r j ] We have –For i≠j, E[r i r j ] = E[r i ] E[r j ] =0 – term disappears –For i=j, E[r i r j ] =1 Therefore E[Z 2 ] = ∑ i x i 2 =||x|| 2 2 (unbiased estimator)

17 Analysis, ctd. The second moment of Z 2 = (∑ i r i x i ) 2 is equal to the expectation of Z 4 = (∑ i r i x i ) (∑ i r i x i ) (∑ i r i x i ) (∑ i r i x i ) This can be decomposed into a sum of –∑ i (r i x i ) 4 →expectation= ∑ i x i 4 –6 ∑ i<j (r i r j x i x j ) 2 →expectation= 6∑ i<j x i 2 x j 2 –Terms involving single multiplier r i x i (e.g., r 1 x 1 r 2 x 2 r 3 x 3 r 4 x 4 ) →expectation=0 Total: ∑ i x i 4 + 6∑ i<j x i 2 x j 2 The variance of Z 2 is equal to E[Z 4 ]-E 2 [Z 2 ] = ∑ i x i 4 + 6∑ i<j x i 2 x j 2 – (∑ i x i 2 ) 2 = ∑ i x i 4 + 6∑ i<j x i 2 x j 2 – ∑ i x i 4 -2 ∑ i<j x i 2 x j 2 = 4∑ i<j x i 2 x j 2 ≤ 2 (∑ i x i 2 ) 2

18 Analysis, ctd. We have an estimator Y=Z 2 –E[Y] = ∑ i x i 2 –σ 2 =Var[Y] ≤ 2 (∑ i x i 2 ) 2 Chebyshev inequality: Pr[ |E[Y]-Y| ≥ cσ ] ≤ 1/c 2 Algorithm II: –Maintain Z 1 … Z k (and thus Y 1 … Y k ), define Y ’ = ∑ i Y i /k –E[Y’] = k ∑ i x i 2 /k = ∑ i x i 2 –σ’ 2 = Var[Y’] ≤ 2k(∑ i x i 2 ) 2 /k 2 = 2 (∑ i x i 2 ) 2 /k Guarantee: Pr[ |Y’ - ∑ i x i 2 | ≥c (2/k) 1/2 ∑ i x i 2 ] ≤ 1/c 2 Setting c to a constant and k=O(1/ε 2 ) gives (1  ε)- approximation with const. probability

19 Comments Only needed 4-wise independence of r 1 …r m – I.e., for any subset S of {1…m}, |S|=4, and any b  {-1,1} 4, we have Pr[r S =b]=1/2 4 –Can generate such vars from O(log m) random bits What we did: –Maintain a “linear sketch” vector Z=[Z 1...Z k ] = R x –Estimator for ||x|| 2 2 : (Z 1 2 +... + Z k 2 )/k = ||Rx|| 2 2 /k –“Dimensionality reduction”: x→ Rx … but the tail somewhat “heavy” –Reason: only used second moment of the estimator


Download ppt "Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network."

Similar presentations


Ads by Google