Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.

Similar presentations


Presentation on theme: "Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper."— Presentation transcript:

1 Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper

2 The Streaming Model [AMS96] 0113734 … Stream of elements a 1, …, a q each in {1, …, m} Want to compute statistics on stream Elements arranged in adversarial order Algorithms given one pass over stream Goal: Minimum space algorithm

3 Frequency Moments Notation q = stream size, m = universe size f i = # occurrences of item i Why are frequency moments important? F 0 = # of Distinct elements F 1 = q F 2 = repeat rate k-th moment

4 Applications Estimating # distinct elts. w/ low space Estimate selectivity of queries to DB w/o expensive sort Routers gather # distinct destinations w/limited memory. Estimating F 2 estimates size of self-joins: Bobx Alicey Bobz a Aliceb Bobc, Aliceby Bobax az cx cz

5 The Best Determininistic Algorithm Trivial algorithm for F k Store/update f i for each item i, sum f i k at end Space = O(mlog q): m items i, log q bits to count f i Negative Results [AMS96]: Compute F k exactly => (m) space Any deterministic alg. outputs x with |Fk – x| < must use (m) space What about randomized algorithms?

6 Randomized Approx Algs for F k Randomized alg. -approximates F k if outputs x s.t. Pr[|Fk – x| 2/3 Can -approximate F0 [BJKST02], F2 [AMS96], Fk [CK04], k > 2 in space: (big-Oh notation suppresses polylog(1/, m, q) factors) Ideas: Hashing: O(1)-wise independence Sampling

7 Example: F 0 [BJKST02] Idea: For random function h:[m] -> [0,1] and distinct elts b 1, b 2, …, b F 0, expect min i h(b i ) ¼ 1/F 0 Algorithm: Choose 2-wise indep. hash function h: [m] -> [m 3 ] Maintain t = (1/ 2 ) distinct smallest values h(b i ) Let v be t-th smallest value Output tm 3 /v as estimate for F 0 Success prob up to 1- => take median O(log 1/ ) copies Space: O((log 1/ )/ 2 )

8 Example: F 2 [AMS99] Algorithm: Choose 4-wise indep. hash function h:[m] -> {-1,1} Maintain Z = i in [m] f i ¢ h(i) Output Y = Z 2 as estimate for F 2 Correctness: Chebyshevs inequality => O(1/ 2 ) space

9 Previous Lower Bounds: [AMS96] 8 k, –approximating F k => (log m) space [Bar-Yossef] -approximating F 0 => (1/ ) space [IW03] -approximating F 0 => space if Questions: Does the bound hold for k 0? Does it hold for F 0 for smaller ?

10 Our First Result Optimal Lower Bound: 8 k 1, any = (m -.5 ), -approximate F k => ( -2 ) bits of space. F 1 = q trivial in log q space F k trivial in O(m log q) space, so need = (m -.5 ) Technique: Reduction from 2-party protocol for computing Hamming distance (x,y) Use tools from communication complexity

11 Lower Bound Idea x 2 {0,1} m y 2 {0,1} m Stream s(x) Stream s(y) (1 § ) F k algorithm A Internal state of A Compute (1 § ) F k (s(x) ± s(y)) w.p. > 2/3 Idea: If can decide f(x,y) w.p. > 2/3, space used by A at least randomized 1-way comm. Complexity of f S AliceBob

12 Randomized 1-way comm. complexity Boolean function f: X £ Y ! {0,1} Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) Only 1 message m sent: must be from Alice to Bob Communication cost = max x,y E coins [|m|] -error randomized 1-way communication complexity R (f), is cost of optimal protocol computing f with probability ¸ 1- Ok, but how do we lower bound R (f)?

13 Shatter Coefficients [KNR] F = {f : X ! {0,1}} function family, f 2 F length-|X| bitstring For S µ X, shatter coefficient SC(f S ) of S : |{f | S } f 2 F | = # distinct bitstrings when F restricted to S SC(F, p) = max S µ X, |S| = p SC(f S ). If SC(f S ) = 2 |S|, S shattered Treat f: X £ Y ! {0,1} as function family f X : f X = { f x (y) : Y ! {0,1} | x 2 X }, where f x (y) = f(x,y) Theorem [BJKS]: For every f: X £ Y ! {0,1}, every integer p, R 1/3 (f) = (log(SC(f X, p)))

14 Warmup: (1/ ) Lower Bound [Bar-Yossef] Alice input x 2 R {0,1} m, wt(x) = m/2 Bob input y 2 R {0,1} m, wt(y) = m s(x), s(y) any streams w/char. vectors x, y PROMISE: (1) wt(x Æ y) = 0 OR (2) wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1 F 0 (s(x) ± s(y)) = m/2 + m F 0 (s(x) ± s(y)) = m/2 R 1/3 (f) = (1/ ) [Bar-Yossef] (uses shatter coeffs) (1+)m/2 < (1 -)(m/2 + m) for = ( ) Hence, can decide f ! F 0 alg. uses (1/ ) space Too easy! Can replace F 0 alg. with a Sampler!

15 Our Reduction: Hamming Distance Decision Problem (HDDP) Lower bound R 1/3 (f) via SC(f X, t), but need a lemma Set t = (1/ 2 ) x 2 {0,1} t y 2 {0,1} t AliceBob Promise Problem : (x,y) · t/2 – (t 1/2 ) (x,y) > t/2 f(x,y) = 0 OR f(x,y) = 1

16 Main Lemma S µ{0,1} n y = T = S-T 9 S µ {0,1} n with |S| = n s.t. exist 2 (n) good sets T µ S s.t. 9 y 2 {0,1} n s.t 8 t 2 T, (y, t) · n/2 – cn 1/2 for some c > 0 8 t 2 S – T, (y,t) > n/2

17 Lemma Resolves HDDP Complexity Theorem: R 1/3 (f) = (t) = ( -2 ). Proof: Alice gets y T for random good set T applying main lemma with n = t. Bob gets random s 2 S Let f: {y T } T £ S ! {0,1}. Main Lemma =>SC(f) = 2 (t) [BJKS] => R 1/3 (f) = (t) = ( -2 ) Corollary: (1/ 2 ) space for randomized 2-party protocol to approximate (x,y) between inputs First known lower bound in terms of !

18 Back to Frequency Moments Use -approximator for F k to solve HDDP y 2 {0,1} t s 2 S µ {0,1} t F k Alg State ayay asas i-th universe element included exactly once in stream a y iff y i = 1 (a s same)

19 Solving HDDP with F k Alice/Bob compute -approx to F k (a y ± a s ) F k (a y ± a s ) = 2 k wt(y Æ s) + 1 k (y,s) For k 1, Conclusion: -approximating F k (a y ± a s ) decides HDDP, so space for F k is (t) = ( -2 ) Alice also transmits wt(y) in log m space.

20 Back to the Main Lemma Recall: show 9 S µ {0,1} n with |S| = n s.t. 2 (n) good sets T µ S s.t: 9 y 2 {0,1} n s.t 1. 8 t 2 T, (y, t) · n/2 – cn 1/2 for some c > 0 2. 8 t 2 S – T, (y,t) > n/2 Probabilistic Method Choose n random elts in {0,1} n for S Show arbitrary T µ S of size n/2 is good with probability > 2 -zn for constant z < 1. Expected # good T is 2 (n) So exists S with 2 (n) good T

21 Proving the Main Lemma T ={t 1, …, t n/2 } µ S arbitrary Let y be majority codeword of T What is probability p that both: 1. 8 t 2 T, (y, t) · n/2 – cn 1/2 for some c > 0 2. 8 t 2 S – T, (y,t) > n/2 Put x = Pr[8 t 2 T, (y,t) · n/2 – cn 1/2 ] Put y = Pr[8 t 2 S-T, (y,t) > n/2] = 2 -n/2 Independence => p = xy = x2 -n/2

22 The Matrix Problem Wlog, assume y = 1 n (recall y is majority word) Want lower bound Pr[8 t 2 T, (y,t) · n/2 – cn 1/2 ] Equivalent to matrix problem: t1 -> t2 -> … t n/2 -> 101001000101111001 100101011100011110 001110111101010101 101010111011100011 For random n/2 x n binary matrix M, each column majority 1, what is probablity each row ¸ n/2 + cn 1/2 1s?

23 A First Attempt Set family A µ 2^{0,1} n monotone increasing if S 1 2 A, S 1 µ S 2 => S 2 2 A For uniform distribution on S µ {0,1} n, and A, B monotone increasing families, [Kleitman] Pr[A Å B] ¸ Pr[A] ¢ Pr[B] First try: Let R be event M ¸ n/2 + cn 1/2 1s in each row, C event M majority 1 in each column Pr[8 t 2 T, (y,t) · n/2 – cn 1/2 ] = Pr[R | C] = Pr[R Å C]/Pr[C] M characteristic vector of subset of [.5n 2 ] => R,C monotone increasing => Pr[R Å C]/Pr[C] ¸ Pr[R]Pr[C]/Pr[C] = Pr[R] < 2 -n/2 But we need > 2 -zn/2 for constant z < 1, so this fails…

24 A Second Attempt Second Try: R 1 : M ¸ n/2 + cn 1/2 1s in first m rows R 2 : M ¸ n/2 + cn 1/2 1s in remaining n/2-m rows C: M majority 1 in each column Pr[8 t 2 T, (y,t) · n/2 – cn 1/2 ] = Pr[R 1 Å R 2 | C] = Pr[R 1 Å R 2 Å C]/Pr[C] R 1, R 2, C monotone increasing => Pr[R 1 Å R 2 Å C]/Pr[C] ¸ Pr[R 1 Å C]Pr[R 2 ]/Pr[C] = Pr[R 1 | C] Pr[R 2 ] Want this at least 2 -zn/2 for z < 1 Pr[ X i > n/2 + cn 1/2 ] > ½ - c (2/pi) 1/2 [Stirling] Independence => Pr[R 2 ] > (½ - c(2/pi) 1/2 ) n/2 - m Remains to show Pr[R 1 | C] large.

25 Computing Pr[R 1 | C] Pr[R 1 | C] = Pr[M ¸ n/2 + cn 1/2 1s in 1st m rows | C] Show Pr[R 1 | C] > 2 -zm for certain constant z < 1 Ingredients: Expect to get n/2 + (n 1/2 ) 1s in each of 1 st m rows | C Use negative correlation of entries in a given row => show n/2 + (n 1/2 ) 1s in a given row w/good probability for small enough c A simple worst-case conditioning argument on these 1 st m rows shows they all have ¸ n/2 + cn 1/2 1s

26 Completing the Proof Recall: what is probability p = xy, where 1. x = Pr[ 8 t 2 T, (y, t) · n/2 – cn 1/2 ] 2.y = Pr[ 8 t 2 S – T, (y,t) > n/2] = 2 -n/2 3.R 1 : M ¸ n/2 + cn 1/2 1s in first m rows 4.R 2 : M ¸ n/2 + cn 1/2 1s in remaining n/2-m rows 5.C: M majority 1 in each column x ¸ Pr[R 1 | C] Pr[R 2 ] ¸ 2 -zm (½ - c(2/pi) 1/2 ) n/2 – m Analysis shows z small so this ¸ 2 -zn/2, z < 1 Hence p = xy ¸ 2 -(z+1)n/2 Hence expected # good sets 2 n-O(log n) p = 2 (n) So exists S with 2 (n) good T

27 Bipartite Graphs Matrix Problem Bipartite Graph Counting Problem: How many bipartite graphs exist on n/2 by n vertices s.t. each left vertex has degree > n/2 + cn 1/2 and each right vertex degree > n/2? ……

28 Our Result on # of Bipartite Graphs Bipartite graph count: Argument shows at least 2 n^2/2 – zn/2 –n such bipartite graphs for constant z < 1. Main lemma shows # bipartite graphs on n + n vertices w/each vertex degree > n/2 is > 2 n^2-zn-n Can replace > with < Previous knowncount: 2 n^2-2n [MW – personal comm.] Follows easily from Kleitman inequality

29 Summary Results: Optimal F k Lower Bound: 8 k 1 and any = (m -1/2 ), any -approximator for F k must use ( -2 ) bits of space. Communication Lower Bound of ( -2 ) for one- way communication complexity of (, )- approximating (x, y) Bipartite Graph Count: # bipartite graphs on n + n vertices w/each vertex degree > n/2 at least 2 n^2-zn-n for constant z < 1.


Download ppt "Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper."

Similar presentations


Ads by Google