Download presentation
Presentation is loading. Please wait.
Published byJesse Chase Modified over 11 years ago
1
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk
2
The Problem Stream of elements a 1, …, a n each in {1, …, m} Want F 0 = # of distinct elements Elements in adversarial order Algorithms given one pass over stream Goal: Minimum-space algorithm 0113734 …
3
A Trivial Algorithm … 0113734 Keep m-bit characteristic vector v of stream j in stream $ v j = 1 F 0 = wt(10011011) = 5 Space = m 0000000010011011 Can we do better?
4
Negative Results Any algorithm computing F 0 exactly must use (m) space [AMS96] Any deterministic alg. that outputs x with |F 0 – x| < F 0 must use (m) space [AMS96] What about randomized approximation algorithms?
5
Rand. Approx. Algorithms for F 0 O(log log m/ 2 + log m log 1/ ) alg. outputs x with Pr[| F 0 – x| ¾ [BJKST02] Lots of hashing tricks Is this optimal? Previous lower bounds (log m) [AMS96] (1/ ) [Bar-Yossef] Open Problem of [BJKST02]: GAP: 1/ << 1/ 2
6
Idea Behind Lower Bounds x 2 {0,1} m y 2 {0,1} m Stream s(x) Stream s(y) (1 § ) F 0 algorithm A Internal state of A Compute (1 § ) F 0 (s(x) ± s(y)) w.p. > ¾ Idea: If can decide f(x,y) w.p. > ¾, space used by A at least fs rand. 1-way comm. complexity S AliceBob
7
Randomized 1-way comm. complexity Boolean function f: X £ Y ! {0,1} Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) Only 1 message sent: must be from Alice to Bob Comm. cost of protocol = expected length of longest message sent over all inputs. -error randomized 1-way comm. complexity of f, R (f), is comm. cost of optimal protocol computing f w.p. ¸ 1- How do we lower bound R (f)?
8
The VC Dimension [KNR] F = {f : X ! {0,1}} family of Boolean functions f 2 F is length-|X | bit string For S µ X, shatter coefficient SC(f S ) of S is |{f | S } f 2 F | = # distinct bit strings when F restricted to S SC(F, p) = max S 2 X, |S| = p SC(f S ) If SC(f S ) = 2 |S|, S shattered by F VC Dimension of F, VCD(F), = size of largest S shattered by F
9
Shatter Coefficient Theorem Notation: For f: X £ Y ! {0,1}, define: f X = { f x (y) : Y ! {0,1} | x 2 X }, where f x (y) = f(x,y) Theorem [BJKS]: For every f: X £ Y ! {0,1}, every p ¸ VCD( f X ), R 1/4 (f) = (log(SC(f X, p)))
10
The (1/ ) Lower Bound [Bar-Yossef] Alice has x 2 R {0,1} m, wt(x) = m/2 Bob has y 2 R {0,1} m, wt(y) = m and: Either wt(x Æ y) = 0 OR wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1 R 1/4 (f) = (VCD(f X )) = (1/ ) [Bar-Yossef] s(x), s(y) any streams w/char. vectors x, y f(x,y) = 1 ! F 0 (s(x) ± s(y)) = m/2 f(x,y) = 0 ! F 0 (s(x) ± s(y)) = m/2 + m (1+)m/2 < (1 -)(m/2 + m) for = ( ) Hence, can decide f ! F 0 alg. uses (1/ ) space
11
Our Results Remainder of talk: (1/ 2 ) lower bound for = (m -1/(9+k) ) for any k > 0. ! O(log log m/ 2 + log m log 1/ ) upper bound almost optimal IDEA: Reduce from protocol for computing dot product
12
The Promise Problem X = {x 2 [0,1] t, ||x|| = 1 and 9 y 2 Y s.t. (x,y) 2 } We lower bound R 1/4 (f) via SC(f X, t) t = (1/ 2 ), Y = basis of unit vectors of R t x 2 [0,1] t ||x|| = 1 y 2 Y AliceBob Promise Problem : h x,y i = 0 h x,y i = 2/t 1/2 f(x,y) = 0 OR f(x,y) = 1
13
Bounding SC(f X, t) Theorem: SC(f X, t/4) = 2 (t) Proof: 1. 8 T ½ {Y} s.t. |T| = t/4, put x T = (2/t 1/2 ) ¢ e 2 T e 2.Define X 1 ½ X as X 1 = {x T | T ½ {Y}, |T| = t/4} 3.Claim: 8 s 2 {0,1} t w/ wt(x) = t/4, s 2 truth tab. of f X 1 4.Proof: 1.Let s 2 {0,1} t with 1s in positions i 1, …, i t/4 2.Put T = {e i1, …, e it/4 }. 8 e 2 T, he, x T i = 2/t 1/2 = 2 3. 8 e 2 Y - T, h e, x T i = 0 5.There are 2 (t) such s.
14
Bounding R 1/4 (f) Corollary: ReductionReduction: we need protocol computing f with communication = space used by any (1 § ) F 0 approx. alg.
15
Reduction Recall: hx,yi = 0 if f(x,y) = 0 hx,yi = 2/t 1/2 if f(x,y) = 1 Goal:Goal: Reduce separation of hx,yi to separation of F 0 (s(x) ± s(y)) for streams s(x),s(y) Alice/Bob can derive from x,y Use relation: ||y-x|| 2 = ||y|| 2 + ||x|| 2 – 2hx, yi f(x,y) = 0 ! ||y-x|| = 2 1/2 f(x,y) = 1 ! ||y-x|| < 2 1/2 (1- 1/t 1/2 ) = 2 1/2 (1 - ( ))
16
Overview of Reduction x 2 [0,1] t ||x|| = 1 y 2 E 1.Low-distortion embedding : l 2 t ! l 1 poly(t) 2. Rational Approximation (x) (y) 3. Scale rationals to integers s 4. Convert integer coords to unary to get {0,1} vectors x,y x y F 0 (s(x) ± s(y)) can decide f(x,y) w.p. ¸ 3/4 F 0 Alg F 0 (s(x) ± s(y)) F 0 Alg State s(x)s(y)
17
Embedding l 2 t into l 1 poly(t) A (1+ )-distortion embedding : l 2 t ! l 1 d is mapping s.t. 8 p,q 2 l 2 t, Theorem [FLM77]: 8 9 a (1+ )- distortion embedding : l 2 t ! l 1 d with:
18
Embedding l 2 t into l 1 d x 2 [0,1] t ||x|| = 1 y 2 E Low-distortion embedding : l 2 t ! l 1 d (x) (y) Using Theorem [FLM77], Alice/Bob get (x), (y) 2 R d with d = O(t ¢ (log 1/ ) / 2 ): specified later
19
Rational Approximation z = z(t): N ! N; assume z ¸ d Approximate each coord. of output of embedding by integer multiple of 1/z
20
Scaling Alice (resp. Bob) multiplies each coord. of (resp. ) by z Obtains s( ) (resp. s( ) Claim: coords. are integers in range [-2z, 2z] Proof: 1. | | · | (¢)| + d/z · 2 2. |s( )| = z| |
21
Converting to Unary For i=1 to d j à s( ) i Replace s( ) i with 1 2z+j 0 2z-j Bob does same for s( ) x, y denote new length 4dz bitstrings wt(x) = |s( )|, wt(y) = |s( )| (x,y) = |s( ) – s( )|
22
Reducing (x,y) to F 0 Alice (Bob) chooses stream a x (a y ) with char. vector x (y). Lemma: If 1 < wt(x), wt(y) < 2, then: 1 + (x,y)/2 < F 0 (a x ± a y ) < 2 + (x,y)/2 Follows from fact: F 0 (a x ± a y ) = wt(x Ç y)
23
Reducing (x,y) to F 0 Use lemma to show: Set = ( ), z = (1/ 5 log 1/ ) so that two cases distinguished by (1 § ( )) F 0 alg
24
Conclusions a x, a y must be in universe of size ¸ 4zd = (log (1/ )/ 9 ) Reduction only valid if 4zd · m (1/ 2 ) bound for = (m -1/(9+k) ) 8 k > 0. Recently lower bound improved to: (1/ 2 ) for ¸ m -1/2, which is optimal Find set of vectors directly in Hamming space via involved prob. method argument
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.