Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lower Bounds for Read/Write Streams Paul Beame Joint work with Trinh Huynh (Dang-Trinh Huynh-Ngoc) University of Washington.

Similar presentations


Presentation on theme: "Lower Bounds for Read/Write Streams Paul Beame Joint work with Trinh Huynh (Dang-Trinh Huynh-Ngoc) University of Washington."— Presentation transcript:

1 Lower Bounds for Read/Write Streams Paul Beame Joint work with Trinh Huynh (Dang-Trinh Huynh-Ngoc) University of Washington

2 Data stream Algorithms Many huge successes – No need to remind people at this workshop! Some problems provably hard – E.g. Frequency moments F k, k > 2 require space Ω(n 1-2/k ) [Bar-Yossef-Jayram-Kumar-Sivakumar 02], [Chakrabarti-Khot-Sun 03]

3 Beyond Data Streams Disk storage can be huge – Can stream data to/from disks in real time Sequential access hides latency – Motivates multipass streams Analyzed by similar methods to single pass Why stop at a single copy? – Working with more than one copy at once may make computations easier Why stream the data onto disks exactly as read? – Can make modifications to data while writing

4 01001110100010101000100000101111001111010000 Read/write streams model Disks  read/write streams – Key Parameters: space, #passes=reversals – Assume #streams is constant Introduced by [Grohe-Schweikardt 05] 0100111010001010100010 memory 001111010

5 Read/write streams model Much more powerful than data-stream model – Sort with O(log n) passes, O(log n) space, 3 streams MergeSort – Exactly compute any frequency moment Data-stream requires passes  space = Ω(n) – Θ(log n) passes, O(1) space gives all of LOGSPACE [Hernich-Schweikardt 08] What can be computed in o(log n) passes + small space?

6 Previous lower bounds for R/W streams In o(log n) passes need Ω(n 1-ε ) space to – Sort n numbers [Grohe-Schweikardt 05] – Test set-equality A=B, multiset equality, XQuery, XPath [Grohe-Hernich-Schweikardt 06] Same lower bounds apply for randomized algorithms with one-sided error [Grohe-Hernich-Schweikardt 06]

7 Previous lower bounds for R/W streams Lower bounds for general randomness and two-sided error: – In o(log n  log log n) passes, need Ω(n 1-ε ) space to: Approximate F  * within factor 2 Find Empty-Join, XQuery/XPath-Filtering etc. [B-Jayram-Rudra 07] What about approximating frequency moments F k for k  2 ?

8 Our Main Result Theorem: Any randomized R/W-stream algorithm using o(log n) passes needs Ω(n 1-4/k-ε ) space to 2-approximate F k Implies polynomial space for k>4 Compare with: Θ(n 1-2/k ) on data streams R/W streams with o(log n) passes don’t help much for approximating frequency moments.

9 Methods

10 1.Reduce testing t-party set-disjointness to F k Easy! 2.Simulate any data-stream algorithm by a multi-party number-in-hand communication game Trivial! 3.Apply Ω(n/t) communication lower bound on t-party set-disjointness [AMS 96,Saks-Sun 02,Bar-Yossef-Jayram-Kumar-Sivakumar 02, Chakrabarti-Khot-Sun 03,Grönemeier 09] (tight!) [Alon-Matias-Szegedy 96] approach to lower bounding F k in data streams Fails for R/W streams! Solved easily by R/W streams! Cannot be applied to R/W streams!

11 Promise Set-Disjointness (DISJ) 0, x 1,…,x t are pair-wise disjoint DISJ n,t (x 1,…,x t ) = 1,  a s.t. a  x i for every i Undefined otherwise 010100101000100 100010001000001 001000011010000 000001001001000 000000001000010 x1x1 x2x2 x3x3 x4x4 x5x5 t-party NIH communication: Ω(n  t) Approximating F k  testing DISJ n,t for t  n 1/k 

12 xtxt x t-1 x2x2 x1x1 Testing DISJ n,t with 2 streams,3 passes,O(log n) space Input: x 1,x 2,…,x t  {0,1} n R/W streams easily solve DISJ n,t x1x1 x2x2 x t-1 xtxt

13 Lower bounds [GS05], [GHS05], [BJR07] for R/W streams don’t use [AMS96] outline – Introduce permuted 2-party versions of problems – Employ ad-hoc combinatorial arguments How to prove lower bounds in R/W streams? We take a more general approach related to [AMS96] directly using NIH comm. complexity

14 Our approach to lower bound F k R/W streams algorithm for t-party-permuted-DISJ on input size n Number-in-hand communication protocol for t-party-DISJ on input size  n  t 2

15 1.Reduce testing t-party set-disjointness to F k Easy! 2.Simulate data-stream algorithms by multi-party number-in-hand communication game Apply our simulation 3.Apply communication lower bound on t-party set-disjointness [AMS96,SS02,B-YJKS02,CKS03,G09] (tight!) 2. Simulate R/W streams for permuted DISJ by NIH comm. for DISJ on slightly smaller input size 1. Reduce testing permuted t-party DISJ to F k [Alon,Matias,Szegedy 96]’s approach to lower bound F k in data stream Our approach to lower bound F k in R/W streams

16 Ideas from the proof

17 Segmenting DISJ n,t Input: x 1,x 2,…,x t  {0,1} n View DISJ n,t as an OR of m subproblems DISJ n/m,t x1x1 x2x2 x t-1 xtxt 1 2    m nmnm nmnm

18 Fix  1,  2,…,  t permutations on [m] Permuted- DISJ n,m,t View Permuted-DISJ n,m,t as an OR of m subproblems DISJ n/m,t Permuted DISJ  1 ( 1 )  1 ( 2 )     1 ( m ) 1(x1)1(x1) 2(x2)2(x2) t(xt)t(xt) 1 2    m DISJ n/m,t      nmnm      1 2    m nmnm  t ( 1 )  t ( 2 )     t ( m )

19 Intuitively, to solve a subproblem (e.g. blue), we need to compare at least two blue segments Need to compare at least two segments of every color If segments are shuffled, many passes are needed Why is permuted-DISJ hard?  i(xi)i(xi) j(xj)j(xj) l(xl)l(xl) DISJ n/m,t           

20 Permuted DISJ Good subproblem: computation always depends only on at most one of its t segments (and the memory/state) If segments are randomly shuffled: With o(log m) passes, t=o(m 1/2 ) parties, 99% of the m subproblems are good Reduction idea: Try to embed an ordinary DISJ n/m,t in one of the good subproblems Catch: Which subproblems are good depends on input

21 t players on input y 1,y 2,…,y t : 1.Generate m- 1 DISJ n/m,t ’s that look like* y 1,y 2,…,y t 2.Shuffle with  1,  2,…,  t (y 1,y 2,…,y t ) is good w.h.p 3.Run A on  1 (x 1 ),…,  t (x t ) Simulation s-space R/W streams algo A for permuted-DISJ n,m,t NIH comm. protocol for DISJ n/m,t y1y1 y2y2 1(x1)1(x1) 2(x2)2(x2) x1x1 x2x2  *same sizes but don’t intersect

22 Generating the extended input Given y 1,y 2,…,y t, players – Exchange the sizes of each of the sets O(t log n) bits – Choose random consistent reordering of the indices of each y 1,y 2,…,y t – Generate m-1 random inputs to DISJ n/m,t with same set sizes as y 1,y 2,…,y t but that are disjoint – Place y 1,y 2,…,y t in random position and then shuffle Key observation: If y 1,y 2,…,y t are disjoint then this resolves the catch – After shuffling, all the subproblems look the same so the probability that the subproblem where y 1,y 2,…,y t lands is good does not depend on the input

23 Simulating R/W stream algorithm A using NIH communication As A executes on input v=  1 (x 1 ),…,  t (x t ) players know all inputs except y 1,…,y t – each player builds up copy of a dependency graph σ(v) for the elements of each stream so far Using σ(v), at each step all players either – know the next move, or – know which one player knows next block of moves that player communicates – know that need two players’ info: simulation “fails” If subproblem y 1,…,y t is good for v then simulation does not fail If players detect failure they output “not disjoint” – If input was disjoint then only 1% chance of this

24 Dependency Graph pass j pass j+1 Stream R to LStream L to R Vertices: Elements of each stream in each pass Edges: From element to elements in previous pass that contained heads at same time it did pass j -1 pass 0 pass 1

25 Why most subproblems are good Simple case: algorithm just makes copies of the input stream and compares them – # of subproblems with > 1 segment read at same time on single pass through the streams (L-to-R or R-to-L on each stream) ≤ # segments appearing in the same (or reversed) order – Almost surely, for random permutations  1,  2,…,  t no pair has a common subsequence or inverted subsequence longer than 2em 1/2 – When t is o(m 1/2 ) the total is o(m).

26 Why most subproblems are good General case: May combine information about all streams onto a single stream in single pass – What is combined may depend on the input values – Each element depends on the segments that it can reach in the input stream via the dependency graph

27 For each fixed v, after p=o(log m) passes: – Each element can depend on only 2 O(p) different input segments – For any one stream, the sequence of its elements’ dependencies on input segments is the interleaving of 2 O(p) monotone subsequences from  1,  2,…,  t  Only 2 O(p) t m 1/2 =m o(1) bad subproblems on input v Why most subproblems are good

28 Communication Cost of Simulation For each fixed v, after p=o(log m) passes: – Only 2 O(p) t elements depend on a segment and have a neighbor that does not depend on it Players only need to communicate when segment dependencies change – only happens 2 O(p) t times at cost of O(ps) bits per time

29 Limitations and Future Work

30 Gap from data stream due to loss in input size Most of this loss is necessary – Need n  m   (t 2 ) to use Ω(n/t) CC lower bound for DISJ n/m,t – Efficient R/W algo for permuted-DISJ n,m,t unless m ≥ t 3  2 – Implies that n is Ω(mt 2 ) which is Ω(t 3.5 )  Since we need t≈n 1/k, the lower bound Ω(n/t) is trivial for k  3.5 Limitation of using permuted-DISJ R/W streams algo for permuted-DISJ n,m,t NIH CC protocol for DISJ n/m,t

31 Algorithm for permuted-DISJ n,m,t follows from the following theorem: Proof: For each i  [m] define a triple t i of integers: For each of the 3 pairs of permutations put length of the longest common subsequence for that pair that ends with value i. Can show that all m triples are different. So some triple must contain a coordinate ≥ m 1/3 Tight even for 4 permutations In any 3 permutations on [m] there is a pair with longest common subsequence length ≥ m 1/3. A longest-common-subsequence problem on permutations

32 t  m 2/3, any  : Testing permuted-DISJ n,m,t with 2 streams, 3 passes, O(log nmt) space R/W stream algorithm for permuted-DISJ n,m,t for large t In any three permutations on [m] there is a pair with longest common subsequence length ≥ m 1/3. 1(x1)1(x1)  2(x2)2(x2) 3(x3)3(x3) 4(x4)4(x4) 5(x5)5(x5) 6(x6)6(x6) 1(x1)1(x1) 2(x2)2(x2) 3(x3)3(x3) 4(x4)4(x4) 5(x5)5(x5) 6(x6)6(x6) Compare m 1/3 blocks each time 

33 Open problems Is Ω(n 1-4/k-ε ) lower bound for R/W streams tight? – Gap from O(n 1-2/k ) upper bound in data stream Can’t use permuted-DISJ n,m,t to close it – Polynomial space to compute F k for 2 < k ≤ 4 ? Other problems on R/W streams? L(m,k)  maximum LCS length that can be guaranteed between some pair in any set of k permutations on [m]. – We show L(m,3)  L(m,4)  m 1/3 – What is L(m,k) for other values of k? – [B-Blais-Huynh 08] L(m,k) = m 1/3+o(1) for k  m O(1)


Download ppt "Lower Bounds for Read/Write Streams Paul Beame Joint work with Trinh Huynh (Dang-Trinh Huynh-Ngoc) University of Washington."

Similar presentations


Ads by Google