Download presentation
Presentation is loading. Please wait.
Published byRaymundo Cottrell Modified over 9 years ago
1
Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar 05361
2
Processing massive data streams Large body of work in recent years Practically motivated, raises interesting theoretical questions Areas: Databases, Sensors, Networking, Hardware, Programming lang. Core problems: Algorithms, Complexity, Statistics, Probability, Approximation theory
3
Classical streaming input stream M 1st pass MMMM 2nd pass MMM p = number of passes s = size of working memory M (space in bits) n = size of input stream (# of items)
4
Classical streaming Seminal work by Munro and Paterson (1980): pass-efficient selection and sorting Several problems shown to be solvable with polylog(n) space and passes in the 90’s (e.g., approximating frequency moments) Classical streaming is very restrictive: for many fundamental problems (e.g., on graphs) provably impossible to achieve polylog(n) space and passes
5
Graph streaming problems For many basic graph problems (e.g., connectivity, shortest paths): passes = Ω (N/space) ( N = number of vertices ) Recent interest in graph problems in “semi-streaming” models, where: space = O( N · polylog(N) ) passes = O( polylog(N) ) [Feigenbaum et al., ICALP 2004] O(N · polylog(N)) space “sweet spot” for graph streaming problems [Muthukrishnan, 2001]
6
Graph algorithms in classical streaming Approximate triangle counting [Bar-Yossef et al., SODA 2002] Matching, bipartiteness, connectivity, MST, t-spanners, … [Feigenbaum et al., ICALP 2004, SODA 2005] All of them make one, or very few passes, but require Ω(N) space
7
Trading off space for passes Natural question: Can we reduce space if we do more passes? [Munro and Paterson ‘80, Henzinger et al. ‘99] Example: Processing a 50 GB graph on a 1 GB RAM PC (4 billion vertices, 6 billion edges) s = (N/p) algorithm: ~16 passes (a few hours) s = (N) algorithm: out of memory (16 GB RAM would be required)
8
Some facts on modern commodity I/O A RAID disk controller can deliver 100 MB/s access rate On a 1+ GHz Pentium PC, random access to 2GB of main memory in 32 byte chunks: 80 MB/s effective access rate Sequential access rates are comparable to (or even faster than) random access rates in main memory: Sequential access uses caches optimally (this makes algorithms cache-oblivious) [Ruhl ‘03 - Rajagopalan ‘02]
9
Some facts on modern commodity I/O Classical read-only streaming perhaps overly pessimistic? Why not exploiting temporary storage? Above facts imply that both reading and writing sequentially can improve performances External memory storage is cheap (less than a dollar per gigabyte) and readily available
10
interm. stream M 1st pass The StreamSort model [Aggarwal et al.’04] input stream MMMMMMM output stream 2nd pass M MMMMMMM use a sorting primitive to reorder the stream
11
How much power does sorting yield? Open problem: No clue on how to get polylog(N) bounds for Shortest Paths (even BFS) in StreamSort Good news: Undirected connectivity can be solved in polylog(N) space and passes in StreamSort [Aggarwal et al., FOCS 2004]
12
Dish of the day In this model, we show effective space/passes tradeoffs for natural graph streaming problems - Connectivity - Single-source shortest paths We address: We show that StreamSort can yield interesting results even without using sorting at all (call this more restrictive model W-Stream: allows intermediate streams, but no sorting)
13
Graph connectivity UCON: G=(V,E) undirected graph with N vertices given as stream of edges in arbitrary order. Find out if G is connected. Lower bound: UCON in W-Stream p = Ω(N/s) Upper bound: UCON in W-Stream p = O(N · log N / s) We now show the following:
14
Input streamOutput stream GG’ pass F Graph connectivity: algorithm 12 37 5 811 12 11 12 8 5 9 6 10 1 9 4 Generic pass: two phases Red phase Blue phase
15
Graph connectivity: analysis How many passes? At each pass we loose at least |V(F)| / 2 = (s/log N) vertices Invariant: F is induced by a set of edges each tree in F contains at least two vertices p = O( N ·log N / s) All vertices of F that are not component representatives disappear from the output graph
16
Single-source shortest paths SSSP: G=(V,E,w) weighted directed graph with N vertices given as arbitrary stream of edges. Find distances from a given source t to all other vertices. Lower bound 1: BFS in W-Stream: p = Ω(N / s) Lower bound 2: finding vertices up to constant distance d: p ≤ d s = Ω( N 1+1/(2d) ) [Feigenbaum et al., SODA 2005] Space-efficient algorithms for SSSP always require multiple passes
17
Single-source shortest paths Hard even using sorting as a primitive No sublinear-space streaming algorithm for SSSP previously known. We make a first step, showing that we can solve SSSP in W-Stream in sublinear space and passes simultaneously in directed graphs with small integer edge weights Previous results on distances in streaming: approximate (spanners) in undirected graphs only
18
Single-source shortest paths: bound For C = O(s 1/2- ) and polynomial sublinear space, we also get sublinear p Thm: For any space restriction s, there is a randomized one-sided error algorithm for directed SSSP in W-Stream with edge weights in {1,2,…,C} s.t.: p = O C ·N ·log 3/2 N √s In this talk we focus on C=1 (BFS) p = O N √s ~ p = Ω N s
19
Single-source shortest paths: approach For a given space restriction, this helps us reduce the number of passes to find long paths Overall approach: First build many short paths “in parallel”, then stitch them together to form long paths.
20
Single-source shortest paths: step 1/5 Pick a set K of (s/log N) 1/2 random vertices including source t 16105837249 t Example: (chain)
21
1111 22 2333 Single-source shortest paths: step 2/5 Find distances up to (N log N) / |K| from each vertex in K (short distances) 16105837249 t Example: (chain) N log N |K| 0000 The more memory we have, the larger |K|, and thus the smaller the # of passes
22
Single-source shortest paths: step 3/5 Build a graph G’ = (K, E’), where: (x,y) E’ dist(x,y) ≤ (N log N) / |K| in G 16105837249 t Example: (chain) 1574 t 332 G’ 111122 2333 0000
23
0368 Single-source shortest paths: step 4/5 Find in G’ distances from t to all other vertices of K 16105837249 t Example: (chain) 1574 t 332 G’ 0368
24
Single-source shortest paths: step 5/5 For each v, let: dist(t,v) = min c K {dist(t,c) + dist(c,v)} (final distances) 16105837249 t Example: (chain) 0368 111122 2333 0000 124579
25
Results are correct with high prob. [Greene & Knuth,’80] Sampling thm. Let K be a set of vertices chosen uniformly at random. Then the probability that a simple path with more than (c ·N · log N) / |K| vertices intersects K is at least 1-1/n c for any c > 0
26
Conclusions and further work We have shown effective space/passes tradeoffs for problems that seem hard in classical streaming (graph connectivity & shortest paths) Can we close the gap between upper and lower bound for BFS in W-Stream? Can we do the same in the classical read-only streaming model? Can we prove stronger lower bounds in classical streaming? Space/passes tradeoffs for other problems?
27
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.