Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Slides:



Advertisements
Similar presentations
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Advertisements

Lower Bounds for Local Search by Quantum Arguments Scott Aaronson (UC Berkeley) August 14, 2003.
Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.
Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Lindsey Bleimes Charlie Garrod Adam Meyerson
A sublinear Time Approximation Scheme for Clustering in Metric Spaces Author: Piotr Indyk IEEE FOCS 1999.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
ABSTRACT We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result.
Compact and Low Delay Routing Labeling Scheme for Unit Disk Graphs Chenyu Yan, Yang Xiang, and Feodor F. Dragan (WADS 2009) Kent State University, Kent,
Chapter 3 The Greedy Method 3.
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
An Approximation Algorithm for Requirement cut on graphs Viswanath Nagarajan Joint work with R. Ravi.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Collective Additive Tree Spanners of Homogeneously Orderable Graphs
Collective Tree Spanners of Graphs with Bounded Parameters F.F. Dragan and C. Yan Kent State University, USA.
A 2-Approximation algorithm for finding an optimum 3-Vertex-Connected Spanning Subgraph.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Collective Tree Spanners of Graphs F.F. Dragan, C. Yan, I. Lomonosov Kent State University, USA Hiram College, USA.
Additive Spanners for k-Chordal Graphs V. D. Chepoi, F.F. Dragan, C. Yan University Aix-Marseille II, France Kent State University, Ohio, USA.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
Quantum Algorithms II Andrew C. Yao Tsinghua University & Chinese U. of Hong Kong.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Hardness Results for Problems
Priority Models Sashka Davis University of California, San Diego June 1, 2003.
Randomized Algorithms Morteza ZadiMoghaddam Amin Sayedi.
Dynamic Single-source Shortest Paths Camil Demetrescu University of Rome “La Sapienza”
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
Primal-Dual Meets Local Search: Approximating MST’s with Non-uniform Degree Bounds Author: Jochen Könemann R. Ravi From CMU CS 3150 Presentation by Dan.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Fixed Parameter Complexity Algorithms and Networks.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Near Optimal Streaming algorithms for Graph Spanners Surender Baswana IIT Kanpur.
Improved Approximation Algorithms for the Quality of Service Steiner Tree Problem M. Karpinski Bonn University I. Măndoiu UC San Diego A. Olshevsky GaTech.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Minimum Spanning Trees CS 146 Prof. Sin-Min Lee Regina Wang.
Amplification and Derandomization Without Slowdown Dana Moshkovitz MIT Joint work with Ofer Grossman (MIT)
Tree Spanners on Chordal Graphs: Complexity, Algorithms, Open Problems A. Brandstaedt, F.F. Dragan, H.-O. Le and V.B. Le University of Rostock, Germany.
Limits to Computation How do you analyze a new algorithm? –Put it in the form of existing algorithms that you know the analysis. –For example, given 2.
Shortest Paths in Decremental, Distributed and Streaming Settings 1 Danupon Nanongkai KTH Royal Institute of Technology BIRS, Banff, March 2015.
A randomized linear time algorithm for graph spanners Surender Baswana Postdoctoral Researcher Max Planck Institute for Computer Science Saarbruecken,
Online Bipartite Matching with Augmentations Presentation by Henry Lin Joint work with Kamalika Chaudhuri, Costis Daskalakis, and Robert Kleinberg.
Data Structures for Emergency Planning Cyril Gavoille (LaBRI, University of Bordeaux) 8 th FoIKS Bordeaux – March 3, 2014.
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
New Characterizations in Turnstile Streams with Applications
Approximating the MST Weight in Sublinear Time
Improved Randomized Algorithms for Path Problems in Graphs
Space-efficient graph algorithms
Approximate Matchings in Dynamic Graph Streams
MST in Log-Star Rounds of Congested Clique
Advanced Topics in Data Management
CIS 700: “algorithms for Big Data”
Objective of This Course
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
Instructor: Shengyu Zhang
CSCI B609: “Foundations of Data Science”
Range-Efficient Computation of F0 over Massive Data Streams
Presentation transcript:

Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar 05361

Processing massive data streams Large body of work in recent years Practically motivated, raises interesting theoretical questions Areas: Databases, Sensors, Networking, Hardware, Programming lang. Core problems: Algorithms, Complexity, Statistics, Probability, Approximation theory

Classical streaming input stream M 1st pass MMMM 2nd pass MMM p = number of passes s = size of working memory M (space in bits) n = size of input stream (# of items)

Classical streaming Seminal work by Munro and Paterson (1980): pass-efficient selection and sorting Several problems shown to be solvable with polylog(n) space and passes in the 90’s (e.g., approximating frequency moments) Classical streaming is very restrictive: for many fundamental problems (e.g., on graphs) provably impossible to achieve polylog(n) space and passes

Graph streaming problems For many basic graph problems (e.g., connectivity, shortest paths): passes = Ω (N/space) ( N = number of vertices ) Recent interest in graph problems in “semi-streaming” models, where: space = O( N · polylog(N) ) passes = O( polylog(N) ) [Feigenbaum et al., ICALP 2004] O(N · polylog(N)) space “sweet spot” for graph streaming problems [Muthukrishnan, 2001]

Graph algorithms in classical streaming Approximate triangle counting [Bar-Yossef et al., SODA 2002] Matching, bipartiteness, connectivity, MST, t-spanners, … [Feigenbaum et al., ICALP 2004, SODA 2005] All of them make one, or very few passes, but require Ω(N) space

Trading off space for passes Natural question: Can we reduce space if we do more passes? [Munro and Paterson ‘80, Henzinger et al. ‘99] Example: Processing a 50 GB graph on a 1 GB RAM PC (4 billion vertices, 6 billion edges) s =  (N/p) algorithm: ~16 passes (a few hours) s =  (N) algorithm: out of memory (16 GB RAM would be required)

Some facts on modern commodity I/O A RAID disk controller can deliver 100 MB/s access rate On a 1+ GHz Pentium PC, random access to 2GB of main memory in 32 byte chunks: 80 MB/s effective access rate Sequential access rates are comparable to (or even faster than) random access rates in main memory: Sequential access uses caches optimally (this makes algorithms cache-oblivious) [Ruhl ‘03 - Rajagopalan ‘02]

Some facts on modern commodity I/O  Classical read-only streaming perhaps overly pessimistic?  Why not exploiting temporary storage? Above facts imply that both reading and writing sequentially can improve performances External memory storage is cheap (less than a dollar per gigabyte) and readily available

interm. stream M 1st pass The StreamSort model [Aggarwal et al.’04] input stream MMMMMMM output stream 2nd pass M MMMMMMM use a sorting primitive to reorder the stream

How much power does sorting yield? Open problem: No clue on how to get polylog(N) bounds for Shortest Paths (even BFS) in StreamSort Good news: Undirected connectivity can be solved in polylog(N) space and passes in StreamSort [Aggarwal et al., FOCS 2004]

Dish of the day In this model, we show effective space/passes tradeoffs for natural graph streaming problems - Connectivity - Single-source shortest paths We address: We show that StreamSort can yield interesting results even without using sorting at all (call this more restrictive model W-Stream: allows intermediate streams, but no sorting)

Graph connectivity UCON: G=(V,E) undirected graph with N vertices given as stream of edges in arbitrary order. Find out if G is connected. Lower bound: UCON in W-Stream p = Ω(N/s) Upper bound: UCON in W-Stream p = O(N · log N / s) We now show the following:

Input streamOutput stream GG’ pass F Graph connectivity: algorithm Generic pass: two phases Red phase Blue phase

Graph connectivity: analysis How many passes? At each pass we loose at least |V(F)| / 2 =  (s/log N) vertices Invariant: F is induced by a set of edges  each tree in F contains at least two vertices  p = O( N ·log N / s) All vertices of F that are not component representatives disappear from the output graph

Single-source shortest paths SSSP: G=(V,E,w) weighted directed graph with N vertices given as arbitrary stream of edges. Find distances from a given source t to all other vertices. Lower bound 1: BFS in W-Stream: p = Ω(N / s) Lower bound 2: finding vertices up to constant distance d: p ≤ d  s = Ω( N 1+1/(2d) ) [Feigenbaum et al., SODA 2005] Space-efficient algorithms for SSSP always require multiple passes

Single-source shortest paths Hard even using sorting as a primitive No sublinear-space streaming algorithm for SSSP previously known. We make a first step, showing that we can solve SSSP in W-Stream in sublinear space and passes simultaneously in directed graphs with small integer edge weights Previous results on distances in streaming: approximate (spanners) in undirected graphs only

Single-source shortest paths: bound For C = O(s 1/2-  ) and polynomial sublinear space, we also get sublinear p Thm: For any space restriction s, there is a randomized one-sided error algorithm for directed SSSP in W-Stream with edge weights in {1,2,…,C} s.t.: p = O C ·N ·log 3/2 N √s In this talk we focus on C=1 (BFS) p = O N √s ~ p = Ω N s

Single-source shortest paths: approach For a given space restriction, this helps us reduce the number of passes to find long paths Overall approach: First build many short paths “in parallel”, then stitch them together to form long paths.

Single-source shortest paths: step 1/5 Pick a set K of (s/log N) 1/2 random vertices including source t t Example: (chain)

Single-source shortest paths: step 2/5 Find distances up to (N log N) / |K| from each vertex in K (short distances) t Example: (chain) N log N |K| 0000 The more memory we have, the larger |K|, and thus the smaller the # of passes

Single-source shortest paths: step 3/5 Build a graph G’ = (K, E’), where: (x,y)  E’  dist(x,y) ≤ (N log N) / |K| in G t Example: (chain) 1574 t 332 G’

0368 Single-source shortest paths: step 4/5 Find in G’ distances from t to all other vertices of K t Example: (chain) 1574 t 332 G’ 0368

Single-source shortest paths: step 5/5 For each v, let: dist(t,v) = min c  K {dist(t,c) + dist(c,v)} (final distances) t Example: (chain)

Results are correct with high prob. [Greene & Knuth,’80] Sampling thm. Let K be a set of vertices chosen uniformly at random. Then the probability that a simple path with more than (c ·N · log N) / |K| vertices intersects K is at least 1-1/n c for any c > 0

Conclusions and further work We have shown effective space/passes tradeoffs for problems that seem hard in classical streaming (graph connectivity & shortest paths) Can we close the gap between upper and lower bound for BFS in W-Stream? Can we do the same in the classical read-only streaming model? Can we prove stronger lower bounds in classical streaming? Space/passes tradeoffs for other problems?

Thank you