Algorithm Frameworks Using Adaptive Sampling Richard Peng Georgia Tech.

Slides:



Advertisements
Similar presentations
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Advertisements

05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.
Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.
Lecture 19: Parallel Algorithms
The Combinatorial Multigrid Solver Yiannis Koutis, Gary Miller Carnegie Mellon University TexPoint fonts used in EMF. Read the TexPoint manual before you.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Uniform Sampling for Matrix Approximation Michael Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford M.I.T.
1 Maximum flow sender receiver Capacity constraint Lecture 6: Jan 25.
Solving linear systems through nested dissection Noga Alon Tel Aviv University Raphael Yuster University of Haifa.
An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)
Algorithm Design Using Spectral Graph Theory Richard Peng Joint Work with Guy Blelloch, HuiHan Chin, Anupam Gupta, Jon Kelner, Yiannis Koutis, Aleksander.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Interchanging distance and capacity in probabilistic mappings Uriel Feige Weizmann Institute.
Solving Linear Systems (Numerical Recipes, Chap 2)
Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.
SDD Solvers: Bridging theory and practice Yiannis Koutis University of Puerto Rico, Rio Piedras joint with Gary Miller, Richard Peng Carnegie Mellon University.
Lectures on Network Flows
All Rights Reserved © Alcatel-Lucent 2006, ##### Matthew Andrews, Alcatel-Lucent Bell Labs Princeton Approximation Workshop June 15, 2011 Edge-Disjoint.
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo Department of Combinatorics and Optimization Joint work with Isaac.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Sparsifiers: A Survey Nick Harvey Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,
Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.
Sampling from Gaussian Graphical Models via Spectral Sparsification Richard Peng M.I.T. Joint work with Dehua Cheng, Yu Cheng, Yan Liu and Shanghua Teng.
Sampling: an Algorithmic Perspective Richard Peng M.I.T.
Approximate Undirected Maximum Flows in O(m polylog(n)) Time
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
CSE 421 Algorithms Richard Anderson Lecture 22 Network Flow.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Yiannis Koutis , U of Puerto Rico, Rio Piedras
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Complexity of direct methods n 1/2 n 1/3 2D3D Space (fill): O(n log n)O(n 4/3 ) Time (flops): O(n 3/2 )O(n 2 ) Time and space to solve any problem on any.
Institute for Advanced Study, April Sushant Sachdeva Princeton University Joint work with Lorenzo Orecchia, Nisheeth K. Vishnoi Linear Time Graph.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Graph Sparsifiers Nick Harvey Joint work with Isaac Fung TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Spanning and Sparsifying Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopSpanning and Sparsifying1.
Multifaceted Algorithm Design Richard Peng M.I.T..
Embeddings, flow, and cuts: an introduction University of Washington James R. Lee.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
New algorithms for Disjoint Paths and Routing Problems
Graph Partitioning using Single Commodity Flows
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
CSEP 521 Applied Algorithms Richard Anderson Lecture 8 Network Flow.
CSE 421 Algorithms Richard Anderson Lecture 22 Network Flow.
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
Generating Random Spanning Trees via Fast Matrix Multiplication Keyulu Xu University of British Columbia Joint work with Nick Harvey TexPoint fonts used.
Sparsified Matrix Algorithms for Graph Laplacians Richard Peng Georgia Tech.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
High Performance Linear System Solvers with Focus on Graph Laplacians
Algorithm Frameworks Based on Adaptive Sampling
Richard Peng Georgia Tech Michael Cohen Jon Kelner John Peebles
Resparsification of Graphs
Efficient methods for finding low-stretch spanning trees
Parallel Algorithm Design using Spectral Graph Theory
Lecture 22 Network Flow, Part 2
A Polynomial-time Tree Decomposition for Minimizing Congestion
Nearly-Linear Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs Richard Peng Georgia Tech.
Lecture 22: Parallel Algorithms
Density Independent Algorithms for Sparsifying
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Matrix Martingales in Randomized Numerical Linear Algebra
On the effect of randomness on planted 3-coloring models
A Numerical Analysis Approach to Convex Optimization
Lecture 22 Network Flow, Part 2
On Solving Linear Systems in Sublinear Time
Optimization on Graphs
Presentation transcript:

Algorithm Frameworks Using Adaptive Sampling Richard Peng Georgia Tech

OUTLINE Sampling and its algorithmic incorporation Adaptive sampling for approximating maxflow Adaptive sampling for solving linear systems

RANDOM SAMPLING Pick a small subset of a collection of many objects Goal: Estimate quantities Reduce sizes Speed up algorithms

ALGORITHMIC USE OF SAMPLING Framework: Compute on the sample Bring answer back to original Original Compute on sample Examples: Quick sort/select Geometric cuttings Nystrom method

THIS TALK Adaptive sampling based algorithms for finding combinatorial and scientific flows Current best, O(mlog c n) time guarantees for: Approximate maxflow / balanced cuts Computation of random walks / eigenvectors Measure runtime with big-O, address worst case instances, including ill-conditioned ones

ADAPTIVE SAMPLING Iterative schemes: chain of samples, each constructed from the previous Preserve `just enough’ for answer on the sample to be useful Problem 1 Problem 2 Problem d …

PRESERVING GRAPH STRUCTURES This talk: undirected graphs n vertices m < n 2 edges Is n 2 edges (dense) sometimes necessary? For connectivity: < n edges always ok

PRESERVING MORE [BK `96]: for ANY G, can get H with O(nlogn) edges s.t. G ≈ H on all cuts Andras Benczur David Karger How: keep edge e with probability p e, rescale if kept to maintain expectation

HOW TO PICK PROBABILITIES Widely used: uniform sampling Works well when data is uniform e.g. complete graph Problem: long path, removing any edge changes connectivity (can also have both in one graph) Fix: pick probaiblities adaptively baesd on graph

THE `RIGHT’ PROBABILITIES Path + clique: 1 1/n [RV `07], [Tropp `12], [SS `08]: suffices to have p e ≥ O( logn) × w e × effective resistance Dan Spielman Nikhil Srivastava Gives spectral sparsifiers, which suffices for all the results in this talk, e.g. preserves cuts

SAMPLING MORE [Talagrand `90] “Embedding subspaces of L 1 into L N 1 ” : non-linear, matrix, analog, O(dlogd) objects for d dimensions Mathematical view of probabilities: 2-norm: leverage scores 1-norm: [CP `15]: Lewis weights ║x║1║x║1 ║x║2║x║2 Michel Talagrand

COMPUTING SAMPLING PROBABILITIES [ST `04][OV `11]: spectral partitioning [SS`08][KLP `12]: projections + solves [Koutis `14]: spanners / low diameter partitions [LMP `13][CLMMPS`15]: self-reduction / recursion [DMMW`13][CW`13]: sketches [BSS`09][LS `15]: potential functions For edge with Laplacian L e w e × r e = trace( L + L e ) +: pseudo-inverse

OUTLINE Sampling and its algorithmic incorporation Adaptive sampling for approximating maxflow Adaptive sampling for solving linear systems

s Undirected graphs PROBLEM t Maximum number of disjoint s-t paths Applications: Routing Scheduling Dual: separate s and t by removing fewest edges Applications: Partitioning Clustering n vertices, m edges Assume capacities = poly(n) Goal: 1± ε approx in O(mlog c nε -2 ) time

May also ‘fix’ earlier flow via residual graph Faster: [EK `73]: fewer paths: O(m 2 n) [Dinic `73][HK `75] [GN `80][ST `83]: shorter / simpler paths, O(nmlogn) ‘PROTOTYPE’ FLOW ALGORITHM (FORD-FULKERSON `56) While exists s-t path route it adjust graph s t s t D. R. Fulkerson L. R. Ford Jr.

EXTREME INSTANCES Highly connected, need global steps Long paths / tree, need many steps But must handle both simultaneously gradient steps MCMC Power method min-degree dynamic trees DFS Each easy on their own

John Hopcroft LIMIT OF THIS APPROACH? s t s t [HK `75]: if we can route f units of flow in a unit-capacity graph, the shortest augmenting path has length at most m / f If we can find (approximate) shortest paths dynamically: Σ f=1 m (m / f) = O(mlogn) Richard Karp

1980: dynamic trees 1970s: blocking flows 1986: dual algorithms 1999: scaling 2010: linear systems 1956: augmenting paths Augmenting paths [FF `55] Notion of polynomial time [Edmonds `65] Blocking flows [EK`75, Dinic `73] Dynamic tree data structures [GN `80] Dual/scaling algorithms [Gabow `83, GT `88, GR `99] Faster solvers for graph Laplacians [Vaidya `89] WORK RELATED TO FLOWS LED TO:

AVERAGE TOGETHER FLOWS NUMERICALLY Numerical notions of `approximate’ shortest paths, interact well with Hierarchical decompositions, Divide-and-conquer on graphs s s t t s t

Jon Kelner NUMERICAL MAXFLOW ALGORITHMS [Sherman `13] [KLOS `14]: given operator that α-approximates maxflow for ANY demand d, can compute (1 + ε)-approx maxflow in O(α 2 lognε -2 ) calls [Madry `10] [KLOS `13]: build this operator with α = O(m θ ) in O(m 1+θ ) time Jonah Sherman Alexander Madry Aaron Sidford Yin-Tat Lee Lorenzo Orecchia Combining gives O(m 1+θ ), will show next: O(mlog c n)

-2 3 OPERATOR? Tree: unique s-t path, maxflow = demand / bottleneck edge s t 1 -2 Multiple (exact) demands: flows along each edge determined via linear mapping, O(n) time

TREE FOR ANY GRAPH? [RST `14]: can find such a tree by solving maxflows of total size O(mlog c n) [Racke `01]: ANY undirected graph has a tree that’s an O(log c n) approximator Harald Racke Hanjo Taubig Chintan Shah

USING THESE DECOMPOSITIONS? Approximator Maxflow [RST `14] operator with α=O(log c n) via maxflows of total size O(mlog c n) Maxflow Approximator O(mlog c nε -2 ) time? [Sherman `13] [KLOS `14]: given operator that α-approximates maxflow for ANY demand d, can compute (1 +ε)-approx maxflow in O(α 2 lognε -2 ) calls

RESOLUTION: RECURSION Create smaller approximation Build cut/flow approximator on it recursively Use (fixed size) approximator for the original ` Adaptive sampling acts driver that controls progress of recursion

SIZE REDUCTION Ultra-sparsifier: for any k, can find H ≈ k G that’s tree + O(mlog c n/k) edges ` ` e.g. [Koutis-Miller-P `10]: pick good tree, sample off-tree edges by their ‘stretch’ Reducible to O(mlog c n/k) vertices/edges Yiannis Koutis Gary Miller

Use: maxflow from O(α 2 lognε -2 ) calls Construction: α=O(log c n) via maxflows on graphs of total size O(mlog c n) [P `16]: INCORPORATING REDUCTIONS T(m) = T(mlog 2c n/k) + O(mk 2 log 2c+1 n) Set k  O(log 2c n): T(m) = T(m/2) + O(mlog 4c+1 n) = O(mlog 4c+1 n) Construct on ultra-sparsifier with approximation factor k α=O(klog 2c n) for original graph New size = O(mlog c n/k) Maxflow

OUTLINE Sampling and its algorithmic incorporation Adaptive sampling for approximating maxflow Adaptive sampling for solving linear systems

GRAPH LAPLACIANS Matrices that correspond to undirected graphs Entries  vertices, n by n matrices Non-zeros  edges, O(m) nonzeros Problem: given graph Laplacian L, vector b, find x s.t. Lx = b

THE LAPLACIAN PARADIGM Directly related : Elliptic systems Few iterations : Eigenvectors, Heat kernels Many iterations / modify algorithm Graph problems Image processing Dan Spielman Shanghua Teng

USE MATLAB? TRILINOS? LAPACK? Sequence of (adpatively) generated linear systems: : …we suggest rerunning the program a few times and/or using a different solver. An alternate solver based on incomplete Cholesky factorization is provided… Optimization Problem Linear System Solver Kevin Deweese

SIMPLIFICATION Adjust/rescale so diagonal = I Add to diagonal to make full rank L = I – A A: Random walk

LOWER BOUND FOR ITERATIVE METHODS Graph theoretic interpretation: each term  1 step walk A diameter b bAbA2bA2b Need Ω(diameter) steps Division with multiplication: L -1 = ( I – A ) -1 = I + A + A 2 + A 3 +…

( I – A ) -1 = I + A + A 2 + A 3 + … = ( I + A ) ( I + A 2 ) ( I + A 4 )… REPEATED SQUARING Dense matrix! A 16 = (((( A 2 ) 2 ) 2 ) 2, 4 operations O(logn) terms ok Similar to multi-level methods

DENSE INTERMEDIATE OBJECTS Matrix powers Matrix inverse Transitive closures LU factorizations Cost-prohibitive to store / find But can access a sparse version

GRAPH THEORETIC VIEW A : step of random walk A 2 : 2 step random walk Still a graph! [PS `14]: can directly access a sparsifier in O(mlog c n) time

HIGHER POWERS A : random walk A k : k step random walk [CCLPT `15: can also compute this sparsifier in nearly-linear time Shanghua Teng Yan Liu Yu Cheng Dehua Cheng Sparse graph close to A k

SPARSIFIED SQUARING I - A 1 ≈ ε I – A 2 I – A 2 ≈ ε I – A 1 2 … I – A i ≈ ε I – A i-1 2 I - A d ≈ I I - A 0 I - A d ≈ I Convergence: (approximately) same as repeated squaring: d = O(log(mixing time)) suffices ≈: spectral/condition number, implies cut

[PS `14] CHAIN  SOLVER x = Solve( I, A 0, … A d, b) 1.For i =1 to d, b i  ( I + A i ) b i-1. 2.x d  b d. 3.For i = d - 1 downto 0, x i  ½[b i +( I + A i )x i+1 ]. Runtime: O(mlog c nlog 3 (mixing time))

SPARSE BLOCK CHOLESKY [KLPRS`16]: Repeatedly eliminate some variables Sparsify intermediate matrices O(mlogn) time, extends to connection Laplacians, which can be viewed as having complex weights Rasmus Kyng Sushant Sachdeva

EVEN MORE ADAPTIVE [KS `16] Per-entry pivoting, almost identical to incomplete LU / ichol Running time bound: O(mlog 3 n), OPEN: improve this

OPEN QUESTIONS Nearly-linear time algorithms for: Wider classes of linear systems Directed maximum flow Intermediate questions: Squaring based flow algorithms / oblivious routing schemes. What can we preserve while sparsifying directed graphs?

THANK YOU!