Nearly-Linear Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs Richard Peng Georgia Tech.

Nearly-Linear Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs
Richard Peng Georgia Tech

In collaboration with Michael B. Cohen Jon Kelner Rasmus Kyng
John Peebles Anup B. Rao Aaron Sidford Adrian Vladu

Outline Graphs and Lx = b G ≈ H and algorithms
Sparsifying directed graphs

Graphs and Matrices a b c b 1 2 -1 -1 a -1 1 0 -1 0 1 a b c 1 c
Graph Laplacians a b c b 1 a a b c 1 c [ST `04] Nearly-linear time Laplacian solvers Input: n × n undirected Laplacian L with m non-zeros vector b Output: vector x that ε-approximates L+b Runtime: O(mlogO(1)nlog(1/ε)) Laplacians occur in Spectral graph theory Optimization Markov chains L+: pseudo-inverse of L

The Laplacian Paradigm
Directly related: Elliptic systems More recent: Laplacians.jl Lx=b Few iterations: Eigenvectors, Heat kernels Many iterations / modify algorithm Flows / matchings, Image processing

The Undireced Laplacian Paradigm?!?
Problem Undirected Directed Stationary Distribution ~ degree As hard as Lx = b Linear systems mlog1/2n nꙍ (previously) Maximum flow Approx: mlogO(1)n m7/4 (unit capacities) Transshipment m1+o(1)n mn1/2 Oblivious routing O(logn)-approx Only single source Rand. spanning trees min(m4/3, n7/3) nꙍ Dynamic matching logO(1)n m1/2

What makes Directed Graphs hard?
Complete bipartite graph, Reachability matrix encodes n2 bits Distinction more murky with numerical algorithms: GMRES (generalized min-residue) converges Pagerank on directed graphs Nearly-linear time algorithm wish-list: Low stretch trees Sparsifiers for all cuts Graph partitions

Our Results Input: n × n directed Laplacian L with m non-zeros, vector b Output: vector x that ε -approximates L+b Runtime: O(mlogO(1)nlog(1/ε)) a b c 1 b a 1 2 b a c 1 c Directed Laplacian Diagonal: out degree Row i column j: wj i Alternate definition: 0 column sums, LT1 = 0 non-negative off-diagonals

Directed Lx = b Key new ideas: Approximations of directed Ls
Decomposing directed graphs (theoretical) applications: Computing stationary distributions Pagerank clustering Hitting / mixing / covering times

What makes graphs hard? `easy’ graph classes: Highly connected
Long diameter, ‘Worst case’ graphs are a combination of both Data structures Path/tree width Min-degree Power method Gradients / CG Most Ω(nm) runtimes: Ω(n) steps × Ω(m) per step

Why graphs and matrices?
My view of the Laplacian paradigm: take apart graphs numerically Ideal: `globalness’ sensitive cost per step, ∑i=1n (m / i) and approximately Highly connected, need global steps Long diameter, many steps Examples: Geometric flows / matchings Brouvka’s MST algorithm Directed tree packing Issue: hard to decompose graphs to isolate `eventful’ parts

Iterative Methods for solving Lx = b
Simplification: random walks, L = I - A Key identity / approximation L+ = (I – A) + = I + A + A2 + A3 +… If ║A║2 ≤ ρ, (1 - ρ)-1 terms well approximates (I – A) + b Ab A2b Adiameterb Graph interpretation: each term  1 step walk Need Ω(diameter) steps

(Preconditioned) Iterative Methods
Solve LGx = b by instead solving LH-1LGx = LH-1b LH: preconditioner of LG LH = LG: x = LG+b, 1 iteration LH = I: same as no preconditioner First requirement: LG and LH operate on the same space

Known null-space: eulerian case
Eulerian Laplacians: In-degree = out-degree, Null space: all 1-s vector a b c 1 b a [CKPPSV`16]: suffices to only consider Eulerian Lx = b 2 b 1 a c 1 c Previous works on Eulerian graphs: [Chung `05]: directed cheeger [EMPS`16]: maxflow on balanced graphs

[CKPPSV `16]: Reduction Simplified case: random walk, L = I - A
s: stationary, ATs = s, Ls = 0 Rescale L to L diag(s) Ldiag(s)1 = Ls = 0, Eulerian! [CKPPSV`16]: algorithmic version: Gradually remove extra diagonal entries Sequence of log(║L+║2) linear systems Definition of directed L: 0 column sums, non-negative off-diagonals To solve Lx = b: solve Ldiag(s)y = b return diag(s)y

Convergence of Iterative Methods
Solving LH-1LGx = LH+b iteratively: Iteration: x  x – LH+ (LGx - b) Effect on residue, r: r’  (I – LH+LG) r residue H ≈ G if all singular values of LH+LG are close 1 Primary motivation for our notion of graph approximations

Graph Sparsification [ST` 04][SS`08][BSS `09]: any undirected graph can be approximated by one with O(n) edges. [CKPPRSV`17]: can always find H with nlogO(1)n edges so LH+LG has all singular values in the range [0.9, 1.1] Approximation in undirected graphs: LH+LG has all eigenvalues close to 1 Implies all cuts are similar 18

[CKPPRSV`17]: Sparsified Squaring
L+ = (I – A) + = (I + A) (I + A2) (I + A4)… A: step of random walk, A2: 2 step random walk Can efficiently sparsify I – A2 without generating it [CKKPPSV`17]: some further control of errors via recursion, total runtime O(m1+α) for any α > 0. 19

More Dense Intermediate Objects
Matrix powers Matrix inverse LU factorizations Cost-prohibitive to store / find Directly access a sparsified version 20

Sparse Gaussian Elimination
[KS `16] Per-entry pivoting, pseudocode: Akin to iChol in MATLAB [CKKPPRS`17]: some partial progress towards this for directed graphs, runtime about mlog10n Issue: still needs to globally sparsify intermediate graphs 21

Wish-List For Directed Approximations
Behaves like ≈: Symmetric, triangle inequality, invertible Candidate: ║LH x║2 ≈ ║LGx║2 ∀ x LG2 ≈ LH2, unfriendly even to perturbations 1 1 1 1 1 1 1 ≈ 1 2 1 2 1 2 1 Need: `divide away’ one copy of G

Norm from symmetrization
b c 1 b a 2 L 1 b a c 1 c U = ½ (L + LT) is symmetric matrix, also norm UU norm: ║M║ UU = maxx ║Mx║U / ║x║U a b c 1.5 b a 0.5 U a b 0.5 c c

Symmetrizing Approximations
LG ε-approximates LH if ║ LG – LH ║UU ≤ ε Implies ║ I – LH+LG ║ UU ≤ ε Equivalent to LGTUG+LG ≈ LHTUH+LH Decomposable! Symmetric, satisfies triangle inequality For undirected L, same as spectral approx. Generalizes directed expander from [Chung `05] Preserves commute times

Easy Case G is an expander with expansion Φ ≥ log-O(1)n
Problem: can modify degrees / nullspace Fix: arbitrarily is ok! Cheeger’s inequality: UG has constant eigenvalues Matrix concentration (e.g. [Tropp `12]): Sampling edges by weights gives ║LG - LH║22≤ ε Implication when translated to norms: UGUG norm is close to 2-norm ║ LG – LH ║22 ≤ ε is sufficient after ε’  ε log-O(1)n

Hard case of Directed Approximations
Hard case: directed and undirected cycles are off by a factor of n2 ≈ Return time of a walk: O(1) in undirected n in directed

Fix: only work on expanders
More general issue UG-norm can be much less than 2-norm in some parts Fix: only work on expanders

Sparsification Algorithm
All except O(nlogn) edges of U are contained in some expander: [ST `04] sparsification algorithm Partition undirected UG so most edges are contained in expanders For each expander (Φ = 1/log2n) Sample to error ε/O(log3n) Fix degree Existence of such a partition: While there is a sparse cut, take it, recurse on both sides Charge edges to smaller side Charge per edge: O(Φ logn) Result: O(nlogO(1)nε-2) sized ε-approximations in O(mlogO(1)n) time, and parallelizable

Ongoing / Future work Faster? Sparse Gaussian elimination based algo?
Directed sparsification without partitioning / Matrix-concentration based sparsification? Extensions to other problems on directed graphs?

Accumulation of Errors
Zi Zj (I + Aj-1) (I + Aj-2)… (I + A1) Hiding: lazy random walks Can show for Ui = ½ (Li+LiT): Ui is 2-approximation of Ui+1 Lj+(I + Aj-1) is ε-approximation of Lj-1+ w.r.t. Uj-1 Error accumulation: if Zj is ε-approx. pseudoinverse of Lj, then Zi is exp(O(j – i)) ε-approx. pseudoinverse of Li Directly use I = Zd for Z0: need ε < 1/poly(Rn)

Fix: iterative Refinement
Only work up δ steps at a time Reduce error to exp(-O(δ)) via iterative refinement: O(δ) branching factor δ Need ε = exp(-O(δ)) in sparsifier O(δ) branching factor, every δ layers Overhead: 2(O(δ))δd/δ≤2O(δ +dlogd/δ) Optimized at δ = d1/2 = log1/2nR Total: O(m 2\sqrt(log(nR))log(1/ε)) δ

(Preconditioned) Iterative Methods
Solve LGx = b by iterating with x  x – LH+(b – LGx) Fixed point: b = LGx LH: preconditioner of LG LH = LG: x = LG+b, 1 iteration LH = I: same as no preconditioner First requirement: LG and LH have same null space

Nearly-Linear Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs Richard Peng Georgia Tech.

Similar presentations

Presentation on theme: "Nearly-Linear Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs Richard Peng Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nearly-Linear Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs Richard Peng Georgia Tech.

Similar presentations

Presentation on theme: "Nearly-Linear Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs Richard Peng Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback