Download presentation
Presentation is loading. Please wait.
Published byWilfrid Bradley Modified over 9 years ago
1
Solving SDD Linear Systems in Nearly mlog 1/2 n Time Richard Peng M.I.T. A&C Seminar, Sep 12, 2014
2
OUTLINE The Problem Approach 1 Approach 2 Combining these
3
LARGE GRAPHS Images Algorithmic challenges: How to store? How to analyze? How to optimize? Meshes Roads Social networks
4
LAPLACIAN PARADIGM Graph Electrical network Linear systems 1Ω 2Ω
5
GRAPH LAPLACIAN Electrical network/ Weighted graph Properties Symmetric Row/column sums = 0 Non-positive off-diagonals Row/column vertex Off-diagonal -weight Diagonal weighted degree 1Ω 2Ω
6
CORE ROUTINE To measure performance: n: dimension, # vertices m: nnz, O(# of edges) Input : graph Laplacian L, vector b Output : vector x s.t. Lx ≈ b
7
APPLICATIONS Directly related : SDD, M matrices Elliptic PDEs Few iterations : eigenvectors, heat kernels Clustering Inference Many iterations : Combinatorial graph problems Image processing Random trees Flow / matching
8
GENERAL LINEAR SYSTEM SOLVES Oldest studied algorithmic problem [Liu 179]…[Newton `1710] [Gauss 1810]: O(n 3 ) [HS `52]: conjugate gradient, O(nm)* [Strassen `69] O(n 2.8 ) [Coopersmith-Winograd `90] O(n 2.3755 ) [Stothers `10] O(n 2.3737 ) [Vassilevska Williams`11] O(n 2.3727 )
9
Use the connection to graphs to design numerical solvers [Vaidya`91]: O(m 7/4 ) [Boman-Hendrickson`01] O(mn) [Spielman-Teng `03] O(m 1.31 ) [Spielman-Teng `04] O(mlog c n) COMBINATORIAL PRECONDITIONING Subsequent improvements: [Elkin-Emek-Spielman-Teng `05] [Andersen-Chung-Lang `06] [Koutis-Miller `07] [Spielman-Srivastava `08] [Abraham-Bartal-Neiman `08] [Andersen-Perez `09] … [Batson-Spielman-Srivastava `09] [Kolla-Makarychev-Saberi-Teng `10] [Koutis-Miller-P `10, `11] [Orecchia-Vishnoi `11] [Abraham-Neiman `12] [OveisGharan-Trevisan `12] … 1Ω 2Ω
10
ZENO’S DICHOTOMY PARADOX [Miller]: Instance of speedup theorem? OPT: 0? 2014: 1/2 c 2011: 1 2010: 2 2010: 6 2009: 15 2006: 32 2004: 70 O(mlog c n) Fundamental theorem of Laplacian solvers: improvements decrease c by factor between [2,3]
11
OUTLINE The Problem Approach 1 Approach 2 Combining these
12
WHAT IS A SOLVE? What is b = Lx ? 3x 3 – x 2 – 2x 1 = (x 3 – x 2 ) + 2(x 3 – x 1 ) Ohm’s law: current = voltage × conductance Flows on edges b: residue of electrical current 1Ω 2Ω x2x2 x: voltages at vertices x3x3
13
WHAT IS A SOLVE? find voltages x whose flow meets demand b Intermediate object: flows on edge, f [Kirchoff 1845, Doyle-Snell `84]: f minimizes energy dissipation Energy of f = Σ e r(e)f(e) 2
14
WHAT MAKES SOLVES HARD Densely connected graphs, need: sampling error reduction Long paths, need: Divide-and-conquer Data structures Solvers need to combine both
15
KELNER-ORECCHIA-SIDFORD-ZHU `13: ELECTRICAL FLOW SOLVER Start with some flow f meeting demand b Push flows along cycles in the graph to reduce energy [KOSZ `13]: energy of f approaches optimum ` Algorithmic challenges: How many cycles? Ω(m) How big is each cycle? Ω(n)
16
HOW TO FIND CYCLES? Cycle = tree + edge: Pick a tree Sample off-tree edges ```
17
HOW TO SAMPLE EDGES [KMP `10, `11, KOSZ `13]: Sample edges w.p. proportional to stretch [KOSZ `13]: O(m + S) cycles halves error in expectation ` Stretch = length of edge / length of path in tree Unweighted graph: length of tree path Key quantity: total stretch, S Stretch = 4 Stretch = 3
18
WHAT’S A LOW STRETCH TREE? n 1/2 -by-n 1/2 unit weighted mesh stretch(e)= O(1) total stretch = Ω(n 3/2 ) stretch(e)=Ω(n 1/2 ) Candidate 2: recursive C ‘fractal’Candidate 1: ‘haircomb’: shortest path tree max weight spanning tree O(logn) such layers due to fractal, Total = O(nlogn) Stretch = n 1/2 But only n 1/2 such edges, Contribution = O(n)
19
LOW STRETCH TREES RuntimeStretch AKPW `91O(m log log n)exp((log n log log n) 1/2 ) Bartal 96, 98O(m log n)O(log n log log n) FRT `03O(m log 3 n)O(log n) EEST `05O(m log 2 nO(log 2 n log log n) ABN `08O(m log n)O(log n (log log n) c ) AN `12O(m log n)O(log n log log n) Bartal / FRT trees have steiner vertices [KOSZ `13]: embeddable trees are ok! [CKMPX`14]: Construction in O(mloglogn) time, will assume S = O(mlogn) (actual bounds more intricate)
20
[KOSZ `13] IN A NUTSHELL [KOSZ `13] Converges onFlows Sampling DistributionStretch Sample size1 # updatesO(S + m) Cost per updateO(n) Total (S = O(mlogn))O(mnlogn) O(logn) using data structures O(mlog 2 n)
21
OUTLINE The Problem Approach 1 Approach 2 Combining these
22
NUMERICAL VIEW OF SOLVERS Chebyshev iteration: If L G ≼ L H ≼ k L G, can halve error in L G x = b by solving O(k 1/2 ) problems in H to O(1/poly(k)) accuracy Iterative methods: given H similar to G, solve L G x = b by solving several L H y = r Similar: L G ≼ L H ≼ k L G ≼ : Loewner ordering, A ≼ B x T Ax ≤ x T Bx for all x
23
NUMERICAL VIEWS OF SOLVERS Preconditioner construction view: Given G, find H that’s easier to solve such that L G ≼ L H ≼ k L G Chebyshev iteration (ignoring errors): If L G ≼ L H ≼ k L G, can solve problem in G by solving O(k 1/2 ) problems in H Perturbation stability view: Can adjust edge weights by factor of [1,k] Need to solve O(k 1/2 ) problems on resulting graph
24
GENERATING H [KOSZ `13]: sample 1 edge, reduce error by 1-1/(m + S) Matrix Chernoff: O(S logn) samples gives L G ≼ 2 L H ≼ 3 L G. Can halve error in L G x = b via. O(1) solves in H `` O(m + S) samples halves error
25
S = O(mlogn), m’ > m HOW DOES THIS HELP? `` Go from G to H with m’ = O(Slogn) off tree edges [KMP `10]: take perturbation stability view, scale up tree by some factor k factor k distortion, O(k 1/2 ) iterations S’ = S/k, total stretch decrease by factor of k
26
n - 1 tree edges + m’ off-tree edges Repeatedly remove degree 1 and 2 vertices New size: O(Slogn/k) SIZE REDUCTION `` Scale up tree by factor of k so S’ = S/k Sample m’ = O(S’logn) = O(Slogn/k) edges T(m, s) = O(k 1/2 )(m + T(Slogn / k, S/k)) (can show: errors don’t accumulate in recursion)
27
TWO TERM RECURRENCE? T(m, s) = O(k 1/2 )(m + T(Slogn / k, S/k)) Key invariant from [KMP `11]: ‘spine-heavy’: S = m/O(logn) These are really the same parameter! W-cycle algorithm:
28
T(m) = O(k 1/2 )(m + T(m / k)) = O(m) TIME ON SPINE HEAVY GRAPHS Low stretch spanning tree: S ini = mlogn Initial scaling: O(log 2 n), O(logn) overhead More important runtime parameter: how small of S to get O(m) runtime?
29
NUMERICAL SOLVER [KMP `10, `11] Converges onVectors Sampling DistributionStretch Cost per updateO(1) amortized* O(m) ‘steps’ whenS = O(m/logn) Increase when S kSO(k 1/2 ) Total (S = O(mlogn))O(mlogn) *All updates are matrix-vector multiplications Byproduct of this view: solving to O(log c n) error is `easy’, expected convergence become w.h.p. via. checking with O((loglogn) c ) overhead
30
NUMERICAL VS. COMBINATORIAL [KMP `10, `11][KOSZ `13] Converges onVectorsFlows Sampling DistributionStretch Cost per updateO(1) amortized O(logn) O(m) ‘steps’ whenS = O(m/logn)S = O(m) Increase when S kSO(k 1/2 )O(k) Total (S = O(mlogn))O(mlogn)O(mlog 2 n)
31
COMMONALITY: RANDOMNESS [KMP `10, `11][KOSZ `13] Sampling DistributionStretch Analysis MethodMatrix Chernoff + black box numerical methods Expected convergence, single potential function Sample Countarbitrary, O(m)1 Overhead needed for convergence O(logn) (from union bound on n dimensions) O(1) AnalogsMultigrid methods Preconditioning Stochastic methods Kaczmarz iteration Randomized descent Reason: need to handle complete graph, easiest expander constructions are randomized Stretch is the ‘right’ parameter when we use trees due to the Sherman-Morrison formula
32
NUMERICAL VS. COMBINATORIAL [KMP `10, `11][KOSZ `13] Cost per updateO(1) amortizedO(logn) O(m) ‘steps’ whenS = O(m/logn)S = O(m) Increase when S kS O(k 1/2 )O(k) + Sublinear sample count + Recursive, O(1) per udpate - L G ≼ L H ≼ k L G, O(logn) overhead + Overhead: (S / m) 1/2 - Recursive error propagation - Ω(m) samples - Data structure, O(logn) + Adaptive convergence - Linear dependence on S + Simple error propagation
33
COMBINE THESE? This Talk Converges onVectors and Flows Sampling DistributionStretch Cost per updateO(1) amortized O(m) ‘steps’ whenS = O(m) Increase when S kSO(k 1/2 ) Total (S = O(mlogn))O(mlog 1/2 n) Can the adaptive convergence guarantees work with the recursive Chebyshev? Consequence : T(m, s) = O(k 1/2 )(m + T(S / k, S / k)))
34
OUTLINE The Problem Approach 1 Approach 2 Combining these
35
DIRECT MODIFICATION Use Chebyshev iteration outside of [KOSZ `13]? [KOSZ `13] Thought Experiment Converges onFlows Sampling DistributionStretch Cost per updateO(logn) O(m) ‘steps’ whenS = O(m) Increase when S kSO(k) Total (S = O(mlogn))O(mlog 2 n) O(k 1/2 ) O(mlog 3/2 n) ???????????? Within (loglogn) c of [LS`13], which also uses Chebyshev-like methods Our analyses do make this rigorous
36
CHALLENGES [KMP `10, `11][KOSZ `13] Sample sizem / k1 Converges onVectorsFlows Cost per update O(1) amortized O(logn) 1.Multiple updates, instead of a few 2.Flows vs. voltages 3.Remove data structure overhead
37
Each cycle updates lowers flow energy Changing one after another is still valid flow update when considering both Updating together can only be better! BATCHED CYCLE UPDATES `` This only gives flows, Chebyshev iteration works with voltages
38
MULTI-EDGE SAMPLING ` Expected error reduction from [KOSZ `13] extends to multiple edges, in voltage space Major modifications: Voltages instead of flows Sublinear # of samples O(m + S) O(S) O(m) term goes into iterative method Only prove this for iterative refinement Reduce to black-box Chebyshev Simpler analysis, at cost of (loglogn) c
39
WHAT HAPPENED TO THE DATA STRUCTURE? Path updates/queries in a tree in O(logn) time Amortized into: Vertex elimination Recursion
40
GETTING RID OF LOG Call structure is ‘bottom light’: Total size of bottom level calls is o(m) Cost dominated by top level size, O(m), instead of being same at all O(logn) levels Recursive Chebyshev: T(m) = k 1/2 (m + T(m / k))
41
MLOG 1/2 N [KMP `10, `11][KOSZ `13][CKMPPRX`14] Converges onVectorsFlowsVectors/Flows Sampling Distribution Stretch Cost per updateO(1) amortizedO(logn)O(1) amortized O(m) ‘steps’ whenS = O(m/logn)S = O(m)S = O(m/(loglogn) c ) Increase when S kSO(k 1/2 )O(k)O(k 1/2 ) Total (S = O(mlogn))O(mlogn)O(mlog 2 n)O(mlog 1/2 (loglogn) c n) Expected convergence as shown in [KOSZ `13] Is a general phenomenon Interacts well with numerical algorithms in recursive settings
42
HIDDEN UNDER THE RUG Error propagation in recursion Vector based analysis of Chebyshev iteration (instead of operators) Numerical stability analysis Construction of better trees by discounting higher stretch
43
OPEN PROBLEMS loglogn factors: how many are real? Extend this analysis? Numerical stability: double precision ok? O(m) time algorithm? (maybe with O(mlogn) pre-processing?) Other notions of stretch? Beyond Laplacians Code packages 2.0.
44
THANK YOU! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.