Matrix Martingales in Randomized Numerical Linear Algebra

Slides:

Advertisements

Similar presentations

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Advertisements

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Matroid Bases and Matrix Concentration

Sparse Approximations

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Solving Laplacian Systems: Some Contributions from Theoretical Computer Science Nick Harvey UBC Department of Computer Science.

Uniform Sampling for Matrix Approximation Michael Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford M.I.T.

Solving linear systems through nested dissection Noga Alon Tel Aviv University Raphael Yuster University of Haifa.

Matrix Concentration Nick Harvey University of British Columbia TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

Lecture 17 Introduction to Eigenvalue Problems

The loss function, the normal equation,

CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Manish Singh Vaibhav Rastogi February 7 & 11, 2008.

Graph Sparsifiers: A Survey Nick Harvey Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.

Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:

Semidefinite Programming

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005

Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.

Final Exam Review II Chapters 5-7, 9 Objectives and Examples.

CSE554Laplacian DeformationSlide 1 CSE 554 Lecture 8: Laplacian Deformation Fall 2012.

Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)

Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.

Complexity of direct methods n 1/2 n 1/3 2D3D Space (fill): O(n log n)O(n 4/3 ) Time (flops): O(n 3/2 )O(n 2 ) Time and space to solve any problem on any.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

Graph Sparsifiers Nick Harvey Joint work with Isaac Fung TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

Spectrally Thin Trees Nick Harvey University of British Columbia Joint work with Neil Olver (MIT  Vrije Universiteit) TexPoint fonts used in EMF. Read.

Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.

CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.

Review of Linear Algebra Optimization 1/16/08 Recitation Joseph Bradley.

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

5 5.1 © 2016 Pearson Education, Ltd. Eigenvalues and Eigenvectors EIGENVECTORS AND EIGENVALUES.

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

Richard Peng Georgia Tech Michael Cohen Jon Kelner John Peebles

Lap Chi Lau we will only use slides 4 to 19

Resparsification of Graphs

New Characterizations in Turnstile Streams with Applications

CSCE 411 Design and Analysis of Algorithms

Matrices and vector spaces

Topics in Algorithms Lap Chi Lau.

Approximating the MST Weight in Sublinear Time

Solving Linear Systems Ax=b

Streaming & sampling.

From dense to sparse and back again: On testing graph properties (and some properties of Oded)

Nearly-Linear Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs Richard Peng Georgia Tech.

Background: Lattices and the Learning-with-Errors problem

Density Independent Algorithms for Sparsifying

Chapter 10: Solving Linear Systems of Equations

MST in Log-Star Rounds of Congested Clique

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

CIS 700: “algorithms for Big Data”

CSCI B609: “Foundations of Data Science”

On the effect of randomness on planted 3-coloring models

General Strong Polarization

RECORD. RECORD COLLABORATE: Discuss: Is the statement below correct? Try a 2x2 example.

The loss function, the normal equation,

Mathematical Foundations of BME Reza Shadmehr

Maths for Signals and Systems Linear Algebra in Engineering Lecture 6, Friday 21st October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.

Lecture 6: Counting triangles Dynamic graphs & sampling

A Numerical Analysis Approach to Convex Optimization

Graphs G = (V, E) V are the vertices; E are the edges.

Prepared by Po-Chuan on 2016/05/24

On Solving Linear Systems in Sublinear Time

Optimization on Graphs

Presentation transcript:

Matrix Martingales in Randomized Numerical Linear Algebra Speaker Rasmus Kyng Harvard University Sep 27, 2018

Concentration of Scalar Random Variables 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 =0 Is 𝑋≈0 with high probability?

Concentration of Scalar Random Variables Bernstein’s Inequality Random 𝑋= 𝑖 𝑋 𝑖 , 𝑋 𝑖 ∈ℝ 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 =0 𝑋 𝑖 ≤𝑟 𝑖 𝔼 𝑋 𝑖 2 ≤ 𝜎 2 gives ℙ[ 𝑋 >𝜀]≤ 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 ≤𝜏 E.g. if 𝜀=0.5, and 𝑟, 𝜎 2 =0.1/log(1/𝜏)

Concentration of Scalar Martingales Random 𝑋= 𝑖 𝑋 𝑖 , 𝑋 𝑖 ∈ℝ 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 =0 𝔼 𝑋 𝑖 | 𝑋 1 , … ,𝑋 𝑖−1 =0

Concentration of Scalar Martingales Random 𝑋= 𝑖 𝑋 𝑖 , 𝑋 𝑖 ∈ℝ 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 =0 𝔼 𝑋 𝑖 | previous steps =0

Concentration of Scalar Martingales Freedman’s Inequality Bernstein’s Inequality Random 𝑋= 𝑖 𝑋 𝑖 , 𝑋 𝑖 ∈ℝ 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 =0 𝑋 𝑖 ≤𝑟 𝑖 𝔼 𝑋 𝑖 2 ≤ 𝜎 2 gives ℙ[ 𝑋 >𝜀]≤2 exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 𝔼 𝑋 𝑖 | previous steps =0 𝑖 𝔼 𝑋 𝑖 2 | previous steps ≤ 𝜎 2 ℙ 𝑖 𝔼 𝑋 𝑖 2 | previous steps > 𝜎 2 ≤𝛿 +𝛿

Concentration of Scalar Martingales Freedman’s Inequality Random 𝑋= 𝑖 𝑋 𝑖 , 𝑋 𝑖 ∈ℝ 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 | previous steps =0 𝑋 𝑖 ≤𝑟 ℙ[ 𝑖 𝔼 𝑋 𝑖 2 | previous steps > 𝜎 2 ]≤𝛿 gives ℙ[ 𝑋 >𝜀]≤ 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 +𝛿

Concentration of Matrix Random Variables Matrix Bernstein’s Inequality (Tropp 11) Random 𝑿= 𝑖 𝑿 𝑖 𝑿 𝑖 ∈ ℝ 𝑑×𝑑 , symmetric 𝑿 𝑖 are independent 𝔼 𝑿 𝑖 =𝟎 𝑿 𝑖 ≤𝑟 𝑖 𝔼 𝑿 𝑖 2 ≤ 𝜎 2 gives ℙ[ 𝑿 >𝜖]≤𝑑 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2

Concentration of Matrix Martingales Matrix Freedman’s Inequality (Tropp 11) Random 𝑿= 𝑖 𝑿 𝑖 𝑿 𝑖 ∈ ℝ 𝑑×𝑑 , symmetric 𝑿 𝑖 are independent 𝔼 𝑿 𝑖 =0 𝔼 𝑿 𝑖 | previous steps =0 𝑿 𝑖 ≤𝑟 𝑖 𝔼 𝑿 𝑖 2 ≤ 𝜎 2 ℙ[ 𝑖 𝔼 𝑿 𝑖 2 | prev. steps > 𝜎 2 ]≤𝛿 gives ℙ[ 𝑿 >𝜀]≤𝑑 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 +𝛿 E.g. if 𝜀=0.5, and 𝑟, 𝜎 2 =0.1/log(𝑑/𝜏) ≤𝜏+𝛿

Concentration of Matrix Martingales Matrix Freedman’s Inequality (Tropp 11) ℙ[ 𝑖 𝔼 𝑿 𝑖 2 | prev. steps > 𝜎 2 ]≤𝛿 Predictable quadratic variation = 𝑖 𝔼 𝑿 𝑖 2 | prev. steps

Laplacian Matrices 𝑏 𝑎 Graph 𝐺=(𝑉,𝐸) Edge weights 𝑤:𝐸→ ℝ + 𝑛= 𝑉 𝑚=|𝐸| 𝑐 𝐿= 𝑒∈𝐸 𝐿 𝑒 𝑛×𝑛 matrix

Laplacian Matrices 𝑏 𝑎 Graph 𝐺=(𝑉,𝐸) Edge weights 𝑤:𝐸→ ℝ + 𝑛= 𝑉 𝑚=|𝐸| 𝑐 𝐿= 𝑒∈𝐸 𝐿 𝑒 𝑛×𝑛 matrix

Laplacian Matrices Graph 𝐺=(𝑉,𝐸) Edge weights 𝑤:𝐸→ ℝ + 𝑛= 𝑉 𝑚=|𝐸| 𝑨: weighted adjacency matrix of the graph 𝑫: diagonal matrix of weighted degrees 𝑳=𝑫−𝑨 𝑨 𝑖𝑗 = 𝑤 𝑖𝑗 𝑫 𝑖𝑖 = 𝑗 𝑤 𝑖𝑗

Laplacian Matrices Symmetric matrix 𝐿 All off-diagonals are non-positive and 𝐿 𝑖𝑖 = 𝑗≠𝑖 𝐿 𝑖𝑗

Laplacian of a Graph 1 1 2

Approximation between graphs PSD Order 𝑨≼𝑩 iff for all 𝒙 𝒙 ⊤ 𝑨𝒙≤ 𝒙 ⊤ 𝑩𝒙 Matrix Approximation 𝑨 ≈ 𝜀 𝑩 iff 𝑨≼ 1+𝜀 𝑩 and 𝑩≼ 1+𝜀 𝑨 Graphs 𝐺 ≈ 𝜀 𝐻 iff 𝑳 𝐺 ≈ 𝜀 𝑳 𝐻 Implies multiplicative approximation to cuts

Sparsification of Laplacians Every graph 𝐺 (0) on 𝑛 vertices for any 𝜀∈ 0,1 , has a sparsifier 𝐺 (1) ≈ 𝜀 𝐺 (0) with 𝑂(𝑛 𝜀 −2 ) edges [BSS’09,LS’17] Sampling edges independently by leverage scores w.h.p. gives sparsifier with 𝑂(𝑛 𝜀 −2 log 𝑛 ) edges [Spielman-Srivastava’08] Many applications!

Graph Semi-Streaming Sparsification 𝑚 edges of a graph arriving as stream 𝑎 1 , 𝑏 1 , 𝑤 1 , 𝑎 2 , 𝑏 2 , 𝑤 2 ,… GOAL: maintain 𝜀-sparsifier, minimize working space (& time) [Ahn-Guha ’09] 𝑛 𝜀 −2 log 𝑛 log 𝑚 𝑛 space 𝑛 𝜀 −2 log 𝑛 log 𝑚 𝑛 edge cut 𝜀-sparsifier Used scalar martingales

Independent Edge Samples Expectation 𝔼 𝑳 𝑒 (1) = 𝑳 𝑒 (0) Zero-mean 𝑿 𝑒 (1) = 𝑳 𝑒 (1) − 𝑳 𝑒 (0) 𝑿 (1) = 𝑳 (1) − 𝑳 0 = 𝑒 𝑿 𝑒 (1) 𝑳 𝑒 (1) = 1 𝑝 𝑒 𝑳 𝑒 (0) 1 𝑝 𝑒 𝟎 w. probability 𝑝 𝑒 o.w.

Matrix Martingale? Zero mean? So what if we just sparsify again? It’s a martingale! Every time we accumulate 20 𝑛 𝜀 −2 log 𝑛, sparsify down to 10 𝑛 𝜀 −2 log 𝑛, and continue

Independent Edge Samples Expectation 𝔼 𝑳 𝑒 (1) = 𝑳 𝑒 (0) Zero-mean 𝑿 𝑒 (1) = 𝑳 𝑒 (1) − 𝑳 𝑒 (0) 𝑿 (1) = 𝑒 𝑿 𝑒 (1) 𝑳 𝑒 (1) = 1 𝑝 𝑒 𝑳 𝑒 (0) 1 𝑝 𝑒 𝟎 w. probability 𝑝 𝑒 o.w.

Repeated Edge Sampling Expectation 𝔼 𝑳 𝑒 (𝑖) = 𝑳 𝑒 (𝑖−1) Zero-mean 𝑿 𝑒 (𝑖) = 𝑳 𝑒 (𝑖) − 𝑳 𝑒 (𝑖−1) 𝑿 (𝑖) = 𝑳 (𝑖) − 𝑳 (𝑖−1) = 𝒆 𝑿 𝑒 (𝑖) 𝑳 𝑒 (𝑖) = 1 𝑝 𝑒 (𝑖−1) 𝑳 𝑒 (𝑖−1) 1 𝑝 𝑒 𝟎 w. prob. 𝑝 𝑒 (𝑖−1) o.w.

Repeated Edge Sampling Expectation 𝔼 𝑳 𝑒 (𝑖) = 𝑳 𝑒 (𝑖−1) Zero-mean 𝑿 𝑒 (𝑖) = 𝑳 𝑒 (𝑖) − 𝑳 𝑒 (𝑖−1) 𝑿 (𝑖) = 𝑳 (𝑖) − 𝑳 (𝑖−1) = 𝒆 𝑿 𝑒 (𝑖) 𝑳 𝑒 (𝑖) = 1 𝑝 𝑒 (𝑖−1) 𝑳 𝑒 (𝑖−1) 1 𝑝 𝑒 𝟎 w. prob. 𝑝 𝑒 (𝑖−1) o.w.

Repeated Edge Sampling Expectation 𝔼 𝑳 𝑒 (𝑖) = 𝑳 𝑒 (𝑖−1) Zero-mean 𝑿 𝑒 (𝑖) = 𝑳 𝑒 (𝑖) − 𝑳 𝑒 (𝑖−1) 𝑿 (𝑖) = 𝑳 (𝑖) − 𝑳 (𝑖−1) = 𝒆 𝑿 𝑒 (𝑖) Martingale 𝔼 𝑿 𝑖 prev. steps =𝟎 𝑳 𝑒 (𝑖) = 1 𝑝 𝑒 (𝑖−1) 𝑳 𝑒 (𝑖−1) 1 𝑝 𝑒 𝟎 w. prob. 𝑝 𝑒 (𝑖−1) o.w.

Repeated Edge Sampling It works – and it couldn’t be simpler [K.PPS’17] 𝑛 𝜀 −2 log 𝑛 space 𝑛 𝜀 −2 log 𝑛 edge 𝜀-sparsifier Shave a log off many algorithms that do repeated matrix sampling…

Approximation? 𝑳 𝐻 ≈ 0.5 𝑳 𝐺 𝑳 𝐻 ≼1.5 𝑳 𝐺 and 𝑳 𝐺 ≼1.5 𝑳 𝐻 𝑳 𝐻 − 𝑳 𝐺 ≤0.5 ? 𝑳 𝐺 −1/2 (𝑳 𝐻 − 𝑳 𝐺 ) 𝑳 𝐺 −1/2 ≤0.1

Isotropic position 𝒁 ≝ 𝑳 𝐺 −1/2 𝒁 𝑳 𝐺 −1/2 Goal is now 𝑳 𝐻 − 𝑳 𝐺 ≤0.1 𝑳 𝐻 −𝑰 ≤0.1

Concentration of Matrix Martingales Matrix Freedman’s Inequality Random 𝑿 = 𝑖 𝑒 𝑿 𝑒 (𝑖) 𝑿 𝑒 (𝑖) ∈ ℝ 𝑑×𝑑 , symmetric 𝔼 𝑿 𝑒 (𝑖) | previous steps =0 𝑿 𝑒 (𝑖) ≤𝑟 ℙ 𝑖 𝑒 𝔼 𝑿 𝑒 𝑖 2 | prev. steps > 𝜎 2 ≤𝛿 gives ℙ[ 𝑿 >𝜀]≤𝑑 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 +𝛿

Concentration of Matrix Martingales Matrix Freedman’s Inequality Random 𝑿 = 𝑖 𝑒 𝑿 𝑒 (𝑖) 𝑿 𝑒 (𝑖) ∈ ℝ 𝑑×𝑑 , symmetric 𝔼 𝑿 𝑒 (𝑖) | previous steps =0 𝑿 𝑒 (𝑖) ≤𝑟 ℙ 𝑖 𝑒 𝔼 𝑿 𝑒 𝑖 2 | prev. steps > 𝜎 2 ≤𝛿 gives ℙ[ 𝑿 >𝜀]≤𝑑 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 +𝛿

Norm Control Can there be many edges with large norm? 𝑒 𝑳 𝑒 (0) = 𝑒 Tr 𝑳 𝑒 0 =Tr 𝑒 𝑳 𝑒 0 = Tr 𝑳 (0) =Tr 𝑰 = 𝑛−1 Can afford 𝑟= 1 log 𝑛 , 𝑝 𝑒 = 𝑳 𝑒 0 log 𝑛 , 𝑒 𝑝 𝑒 =𝑛 log 𝑛 𝑳 𝑒 (1) = 1 𝑝 𝑒 𝑳 𝑒 (0) 1 𝑝 𝑒 𝟎 w. probability 𝑝 𝑒 o.w.

Norm Control 𝑿 𝑒 (𝑖) ≤𝑟 If we lose track of the original graph, we can’t estimate the norm Stopping time argument Bad regions

Variance Control 𝔼 𝑿 𝑒 1 2 ≼𝑟 𝑳 𝑒 (0) 𝑒 𝔼 𝑿 𝑒 1 2 ≼ 𝑒 𝑟 𝑳 𝑒 (0) =𝑟𝑰 But then edges get resampled. Geometric sequence for variance, roughly.

Resparsification Every time we accumulate 20 𝑛 𝜀 −2 log 𝑛, sparsify down to 10 𝑛 𝜀 −2 log 𝑛, and continue [K.PPS’17] 𝑛 𝜀 −2 log 𝑛 space 𝑛 𝜀 −2 log 𝑛 edge 𝜀-sparsifier Shave a log off many algorithms that do repeated matrix sampling…

Laplacian Linear Equations [ST04]: solving Laplacian linear equations in 𝑂 𝑚 time ⋮ [KS16]: simple algorithm

Solving a Laplacian Linear Equation 𝑳𝒙=𝒃 Gaussian Elimination Find 𝖀, upper triangular matrix, s.t. 𝖀 ⊤ 𝖀=𝑳 Then 𝒙= 𝖀 −1 𝖀 −⊤ 𝒃 Easy to apply 𝖀 −1 and 𝖀 −⊤

Solving a Laplacian Linear Equation 𝑳𝒙=𝒃 Approximate Gaussian Elimination Find 𝖀, upper triangular matrix, s.t. 𝖀 ⊤ 𝖀 ≈ 0.5 𝑳 𝖀 is sparse. 𝑂 log 1 𝜀 iterations to get 𝜀-approximate solution 𝒙 .

Approximate Gaussian Elimination Theorem [KS] When 𝑳 is an Laplacian matrix with 𝑚 non-zeros, we can find in 𝑂(𝑚 log 3 𝑛 ) time an upper triangular matrix 𝖀 with 𝑂(𝑚 log 3 𝑛 ) non-zeros, s.t. w.h.p. 𝖀 ⊤ 𝖀 ≈ 𝟎.𝟓 𝑳

Additive View of Gaussian Elimination Find 𝖀, upper triangular matrix, s.t 𝖀 ⊤ 𝖀=𝑴 𝑴=

Additive View of Gaussian Elimination Find the rank-1 matrix that agrees with 𝑴 on the first row and column.

Additive View of Gaussian Elimination Subtract the rank 1 matrix. We have eliminated the first variable.

Additive View of Gaussian Elimination The remaining matrix is PSD.

Additive View of Gaussian Elimination Find rank-1 matrix that agrees with our matrix on the next row and column.

Additive View of Gaussian Elimination Subtract the rank 1 matrix. We have eliminated the second variable.

Additive View of Gaussian Elimination Repeat until all parts written as rank 1 terms. 𝑴=

Additive View of Gaussian Elimination Repeat until all parts written as rank 1 terms. 𝑴=

Additive View of Gaussian Elimination Repeat until all parts written as rank 1 terms. 𝑴= =𝖀 ⊤ 𝖀

Additive View of Gaussian Elimination What is special about Gaussian Elimination on Laplacians? The remaining matrix is always Laplacian. 𝑳=

Additive View of Gaussian Elimination What is special about Gaussian Elimination on Laplacians? The remaining matrix is always Laplacian. 𝑳=

Additive View of Gaussian Elimination What is special about Gaussian Elimination on Laplacians? The remaining matrix is always Laplacian. 𝑳= A new Laplacian!

Why is Gaussian Elimination Slow? Solving 𝑳𝒙=𝒃 by Gaussian Elimination can take Ω 𝑛 3 time. The main issue is fill 𝑳= 𝑣

Why is Gaussian Elimination Slow? Solving 𝑳𝒙=𝒃 by Gaussian Elimination can take Ω 𝑛 3 time. The main issue is fill 𝑳= 𝑣

Why is Gaussian Elimination Slow? Solving 𝑳𝒙=𝒃 by Gaussian Elimination can take Ω 𝑛 3 time. The main issue is fill 𝑳= New Laplacian 𝑣 Elimination creates a clique on the neighbors of 𝑣

Why is Gaussian Elimination Slow? Solving 𝑳𝒙=𝒃 by Gaussian Elimination can take Ω 𝑛 3 time. The main issue is fill 𝑳= New Laplacian 𝑣 ≈ Laplacian cliques can be sparsified!

Gaussian Elimination Pick a vertex 𝑣 to eliminate Add the clique created by eliminating 𝑣 Repeat until done

Approximate Gaussian Elimination Pick a vertex 𝑣 to eliminate Add the clique created by eliminating 𝑣 Repeat until done

Approximate Gaussian Elimination Pick a random vertex 𝑣 to eliminate Add the clique created by eliminating 𝑣 Repeat until done

Approximate Gaussian Elimination Pick a random vertex 𝑣 to eliminate Sample the clique created by eliminating 𝑣 Repeat until done Resembles randomized Incomplete Cholesky

How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3

How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3

How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3

How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3

How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3

How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3

How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3

How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3

How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3

Approximating Matrices by Sampling Goal 𝖀 ⊤ 𝖀≈𝑳 Approach 𝔼 𝖀 ⊤ 𝖀=𝑳 Show 𝖀 ⊤ 𝖀 concentrated around expectation Gives 𝖀 ⊤ 𝖀≈𝑳 w. high probability

Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝑳 (0) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Original Laplacian Rank 1 term Remaining graph + clique

Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝐿 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Rank 1 term Remaining graph + sparsified clique

Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝑳 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Rank 1 term Remaining graph + sparsified clique

Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝑳 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Rank 1 term Remaining graph + sparsified clique

Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝑳 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Rank 1 term Remaining graph + sparsified clique 𝔼 𝐿 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Suppose

Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝑳 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Rank 1 term Remaining graph + sparsified clique Then = 𝑳 (0)

Approximating Matrices in Expectation Let 𝑳 (𝑖) be our approximation after 𝑖 eliminations If we ensure at each step Then 𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ = ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Sparsified clique Clique 𝔼 𝑳 (𝑖) = 𝑳 (𝑖−1) 𝔼 𝑳 𝑖 − 𝑳 𝑖−1 | previous steps =𝟎

Predictable Quadratic Variation Predictable quadratic variation 𝑾= 𝑖 𝑒 𝔼 𝑿 𝑒 𝑖 2 | prev. steps Want to show ℙ 𝑾 > 𝜎 2 ≤𝛿 Promise: 𝔼 𝑿 𝑒 2 ≼𝑟 𝑳 𝑒

Sample Variance 𝑣 𝑣 ≼ 𝔼 𝑣 𝑒∈ 𝑣 𝑳 𝑒 ≼ 𝔼 𝑣 𝑒∋𝑣 𝑳 𝑒 elim. clique of

Sample Variance 𝔼 𝑣 𝑒∈ 𝑣 𝑳 𝑒 ≼ 𝔼 𝑣 𝑒∋𝑣 𝑳 𝑒 = 2 current lap. ≼ 4 𝑳 = 4𝑰 𝔼 𝑣 𝑒∈ 𝑣 𝑳 𝑒 ≼ 𝔼 𝑣 𝑒∋𝑣 𝑳 𝑒 elim. clique of = 2 current lap. # vertices ≼ 4 𝑳 # vertices = 4𝑰 # vertices WARNING: only true w.h.p.

Sample Variance Recall 𝔼 𝑿 𝑒 2 ≼𝑟 𝑳 𝑒 Putting it together 𝔼 𝑣 𝑒∈ 𝑣 𝑳 𝑒 ≼ 𝔼 𝑣 𝑒∋𝑣 𝑳 𝑒 elim. clique of 𝔼 𝑣 𝑒∈ 𝑣 𝔼 𝑿 𝑒 2 ≼ =𝑟 4𝑰 # vertices = 4𝑰 # vertices elim. clique of variance in one round of elimination

Sample Variance ≼ 𝑟⋅ 4𝑰 # vertices variance ≼4 𝑟 log 𝑛 ⋅𝑰 rounds of elimination variance 𝑟⋅ 4𝑰 # vertices rounds of elimination ≼4 𝑟 log 𝑛 ⋅𝑰

Directed Laplacian of a Graph 1 1 2

(Weighted) Eulerian Graph weighted in-degree = weighted out-degree 1 2 1

Eulerian Laplacian of a Graph 1 1 1

Solving an Eul. Laplacian Linear Equation 𝑳𝒙=𝒃 Gaussian Elimination Find 𝕷,𝖀, lower and upper triangular matrices, s.t. 𝕷𝖀=𝑳 Then 𝒙= 𝖀 −1 𝕷 −1 𝒃 Easy to apply 𝖀 −1 and 𝕷 −1

Solving an Eul. Laplacian Linear Equation 𝑳𝒙=𝒃 Approximate Gaussian Elimination [CKKPPRS FOCS’18] Find 𝕷,𝖀, lower and upper triangular matrices, s.t. 𝕷𝖀≈𝑳 𝕷, 𝖀 are sparse. 𝑂 log 1 𝜀 iterations to get 𝜀-approximate solution 𝒙 .

Gaussian Elimination on Eulerian Laplacians 𝑳= 1 2 𝑣 1 1 1

Gaussian Elimination on Eulerian Laplacians 𝑳= 2/3 1 2 𝑣 1 1/3 1 1

Gaussian Elimination on Eulerian Laplacians 𝑳= 2/3 1 2 𝑣 1 1/3 1 1

Gaussian Elimination on Eulerian Laplacians 𝑳= 1 2 𝑣 1 1 1

Gaussian Elimination on Eulerian Laplacians Sparsify correct in expectation output always Eulerian 𝑳= 1 1 ≈ 1

Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 2 𝑣 1 1 1 ≈

Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 2 𝑣 1 1 1 ≈

Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 2 𝑣 1 1 1 1 ≈

Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 𝑣 1 1 1 1 ≈

Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 𝑣 1 1 1 1 ≈

Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 𝑣 1 1 1 1 ≈ 1

Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 𝑣 1 1 1 ≈ 1

Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 𝑣 1 1 ≈ 1

Gaussian Elimination on Eulerian Laplacians Our sparsification correct in expectation output always Eulerian Average over log 3 (𝑛) runs to reduce variance 1 1 ≈ 1

Approximating Matrices by Sampling Goal 𝕷𝖀≈𝑳 Approach 𝔼 𝕷𝖀=𝑳 Show 𝕷𝖀 concentrated around expectation Gives 𝕷𝖀≈𝑳 w. high probability (?) But what does ≈ mean? 𝑳 is not PSD?!? Come to my talk and I’ll tell you!

Difficulties with Approximation 1 1 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 1

Difficulties with Approximation 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Difficulties with Approximation 1 1 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 1 Variance manageable

Difficulties with Approximation 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Variance problematic!

Difficulties with Approximation Come to my talk and hear the rest of the story! [Cohen-Kelner-Kyng-Peebles-Peng-Rao-Sidford FOCS’18] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Spanning Trees of a Graph

Spanning Trees of a (Multi)-Graph Pick a random tree?

Random Spanning Trees Graph 𝐺 Tree distribution Marginal probability of edge: 𝑝 𝑒 = 𝑳 𝑒 Similar to sparsification?!

Random Spanning Trees Graph 𝐺 Tree distribution Marginal probability of edge: 𝑝 𝑒 = 𝑳 𝑒 𝑳 𝐺 = 𝑒∈𝐺 𝑳 𝑒 Consider 𝑳 𝑇 = 𝑒∈𝑇 1 𝑝 𝑒 𝑳 𝑒 𝔼 𝑳 𝑇 = 𝑳 𝐺

Random Spanning Trees Concentration? 𝑳 𝑇 ≈ 0.5 𝑳 𝐺 ? 𝑳 𝑇 ≼log 𝑛 𝑳 𝐺 ? 1 𝑡 𝑖=1 𝑡 𝑳 𝑇 𝑖 ≈ 𝜀 𝑳 𝐺 , 𝑡= 𝜀 −2 log 2 𝑛 [Kyng-Song FOCS’18] Come see my talk! NO YES, w.h.p

Sometimes You Get a Martingale for Free Random variables 𝑍 1 ,…, 𝑍 𝑘 Prove concentration for 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) where 𝑓 is ``stable” under small changes to 𝑍 1 ,…, 𝑍 𝑘 𝑓 𝑍 1 ,…, 𝑍 𝑘 ≈𝔼𝑓 𝑍 1 ,…, 𝑍 𝑘 ?

Doob Martingales NOT indep. Random variables 𝑍 1 ,…, 𝑍 𝑘 Prove concentration for 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) where 𝑓 is ``stable” under small changes to 𝑍 1 ,…, 𝑍 𝑘 Pick outcome 𝑧 1 ,…, 𝑧 𝑘 𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) → 𝔼 𝑓 𝑍 1 ,…, 𝑍 𝑘 | 𝑍 1 = 𝑧 1 → 𝔼 𝑓 𝑍 1 ,…, 𝑍 𝑘 | 𝑍 1 = 𝑧 1 , 𝑍 2 = 𝑧 2 →𝑓 𝑧 1 ,…, 𝑧 𝑘 NOT indep.

Doob Martingales 𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) → 𝔼 𝑓 𝑍 1 ,…,𝑘 | 𝑍 1 = 𝑧 1 → 𝔼 𝑓 𝑍 1 ,…, 𝑍 𝑘 | 𝑍 1 = 𝑧 1 , 𝑍 2 = 𝑧 2 →𝑓 𝑧 1 ,…, 𝑧 𝑘 𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) =𝔼 𝑧 1 𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 )| 𝑍 1 = 𝑧 1 𝑌 0 =𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) 𝑌 1 =𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 )| 𝑍 1 = 𝑧 1 𝑌 2 =𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 )| 𝑍 1 = 𝑧 1 , 𝑍 2 = 𝑧 2 𝔼 𝑌 𝑖+1 − 𝑌 𝑖 |prev. steps =0 Martingale! Despite 𝑍 1 ,…, 𝑍 𝑘 NOT indep.

Concentration of Random Spanning Trees Random variables 𝑍 1 ,…, 𝑍 𝑛−1 Positions of 𝑛−1 edges of random spanning tree 𝑓 𝑍 1 ,…, 𝑍 𝑛−1 = 𝑳 𝑇 Similar to [Peres-Pemantle ‘14]

Conditioning Changes Distributions? Graph Tree distribution All

Conditioning Changes Distributions? Graph Tree distribution All Conditional

Summary Matrix martingales: a natural fit for algorithmic analysis Understanding the Predictable Quadratic Variation is key Some results using matrix martingales Cohen, Musco, Pachocki ’16 – online row sampling Kyng, Sachdeva ’17 – approximate Gaussian elimination Kyng, Peng, Pachocki, Sachdeva ’17 – semi-streaming graph sparsification Cohen, Kelner, Kyng, Peebles, Peng, Rao, Sidford ’18 – solving Directed Laplacian eqs. Kyng, Song ’18 – Matrix Chernoff bound for negatively dependent random variables

Thanks!

This is really the end

Older slides