Matrix Martingales in Randomized Numerical Linear Algebra Speaker Rasmus Kyng Harvard University Sep 27, 2018
Concentration of Scalar Random Variables 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 =0 Is 𝑋≈0 with high probability?
Concentration of Scalar Random Variables Bernstein’s Inequality Random 𝑋= 𝑖 𝑋 𝑖 , 𝑋 𝑖 ∈ℝ 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 =0 𝑋 𝑖 ≤𝑟 𝑖 𝔼 𝑋 𝑖 2 ≤ 𝜎 2 gives ℙ[ 𝑋 >𝜀]≤ 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 ≤𝜏 E.g. if 𝜀=0.5, and 𝑟, 𝜎 2 =0.1/log(1/𝜏)
Concentration of Scalar Martingales Random 𝑋= 𝑖 𝑋 𝑖 , 𝑋 𝑖 ∈ℝ 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 =0 𝔼 𝑋 𝑖 | 𝑋 1 , … ,𝑋 𝑖−1 =0
Concentration of Scalar Martingales Random 𝑋= 𝑖 𝑋 𝑖 , 𝑋 𝑖 ∈ℝ 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 =0 𝔼 𝑋 𝑖 | previous steps =0
Concentration of Scalar Martingales Freedman’s Inequality Bernstein’s Inequality Random 𝑋= 𝑖 𝑋 𝑖 , 𝑋 𝑖 ∈ℝ 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 =0 𝑋 𝑖 ≤𝑟 𝑖 𝔼 𝑋 𝑖 2 ≤ 𝜎 2 gives ℙ[ 𝑋 >𝜀]≤2 exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 𝔼 𝑋 𝑖 | previous steps =0 𝑖 𝔼 𝑋 𝑖 2 | previous steps ≤ 𝜎 2 ℙ 𝑖 𝔼 𝑋 𝑖 2 | previous steps > 𝜎 2 ≤𝛿 +𝛿
Concentration of Scalar Martingales Freedman’s Inequality Random 𝑋= 𝑖 𝑋 𝑖 , 𝑋 𝑖 ∈ℝ 𝑋 𝑖 are independent 𝔼 𝑋 𝑖 | previous steps =0 𝑋 𝑖 ≤𝑟 ℙ[ 𝑖 𝔼 𝑋 𝑖 2 | previous steps > 𝜎 2 ]≤𝛿 gives ℙ[ 𝑋 >𝜀]≤ 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 +𝛿
Concentration of Matrix Random Variables Matrix Bernstein’s Inequality (Tropp 11) Random 𝑿= 𝑖 𝑿 𝑖 𝑿 𝑖 ∈ ℝ 𝑑×𝑑 , symmetric 𝑿 𝑖 are independent 𝔼 𝑿 𝑖 =𝟎 𝑿 𝑖 ≤𝑟 𝑖 𝔼 𝑿 𝑖 2 ≤ 𝜎 2 gives ℙ[ 𝑿 >𝜖]≤𝑑 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2
Concentration of Matrix Martingales Matrix Freedman’s Inequality (Tropp 11) Random 𝑿= 𝑖 𝑿 𝑖 𝑿 𝑖 ∈ ℝ 𝑑×𝑑 , symmetric 𝑿 𝑖 are independent 𝔼 𝑿 𝑖 =0 𝔼 𝑿 𝑖 | previous steps =0 𝑿 𝑖 ≤𝑟 𝑖 𝔼 𝑿 𝑖 2 ≤ 𝜎 2 ℙ[ 𝑖 𝔼 𝑿 𝑖 2 | prev. steps > 𝜎 2 ]≤𝛿 gives ℙ[ 𝑿 >𝜀]≤𝑑 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 +𝛿 E.g. if 𝜀=0.5, and 𝑟, 𝜎 2 =0.1/log(𝑑/𝜏) ≤𝜏+𝛿
Concentration of Matrix Martingales Matrix Freedman’s Inequality (Tropp 11) ℙ[ 𝑖 𝔼 𝑿 𝑖 2 | prev. steps > 𝜎 2 ]≤𝛿 Predictable quadratic variation = 𝑖 𝔼 𝑿 𝑖 2 | prev. steps
Laplacian Matrices 𝑏 𝑎 Graph 𝐺=(𝑉,𝐸) Edge weights 𝑤:𝐸→ ℝ + 𝑛= 𝑉 𝑚=|𝐸| 𝑐 𝐿= 𝑒∈𝐸 𝐿 𝑒 𝑛×𝑛 matrix
Laplacian Matrices 𝑏 𝑎 Graph 𝐺=(𝑉,𝐸) Edge weights 𝑤:𝐸→ ℝ + 𝑛= 𝑉 𝑚=|𝐸| 𝑐 𝐿= 𝑒∈𝐸 𝐿 𝑒 𝑛×𝑛 matrix
Laplacian Matrices Graph 𝐺=(𝑉,𝐸) Edge weights 𝑤:𝐸→ ℝ + 𝑛= 𝑉 𝑚=|𝐸| 𝑨: weighted adjacency matrix of the graph 𝑫: diagonal matrix of weighted degrees 𝑳=𝑫−𝑨 𝑨 𝑖𝑗 = 𝑤 𝑖𝑗 𝑫 𝑖𝑖 = 𝑗 𝑤 𝑖𝑗
Laplacian Matrices Symmetric matrix 𝐿 All off-diagonals are non-positive and 𝐿 𝑖𝑖 = 𝑗≠𝑖 𝐿 𝑖𝑗
Laplacian of a Graph 1 1 2
Approximation between graphs PSD Order 𝑨≼𝑩 iff for all 𝒙 𝒙 ⊤ 𝑨𝒙≤ 𝒙 ⊤ 𝑩𝒙 Matrix Approximation 𝑨 ≈ 𝜀 𝑩 iff 𝑨≼ 1+𝜀 𝑩 and 𝑩≼ 1+𝜀 𝑨 Graphs 𝐺 ≈ 𝜀 𝐻 iff 𝑳 𝐺 ≈ 𝜀 𝑳 𝐻 Implies multiplicative approximation to cuts
Sparsification of Laplacians Every graph 𝐺 (0) on 𝑛 vertices for any 𝜀∈ 0,1 , has a sparsifier 𝐺 (1) ≈ 𝜀 𝐺 (0) with 𝑂(𝑛 𝜀 −2 ) edges [BSS’09,LS’17] Sampling edges independently by leverage scores w.h.p. gives sparsifier with 𝑂(𝑛 𝜀 −2 log 𝑛 ) edges [Spielman-Srivastava’08] Many applications!
Graph Semi-Streaming Sparsification 𝑚 edges of a graph arriving as stream 𝑎 1 , 𝑏 1 , 𝑤 1 , 𝑎 2 , 𝑏 2 , 𝑤 2 ,… GOAL: maintain 𝜀-sparsifier, minimize working space (& time) [Ahn-Guha ’09] 𝑛 𝜀 −2 log 𝑛 log 𝑚 𝑛 space 𝑛 𝜀 −2 log 𝑛 log 𝑚 𝑛 edge cut 𝜀-sparsifier Used scalar martingales
Independent Edge Samples Expectation 𝔼 𝑳 𝑒 (1) = 𝑳 𝑒 (0) Zero-mean 𝑿 𝑒 (1) = 𝑳 𝑒 (1) − 𝑳 𝑒 (0) 𝑿 (1) = 𝑳 (1) − 𝑳 0 = 𝑒 𝑿 𝑒 (1) 𝑳 𝑒 (1) = 1 𝑝 𝑒 𝑳 𝑒 (0) 1 𝑝 𝑒 𝟎 w. probability 𝑝 𝑒 o.w.
Matrix Martingale? Zero mean? So what if we just sparsify again? It’s a martingale! Every time we accumulate 20 𝑛 𝜀 −2 log 𝑛, sparsify down to 10 𝑛 𝜀 −2 log 𝑛, and continue
Independent Edge Samples Expectation 𝔼 𝑳 𝑒 (1) = 𝑳 𝑒 (0) Zero-mean 𝑿 𝑒 (1) = 𝑳 𝑒 (1) − 𝑳 𝑒 (0) 𝑿 (1) = 𝑒 𝑿 𝑒 (1) 𝑳 𝑒 (1) = 1 𝑝 𝑒 𝑳 𝑒 (0) 1 𝑝 𝑒 𝟎 w. probability 𝑝 𝑒 o.w.
Repeated Edge Sampling Expectation 𝔼 𝑳 𝑒 (𝑖) = 𝑳 𝑒 (𝑖−1) Zero-mean 𝑿 𝑒 (𝑖) = 𝑳 𝑒 (𝑖) − 𝑳 𝑒 (𝑖−1) 𝑿 (𝑖) = 𝑳 (𝑖) − 𝑳 (𝑖−1) = 𝒆 𝑿 𝑒 (𝑖) 𝑳 𝑒 (𝑖) = 1 𝑝 𝑒 (𝑖−1) 𝑳 𝑒 (𝑖−1) 1 𝑝 𝑒 𝟎 w. prob. 𝑝 𝑒 (𝑖−1) o.w.
Repeated Edge Sampling Expectation 𝔼 𝑳 𝑒 (𝑖) = 𝑳 𝑒 (𝑖−1) Zero-mean 𝑿 𝑒 (𝑖) = 𝑳 𝑒 (𝑖) − 𝑳 𝑒 (𝑖−1) 𝑿 (𝑖) = 𝑳 (𝑖) − 𝑳 (𝑖−1) = 𝒆 𝑿 𝑒 (𝑖) 𝑳 𝑒 (𝑖) = 1 𝑝 𝑒 (𝑖−1) 𝑳 𝑒 (𝑖−1) 1 𝑝 𝑒 𝟎 w. prob. 𝑝 𝑒 (𝑖−1) o.w.
Repeated Edge Sampling Expectation 𝔼 𝑳 𝑒 (𝑖) = 𝑳 𝑒 (𝑖−1) Zero-mean 𝑿 𝑒 (𝑖) = 𝑳 𝑒 (𝑖) − 𝑳 𝑒 (𝑖−1) 𝑿 (𝑖) = 𝑳 (𝑖) − 𝑳 (𝑖−1) = 𝒆 𝑿 𝑒 (𝑖) Martingale 𝔼 𝑿 𝑖 prev. steps =𝟎 𝑳 𝑒 (𝑖) = 1 𝑝 𝑒 (𝑖−1) 𝑳 𝑒 (𝑖−1) 1 𝑝 𝑒 𝟎 w. prob. 𝑝 𝑒 (𝑖−1) o.w.
Repeated Edge Sampling It works – and it couldn’t be simpler [K.PPS’17] 𝑛 𝜀 −2 log 𝑛 space 𝑛 𝜀 −2 log 𝑛 edge 𝜀-sparsifier Shave a log off many algorithms that do repeated matrix sampling…
Approximation? 𝑳 𝐻 ≈ 0.5 𝑳 𝐺 𝑳 𝐻 ≼1.5 𝑳 𝐺 and 𝑳 𝐺 ≼1.5 𝑳 𝐻 𝑳 𝐻 − 𝑳 𝐺 ≤0.5 ? 𝑳 𝐺 −1/2 (𝑳 𝐻 − 𝑳 𝐺 ) 𝑳 𝐺 −1/2 ≤0.1
Isotropic position 𝒁 ≝ 𝑳 𝐺 −1/2 𝒁 𝑳 𝐺 −1/2 Goal is now 𝑳 𝐻 − 𝑳 𝐺 ≤0.1 𝑳 𝐻 −𝑰 ≤0.1
Concentration of Matrix Martingales Matrix Freedman’s Inequality Random 𝑿 = 𝑖 𝑒 𝑿 𝑒 (𝑖) 𝑿 𝑒 (𝑖) ∈ ℝ 𝑑×𝑑 , symmetric 𝔼 𝑿 𝑒 (𝑖) | previous steps =0 𝑿 𝑒 (𝑖) ≤𝑟 ℙ 𝑖 𝑒 𝔼 𝑿 𝑒 𝑖 2 | prev. steps > 𝜎 2 ≤𝛿 gives ℙ[ 𝑿 >𝜀]≤𝑑 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 +𝛿
Concentration of Matrix Martingales Matrix Freedman’s Inequality Random 𝑿 = 𝑖 𝑒 𝑿 𝑒 (𝑖) 𝑿 𝑒 (𝑖) ∈ ℝ 𝑑×𝑑 , symmetric 𝔼 𝑿 𝑒 (𝑖) | previous steps =0 𝑿 𝑒 (𝑖) ≤𝑟 ℙ 𝑖 𝑒 𝔼 𝑿 𝑒 𝑖 2 | prev. steps > 𝜎 2 ≤𝛿 gives ℙ[ 𝑿 >𝜀]≤𝑑 2exp − 𝜀 2 /2 𝑟𝜀+ 𝜎 2 +𝛿
Norm Control Can there be many edges with large norm? 𝑒 𝑳 𝑒 (0) = 𝑒 Tr 𝑳 𝑒 0 =Tr 𝑒 𝑳 𝑒 0 = Tr 𝑳 (0) =Tr 𝑰 = 𝑛−1 Can afford 𝑟= 1 log 𝑛 , 𝑝 𝑒 = 𝑳 𝑒 0 log 𝑛 , 𝑒 𝑝 𝑒 =𝑛 log 𝑛 𝑳 𝑒 (1) = 1 𝑝 𝑒 𝑳 𝑒 (0) 1 𝑝 𝑒 𝟎 w. probability 𝑝 𝑒 o.w.
Norm Control 𝑿 𝑒 (𝑖) ≤𝑟 If we lose track of the original graph, we can’t estimate the norm Stopping time argument Bad regions
Variance Control 𝔼 𝑿 𝑒 1 2 ≼𝑟 𝑳 𝑒 (0) 𝑒 𝔼 𝑿 𝑒 1 2 ≼ 𝑒 𝑟 𝑳 𝑒 (0) =𝑟𝑰 But then edges get resampled. Geometric sequence for variance, roughly.
Resparsification Every time we accumulate 20 𝑛 𝜀 −2 log 𝑛, sparsify down to 10 𝑛 𝜀 −2 log 𝑛, and continue [K.PPS’17] 𝑛 𝜀 −2 log 𝑛 space 𝑛 𝜀 −2 log 𝑛 edge 𝜀-sparsifier Shave a log off many algorithms that do repeated matrix sampling…
Laplacian Linear Equations [ST04]: solving Laplacian linear equations in 𝑂 𝑚 time ⋮ [KS16]: simple algorithm
Solving a Laplacian Linear Equation 𝑳𝒙=𝒃 Gaussian Elimination Find 𝖀, upper triangular matrix, s.t. 𝖀 ⊤ 𝖀=𝑳 Then 𝒙= 𝖀 −1 𝖀 −⊤ 𝒃 Easy to apply 𝖀 −1 and 𝖀 −⊤
Solving a Laplacian Linear Equation 𝑳𝒙=𝒃 Approximate Gaussian Elimination Find 𝖀, upper triangular matrix, s.t. 𝖀 ⊤ 𝖀 ≈ 0.5 𝑳 𝖀 is sparse. 𝑂 log 1 𝜀 iterations to get 𝜀-approximate solution 𝒙 .
Approximate Gaussian Elimination Theorem [KS] When 𝑳 is an Laplacian matrix with 𝑚 non-zeros, we can find in 𝑂(𝑚 log 3 𝑛 ) time an upper triangular matrix 𝖀 with 𝑂(𝑚 log 3 𝑛 ) non-zeros, s.t. w.h.p. 𝖀 ⊤ 𝖀 ≈ 𝟎.𝟓 𝑳
Additive View of Gaussian Elimination Find 𝖀, upper triangular matrix, s.t 𝖀 ⊤ 𝖀=𝑴 𝑴=
Additive View of Gaussian Elimination Find the rank-1 matrix that agrees with 𝑴 on the first row and column.
Additive View of Gaussian Elimination Subtract the rank 1 matrix. We have eliminated the first variable.
Additive View of Gaussian Elimination The remaining matrix is PSD.
Additive View of Gaussian Elimination Find rank-1 matrix that agrees with our matrix on the next row and column.
Additive View of Gaussian Elimination Subtract the rank 1 matrix. We have eliminated the second variable.
Additive View of Gaussian Elimination Repeat until all parts written as rank 1 terms. 𝑴=
Additive View of Gaussian Elimination Repeat until all parts written as rank 1 terms. 𝑴=
Additive View of Gaussian Elimination Repeat until all parts written as rank 1 terms. 𝑴= =𝖀 ⊤ 𝖀
Additive View of Gaussian Elimination What is special about Gaussian Elimination on Laplacians? The remaining matrix is always Laplacian. 𝑳=
Additive View of Gaussian Elimination What is special about Gaussian Elimination on Laplacians? The remaining matrix is always Laplacian. 𝑳=
Additive View of Gaussian Elimination What is special about Gaussian Elimination on Laplacians? The remaining matrix is always Laplacian. 𝑳= A new Laplacian!
Why is Gaussian Elimination Slow? Solving 𝑳𝒙=𝒃 by Gaussian Elimination can take Ω 𝑛 3 time. The main issue is fill 𝑳= 𝑣
Why is Gaussian Elimination Slow? Solving 𝑳𝒙=𝒃 by Gaussian Elimination can take Ω 𝑛 3 time. The main issue is fill 𝑳= 𝑣
Why is Gaussian Elimination Slow? Solving 𝑳𝒙=𝒃 by Gaussian Elimination can take Ω 𝑛 3 time. The main issue is fill 𝑳= New Laplacian 𝑣 Elimination creates a clique on the neighbors of 𝑣
Why is Gaussian Elimination Slow? Solving 𝑳𝒙=𝒃 by Gaussian Elimination can take Ω 𝑛 3 time. The main issue is fill 𝑳= New Laplacian 𝑣 ≈ Laplacian cliques can be sparsified!
Gaussian Elimination Pick a vertex 𝑣 to eliminate Add the clique created by eliminating 𝑣 Repeat until done
Approximate Gaussian Elimination Pick a vertex 𝑣 to eliminate Add the clique created by eliminating 𝑣 Repeat until done
Approximate Gaussian Elimination Pick a random vertex 𝑣 to eliminate Add the clique created by eliminating 𝑣 Repeat until done
Approximate Gaussian Elimination Pick a random vertex 𝑣 to eliminate Sample the clique created by eliminating 𝑣 Repeat until done Resembles randomized Incomplete Cholesky
How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3
How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3
How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3
How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3
How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3
How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3
How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3
How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3
How to Sample a Clique For each edge (1,𝑣) pick an edge 1,𝑢 with probability ~ 𝑤 𝑢 insert edge 𝑣,𝑢 with weight 𝑤 𝑢 𝑤 𝑣 𝑤 𝑢 +𝑤 𝑣 1 5 5 4 4 2 2 3 3
Approximating Matrices by Sampling Goal 𝖀 ⊤ 𝖀≈𝑳 Approach 𝔼 𝖀 ⊤ 𝖀=𝑳 Show 𝖀 ⊤ 𝖀 concentrated around expectation Gives 𝖀 ⊤ 𝖀≈𝑳 w. high probability
Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝑳 (0) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Original Laplacian Rank 1 term Remaining graph + clique
Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝐿 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Rank 1 term Remaining graph + sparsified clique
Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝑳 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Rank 1 term Remaining graph + sparsified clique
Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝑳 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Rank 1 term Remaining graph + sparsified clique
Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝑳 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Rank 1 term Remaining graph + sparsified clique 𝔼 𝐿 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Suppose
Approximating Matrices in Expectation Consider eliminating the first variable 𝔼 𝑳 (1) = 𝒄 1 𝒄 1 ⊤ +𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Rank 1 term Remaining graph + sparsified clique Then = 𝑳 (0)
Approximating Matrices in Expectation Let 𝑳 (𝑖) be our approximation after 𝑖 eliminations If we ensure at each step Then 𝔼 ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ = ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ ∎ Sparsified clique Clique 𝔼 𝑳 (𝑖) = 𝑳 (𝑖−1) 𝔼 𝑳 𝑖 − 𝑳 𝑖−1 | previous steps =𝟎
Predictable Quadratic Variation Predictable quadratic variation 𝑾= 𝑖 𝑒 𝔼 𝑿 𝑒 𝑖 2 | prev. steps Want to show ℙ 𝑾 > 𝜎 2 ≤𝛿 Promise: 𝔼 𝑿 𝑒 2 ≼𝑟 𝑳 𝑒
Sample Variance 𝑣 𝑣 ≼ 𝔼 𝑣 𝑒∈ 𝑣 𝑳 𝑒 ≼ 𝔼 𝑣 𝑒∋𝑣 𝑳 𝑒 elim. clique of
Sample Variance 𝔼 𝑣 𝑒∈ 𝑣 𝑳 𝑒 ≼ 𝔼 𝑣 𝑒∋𝑣 𝑳 𝑒 = 2 current lap. ≼ 4 𝑳 = 4𝑰 𝔼 𝑣 𝑒∈ 𝑣 𝑳 𝑒 ≼ 𝔼 𝑣 𝑒∋𝑣 𝑳 𝑒 elim. clique of = 2 current lap. # vertices ≼ 4 𝑳 # vertices = 4𝑰 # vertices WARNING: only true w.h.p.
Sample Variance Recall 𝔼 𝑿 𝑒 2 ≼𝑟 𝑳 𝑒 Putting it together 𝔼 𝑣 𝑒∈ 𝑣 𝑳 𝑒 ≼ 𝔼 𝑣 𝑒∋𝑣 𝑳 𝑒 elim. clique of 𝔼 𝑣 𝑒∈ 𝑣 𝔼 𝑿 𝑒 2 ≼ =𝑟 4𝑰 # vertices = 4𝑰 # vertices elim. clique of variance in one round of elimination
Sample Variance ≼ 𝑟⋅ 4𝑰 # vertices variance ≼4 𝑟 log 𝑛 ⋅𝑰 rounds of elimination variance 𝑟⋅ 4𝑰 # vertices rounds of elimination ≼4 𝑟 log 𝑛 ⋅𝑰
Directed Laplacian of a Graph 1 1 2
(Weighted) Eulerian Graph weighted in-degree = weighted out-degree 1 2 1
Eulerian Laplacian of a Graph 1 1 1
Solving an Eul. Laplacian Linear Equation 𝑳𝒙=𝒃 Gaussian Elimination Find 𝕷,𝖀, lower and upper triangular matrices, s.t. 𝕷𝖀=𝑳 Then 𝒙= 𝖀 −1 𝕷 −1 𝒃 Easy to apply 𝖀 −1 and 𝕷 −1
Solving an Eul. Laplacian Linear Equation 𝑳𝒙=𝒃 Approximate Gaussian Elimination [CKKPPRS FOCS’18] Find 𝕷,𝖀, lower and upper triangular matrices, s.t. 𝕷𝖀≈𝑳 𝕷, 𝖀 are sparse. 𝑂 log 1 𝜀 iterations to get 𝜀-approximate solution 𝒙 .
Gaussian Elimination on Eulerian Laplacians 𝑳= 1 2 𝑣 1 1 1
Gaussian Elimination on Eulerian Laplacians 𝑳= 2/3 1 2 𝑣 1 1/3 1 1
Gaussian Elimination on Eulerian Laplacians 𝑳= 2/3 1 2 𝑣 1 1/3 1 1
Gaussian Elimination on Eulerian Laplacians 𝑳= 1 2 𝑣 1 1 1
Gaussian Elimination on Eulerian Laplacians Sparsify correct in expectation output always Eulerian 𝑳= 1 1 ≈ 1
Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 2 𝑣 1 1 1 ≈
Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 2 𝑣 1 1 1 ≈
Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 2 𝑣 1 1 1 1 ≈
Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 𝑣 1 1 1 1 ≈
Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 𝑣 1 1 1 1 ≈
Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 𝑣 1 1 1 1 ≈ 1
Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 1 𝑣 1 1 1 ≈ 1
Gaussian Elimination on Eulerian Laplacians 𝑳= While edges remain: Pick a minimum weight edge Pair with random neighbor Reduce mass on both by 𝑤 min 𝑣 1 1 ≈ 1
Gaussian Elimination on Eulerian Laplacians Our sparsification correct in expectation output always Eulerian Average over log 3 (𝑛) runs to reduce variance 1 1 ≈ 1
Approximating Matrices by Sampling Goal 𝕷𝖀≈𝑳 Approach 𝔼 𝕷𝖀=𝑳 Show 𝕷𝖀 concentrated around expectation Gives 𝕷𝖀≈𝑳 w. high probability (?) But what does ≈ mean? 𝑳 is not PSD?!? Come to my talk and I’ll tell you!
Difficulties with Approximation 1 1 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 1
Difficulties with Approximation 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Difficulties with Approximation 1 1 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 1 Variance manageable
Difficulties with Approximation 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Variance problematic!
Difficulties with Approximation Come to my talk and hear the rest of the story! [Cohen-Kelner-Kyng-Peebles-Peng-Rao-Sidford FOCS’18] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Spanning Trees of a Graph
Spanning Trees of a (Multi)-Graph Pick a random tree?
Random Spanning Trees Graph 𝐺 Tree distribution Marginal probability of edge: 𝑝 𝑒 = 𝑳 𝑒 Similar to sparsification?!
Random Spanning Trees Graph 𝐺 Tree distribution Marginal probability of edge: 𝑝 𝑒 = 𝑳 𝑒 𝑳 𝐺 = 𝑒∈𝐺 𝑳 𝑒 Consider 𝑳 𝑇 = 𝑒∈𝑇 1 𝑝 𝑒 𝑳 𝑒 𝔼 𝑳 𝑇 = 𝑳 𝐺
Random Spanning Trees Concentration? 𝑳 𝑇 ≈ 0.5 𝑳 𝐺 ? 𝑳 𝑇 ≼log 𝑛 𝑳 𝐺 ? 1 𝑡 𝑖=1 𝑡 𝑳 𝑇 𝑖 ≈ 𝜀 𝑳 𝐺 , 𝑡= 𝜀 −2 log 2 𝑛 [Kyng-Song FOCS’18] Come see my talk! NO YES, w.h.p
Sometimes You Get a Martingale for Free Random variables 𝑍 1 ,…, 𝑍 𝑘 Prove concentration for 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) where 𝑓 is ``stable” under small changes to 𝑍 1 ,…, 𝑍 𝑘 𝑓 𝑍 1 ,…, 𝑍 𝑘 ≈𝔼𝑓 𝑍 1 ,…, 𝑍 𝑘 ?
Doob Martingales NOT indep. Random variables 𝑍 1 ,…, 𝑍 𝑘 Prove concentration for 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) where 𝑓 is ``stable” under small changes to 𝑍 1 ,…, 𝑍 𝑘 Pick outcome 𝑧 1 ,…, 𝑧 𝑘 𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) → 𝔼 𝑓 𝑍 1 ,…, 𝑍 𝑘 | 𝑍 1 = 𝑧 1 → 𝔼 𝑓 𝑍 1 ,…, 𝑍 𝑘 | 𝑍 1 = 𝑧 1 , 𝑍 2 = 𝑧 2 →𝑓 𝑧 1 ,…, 𝑧 𝑘 NOT indep.
Doob Martingales 𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) → 𝔼 𝑓 𝑍 1 ,…,𝑘 | 𝑍 1 = 𝑧 1 → 𝔼 𝑓 𝑍 1 ,…, 𝑍 𝑘 | 𝑍 1 = 𝑧 1 , 𝑍 2 = 𝑧 2 →𝑓 𝑧 1 ,…, 𝑧 𝑘 𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) =𝔼 𝑧 1 𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 )| 𝑍 1 = 𝑧 1 𝑌 0 =𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 ) 𝑌 1 =𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 )| 𝑍 1 = 𝑧 1 𝑌 2 =𝔼 𝑓( 𝑍 1 ,…, 𝑍 𝑘 )| 𝑍 1 = 𝑧 1 , 𝑍 2 = 𝑧 2 𝔼 𝑌 𝑖+1 − 𝑌 𝑖 |prev. steps =0 Martingale! Despite 𝑍 1 ,…, 𝑍 𝑘 NOT indep.
Concentration of Random Spanning Trees Random variables 𝑍 1 ,…, 𝑍 𝑛−1 Positions of 𝑛−1 edges of random spanning tree 𝑓 𝑍 1 ,…, 𝑍 𝑛−1 = 𝑳 𝑇 Similar to [Peres-Pemantle ‘14]
Conditioning Changes Distributions? Graph Tree distribution All
Conditioning Changes Distributions? Graph Tree distribution All Conditional
Summary Matrix martingales: a natural fit for algorithmic analysis Understanding the Predictable Quadratic Variation is key Some results using matrix martingales Cohen, Musco, Pachocki ’16 – online row sampling Kyng, Sachdeva ’17 – approximate Gaussian elimination Kyng, Peng, Pachocki, Sachdeva ’17 – semi-streaming graph sparsification Cohen, Kelner, Kyng, Peebles, Peng, Rao, Sidford ’18 – solving Directed Laplacian eqs. Kyng, Song ’18 – Matrix Chernoff bound for negatively dependent random variables
Thanks!
This is really the end
Older slides