A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard.

A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard University Joint work with Zhao Song UT Austin October 2018

Concentration of scalar random variables
Independent random 𝑋 𝑖 ∈ℝ 𝑋= 𝑖 𝑋 𝑖 Is 𝑋≈𝔼𝑋with high probability?

Concentration of scalar random variables
Chernoff inequality Independent random 𝑋 𝑖 ∈ ℝ ≥0 𝑋= 𝑖 𝑋 𝑖 𝔼𝑋=𝜇 𝑋 𝑖 ≤𝑟 gives ℙ[ 𝑋−𝜇 >𝜀𝜇]≤ 2exp − 𝜇 𝜀 2 𝑟(2+𝜀) ≤𝜏 E.g. if 𝜀=0.5, 𝑟=1 and 𝜇=10 log(1/𝜏)

Concentration of random matrices
Independent random 𝑿 𝑖 ∈ ℝ 𝑑×𝑑 𝑿= 𝑖 𝑿 𝑖 Is 𝑿≈𝔼𝑿 with high probability?

Matrix Chernoff [Tropp ’11] Independent random 𝑿 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝑿= 𝑖 𝑿 𝑖 𝔼𝑿 =𝜇 𝑿 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 2+𝜀 ≤𝜏 E.g. if 𝜀=0.5, 𝑟=1 and 𝜇=10 log(𝑑/𝜏) [Rudelson ‘99, Ahlswede-Winter ’02]

What if variables are not independent?
𝑋∈ 0,1 random variable 𝑌=𝑋 𝑋+𝑌 not concentrated, 0 or 2 𝑍=1−𝑋 𝑋+𝑍 very concentrated, always 1 Negative dependence: 𝑋 makes 𝑍 less likely and vice versa

𝜉∈ 0,1 𝑚 random variable Negative pairwise correlation For all pairs 𝑖≠𝑗 𝜉 𝑖 =1 ⇒ lower prob. of 𝜉 𝑗 =1 Formally ℙ[𝜉 𝑗 =1|𝜉 𝑖 =1]≤ℙ[𝜉 𝑗 =1]

𝜉∈ 0,1 𝑚 random variable Negative correlation For all 𝑆⊆[𝑚] ℙ ∀𝑖∈𝑆.𝜉 𝑖 =1 ≤ Π 𝑖∈𝑆 ℙ 𝜉 𝑖 =1 Can we get a Chernoff bound? Yes. If 𝜉 AND 𝜉 (negated bits) are negatively correlated, Chernoff-like concentration applies to 𝑖 𝜉(𝑖) [Goyal-Rademacher-Vempala ‘09, Dubhashi-Ranjan ’98]

Strongly Rayleigh distributions
A class of negatively dependent distributions [Borcea-Branden-Liggett ’09] 𝜉∈ 0,1 𝑚 random variable Many nice properties Negative pairwise correlation Negative association Closed under conditioning, marginalization

A class of negatively dependent distributions [Borcea-Branden-Liggett ’09] 𝜉∈ 0,1 𝑚 random variable Examples: Uniformly sampling 𝑘 items without replacement Random spanning trees Determinantal point processes, volume sampling Symmetric exclusion processes

A class of negatively dependent distributions [Borcea-Branden-Liggett ’09] 𝜉∈ 0,1 𝑚 random variable 𝒌-homogeneous Strongly Rayleigh: 𝑖 :𝜉 𝑖 =1 =𝑘 always

Strongly Rayleigh matrix Chernoff [K. & Song ’18] Fixed 𝑨 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝜉∈ 0,1 𝑚 is 𝑘-homogeneous strongly Rayleigh Random 𝑿= 𝑖 𝜉(𝑖)𝑨 𝑖 𝔼𝑿 =𝜇 𝑨 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜇𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀 ≤𝜏 E.g. if 𝜀=0.5, 𝑟=1 and 𝜇=10 log 𝑑/𝜏 log⁡(𝑘) Scalar version: Peres-Pemantle ‘14

Strongly Rayleigh matrix Chernoff [K. & Song ’18] Fixed 𝑨 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝜉∈ 0,1 𝑚 is 𝑘-homogeneous strongly Rayleigh Random 𝑿= 𝑖 𝜉(𝑖)𝑨 𝑖 𝔼𝑿 =𝜇 𝑨 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜇𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀 ≤𝜏 E.g. if 𝜀=log⁡(𝑘), 𝑟=1 and 𝜇=10 log 𝑑/𝜏 Scalar version: Peres-Pemantle ‘14

An application: Graph approximation using random spanning trees

Spanning trees of a graph
Edge weights 𝑤:𝐸→ ℝ + 𝑛= 𝑉 Spanning trees of 𝐺

Random spanning trees Graph 𝐺 Tree distribution Pick a random tree?

Random spanning trees Does the sum of a few random spanning trees resemble the graph? E.g. is the weight across each cut similar? Starter question: Are the edge weights similar in expectation?

Random spanning trees Graph 𝐺 Tree distribution Pick a random tree

Random spanning trees Graph 𝐺 Tree distribution Pick a random tree 𝑝 𝑒 : probability of edge present 𝑝 𝑒 =1/2

Random spanning trees Graph 𝐺 Tree distribution Pick a random tree 𝑝 𝑒 : probability of edge present 𝑝 𝑒 =5/8

Random spanning trees Getting the expectation right: 𝔼 𝑤 𝑇 𝑒 = 𝑝 𝑒 ⋅ 1 𝑝 𝑒 𝑤 𝐺 𝑒 = 𝑤 𝐺 𝑒 𝑤 𝑇 (𝑒)= 1 𝑝 𝑒 𝑤 𝐺 (𝑒) 1 𝑝 𝑒 𝟎 w. probability 𝑝 𝑒 o.w.

Reweighted random spanning trees
Original weights Tree weights 1 1 8/5 1 2 1 1 8/5

Original weights Tree weights 1 1 8/5 8/5 1 1 1 8/5

Original weights Tree weights 1 1 1 1 1/ 𝑝 𝑒 1 𝔼 = 1 random tree 1 1 1 1

Original weights Tree weights 1 1 1 1 1/ 𝑝 𝑒 1 𝔼 = 1 random tree 1 1 1 1 The average weight over trees equals the original weight Does the tree “behave like” the original graph?

Preserving cuts? Given cut 𝑆⊆𝑉, 𝑤 𝐺 𝑆, 𝑆 = 𝑎,𝑏 ∈𝐸∩𝑆× 𝑆 𝑤 𝑎𝑏 Want for all 𝑆⊆𝑉 𝑤 𝑇 𝑆, 𝑆 ≈ 𝑤 𝐺 𝑆, 𝑆 with high probability? Too much to ask of one tree! How many edges are necessary?

Independent edge samples
𝐺 graph Flip a coin for each edge to decide if present 𝐻 random graph, independent edges

Independent edge samples
Getting the expectation right: 𝔼 𝑤 𝐻 𝑒 = 𝑝 𝑒 ⋅ 1 𝑝 𝑒 𝑤 𝐺 𝑒 = 𝑤 𝐺 𝑒 𝐻 𝐺 𝑤 𝐻 (𝑒)= 1 𝑝 𝑒 𝑤 𝐺 (𝑒) 1 𝑝 𝑒 0 w. probability 𝑝 𝑒 o.w.

Preserving cuts? Benczur-Karger ‘96 Sample edges independently with “well-chosen” coin probabilities 𝑝 𝑒 , s.t. 𝐻 has on average 𝑂( 𝜀 −2 𝑛 log 2 𝑛 ) 𝑂( 𝜀 −2 𝑛 log 𝑛 ) Edges then w.h.p. for all cuts 𝑆⊆𝑉 (1−𝜀) 𝑤 𝐺 𝑆, 𝑆 ≤ 𝑤 𝐻 𝑆, 𝑆 ≤(1+𝜀) 𝑤 𝐺 𝑆, 𝑆 Proof sketch Count #cuts of each size Chernoff concentration bound per cut

Reweighted random tree
Original weights Tree weights 1 1 1 1 1/ 𝑝 𝑒 1 𝔼 = 1 random tree 1 1 1 1 The average weight over trees equals the original weight Does the tree “behave like” the original graph?

Combining trees 1 2 Maybe it’s better if we average a few trees? 8/5
+ = 1 8/5 4/5 4/5 8/5 1 1 1 ≈ ?? 1 1

Preserving cuts? Fung-Harvey & Hariharan-Panigrahi ‘10 Let 𝐻= 1 𝑡 𝑖=1 𝑡 𝑇 𝑖 be the average of 𝑡=𝑂( 𝜀 −2 log 2 𝑛 ) reweighted random spanning trees of 𝐺 then w.h.p. for all cuts 𝑆⊆𝑉 (1−𝜀) 𝑤 𝐺 𝑆, 𝑆 ≤ 𝑤 𝐻 𝑆, 𝑆 ≤(1+𝜀) 𝑤 𝐺 𝑆, 𝑆 Proof sketch Benczur-Karger cut counting Scalar Chernoff works for negatively correlated variables

Preserving cuts? Goyal-Rademacher-Vempala ‘09 Given an unweighted bounded degree graph G, let 𝐻= 1 𝑡 𝑖=1 𝑡 𝑇 𝑖 be the average of 𝑂(1) unweighted random spanning trees of 𝐺 then w.h.p. for all cuts 𝑆⊆𝑉 Ω(1/ log 𝑛 ) 𝑤 𝐺 𝑆, 𝑆 ≤ 𝑤 𝐻 𝑆, 𝑆 ≤ 𝑤 𝐺 𝑆, 𝑆 Proof sketch Benczur-Karger cut counting + first tree gets small cuts Scalar Chernoff works for negatively correlated variables

Preserving more than cuts: Matrices and quadratic forms

Laplacians: It’s springs!
Weighted, undirected graph 𝐺= 𝑉,𝐸,𝑤 , 𝑤:𝐸→ ℝ + The Laplacian 𝑳 is a |𝑉|×|𝑉| matrix describing 𝐺 On each edge 𝑎,𝑏 , put a spring between the vertices. Nail down each vertex 𝑎 at position 𝒙(𝑎) along the real line. 𝒙 𝑎 𝒙 𝑏 𝒙 𝑐

Laplacians: It’s springs!
Length Energy = spring const. length 𝒙 𝑎 𝒙 𝑏 =|𝒙 𝑎 −𝒙 𝑏 | = 𝑤 𝑎𝑏 𝒙 𝑎 −𝒙 𝑏 2 ⋅( ) 2 𝒙 ⊤ 𝑳𝒙= (𝑎,𝑏)∈𝐸 𝑤 𝑎𝑏 𝒙 𝑎 −𝒙 𝑏 2

Laplacians 𝑏 𝒙 ⊤ 𝑳𝒙= (𝑎,𝑏)∈𝐸 𝑤 𝑎𝑏 𝑥 𝑎 −𝑥 𝑏 2 𝑎 = (𝑎,𝑏)∈𝐸 𝒙 ⊤ 𝑳 (𝑎,𝑏) 𝒙
𝒙 ⊤ 𝑳𝒙= (𝑎,𝑏)∈𝐸 𝑤 𝑎𝑏 𝑥 𝑎 −𝑥 𝑏 2 𝑎 = (𝑎,𝑏)∈𝐸 𝒙 ⊤ 𝑳 (𝑎,𝑏) 𝒙 𝑐 𝑳= (𝑎,𝑏)∈𝐸 𝑳 (𝑎,𝑏) “baby Laplacian” per edge 𝑳 (𝑎,𝑏)

Laplacian of a graph 1 1 2

Preserving matrices? Suppose 𝐻 is a random weighted graph s.t. for every edge 𝑒, 𝔼 𝑤 𝐻 𝑒 = 𝑤 𝐺 (𝑒). Then 𝔼 𝑳 𝐻 = 𝑳 𝐺 Does 𝑳 𝐻 “behave like” 𝑳 𝐺 ?

Preserving quadratic forms?
For all 𝒙∈ ℝ 𝑉 1−𝜖 𝒙 ⊤ 𝑳 𝐺 𝒙≤ 𝒙 ⊤ 𝑳 𝐻 𝒙≤(1+𝜖) 𝒙 ⊤ 𝑳 𝐺 𝒙 with high probability? Useful? Since 𝟏 𝑆 ⊤ 𝑳 𝐺 𝟏 𝑆 = 𝑤 𝐺 𝑆, 𝑆 implies cuts are preserved by letting 𝒙= 𝟏 𝑆 . Quadratic form crucial for solving linear equations

Spielman-Srivastava ’08 (a la Tropp) Sample edges independently with “well-chosen” coin probabilities 𝑝 𝑒 , s.t. 𝐻 has on average 𝑂( 𝜀 −2 𝑛 log 𝑛 ) edges then w.h.p. for all 𝒙∈ ℝ 𝑉 1−𝜀 𝒙 ⊤ 𝑳 𝐺 𝒙≤ 𝒙 ⊤ 𝑳 𝐻 𝒙≤(1+𝜀) 𝒙 ⊤ 𝑳 𝐺 𝒙 Proof sketch Bound spectral norm of sampled edge “baby Laplacians” Matrix Chernoff concentration

What sampling probabilities?
Spielman-Srivastava ’08 “well-chosen” coin probabilities 𝑝 𝑒 ∝ max 𝒙 𝒙 ⊤ 𝑳 𝑒 𝒙 𝒙 ⊤ 𝑳𝒙 What is the marginal probability of edges being present in a random spanning tree? Also proportional to max 𝒙 𝒙 ⊤ 𝑳 𝑒 𝒙 𝒙 ⊤ 𝑳𝒙 (!) Random spanning trees similar to sparsification?

K.-Song ’18 Let 𝐻= 1 𝑡 𝑖=1 𝑡 𝑇 𝑖 be the average of 𝑡=𝑂( 𝜀 −2 log 2 𝑛 ) reweighted random spanning trees of 𝐺 then w.h.p. for all 𝒙∈ ℝ 𝑉 1−𝜀 𝒙 ⊤ 𝑳 𝐺 𝒙≤ 𝒙 ⊤ 𝑳 𝐻 𝒙≤(1+𝜀) 𝒙 ⊤ 𝑳 𝐺 𝒙 Proof sketch Bound norms of sampled matrices (immediate via SS’08) Strongly Rayleigh matrix Chernoff concentration

Random spanning trees 𝒙 ⊤ 1 𝑡 𝑖=1 𝑡 𝑳 𝑇 𝑖 𝒙 ≈ 𝜀 𝒙 ⊤ 𝑳 𝐺 𝒙 , 𝑡= 𝜀 −2 log 2 𝑛 Lower bound (K.-Song ’18) 𝑡=Ω( 𝜀 −2 log 𝑛 ) needed for 𝜀-spectral sparsifier Open question Right number of logs? Guess: O( 𝜀 −2 log 𝑛 ) trees

Random spanning trees More results (K.-Song ’18) 𝒙 ⊤ 𝑳 𝑇 𝒙≤𝑂(log 𝑛) 𝒙 ⊤ 𝑳 𝐺 𝒙 for all 𝒙 w.h.p. ⇒ in 𝜀-spectrally connected graphs random tree is 𝑂(𝜀 log 𝑛)-spectrally thin Lower bounds In some graphs, w. prob. ≥1− 𝑒 −0.4𝑛 there exists 𝒙 s.t. 𝒙 ⊤ 𝑳 𝑇 𝒙≰ 1 8 log 𝑛 log log 𝑛 𝒙 ⊤ 𝑳 𝐺 𝒙 and for some 𝒚, 𝒚 ⊤ 𝑳 𝐺 𝒚≰ 𝒚 ⊤ 𝑳 𝑇 𝒚 In a ring graph, there exists 𝒙, 𝒚 s.t. 𝒙 ⊤ 𝑳 𝑇 𝒙≰ 𝒙 ⊤ 𝑳 𝐺 𝒙 and 1 𝑛−2 𝒚 ⊤ 𝑳 𝐺 𝒚≰ 𝒚 ⊤ 𝑳 𝑇 𝒚

Proving the strongly Rayleigh matrix Chernoff bound

An illustrative case 𝒙 ⊤ 𝑳 𝑇 𝒙≤𝑂(log 𝑛) 𝒙 ⊤ 𝑳 𝐺 𝒙 for all 𝒙 w.h.p.

Loewner order 𝑨≼𝑩 iff for all 𝒙 𝒙 ⊤ 𝑨𝒙≤ 𝒙 ⊤ 𝑩𝒙 𝒙 ⊤ 𝑳 𝑇 𝒙≤𝑂(log 𝑛) 𝒙 ⊤ 𝑳 𝐺 𝒙 for all 𝒙 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺

Proof strategy? Convert problem to Doob martingales Matrix martingale concentration Control effect of conditioning via coupling Norm bound from coupling Variance bound: coupling symmetry + shrinking marginals

What is a martingale? A sequence of random variables 𝑌 0 ,…, 𝑌 𝑘 s.t. 𝔼 𝑌 𝑖 𝑌 0 ,…, 𝑌 𝑖−1 = 𝑌 𝑖−1 Many concentration bounds for independent random variables can be generalized to the martingale case, to show 𝑌 𝑘 ≈ 𝑌 0 w.h.p.

Concentration of martingales
Why do martingales exhibit concentration? Each difference is zero mean, conditional on previous outcomes 𝔼 𝑌 𝑖 − 𝑌 𝑖−1 𝑌 0 ,…, 𝑌 𝑖−1 =0 If each difference 𝑌 𝑖 − 𝑌 𝑖−1 is small, then 𝑌 𝑘 − 𝑌 0 = 𝑖 𝑌 𝑖 − 𝑌 𝑖−1 ≈0

Doob martingales NOT indep.
Random variables 𝛾 1 ,…, 𝛾 𝑘 Goal: Prove concentration for 𝑓( 𝛾 1 ,…, 𝛾 𝑘 ) where 𝑓 is ``stable” under small changes to 𝛾 1 ,…, 𝛾 𝑘 𝑓 𝛾 1 ,…, 𝛾 𝑘 ≈𝔼𝑓 𝛾 1 ,…, 𝛾 𝑘 ? NOT indep. Also need 𝛾 1 ,…, 𝛾 𝑘 stable under conditioning

Doob martingales Pick random outcome 𝛾 1 ,…, 𝛾 𝑘 from distribution 𝑌 0 =𝔼 𝑓( 𝛾 1 ,…, 𝛾 𝑘 ) 𝑌 1 =𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 𝑌 2 =𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 , 𝛾 2 ⋮ 𝑌 𝑘 =𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑘 =𝑓 𝛾 1 ,…, 𝛾 𝑘 𝔼 𝑌 1 =𝔼 𝛾 1 𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 =𝔼 𝑓( 𝛾 1 ,…, 𝛾 𝑘 ) = 𝑌 0 𝔼 𝑌 𝑖 − 𝑌 𝑖−1 |prev. steps =0 Show 𝑌 𝑘 ≈ 𝑌 0 , i.e. 𝑓 𝛾 1 ,…, 𝛾 𝑘 ≈𝔼𝑓 𝛾 1 ,…, 𝛾 𝑘 Martingale! Despite 𝛾 1 ,…, 𝛾 𝑘 NOT independent

Our Doob martingale Reveal one edge of tree at a time Let 𝛾 𝑖 denote the index of the 𝑖th edge of the tree Pick random tree as 𝑇= 𝛾 1 , 𝛾 2 , …, 𝛾 𝑛−1 𝑳 𝑇 =𝑓 𝛾 1 , 𝛾 2 , …, 𝛾 𝑛−1 𝒀 0 =𝔼 𝑳 𝑇 𝒀 1 =𝔼 𝑳 𝑇 | 𝛾 1 ⋮ 𝒀 𝑛−1 =𝔼 𝑳 𝑇 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑛−1 = 𝑳 𝑇 𝔼 𝒀 𝑖 − 𝒀 𝑖−1 |prev. steps =𝟎

Our Doob martingale Want to show 𝒀 𝑛−1 = 𝑳 𝑇 is close to 𝒀 0 =𝔼 𝑳 𝑇 𝒀 𝑛−1 − 𝒀 0 = 𝑖 𝒀 𝑖 − 𝒀 𝑖−1 Matrix martingale concentration? Matrix Freedman (Tropp ‘11) Norm 𝒀 𝑖 − 𝒀 𝑖−1 ≤1 Variance 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≤𝑂 log 𝑛 implies w.h.p 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺

Our Doob martingale Want to show 𝒀 𝑛−1 = 𝑳 𝑇 is close to 𝒀 0 =𝔼 𝑳 𝑇 𝒀 𝑛−1 − 𝒀 0 = 𝑖 𝒀 𝑖 − 𝒀 𝑖−1 Matrix martingale concentration? Matrix Freedman (Tropp ‘11) Norm 𝒀 𝑖 − 𝒀 𝑖−1 ≤1 Variance 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≤𝑂 log 𝑛 implies w.h.p 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺 difficult How can we understand 𝒀 42 =𝔼 𝑳 𝑇 | 𝛾 1 ,…, 𝛾 42 ?

How does conditioning change the distribution?
Graph Tree distribution All Conditional Pick a random tree, conditional on red edge present?

How similar are the distributions? Tree distribution All Conditional

How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′

How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ − =

Coupling table – difference ≤2

Good couplings Stochastic covering property For 𝑘-homogenous strongly Rayleigh distributions a coupling of 𝑇, 𝑇 ′ with difference ≤2 always exists. [Borcea-Branden-Liggett ’09] [Peres-Pemantle ‘14]

Coupling table has more structure
Alice, Bob, Charlie want to form a tree, by each selecting one edge: 𝛾 1 , 𝛾 2 , 𝛾 3

Alice picks 𝛾 1 How much does this restrict Bob and Charlie?

Alice picks 𝛾 1 How much does this restrict Bob and Charlie? After Bob and Charlie choose their edges, they enlist Anna to pick an extra edge, 𝛾 1 Anna can choose that edge s.t. Anna, Bob, & Charlie, obtain original distribution

Recover the original distribution by adding a “make-up edge” to conditional distribution 𝛾 1

Recover the original distribution by adding ≤1 “make-up edge” to conditional distribution 𝛾 1

Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝒀 1 − 𝒀 0 =𝔼 𝑳 𝑇 | 𝛾 1 −𝔼 𝑳 𝑇 = 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 1 𝒀 1 − 𝒀 0 ≤max 𝑳 𝛾 1 , 𝔼 𝑳 𝛾 1 | 𝛾 1 ≤max 𝑳 𝛾 1 ,𝔼 𝑳 𝛾 1 | 𝛾 1 ≤ max 𝑒 𝑳 𝑒 ≤1

Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝔼 𝒀 1 − 𝒀 0 =𝟎 =𝔼 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 1 =𝟎 𝔼 𝒀 1 − 𝒀 0 So 𝔼 𝑳 𝛾 1 =𝔼 𝔼 𝑳 𝛾 1 | 𝛾 1 Alice Anna, “makeup edge” Important symmery

Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝔼 (𝒀 1 − 𝒀 0 ) 2 =𝔼 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 =𝔼 𝑳 𝛾 𝔼 𝑳 𝛾 1 | 𝛾 1 2 ≼𝔼 𝑳 𝛾 𝔼 𝑳 𝛾 1 | 𝛾 1 ≼2𝔼 𝑳 𝛾 1

𝔼 (𝒀 1 − 𝒀 0 ) 2 ≼2𝔼 𝑳 𝛾 1 𝔼 𝑳 𝛾 = 1 𝑛−1 𝔼 𝑳 𝑇 = 1 𝑛−1 𝑳 𝐺 So 𝔼 (𝒀 1 − 𝒀 0 ) 2 ≼ 2 𝑛−1 𝑳 𝐺

Later steps What about 𝒀 𝑡 − 𝒀 𝑡−1 ? Boils down to bounding 𝔼 𝑳 𝛾 𝑡 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑡−1

Graph Tree distribution All Conditional Pick a random tree, conditional on red edge present?

Graph Tree distribution All Conditional “Shrinking Marginals Lemma” All other edges become less likely

Graph Tree distribution All Conditional “Shrinking Marginals Lemma” All other edges become less likely 5 8 = vs =0.6

Graph Tree distribution All Conditional “Shrinking Marginals Lemma” All other edges become less likely 4 8 =0.5 vs =0.4

Later steps 𝔼 remaining edges| 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑖−1 ≼𝔼 𝑳 𝑇 = 𝑳 𝐺
Shrinking marginals 𝔼 𝑳 𝛾 𝑖 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑖−1 ≼ 1 𝑛−𝑖 𝑳 𝐺 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≼ 𝑖 2 𝑛−𝑖 𝑳 𝐺 =𝑂( log 𝑛) 𝑳 𝐺

Our Doob martingale Want to show 𝒀 𝑛−1 = 𝑳 𝑇 is close to 𝒀 0 =𝔼 𝑳 𝑇 𝒀 𝑛−1 − 𝒀 0 = 𝑖 𝒀 𝑖 − 𝒀 𝑖−1 Matrix Freedman (Tropp ‘11) Norm 𝒀 𝑖 − 𝒀 𝑖−1 ≤1 Variance 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≤𝑂 log 𝑛 implies 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺 w.h.p

Strongly Rayleigh matrix Chernoff [K. & Song ’18] Fixed 𝑨 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝜉∈ 0,1 𝑚 is 𝑘-homogeneous strongly Rayleigh Random 𝑿= 𝑖 𝜉(𝑖)𝑨 𝑖 𝔼𝑿 =𝜇 𝑨 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜇𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀

Open Questions Our bound for 𝑘-homogeneous strongly Rayleigh ℙ[ 𝑿−𝔼𝑿 >𝜇𝜖]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀 Remove the log 𝑘 ? Remove homogeneity condition Find more applications Show log 𝑛 sparsifier from 𝑂 1 spanning trees?

Thanks!

A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard.

Similar presentations

Presentation on theme: "A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard.

Similar presentations

Presentation on theme: "A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard."— Presentation transcript:

Similar presentations

About project

Feedback