Download presentation
Presentation is loading. Please wait.
Published byHarjanti Indradjaja Modified over 6 years ago
1
A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard University Joint work with Zhao Song UT Austin October 2018
2
Concentration of scalar random variables
Independent random 𝑋 𝑖 ∈ℝ 𝑋= 𝑖 𝑋 𝑖 Is 𝑋≈𝔼𝑋with high probability?
3
Concentration of scalar random variables
Chernoff inequality Independent random 𝑋 𝑖 ∈ ℝ ≥0 𝑋= 𝑖 𝑋 𝑖 𝔼𝑋=𝜇 𝑋 𝑖 ≤𝑟 gives ℙ[ 𝑋−𝜇 >𝜀𝜇]≤ 2exp − 𝜇 𝜀 2 𝑟(2+𝜀) ≤𝜏 E.g. if 𝜀=0.5, 𝑟=1 and 𝜇=10 log(1/𝜏)
4
Concentration of random matrices
Independent random 𝑿 𝑖 ∈ ℝ 𝑑×𝑑 𝑿= 𝑖 𝑿 𝑖 Is 𝑿≈𝔼𝑿 with high probability?
5
Concentration of random matrices
Matrix Chernoff [Tropp ’11] Independent random 𝑿 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝑿= 𝑖 𝑿 𝑖 𝔼𝑿 =𝜇 𝑿 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 2+𝜀 ≤𝜏 E.g. if 𝜀=0.5, 𝑟=1 and 𝜇=10 log(𝑑/𝜏) [Rudelson ‘99, Ahlswede-Winter ’02]
6
What if variables are not independent?
𝑋∈ 0,1 random variable 𝑌=𝑋 𝑋+𝑌 not concentrated, 0 or 2 𝑍=1−𝑋 𝑋+𝑍 very concentrated, always 1 Negative dependence: 𝑋 makes 𝑍 less likely and vice versa
7
What if variables are not independent?
𝜉∈ 0,1 𝑚 random variable Negative pairwise correlation For all pairs 𝑖≠𝑗 𝜉 𝑖 =1 ⇒ lower prob. of 𝜉 𝑗 =1 Formally ℙ[𝜉 𝑗 =1|𝜉 𝑖 =1]≤ℙ[𝜉 𝑗 =1]
8
What if variables are not independent?
𝜉∈ 0,1 𝑚 random variable Negative correlation For all 𝑆⊆[𝑚] ℙ ∀𝑖∈𝑆.𝜉 𝑖 =1 ≤ Π 𝑖∈𝑆 ℙ 𝜉 𝑖 =1 Can we get a Chernoff bound? Yes. If 𝜉 AND 𝜉 (negated bits) are negatively correlated, Chernoff-like concentration applies to 𝑖 𝜉(𝑖) [Goyal-Rademacher-Vempala ‘09, Dubhashi-Ranjan ’98]
9
Strongly Rayleigh distributions
A class of negatively dependent distributions [Borcea-Branden-Liggett ’09] 𝜉∈ 0,1 𝑚 random variable Many nice properties Negative pairwise correlation Negative association Closed under conditioning, marginalization
10
Strongly Rayleigh distributions
A class of negatively dependent distributions [Borcea-Branden-Liggett ’09] 𝜉∈ 0,1 𝑚 random variable Examples: Uniformly sampling 𝑘 items without replacement Random spanning trees Determinantal point processes, volume sampling Symmetric exclusion processes
11
Strongly Rayleigh distributions
A class of negatively dependent distributions [Borcea-Branden-Liggett ’09] 𝜉∈ 0,1 𝑚 random variable 𝒌-homogeneous Strongly Rayleigh: 𝑖 :𝜉 𝑖 =1 =𝑘 always
12
Concentration of random matrices
Strongly Rayleigh matrix Chernoff [K. & Song ’18] Fixed 𝑨 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝜉∈ 0,1 𝑚 is 𝑘-homogeneous strongly Rayleigh Random 𝑿= 𝑖 𝜉(𝑖)𝑨 𝑖 𝔼𝑿 =𝜇 𝑨 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜇𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀 ≤𝜏 E.g. if 𝜀=0.5, 𝑟=1 and 𝜇=10 log 𝑑/𝜏 log(𝑘) Scalar version: Peres-Pemantle ‘14
13
Concentration of random matrices
Strongly Rayleigh matrix Chernoff [K. & Song ’18] Fixed 𝑨 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝜉∈ 0,1 𝑚 is 𝑘-homogeneous strongly Rayleigh Random 𝑿= 𝑖 𝜉(𝑖)𝑨 𝑖 𝔼𝑿 =𝜇 𝑨 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜇𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀 ≤𝜏 E.g. if 𝜀=log(𝑘), 𝑟=1 and 𝜇=10 log 𝑑/𝜏 Scalar version: Peres-Pemantle ‘14
14
An application: Graph approximation using random spanning trees
15
Spanning trees of a graph
Edge weights 𝑤:𝐸→ ℝ + 𝑛= 𝑉 Spanning trees of 𝐺
16
Random spanning trees Graph 𝐺 Tree distribution Pick a random tree?
17
Random spanning trees Does the sum of a few random spanning trees resemble the graph? E.g. is the weight across each cut similar? Starter question: Are the edge weights similar in expectation?
18
Random spanning trees Graph 𝐺 Tree distribution Pick a random tree
19
Random spanning trees Graph 𝐺 Tree distribution Pick a random tree 𝑝 𝑒 : probability of edge present 𝑝 𝑒 =1/2
20
Random spanning trees Graph 𝐺 Tree distribution Pick a random tree 𝑝 𝑒 : probability of edge present 𝑝 𝑒 =5/8
21
Random spanning trees Getting the expectation right: 𝔼 𝑤 𝑇 𝑒 = 𝑝 𝑒 ⋅ 1 𝑝 𝑒 𝑤 𝐺 𝑒 = 𝑤 𝐺 𝑒 𝑤 𝑇 (𝑒)= 1 𝑝 𝑒 𝑤 𝐺 (𝑒) 1 𝑝 𝑒 𝟎 w. probability 𝑝 𝑒 o.w.
22
Reweighted random spanning trees
Original weights Tree weights 1 1 8/5 1 2 1 1 8/5
23
Reweighted random spanning trees
Original weights Tree weights 1 1 8/5 8/5 1 1 1 8/5
24
Reweighted random spanning trees
Original weights Tree weights 1 1 1 1 1/ 𝑝 𝑒 1 𝔼 = 1 random tree 1 1 1 1
25
Reweighted random spanning trees
Original weights Tree weights 1 1 1 1 1/ 𝑝 𝑒 1 𝔼 = 1 random tree 1 1 1 1 The average weight over trees equals the original weight Does the tree “behave like” the original graph?
26
Preserving cuts? Given cut 𝑆⊆𝑉, 𝑤 𝐺 𝑆, 𝑆 = 𝑎,𝑏 ∈𝐸∩𝑆× 𝑆 𝑤 𝑎𝑏 Want for all 𝑆⊆𝑉 𝑤 𝑇 𝑆, 𝑆 ≈ 𝑤 𝐺 𝑆, 𝑆 with high probability? Too much to ask of one tree! How many edges are necessary?
27
Independent edge samples
𝐺 graph Flip a coin for each edge to decide if present 𝐻 random graph, independent edges
28
Independent edge samples
𝐺 graph Flip a coin for each edge to decide if present 𝐻 random graph, independent edges
29
Independent edge samples
Getting the expectation right: 𝔼 𝑤 𝐻 𝑒 = 𝑝 𝑒 ⋅ 1 𝑝 𝑒 𝑤 𝐺 𝑒 = 𝑤 𝐺 𝑒 𝐻 𝐺 𝑤 𝐻 (𝑒)= 1 𝑝 𝑒 𝑤 𝐺 (𝑒) 1 𝑝 𝑒 0 w. probability 𝑝 𝑒 o.w.
30
Preserving cuts? Benczur-Karger ‘96 Sample edges independently with “well-chosen” coin probabilities 𝑝 𝑒 , s.t. 𝐻 has on average 𝑂( 𝜀 −2 𝑛 log 2 𝑛 ) 𝑂( 𝜀 −2 𝑛 log 𝑛 ) Edges then w.h.p. for all cuts 𝑆⊆𝑉 (1−𝜀) 𝑤 𝐺 𝑆, 𝑆 ≤ 𝑤 𝐻 𝑆, 𝑆 ≤(1+𝜀) 𝑤 𝐺 𝑆, 𝑆 Proof sketch Count #cuts of each size Chernoff concentration bound per cut
31
Reweighted random tree
Original weights Tree weights 1 1 1 1 1/ 𝑝 𝑒 1 𝔼 = 1 random tree 1 1 1 1 The average weight over trees equals the original weight Does the tree “behave like” the original graph?
32
Combining trees 1 2 Maybe it’s better if we average a few trees? 8/5
+ = 1 8/5 4/5 4/5 8/5 1 1 1 ≈ ?? 1 1
33
Preserving cuts? Fung-Harvey & Hariharan-Panigrahi ‘10 Let 𝐻= 1 𝑡 𝑖=1 𝑡 𝑇 𝑖 be the average of 𝑡=𝑂( 𝜀 −2 log 2 𝑛 ) reweighted random spanning trees of 𝐺 then w.h.p. for all cuts 𝑆⊆𝑉 (1−𝜀) 𝑤 𝐺 𝑆, 𝑆 ≤ 𝑤 𝐻 𝑆, 𝑆 ≤(1+𝜀) 𝑤 𝐺 𝑆, 𝑆 Proof sketch Benczur-Karger cut counting Scalar Chernoff works for negatively correlated variables
34
Preserving cuts? Goyal-Rademacher-Vempala ‘09 Given an unweighted bounded degree graph G, let 𝐻= 1 𝑡 𝑖=1 𝑡 𝑇 𝑖 be the average of 𝑂(1) unweighted random spanning trees of 𝐺 then w.h.p. for all cuts 𝑆⊆𝑉 Ω(1/ log 𝑛 ) 𝑤 𝐺 𝑆, 𝑆 ≤ 𝑤 𝐻 𝑆, 𝑆 ≤ 𝑤 𝐺 𝑆, 𝑆 Proof sketch Benczur-Karger cut counting + first tree gets small cuts Scalar Chernoff works for negatively correlated variables
35
Preserving more than cuts: Matrices and quadratic forms
36
Laplacians: It’s springs!
Weighted, undirected graph 𝐺= 𝑉,𝐸,𝑤 , 𝑤:𝐸→ ℝ + The Laplacian 𝑳 is a |𝑉|×|𝑉| matrix describing 𝐺 On each edge 𝑎,𝑏 , put a spring between the vertices. Nail down each vertex 𝑎 at position 𝒙(𝑎) along the real line. 𝒙 𝑎 𝒙 𝑏 𝒙 𝑐
37
Laplacians: It’s springs!
Length Energy = spring const. length 𝒙 𝑎 𝒙 𝑏 =|𝒙 𝑎 −𝒙 𝑏 | = 𝑤 𝑎𝑏 𝒙 𝑎 −𝒙 𝑏 2 ⋅( ) 2 𝒙 ⊤ 𝑳𝒙= (𝑎,𝑏)∈𝐸 𝑤 𝑎𝑏 𝒙 𝑎 −𝒙 𝑏 2
38
Laplacians 𝑏 𝒙 ⊤ 𝑳𝒙= (𝑎,𝑏)∈𝐸 𝑤 𝑎𝑏 𝑥 𝑎 −𝑥 𝑏 2 𝑎 = (𝑎,𝑏)∈𝐸 𝒙 ⊤ 𝑳 (𝑎,𝑏) 𝒙
𝒙 ⊤ 𝑳𝒙= (𝑎,𝑏)∈𝐸 𝑤 𝑎𝑏 𝑥 𝑎 −𝑥 𝑏 2 𝑎 = (𝑎,𝑏)∈𝐸 𝒙 ⊤ 𝑳 (𝑎,𝑏) 𝒙 𝑐 𝑳= (𝑎,𝑏)∈𝐸 𝑳 (𝑎,𝑏) “baby Laplacian” per edge 𝑳 (𝑎,𝑏)
39
Laplacian of a graph 1 1 2
40
Preserving matrices? Suppose 𝐻 is a random weighted graph s.t. for every edge 𝑒, 𝔼 𝑤 𝐻 𝑒 = 𝑤 𝐺 (𝑒). Then 𝔼 𝑳 𝐻 = 𝑳 𝐺 Does 𝑳 𝐻 “behave like” 𝑳 𝐺 ?
41
Preserving quadratic forms?
For all 𝒙∈ ℝ 𝑉 1−𝜖 𝒙 ⊤ 𝑳 𝐺 𝒙≤ 𝒙 ⊤ 𝑳 𝐻 𝒙≤(1+𝜖) 𝒙 ⊤ 𝑳 𝐺 𝒙 with high probability? Useful? Since 𝟏 𝑆 ⊤ 𝑳 𝐺 𝟏 𝑆 = 𝑤 𝐺 𝑆, 𝑆 implies cuts are preserved by letting 𝒙= 𝟏 𝑆 . Quadratic form crucial for solving linear equations
42
Preserving quadratic forms?
Spielman-Srivastava ’08 (a la Tropp) Sample edges independently with “well-chosen” coin probabilities 𝑝 𝑒 , s.t. 𝐻 has on average 𝑂( 𝜀 −2 𝑛 log 𝑛 ) edges then w.h.p. for all 𝒙∈ ℝ 𝑉 1−𝜀 𝒙 ⊤ 𝑳 𝐺 𝒙≤ 𝒙 ⊤ 𝑳 𝐻 𝒙≤(1+𝜀) 𝒙 ⊤ 𝑳 𝐺 𝒙 Proof sketch Bound spectral norm of sampled edge “baby Laplacians” Matrix Chernoff concentration
43
What sampling probabilities?
Spielman-Srivastava ’08 “well-chosen” coin probabilities 𝑝 𝑒 ∝ max 𝒙 𝒙 ⊤ 𝑳 𝑒 𝒙 𝒙 ⊤ 𝑳𝒙 What is the marginal probability of edges being present in a random spanning tree? Also proportional to max 𝒙 𝒙 ⊤ 𝑳 𝑒 𝒙 𝒙 ⊤ 𝑳𝒙 (!) Random spanning trees similar to sparsification?
44
Preserving quadratic forms?
K.-Song ’18 Let 𝐻= 1 𝑡 𝑖=1 𝑡 𝑇 𝑖 be the average of 𝑡=𝑂( 𝜀 −2 log 2 𝑛 ) reweighted random spanning trees of 𝐺 then w.h.p. for all 𝒙∈ ℝ 𝑉 1−𝜀 𝒙 ⊤ 𝑳 𝐺 𝒙≤ 𝒙 ⊤ 𝑳 𝐻 𝒙≤(1+𝜀) 𝒙 ⊤ 𝑳 𝐺 𝒙 Proof sketch Bound norms of sampled matrices (immediate via SS’08) Strongly Rayleigh matrix Chernoff concentration
45
Random spanning trees 𝒙 ⊤ 1 𝑡 𝑖=1 𝑡 𝑳 𝑇 𝑖 𝒙 ≈ 𝜀 𝒙 ⊤ 𝑳 𝐺 𝒙 , 𝑡= 𝜀 −2 log 2 𝑛 Lower bound (K.-Song ’18) 𝑡=Ω( 𝜀 −2 log 𝑛 ) needed for 𝜀-spectral sparsifier Open question Right number of logs? Guess: O( 𝜀 −2 log 𝑛 ) trees
46
Random spanning trees More results (K.-Song ’18) 𝒙 ⊤ 𝑳 𝑇 𝒙≤𝑂(log 𝑛) 𝒙 ⊤ 𝑳 𝐺 𝒙 for all 𝒙 w.h.p. ⇒ in 𝜀-spectrally connected graphs random tree is 𝑂(𝜀 log 𝑛)-spectrally thin Lower bounds In some graphs, w. prob. ≥1− 𝑒 −0.4𝑛 there exists 𝒙 s.t. 𝒙 ⊤ 𝑳 𝑇 𝒙≰ 1 8 log 𝑛 log log 𝑛 𝒙 ⊤ 𝑳 𝐺 𝒙 and for some 𝒚, 𝒚 ⊤ 𝑳 𝐺 𝒚≰ 𝒚 ⊤ 𝑳 𝑇 𝒚 In a ring graph, there exists 𝒙, 𝒚 s.t. 𝒙 ⊤ 𝑳 𝑇 𝒙≰ 𝒙 ⊤ 𝑳 𝐺 𝒙 and 1 𝑛−2 𝒚 ⊤ 𝑳 𝐺 𝒚≰ 𝒚 ⊤ 𝑳 𝑇 𝒚
47
Proving the strongly Rayleigh matrix Chernoff bound
48
An illustrative case 𝒙 ⊤ 𝑳 𝑇 𝒙≤𝑂(log 𝑛) 𝒙 ⊤ 𝑳 𝐺 𝒙 for all 𝒙 w.h.p.
49
Loewner order 𝑨≼𝑩 iff for all 𝒙 𝒙 ⊤ 𝑨𝒙≤ 𝒙 ⊤ 𝑩𝒙 𝒙 ⊤ 𝑳 𝑇 𝒙≤𝑂(log 𝑛) 𝒙 ⊤ 𝑳 𝐺 𝒙 for all 𝒙 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺
50
Proof strategy? Convert problem to Doob martingales Matrix martingale concentration Control effect of conditioning via coupling Norm bound from coupling Variance bound: coupling symmetry + shrinking marginals
51
What is a martingale? A sequence of random variables 𝑌 0 ,…, 𝑌 𝑘 s.t. 𝔼 𝑌 𝑖 𝑌 0 ,…, 𝑌 𝑖−1 = 𝑌 𝑖−1 Many concentration bounds for independent random variables can be generalized to the martingale case, to show 𝑌 𝑘 ≈ 𝑌 0 w.h.p.
52
Concentration of martingales
Why do martingales exhibit concentration? Each difference is zero mean, conditional on previous outcomes 𝔼 𝑌 𝑖 − 𝑌 𝑖−1 𝑌 0 ,…, 𝑌 𝑖−1 =0 If each difference 𝑌 𝑖 − 𝑌 𝑖−1 is small, then 𝑌 𝑘 − 𝑌 0 = 𝑖 𝑌 𝑖 − 𝑌 𝑖−1 ≈0
53
Doob martingales NOT indep.
Random variables 𝛾 1 ,…, 𝛾 𝑘 Goal: Prove concentration for 𝑓( 𝛾 1 ,…, 𝛾 𝑘 ) where 𝑓 is ``stable” under small changes to 𝛾 1 ,…, 𝛾 𝑘 𝑓 𝛾 1 ,…, 𝛾 𝑘 ≈𝔼𝑓 𝛾 1 ,…, 𝛾 𝑘 ? NOT indep. Also need 𝛾 1 ,…, 𝛾 𝑘 stable under conditioning
54
Doob martingales Pick random outcome 𝛾 1 ,…, 𝛾 𝑘 from distribution 𝑌 0 =𝔼 𝑓( 𝛾 1 ,…, 𝛾 𝑘 ) 𝑌 1 =𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 𝑌 2 =𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 , 𝛾 2 ⋮ 𝑌 𝑘 =𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑘 =𝑓 𝛾 1 ,…, 𝛾 𝑘 𝔼 𝑌 1 =𝔼 𝛾 1 𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 =𝔼 𝑓( 𝛾 1 ,…, 𝛾 𝑘 ) = 𝑌 0 𝔼 𝑌 𝑖 − 𝑌 𝑖−1 |prev. steps =0 Show 𝑌 𝑘 ≈ 𝑌 0 , i.e. 𝑓 𝛾 1 ,…, 𝛾 𝑘 ≈𝔼𝑓 𝛾 1 ,…, 𝛾 𝑘 Martingale! Despite 𝛾 1 ,…, 𝛾 𝑘 NOT independent
55
Our Doob martingale Reveal one edge of tree at a time Let 𝛾 𝑖 denote the index of the 𝑖th edge of the tree Pick random tree as 𝑇= 𝛾 1 , 𝛾 2 , …, 𝛾 𝑛−1 𝑳 𝑇 =𝑓 𝛾 1 , 𝛾 2 , …, 𝛾 𝑛−1 𝒀 0 =𝔼 𝑳 𝑇 𝒀 1 =𝔼 𝑳 𝑇 | 𝛾 1 ⋮ 𝒀 𝑛−1 =𝔼 𝑳 𝑇 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑛−1 = 𝑳 𝑇 𝔼 𝒀 𝑖 − 𝒀 𝑖−1 |prev. steps =𝟎
56
Our Doob martingale Want to show 𝒀 𝑛−1 = 𝑳 𝑇 is close to 𝒀 0 =𝔼 𝑳 𝑇 𝒀 𝑛−1 − 𝒀 0 = 𝑖 𝒀 𝑖 − 𝒀 𝑖−1 Matrix martingale concentration? Matrix Freedman (Tropp ‘11) Norm 𝒀 𝑖 − 𝒀 𝑖−1 ≤1 Variance 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≤𝑂 log 𝑛 implies w.h.p 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺
57
Our Doob martingale Want to show 𝒀 𝑛−1 = 𝑳 𝑇 is close to 𝒀 0 =𝔼 𝑳 𝑇 𝒀 𝑛−1 − 𝒀 0 = 𝑖 𝒀 𝑖 − 𝒀 𝑖−1 Matrix martingale concentration? Matrix Freedman (Tropp ‘11) Norm 𝒀 𝑖 − 𝒀 𝑖−1 ≤1 Variance 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≤𝑂 log 𝑛 implies w.h.p 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺 difficult How can we understand 𝒀 42 =𝔼 𝑳 𝑇 | 𝛾 1 ,…, 𝛾 42 ?
58
How does conditioning change the distribution?
Graph Tree distribution All Conditional Pick a random tree, conditional on red edge present?
59
How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Conditional
60
How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Conditional
61
How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′
62
How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′
63
How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′
64
How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ − =
65
How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ − =
66
How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ − =
67
How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ − =
68
How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ − =
69
Coupling table – difference ≤2
70
Good couplings Stochastic covering property For 𝑘-homogenous strongly Rayleigh distributions a coupling of 𝑇, 𝑇 ′ with difference ≤2 always exists. [Borcea-Branden-Liggett ’09] [Peres-Pemantle ‘14]
71
Coupling table has more structure
Alice, Bob, Charlie want to form a tree, by each selecting one edge: 𝛾 1 , 𝛾 2 , 𝛾 3
72
Coupling table has more structure
Alice picks 𝛾 1 How much does this restrict Bob and Charlie?
73
Coupling table has more structure
Alice picks 𝛾 1 How much does this restrict Bob and Charlie? After Bob and Charlie choose their edges, they enlist Anna to pick an extra edge, 𝛾 1 Anna can choose that edge s.t. Anna, Bob, & Charlie, obtain original distribution
74
Coupling table has more structure
Recover the original distribution by adding a “make-up edge” to conditional distribution 𝛾 1
75
Coupling table has more structure
Recover the original distribution by adding ≤1 “make-up edge” to conditional distribution 𝛾 1
76
Coupling table has more structure
Recover the original distribution by adding ≤1 “make-up edge” to conditional distribution 𝛾 1
77
Coupling table has more structure
Recover the original distribution by adding ≤1 “make-up edge” to conditional distribution 𝛾 1
78
Coupling table has more structure
Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝒀 1 − 𝒀 0 =𝔼 𝑳 𝑇 | 𝛾 1 −𝔼 𝑳 𝑇 =𝔼 Alice + Bob + Charlie|Alice −𝔼 Anna + Bob + Charlie|Alice = Alice − 𝔼 Anna|Alice = 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 1
79
Coupling table has more structure
Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝒀 1 − 𝒀 0 =𝔼 𝑳 𝑇 | 𝛾 1 −𝔼 𝑳 𝑇 = 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 1 𝒀 1 − 𝒀 0 ≤max 𝑳 𝛾 1 , 𝔼 𝑳 𝛾 1 | 𝛾 1 ≤max 𝑳 𝛾 1 ,𝔼 𝑳 𝛾 1 | 𝛾 1 ≤ max 𝑒 𝑳 𝑒 ≤1
80
Coupling table has more structure
Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝔼 𝒀 1 − 𝒀 0 =𝟎 =𝔼 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 1 =𝟎 𝔼 𝒀 1 − 𝒀 0 So 𝔼 𝑳 𝛾 1 =𝔼 𝔼 𝑳 𝛾 1 | 𝛾 1 Alice Anna, “makeup edge” Important symmery
81
Coupling table has more structure
Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝔼 (𝒀 1 − 𝒀 0 ) 2 =𝔼 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 =𝔼 𝑳 𝛾 𝔼 𝑳 𝛾 1 | 𝛾 1 2 ≼𝔼 𝑳 𝛾 𝔼 𝑳 𝛾 1 | 𝛾 1 ≼2𝔼 𝑳 𝛾 1
82
Coupling table has more structure
𝔼 (𝒀 1 − 𝒀 0 ) 2 ≼2𝔼 𝑳 𝛾 1 𝔼 𝑳 𝛾 = 1 𝑛−1 𝔼 𝑳 𝑇 = 1 𝑛−1 𝑳 𝐺 So 𝔼 (𝒀 1 − 𝒀 0 ) 2 ≼ 2 𝑛−1 𝑳 𝐺
83
Later steps What about 𝒀 𝑡 − 𝒀 𝑡−1 ? Boils down to bounding 𝔼 𝑳 𝛾 𝑡 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑡−1
84
How does conditioning change the distribution?
Graph Tree distribution All Conditional Pick a random tree, conditional on red edge present?
85
How does conditioning change the distribution?
Graph Tree distribution All Conditional “Shrinking Marginals Lemma” All other edges become less likely
86
How does conditioning change the distribution?
Graph Tree distribution All Conditional “Shrinking Marginals Lemma” All other edges become less likely 5 8 = vs =0.6
87
How does conditioning change the distribution?
Graph Tree distribution All Conditional “Shrinking Marginals Lemma” All other edges become less likely 4 8 =0.5 vs =0.4
88
Later steps 𝔼 remaining edges| 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑖−1 ≼𝔼 𝑳 𝑇 = 𝑳 𝐺
Shrinking marginals 𝔼 𝑳 𝛾 𝑖 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑖−1 ≼ 1 𝑛−𝑖 𝑳 𝐺 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≼ 𝑖 2 𝑛−𝑖 𝑳 𝐺 =𝑂( log 𝑛) 𝑳 𝐺
89
Our Doob martingale Want to show 𝒀 𝑛−1 = 𝑳 𝑇 is close to 𝒀 0 =𝔼 𝑳 𝑇 𝒀 𝑛−1 − 𝒀 0 = 𝑖 𝒀 𝑖 − 𝒀 𝑖−1 Matrix Freedman (Tropp ‘11) Norm 𝒀 𝑖 − 𝒀 𝑖−1 ≤1 Variance 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≤𝑂 log 𝑛 implies 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺 w.h.p
90
Concentration of random matrices
Strongly Rayleigh matrix Chernoff [K. & Song ’18] Fixed 𝑨 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝜉∈ 0,1 𝑚 is 𝑘-homogeneous strongly Rayleigh Random 𝑿= 𝑖 𝜉(𝑖)𝑨 𝑖 𝔼𝑿 =𝜇 𝑨 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜇𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀
91
Open Questions Our bound for 𝑘-homogeneous strongly Rayleigh ℙ[ 𝑿−𝔼𝑿 >𝜇𝜖]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀 Remove the log 𝑘 ? Remove homogeneity condition Find more applications Show log 𝑛 sparsifier from 𝑂 1 spanning trees?
92
Thanks!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.