Presentation is loading. Please wait.

Presentation is loading. Please wait.

A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard.

Similar presentations


Presentation on theme: "A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard."— Presentation transcript:

1 A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard University Joint work with Zhao Song UT Austin October 2018

2 Concentration of scalar random variables
Independent random 𝑋 𝑖 ∈ℝ 𝑋= 𝑖 𝑋 𝑖 Is 𝑋≈𝔼𝑋with high probability?

3 Concentration of scalar random variables
Chernoff inequality Independent random 𝑋 𝑖 ∈ ℝ ≥0 𝑋= 𝑖 𝑋 𝑖 𝔼𝑋=𝜇 𝑋 𝑖 ≤𝑟 gives ℙ[ 𝑋−𝜇 >𝜀𝜇]≤ 2exp − 𝜇 𝜀 2 𝑟(2+𝜀) ≤𝜏 E.g. if 𝜀=0.5, 𝑟=1 and 𝜇=10 log(1/𝜏)

4 Concentration of random matrices
Independent random 𝑿 𝑖 ∈ ℝ 𝑑×𝑑 𝑿= 𝑖 𝑿 𝑖 Is 𝑿≈𝔼𝑿 with high probability?

5 Concentration of random matrices
Matrix Chernoff [Tropp ’11] Independent random 𝑿 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝑿= 𝑖 𝑿 𝑖 𝔼𝑿 =𝜇 𝑿 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 2+𝜀 ≤𝜏 E.g. if 𝜀=0.5, 𝑟=1 and 𝜇=10 log(𝑑/𝜏) [Rudelson ‘99, Ahlswede-Winter ’02]

6 What if variables are not independent?
𝑋∈ 0,1 random variable 𝑌=𝑋 𝑋+𝑌 not concentrated, 0 or 2 𝑍=1−𝑋 𝑋+𝑍 very concentrated, always 1 Negative dependence: 𝑋 makes 𝑍 less likely and vice versa

7 What if variables are not independent?
𝜉∈ 0,1 𝑚 random variable Negative pairwise correlation For all pairs 𝑖≠𝑗 𝜉 𝑖 =1 ⇒ lower prob. of 𝜉 𝑗 =1 Formally ℙ[𝜉 𝑗 =1|𝜉 𝑖 =1]≤ℙ[𝜉 𝑗 =1]

8 What if variables are not independent?
𝜉∈ 0,1 𝑚 random variable Negative correlation For all 𝑆⊆[𝑚] ℙ ∀𝑖∈𝑆.𝜉 𝑖 =1 ≤ Π 𝑖∈𝑆 ℙ 𝜉 𝑖 =1 Can we get a Chernoff bound? Yes. If 𝜉 AND 𝜉 (negated bits) are negatively correlated, Chernoff-like concentration applies to 𝑖 𝜉(𝑖) [Goyal-Rademacher-Vempala ‘09, Dubhashi-Ranjan ’98]

9 Strongly Rayleigh distributions
A class of negatively dependent distributions [Borcea-Branden-Liggett ’09] 𝜉∈ 0,1 𝑚 random variable Many nice properties Negative pairwise correlation Negative association Closed under conditioning, marginalization

10 Strongly Rayleigh distributions
A class of negatively dependent distributions [Borcea-Branden-Liggett ’09] 𝜉∈ 0,1 𝑚 random variable Examples: Uniformly sampling 𝑘 items without replacement Random spanning trees Determinantal point processes, volume sampling Symmetric exclusion processes

11 Strongly Rayleigh distributions
A class of negatively dependent distributions [Borcea-Branden-Liggett ’09] 𝜉∈ 0,1 𝑚 random variable 𝒌-homogeneous Strongly Rayleigh: 𝑖 :𝜉 𝑖 =1 =𝑘 always

12 Concentration of random matrices
Strongly Rayleigh matrix Chernoff [K. & Song ’18] Fixed 𝑨 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝜉∈ 0,1 𝑚 is 𝑘-homogeneous strongly Rayleigh Random 𝑿= 𝑖 𝜉(𝑖)𝑨 𝑖 𝔼𝑿 =𝜇 𝑨 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜇𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀 ≤𝜏 E.g. if 𝜀=0.5, 𝑟=1 and 𝜇=10 log 𝑑/𝜏 log⁡(𝑘) Scalar version: Peres-Pemantle ‘14

13 Concentration of random matrices
Strongly Rayleigh matrix Chernoff [K. & Song ’18] Fixed 𝑨 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝜉∈ 0,1 𝑚 is 𝑘-homogeneous strongly Rayleigh Random 𝑿= 𝑖 𝜉(𝑖)𝑨 𝑖 𝔼𝑿 =𝜇 𝑨 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜇𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀 ≤𝜏 E.g. if 𝜀=log⁡(𝑘), 𝑟=1 and 𝜇=10 log 𝑑/𝜏 Scalar version: Peres-Pemantle ‘14

14 An application: Graph approximation using random spanning trees

15 Spanning trees of a graph
Edge weights 𝑤:𝐸→ ℝ + 𝑛= 𝑉 Spanning trees of 𝐺

16 Random spanning trees Graph 𝐺 Tree distribution Pick a random tree?

17 Random spanning trees Does the sum of a few random spanning trees resemble the graph? E.g. is the weight across each cut similar? Starter question: Are the edge weights similar in expectation?

18 Random spanning trees Graph 𝐺 Tree distribution Pick a random tree

19 Random spanning trees Graph 𝐺 Tree distribution Pick a random tree 𝑝 𝑒 : probability of edge present 𝑝 𝑒 =1/2

20 Random spanning trees Graph 𝐺 Tree distribution Pick a random tree 𝑝 𝑒 : probability of edge present 𝑝 𝑒 =5/8

21 Random spanning trees Getting the expectation right: 𝔼 𝑤 𝑇 𝑒 = 𝑝 𝑒 ⋅ 1 𝑝 𝑒 𝑤 𝐺 𝑒 = 𝑤 𝐺 𝑒 𝑤 𝑇 (𝑒)= 1 𝑝 𝑒 𝑤 𝐺 (𝑒) 1 𝑝 𝑒 𝟎 w. probability 𝑝 𝑒 o.w.

22 Reweighted random spanning trees
Original weights Tree weights 1 1 8/5 1 2 1 1 8/5

23 Reweighted random spanning trees
Original weights Tree weights 1 1 8/5 8/5 1 1 1 8/5

24 Reweighted random spanning trees
Original weights Tree weights 1 1 1 1 1/ 𝑝 𝑒 1 𝔼 = 1 random tree 1 1 1 1

25 Reweighted random spanning trees
Original weights Tree weights 1 1 1 1 1/ 𝑝 𝑒 1 𝔼 = 1 random tree 1 1 1 1 The average weight over trees equals the original weight Does the tree “behave like” the original graph?

26 Preserving cuts? Given cut 𝑆⊆𝑉, 𝑤 𝐺 𝑆, 𝑆 = 𝑎,𝑏 ∈𝐸∩𝑆× 𝑆 𝑤 𝑎𝑏 Want for all 𝑆⊆𝑉 𝑤 𝑇 𝑆, 𝑆 ≈ 𝑤 𝐺 𝑆, 𝑆 with high probability? Too much to ask of one tree! How many edges are necessary?

27 Independent edge samples
𝐺 graph Flip a coin for each edge to decide if present 𝐻 random graph, independent edges

28 Independent edge samples
𝐺 graph Flip a coin for each edge to decide if present 𝐻 random graph, independent edges

29 Independent edge samples
Getting the expectation right: 𝔼 𝑤 𝐻 𝑒 = 𝑝 𝑒 ⋅ 1 𝑝 𝑒 𝑤 𝐺 𝑒 = 𝑤 𝐺 𝑒 𝐻 𝐺 𝑤 𝐻 (𝑒)= 1 𝑝 𝑒 𝑤 𝐺 (𝑒) 1 𝑝 𝑒 0 w. probability 𝑝 𝑒 o.w.

30 Preserving cuts? Benczur-Karger ‘96 Sample edges independently with “well-chosen” coin probabilities 𝑝 𝑒 , s.t. 𝐻 has on average 𝑂( 𝜀 −2 𝑛 log 2 𝑛 ) 𝑂( 𝜀 −2 𝑛 log 𝑛 ) Edges then w.h.p. for all cuts 𝑆⊆𝑉 (1−𝜀) 𝑤 𝐺 𝑆, 𝑆 ≤ 𝑤 𝐻 𝑆, 𝑆 ≤(1+𝜀) 𝑤 𝐺 𝑆, 𝑆 Proof sketch Count #cuts of each size Chernoff concentration bound per cut

31 Reweighted random tree
Original weights Tree weights 1 1 1 1 1/ 𝑝 𝑒 1 𝔼 = 1 random tree 1 1 1 1 The average weight over trees equals the original weight Does the tree “behave like” the original graph?

32 Combining trees 1 2 Maybe it’s better if we average a few trees? 8/5
+ = 1 8/5 4/5 4/5 8/5 1 1 1 ?? 1 1

33 Preserving cuts? Fung-Harvey & Hariharan-Panigrahi ‘10 Let 𝐻= 1 𝑡 𝑖=1 𝑡 𝑇 𝑖 be the average of 𝑡=𝑂( 𝜀 −2 log 2 𝑛 ) reweighted random spanning trees of 𝐺 then w.h.p. for all cuts 𝑆⊆𝑉 (1−𝜀) 𝑤 𝐺 𝑆, 𝑆 ≤ 𝑤 𝐻 𝑆, 𝑆 ≤(1+𝜀) 𝑤 𝐺 𝑆, 𝑆 Proof sketch Benczur-Karger cut counting Scalar Chernoff works for negatively correlated variables

34 Preserving cuts? Goyal-Rademacher-Vempala ‘09 Given an unweighted bounded degree graph G, let 𝐻= 1 𝑡 𝑖=1 𝑡 𝑇 𝑖 be the average of 𝑂(1) unweighted random spanning trees of 𝐺 then w.h.p. for all cuts 𝑆⊆𝑉 Ω(1/ log 𝑛 ) 𝑤 𝐺 𝑆, 𝑆 ≤ 𝑤 𝐻 𝑆, 𝑆 ≤ 𝑤 𝐺 𝑆, 𝑆 Proof sketch Benczur-Karger cut counting + first tree gets small cuts Scalar Chernoff works for negatively correlated variables

35 Preserving more than cuts: Matrices and quadratic forms

36 Laplacians: It’s springs!
Weighted, undirected graph 𝐺= 𝑉,𝐸,𝑤 , 𝑤:𝐸→ ℝ + The Laplacian 𝑳 is a |𝑉|×|𝑉| matrix describing 𝐺 On each edge 𝑎,𝑏 , put a spring between the vertices. Nail down each vertex 𝑎 at position 𝒙(𝑎) along the real line. 𝒙 𝑎 𝒙 𝑏 𝒙 𝑐

37 Laplacians: It’s springs!
Length Energy = spring const. length 𝒙 𝑎 𝒙 𝑏 =|𝒙 𝑎 −𝒙 𝑏 | = 𝑤 𝑎𝑏 𝒙 𝑎 −𝒙 𝑏 2 ⋅( ) 2 𝒙 ⊤ 𝑳𝒙= (𝑎,𝑏)∈𝐸 𝑤 𝑎𝑏 𝒙 𝑎 −𝒙 𝑏 2

38 Laplacians 𝑏 𝒙 ⊤ 𝑳𝒙= (𝑎,𝑏)∈𝐸 𝑤 𝑎𝑏 𝑥 𝑎 −𝑥 𝑏 2 𝑎 = (𝑎,𝑏)∈𝐸 𝒙 ⊤ 𝑳 (𝑎,𝑏) 𝒙
𝒙 ⊤ 𝑳𝒙= (𝑎,𝑏)∈𝐸 𝑤 𝑎𝑏 𝑥 𝑎 −𝑥 𝑏 2 𝑎 = (𝑎,𝑏)∈𝐸 𝒙 ⊤ 𝑳 (𝑎,𝑏) 𝒙 𝑐 𝑳= (𝑎,𝑏)∈𝐸 𝑳 (𝑎,𝑏) “baby Laplacian” per edge 𝑳 (𝑎,𝑏)

39 Laplacian of a graph 1 1 2

40 Preserving matrices? Suppose 𝐻 is a random weighted graph s.t. for every edge 𝑒, 𝔼 𝑤 𝐻 𝑒 = 𝑤 𝐺 (𝑒). Then 𝔼 𝑳 𝐻 = 𝑳 𝐺 Does 𝑳 𝐻 “behave like” 𝑳 𝐺 ?

41 Preserving quadratic forms?
For all 𝒙∈ ℝ 𝑉 1−𝜖 𝒙 ⊤ 𝑳 𝐺 𝒙≤ 𝒙 ⊤ 𝑳 𝐻 𝒙≤(1+𝜖) 𝒙 ⊤ 𝑳 𝐺 𝒙 with high probability? Useful? Since 𝟏 𝑆 ⊤ 𝑳 𝐺 𝟏 𝑆 = 𝑤 𝐺 𝑆, 𝑆 implies cuts are preserved by letting 𝒙= 𝟏 𝑆 . Quadratic form crucial for solving linear equations

42 Preserving quadratic forms?
Spielman-Srivastava ’08 (a la Tropp) Sample edges independently with “well-chosen” coin probabilities 𝑝 𝑒 , s.t. 𝐻 has on average 𝑂( 𝜀 −2 𝑛 log 𝑛 ) edges then w.h.p. for all 𝒙∈ ℝ 𝑉 1−𝜀 𝒙 ⊤ 𝑳 𝐺 𝒙≤ 𝒙 ⊤ 𝑳 𝐻 𝒙≤(1+𝜀) 𝒙 ⊤ 𝑳 𝐺 𝒙 Proof sketch Bound spectral norm of sampled edge “baby Laplacians” Matrix Chernoff concentration

43 What sampling probabilities?
Spielman-Srivastava ’08 “well-chosen” coin probabilities 𝑝 𝑒 ∝ max 𝒙 𝒙 ⊤ 𝑳 𝑒 𝒙 𝒙 ⊤ 𝑳𝒙 What is the marginal probability of edges being present in a random spanning tree? Also proportional to max 𝒙 𝒙 ⊤ 𝑳 𝑒 𝒙 𝒙 ⊤ 𝑳𝒙 (!) Random spanning trees similar to sparsification?

44 Preserving quadratic forms?
K.-Song ’18 Let 𝐻= 1 𝑡 𝑖=1 𝑡 𝑇 𝑖 be the average of 𝑡=𝑂( 𝜀 −2 log 2 𝑛 ) reweighted random spanning trees of 𝐺 then w.h.p. for all 𝒙∈ ℝ 𝑉 1−𝜀 𝒙 ⊤ 𝑳 𝐺 𝒙≤ 𝒙 ⊤ 𝑳 𝐻 𝒙≤(1+𝜀) 𝒙 ⊤ 𝑳 𝐺 𝒙 Proof sketch Bound norms of sampled matrices (immediate via SS’08) Strongly Rayleigh matrix Chernoff concentration

45 Random spanning trees 𝒙 ⊤ 1 𝑡 𝑖=1 𝑡 𝑳 𝑇 𝑖 𝒙 ≈ 𝜀 𝒙 ⊤ 𝑳 𝐺 𝒙 , 𝑡= 𝜀 −2 log 2 𝑛 Lower bound (K.-Song ’18) 𝑡=Ω( 𝜀 −2 log 𝑛 ) needed for 𝜀-spectral sparsifier Open question Right number of logs? Guess: O( 𝜀 −2 log 𝑛 ) trees

46 Random spanning trees More results (K.-Song ’18) 𝒙 ⊤ 𝑳 𝑇 𝒙≤𝑂(log 𝑛) 𝒙 ⊤ 𝑳 𝐺 𝒙 for all 𝒙 w.h.p. ⇒ in 𝜀-spectrally connected graphs random tree is 𝑂(𝜀 log 𝑛)-spectrally thin Lower bounds In some graphs, w. prob. ≥1− 𝑒 −0.4𝑛 there exists 𝒙 s.t. 𝒙 ⊤ 𝑳 𝑇 𝒙≰ 1 8 log 𝑛 log log 𝑛 𝒙 ⊤ 𝑳 𝐺 𝒙 and for some 𝒚, 𝒚 ⊤ 𝑳 𝐺 𝒚≰ 𝒚 ⊤ 𝑳 𝑇 𝒚 In a ring graph, there exists 𝒙, 𝒚 s.t. 𝒙 ⊤ 𝑳 𝑇 𝒙≰ 𝒙 ⊤ 𝑳 𝐺 𝒙 and 1 𝑛−2 𝒚 ⊤ 𝑳 𝐺 𝒚≰ 𝒚 ⊤ 𝑳 𝑇 𝒚

47 Proving the strongly Rayleigh matrix Chernoff bound

48 An illustrative case 𝒙 ⊤ 𝑳 𝑇 𝒙≤𝑂(log 𝑛) 𝒙 ⊤ 𝑳 𝐺 𝒙 for all 𝒙 w.h.p.

49 Loewner order 𝑨≼𝑩 iff for all 𝒙 𝒙 ⊤ 𝑨𝒙≤ 𝒙 ⊤ 𝑩𝒙 𝒙 ⊤ 𝑳 𝑇 𝒙≤𝑂(log 𝑛) 𝒙 ⊤ 𝑳 𝐺 𝒙 for all 𝒙 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺

50 Proof strategy? Convert problem to Doob martingales Matrix martingale concentration Control effect of conditioning via coupling Norm bound from coupling Variance bound: coupling symmetry + shrinking marginals

51 What is a martingale? A sequence of random variables 𝑌 0 ,…, 𝑌 𝑘 s.t. 𝔼 𝑌 𝑖 𝑌 0 ,…, 𝑌 𝑖−1 = 𝑌 𝑖−1 Many concentration bounds for independent random variables can be generalized to the martingale case, to show 𝑌 𝑘 ≈ 𝑌 0 w.h.p.

52 Concentration of martingales
Why do martingales exhibit concentration? Each difference is zero mean, conditional on previous outcomes 𝔼 𝑌 𝑖 − 𝑌 𝑖−1 𝑌 0 ,…, 𝑌 𝑖−1 =0 If each difference 𝑌 𝑖 − 𝑌 𝑖−1 is small, then 𝑌 𝑘 − 𝑌 0 = 𝑖 𝑌 𝑖 − 𝑌 𝑖−1 ≈0

53 Doob martingales NOT indep.
Random variables 𝛾 1 ,…, 𝛾 𝑘 Goal: Prove concentration for 𝑓( 𝛾 1 ,…, 𝛾 𝑘 ) where 𝑓 is ``stable” under small changes to 𝛾 1 ,…, 𝛾 𝑘 𝑓 𝛾 1 ,…, 𝛾 𝑘 ≈𝔼𝑓 𝛾 1 ,…, 𝛾 𝑘 ? NOT indep. Also need 𝛾 1 ,…, 𝛾 𝑘 stable under conditioning

54 Doob martingales Pick random outcome 𝛾 1 ,…, 𝛾 𝑘 from distribution 𝑌 0 =𝔼 𝑓( 𝛾 1 ,…, 𝛾 𝑘 ) 𝑌 1 =𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 𝑌 2 =𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 , 𝛾 2 𝑌 𝑘 =𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑘 =𝑓 𝛾 1 ,…, 𝛾 𝑘 𝔼 𝑌 1 =𝔼 𝛾 1 𝔼 𝑓 𝛾 1 ,…, 𝛾 𝑘 | 𝛾 1 =𝔼 𝑓( 𝛾 1 ,…, 𝛾 𝑘 ) = 𝑌 0 𝔼 𝑌 𝑖 − 𝑌 𝑖−1 |prev. steps =0 Show 𝑌 𝑘 ≈ 𝑌 0 , i.e. 𝑓 𝛾 1 ,…, 𝛾 𝑘 ≈𝔼𝑓 𝛾 1 ,…, 𝛾 𝑘 Martingale! Despite 𝛾 1 ,…, 𝛾 𝑘 NOT independent

55 Our Doob martingale Reveal one edge of tree at a time Let 𝛾 𝑖 denote the index of the 𝑖th edge of the tree Pick random tree as 𝑇= 𝛾 1 , 𝛾 2 , …, 𝛾 𝑛−1 𝑳 𝑇 =𝑓 𝛾 1 , 𝛾 2 , …, 𝛾 𝑛−1 𝒀 0 =𝔼 𝑳 𝑇 𝒀 1 =𝔼 𝑳 𝑇 | 𝛾 1 ⋮ 𝒀 𝑛−1 =𝔼 𝑳 𝑇 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑛−1 = 𝑳 𝑇 𝔼 𝒀 𝑖 − 𝒀 𝑖−1 |prev. steps =𝟎

56 Our Doob martingale Want to show 𝒀 𝑛−1 = 𝑳 𝑇 is close to 𝒀 0 =𝔼 𝑳 𝑇 𝒀 𝑛−1 − 𝒀 0 = 𝑖 𝒀 𝑖 − 𝒀 𝑖−1 Matrix martingale concentration? Matrix Freedman (Tropp ‘11) Norm 𝒀 𝑖 − 𝒀 𝑖−1 ≤1 Variance 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≤𝑂 log 𝑛 implies w.h.p 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺

57 Our Doob martingale Want to show 𝒀 𝑛−1 = 𝑳 𝑇 is close to 𝒀 0 =𝔼 𝑳 𝑇 𝒀 𝑛−1 − 𝒀 0 = 𝑖 𝒀 𝑖 − 𝒀 𝑖−1 Matrix martingale concentration? Matrix Freedman (Tropp ‘11) Norm 𝒀 𝑖 − 𝒀 𝑖−1 ≤1 Variance 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≤𝑂 log 𝑛 implies w.h.p 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺 difficult How can we understand 𝒀 42 =𝔼 𝑳 𝑇 | 𝛾 1 ,…, 𝛾 42 ?

58 How does conditioning change the distribution?
Graph Tree distribution All Conditional Pick a random tree, conditional on red edge present?

59 How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Conditional

60 How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Conditional

61 How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′

62 How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′

63 How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′

64 How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ =

65 How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ =

66 How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ =

67 How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ =

68 How does conditioning change the distribution?
How similar are the distributions? Tree distribution All Cond. Coupling Pick pair 𝑇, 𝑇 ′ with marginals as above 𝑇 𝑇′ =

69 Coupling table – difference ≤2

70 Good couplings Stochastic covering property For 𝑘-homogenous strongly Rayleigh distributions a coupling of 𝑇, 𝑇 ′ with difference ≤2 always exists. [Borcea-Branden-Liggett ’09] [Peres-Pemantle ‘14]

71 Coupling table has more structure
Alice, Bob, Charlie want to form a tree, by each selecting one edge: 𝛾 1 , 𝛾 2 , 𝛾 3

72 Coupling table has more structure
Alice picks 𝛾 1 How much does this restrict Bob and Charlie?

73 Coupling table has more structure
Alice picks 𝛾 1 How much does this restrict Bob and Charlie? After Bob and Charlie choose their edges, they enlist Anna to pick an extra edge, 𝛾 1 Anna can choose that edge s.t. Anna, Bob, & Charlie, obtain original distribution

74 Coupling table has more structure
Recover the original distribution by adding a “make-up edge” to conditional distribution 𝛾 1

75 Coupling table has more structure
Recover the original distribution by adding ≤1 “make-up edge” to conditional distribution 𝛾 1

76 Coupling table has more structure
Recover the original distribution by adding ≤1 “make-up edge” to conditional distribution 𝛾 1

77 Coupling table has more structure
Recover the original distribution by adding ≤1 “make-up edge” to conditional distribution 𝛾 1

78 Coupling table has more structure
Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝒀 1 − 𝒀 0 =𝔼 𝑳 𝑇 | 𝛾 1 −𝔼 𝑳 𝑇 =𝔼 Alice + Bob + Charlie|Alice −𝔼 Anna + Bob + Charlie|Alice = Alice − 𝔼 Anna|Alice = 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 1

79 Coupling table has more structure
Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝒀 1 − 𝒀 0 =𝔼 𝑳 𝑇 | 𝛾 1 −𝔼 𝑳 𝑇 = 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 1 𝒀 1 − 𝒀 0 ≤max 𝑳 𝛾 1 , 𝔼 𝑳 𝛾 1 | 𝛾 1 ≤max 𝑳 𝛾 1 ,𝔼 𝑳 𝛾 1 | 𝛾 1 ≤ max 𝑒 𝑳 𝑒 ≤1

80 Coupling table has more structure
Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝔼 𝒀 1 − 𝒀 0 =𝟎 =𝔼 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 1 =𝟎 𝔼 𝒀 1 − 𝒀 0 So 𝔼 𝑳 𝛾 1 =𝔼 𝔼 𝑳 𝛾 1 | 𝛾 1 Alice Anna, “makeup edge” Important symmery

81 Coupling table has more structure
Alice, Bob, Charlie Anna, Bob, Charlie 𝛾 1 𝛾 1 “makeup edge” Conditional distribution Original distribution (!) 𝔼 (𝒀 1 − 𝒀 0 ) 2 =𝔼 𝑳 𝛾 1 −𝔼 𝑳 𝛾 1 | 𝛾 =𝔼 𝑳 𝛾 𝔼 𝑳 𝛾 1 | 𝛾 1 2 ≼𝔼 𝑳 𝛾 𝔼 𝑳 𝛾 1 | 𝛾 1 ≼2𝔼 𝑳 𝛾 1

82 Coupling table has more structure
𝔼 (𝒀 1 − 𝒀 0 ) 2 ≼2𝔼 𝑳 𝛾 1 𝔼 𝑳 𝛾 = 1 𝑛−1 𝔼 𝑳 𝑇 = 1 𝑛−1 𝑳 𝐺 So 𝔼 (𝒀 1 − 𝒀 0 ) 2 ≼ 2 𝑛−1 𝑳 𝐺

83 Later steps What about 𝒀 𝑡 − 𝒀 𝑡−1 ? Boils down to bounding 𝔼 𝑳 𝛾 𝑡 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑡−1

84 How does conditioning change the distribution?
Graph Tree distribution All Conditional Pick a random tree, conditional on red edge present?

85 How does conditioning change the distribution?
Graph Tree distribution All Conditional “Shrinking Marginals Lemma” All other edges become less likely

86 How does conditioning change the distribution?
Graph Tree distribution All Conditional “Shrinking Marginals Lemma” All other edges become less likely 5 8 = vs =0.6

87 How does conditioning change the distribution?
Graph Tree distribution All Conditional “Shrinking Marginals Lemma” All other edges become less likely 4 8 =0.5 vs =0.4

88 Later steps 𝔼 remaining edges| 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑖−1 ≼𝔼 𝑳 𝑇 = 𝑳 𝐺
Shrinking marginals 𝔼 𝑳 𝛾 𝑖 | 𝛾 1 , 𝛾 2 ,…, 𝛾 𝑖−1 ≼ 1 𝑛−𝑖 𝑳 𝐺 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≼ 𝑖 2 𝑛−𝑖 𝑳 𝐺 =𝑂( log 𝑛) 𝑳 𝐺

89 Our Doob martingale Want to show 𝒀 𝑛−1 = 𝑳 𝑇 is close to 𝒀 0 =𝔼 𝑳 𝑇 𝒀 𝑛−1 − 𝒀 0 = 𝑖 𝒀 𝑖 − 𝒀 𝑖−1 Matrix Freedman (Tropp ‘11) Norm 𝒀 𝑖 − 𝒀 𝑖−1 ≤1 Variance 𝑖 𝔼 (𝒀 𝑖 − 𝒀 𝑖−1 ) 2 | prev. steps ≤𝑂 log 𝑛 implies 𝑳 𝑇 ≼𝑂(log 𝑛) 𝑳 𝐺 w.h.p

90 Concentration of random matrices
Strongly Rayleigh matrix Chernoff [K. & Song ’18] Fixed 𝑨 𝑖 ∈ ℝ 𝑑×𝑑 , positive semi-definite 𝜉∈ 0,1 𝑚 is 𝑘-homogeneous strongly Rayleigh Random 𝑿= 𝑖 𝜉(𝑖)𝑨 𝑖 𝔼𝑿 =𝜇 𝑨 𝑖 ≤𝑟 gives ℙ[ 𝑿−𝔼𝑿 >𝜇𝜀]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀

91 Open Questions Our bound for 𝑘-homogeneous strongly Rayleigh ℙ[ 𝑿−𝔼𝑿 >𝜇𝜖]≤𝑑 2exp − 𝜇 𝜀 2 𝑟 log 𝑘 +𝜀 Remove the log 𝑘 ? Remove homogeneity condition Find more applications Show log 𝑛 sparsifier from 𝑂 1 spanning trees?

92 Thanks!


Download ppt "A matrix Chernoff bound for strongly Rayleigh distributions and spectral sparsifiers from a few random spanning trees Speaker Rasmus Kyng Harvard."

Similar presentations


Ads by Google