Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: www.math.gatech.edu/~randall )

Slides:

Advertisements

Similar presentations

Slow and Fast Mixing of Tempering and Swapping for the Potts Model Nayantara Bhatnagar, UC Berkeley Dana Randall, Georgia Tech.

Advertisements

Domino Tilings of the Chessboard An Introduction to Sampling and Counting Dana Randall Schools of Computer Science and Mathematics Georgia Tech.

Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.

Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.

Bayesian Methods with Monte Carlo Markov Chains III

Markov Chains 1.

11 - Markov Chains Jim Vallandingham.

Mathematical Foundations of Markov Chain Monte Carlo Algorithms Based on lectures given by Alistair Sinclair Computer Science Division U.C. Berkeley.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Random Walks Ben Hescott CS591a1 November 18, 2002.

Lecture 3: Markov processes, master equation

6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

Markov Chains Lecture #5

CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov

1 Hierarchical Image-Motion Segmentation using Swendsen-Wang Cuts Adrian Barbu Siemens Corporate Research Princeton, NJ Acknowledgements: S.C. Zhu, Y.N.

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.

Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.

1 On the Computation of the Permanent Dana Moshkovitz.

Sampling and Approximate Counting for Weighted Matchings Roy Cagan.

Approximating The Permanent Amit Kagan Seminar in Complexity 04/06/2001.

Accelerating Simulated Annealing for the Permanent and Combinatorial Counting Problems.

Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Prateek Bhakta, Sarah Miracle, Dana Randall and Amanda Streib.

Mixing Times of Self-Organizing Lists and Biased Permutations Sarah Miracle Georgia Institute of Technology.

6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.

Algorithms to Approximately Count and Sample Conforming Colorings of Graphs Sarah Miracle and Dana Randall Georgia Institute of Technology (B,B)(B,B) (R,B)(R,B)

Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.

6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 3.

Markov Random Fields Probabilistic Models for Images

Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.

15-853:Algorithms in the Real World

Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.

Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.

Graph Partitioning using Single Commodity Flows

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Date: 2005/4/25 Advisor: Sy-Yen Kuo Speaker: Szu-Chi Wang.

Spatial decay of correlations and efficient methods for computing partition functions. David Gamarnik Joint work with Antar Bandyopadhyay (U of Chalmers),

geometric representations of graphs

The Poincaré Constant of a Random Walk in High- Dimensional Convex Bodies Ivona Bezáková Thesis Advisor: Prof. Eric Vigoda.

STAT 534: Statistical Computing

Monte Carlo Simulation of Canonical Distribution The idea is to generate states i,j,… by a stochastic process such that the probability  (i) of state.

Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.

Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.

Counting and Sampling in Lattices: The Computer Science Perspective Dana Randall Advance Professor of Computing Georgia Institute of Technology.

Domino Tilings of the Chessboard Dana Randall Computer Science and Mathematics Depts. Georgia Institute of Technology.

Equitable Rectangular Dissections Dana Randall Georgia Institute of Technology Joint with: Sarah Cannon and Sarah Miracle.

Randomized Algorithms Hung Dang, Zheyuan Gao, Irvan Jahja, Loi Luu, Divya Sivasankaran.

The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.

Markov Chains and Random Walks

Markov Chains and Mixing Times

Advanced Statistical Computing Fall 2016

Markov Chains Mixing Times Lecture 5

Complex Networks: Connectivity and Functionality

From dense to sparse and back again: On testing graph properties (and some properties of Oded)

Path Coupling And Approximate Counting

Phase Transitions In Reconstruction Yuval Peres, U.C. Berkeley

Markov chain monte carlo

Haim Kaplan and Uri Zwick

Instructor: Shengyu Zhang

Dana Randall Georgia Tech

Markov Chain Monte Carlo: Metropolis and Glauber Chains

On the effect of randomness on planted 3-coloring models

Slow Mixing of Local Dynamics via Topological Obstructions

Presentation transcript:

Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: )

Outline  Fundamentals for designing a Markov chain  Bounding running times (convergence rates)  Connections to statistical physics

Main Q: What do typical elements look like?  Determine properties of “typical’’ elements  Evaluate thermodynamic properties (such as free energy, entropy,…)  Estimate the cardinality of the set “Markov chain Monte Carlo’’ Random sampling can be used to: Markov chains for sampling Given: A large set (matchings, colorings, independent sets,…)

A A K K 2 2 Andrei Andreyevich Markov Markov chains

Sampling using Markov chains State space Ω ( |Ω| ~ c n )

Sampling using Markov chains State space Ω Step 1. Connect the state space. ( |Ω| ~ c n ) E.g., if Ω = indep. sets of a graph G, connect I and I’ iff |I I’| = 1.

Basics of Markov chains Starting at x: - Pick a neighbor y. - Move to y with prob. P(x,y) = 1/∆. - With all remaining prob. stay at x. Transitions P: Random walk on H (max deg in H) H Def’n: A MC is ergodic if it is: irreducible - for all x,y  Ω,  t: P t (x,y) > 0; (connected) aperiodic - g.c.d. { t: P t (x,y) > 0 } =1. (not bipartite) (The “t step” transition prob.) x y

The stationary distribution  (1/∆    /∆) Thm: Any finite, ergodic MC converges to a unique stationary distribution π. Thm: The stationary distribution π satisfies: (The detailed balance condition) π(x) P(x,y) = π(y) P(y,x). P symmetric π  is uniform.  So,

E.g., For >0, sample ind. set I w/ prob: π(I) = where Z = ∑ J |J|.   | I | Z Q: What if we want to sample from some other distribution? Sampling from non- uniform distributions Step 2. Carefully define the transition probabilities.

The Metropolis Algorithm Propose a move from x to y as before, but accept with probability min (1, π(y)/π(x)) (with remaining probability stay at x). (MRRTT ’53) π(y)/∆π(x) 1 π(y) π(x) x y ( if π(x) ≥ π(y) ) π(x) P(x,y) = π(y) P(y,x) 1/∆ For independent sets: min(1, ) I I {v}   min(1, -1 ) π(y) (|I|+1) /Z π(x) (|I|) /Z = ==

Q: But for how long do we walk? Basics continued… Step 1. Connect the state space. Step 2. Carefully define the transition probabilities. Starting at any state x 0, take a random walk for some number of steps... and output the final state (from  ?). Step 3. Bound the mixing time. This tells us the number of steps to take.

The mixing rate Def’n: The total variation distance is ||P t,π|| = max __ ∑ |P t (x,y) - π(x)|. x  Ω y   Ω 2 1 A Markov chain is rapidly mixing if  (  ) is poly (n, log(  -1 )). Def’n Given , the mixing time is  = min { t: ||P t’,π|| < , t’ ≥ t }. A

Spectral gap Let   >    ≥  …  ≥   Ω   be the eigenvalues of P. Def’n: Gap(P) = 1-| 2 | is the spectral gap. Mixing rate Spectral Gap Thm: (Alon, Alon-Milman, Sinclair)  ≤ log ( )  ≥ log ( ). Gap(P) 1 2 Gap(P) | 2 | 1 π*π* 1 22

Outline  Fundamentals for designing a Markov chain  Bounding running times (convergence rates)  Connections to statistical physics

Outline for rest of talk Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

Coupling

Once they agree, they move in sync (x t =y t x t+1 =y t+1 ) Couple moves, but each simulates the MC Start at any x 0 and y 0 x0x0 y0y0 Simulate 2 processes:

Def’n: A coupling is a MC on Ω x Ω: 1)Each process {X t }, {Y t } is a faithful copy of the original MC, 2)If X t = Y t, then X t+1 = Y t+1. Coupling T = max ( E [ T x,y ] ), where T x,y = min {t: X t =Y t | X 0 =x, Y 0 =y}. x,y The coupling time T is: Thm:  (  ) ≤ T e ln  -1. (Aldous’81)

Ex1: Walk on the hypercube MC CUBE : Start at v 0 =(0,0,…,0). Repeat: - Pick i  [n], b  {0,1}. - Set v i = b. Symmetric, ergodic π is uniform. Mixing time? Use coupling: x 0 = y 0 = i=2, b=0: x 1 = y 1 = i=6, b=1: x 2 = y 2 = i=1, b=1: x t = y t =  so T = n log n (coupon collecting)  (  ) = O ( n ln (n  -1 ).  

Outline Techniques: Coupling - path coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

Ex 2: Colorings Given: A graph G (max deg d), k > 1. Goal: Find a random k-coloring of G. MC COL : (Single point replacement) Starting at some k-coloring C 0 Repeat: - With prob 1/2 do nothing. - Pick v  V, c  [k]; - Recolor v with c, if possible. The “lazy” chain If k ≥ d + 2, then the state space is connected. (Therefore π is uniform.) Note: k ≥ d + 1 colorings exist. (Greedy) 

Path Coupling Coupling: Show for all x,y  , E[  (dist(x,y)) ] < 0. Path coupling: Show for all u,v s.t. dist(u,v)=1, that E[  (dist(u,v)) ] < Consider a shortest path: x = z 0, z 1, z 2,..., z r = y, dist(z i,z i+1 ) = 1 dist(x,y) = r. [Bubley,Dyer,Greenhill’97-8]  E[  (dist(x,y)) ] ≤  i E[  (dist(z i,z i+1 )) ] ≤ 0. 

Path coupling for MC COL Thm: MC COL is rapidly mixing if k ≥ 3d. (Jerrum ‘95) Pf: Use path coupling: dist(x,y) = 1. x y ww E∆dist ≤ ( (k-d)(-1) + 2d(+1) ) = (3d-k) ≤ nk 1   v = w, c  C \ {,, }: ∆dist = -1, Cases:  v  N(w), c  {, }: ∆dist = + 1 (or 0)  o.w.: ∆dist = 0.

Summary: Coupling Pros: Can yield very easy proofs Cons: Demands a lot from the chain Extensions:  Careful coupling (k ≥ 2d) (Jerrum’95)  Change the MC (Luby-R-Sinclair’95)  “Macromoves” - burn in (Dyer-Frieze’01, Molloy’02) - non-Markovian couplings (Hayes-Vigoda’03)

Outline Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

Conductance and flows Ω (Jerrum-Sinclair’88)  = min  (S) S  Ω, π(S)≤1/2 S S C  (S) = ∑ π(s) P(s,s’) ∑ π(s) s  S, s’  S C sS sS 22 Thm: ≤ Gap(P) ≤ 2  2

x y Min cut Max flow  paths: {  xy : from x  Ω, to y  Ω, x ≠ y, carrying π(x)π(y) units of flow. }   : Make |Ω| 2 canonical (Sinclair’92) Q(e) = π(u) P(u,v) = π(v) P(v,u).  Capacity of e=(u,v): e  = min  l   ( l     is the max path length ) _  (  ) = max ∑ π(x) π(y) Q(e) 1  xy e  e  The congestion of these paths is: Ω Thm:  ≤  log (  π(x)) -1. _

Ex 3: Back to the hypercube - The complementary pair (u’,v’) determines (s,t), so |    xy e | = 2 n-1.  and l  = n   = Õ(n 2 ).  (  ) = max = = n Q(e) ∑ π(x) π(y)  xy e  e 2 n n 2 -n (1/2n)   s = t = Ex 3: Back to the hypercube s = t = Ex 3: Back to the hypercube s = t = Ex 3: Back to the hypercube s = t = Ex 3: Back to the hypercube t = u = v = = s u’ = v’ = - Bound the number of paths through (u,v)  E. - Define a canonical path from s to t.

Outline Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

Ex 4: Sampling matchings

MC MATCH :  Starting at M 0, repeat :  Pick e = (u,v)  E - If e  M, remove e; - If u and v unmatched in M, add e; - If u matched (by e’) and v unmatched (or vice versa), add e and remove e’; - Otherwise do nothing. e u v u v e e’ e u v Thm: Coupling won’t work! (Kumar-Ramesh’99)

Mixing time of MC MATCH s t s  t s t u v paths using (u,v) determined by u’... as before.  u’ 

Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights Outline

Goal: Given, sample ind. set I with prob : π(I) = |I| /Z, Z = ∑ J |J|. Ex 5: Independent Sets MC IND : Starting at I 0, Repeat: - Pick v  V and b  {0,1}; - If v  I, b=0, remove v w.p. min (1, -1 ) - If v  I, b=1, add v w.p. min (1, ) if possible; - O.w. do nothing. /

Slow mixing of MC IND (large ) n     (n   n/2)  1 0 ∞ SSCSC large there is a “bad cut,”... so MC IND is slowly mixing.   #R/#B (Even) (Odd)

Summary: Flows Pros: Offers a combinatorial approach to mixing; especially useful for proving slow mixing. Cons: Requires global knowledge of the chain to spread out paths. Extensions: Balanced flows (Morris-Sinclair’99) MCMC -- Major highlights: - The permanent (Jerrum-Sinclair-Vigoda’02) - Volume of a convex polytope (Dyer-Frieze-Kannan’89, +… )

Techniques: Coupling Flows and paths Indirect methods - Comparison - Decomposition Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights Outline

Comparison (Diaconis,Saloff-Coste’93) unknown P known P _ w z For each edge (x,y)  P, make a path  x,y using edges in P. Let  (z,w) be the set of paths  x,y using (z,w) _ x y Thm: Gap(P) ≥ Gap(P). _ 1 A A = max { ∑ |  x, y |π(x)P(x,y) } 1 Q(e) e   xy e _ 

Comparison w z (x,y)  P  x,y (using P)  (z,w) is the set of paths  x,y using (z,w) Thm: Gap(P) ≥ Gap(P). _ 1 A xy _ known P unknown P _ S S _ S S _  (S,S) cannot be a bad cut in P if it isn’t in P. _ _ 

Adjacency... The ˆ Matrix Reloaded Comparison, aka...

Disjoint decomposition Ω A1A1 A3A3 A2A2 A6A6 A5A5 A4A4 a1a1 a3a3 a4a4 a2a2 a5a5 a6a6 P — Projection P3P3 Restrictions P _ π(a i ) = π(A i ) P(a i,a j ) = ∑ π(x)P(x,y) π(A i ) x  A i, y  A j _ (Madras-R.’96, Martin-R.’00) Thm: Gap(P) ≥ — Gap(P) (min i Gap(P i )). 1 2 _

Let Ω = {ind. sets of G}; Ω k = {ind. sets of size k}. For G=(V,E): Ex 6: MC IND on small ind. sets MC SWAP : Starting at I 0, Repeat: - Pick (u,v,b)  V x V x {0,1,2}; - If b=0 and u  V, remove u w.p. min (1, -1 ) - If b=1 and u  V, add u w.p. min (1, ) if possible; - If b=2 remove u and add v (if possible); - O.w. do nothing. * Consider first the “swap” chain: / Thm: MC IND is rapidly mixing on  Ω k, where K = |V|/2(∆+1). k = 0 K

Ind. sets w/bounded size (cont.) Thm: MC IND is rapidly mixing on  Ω k, where K=|V|/2(∆+1). k = 1 K Ω 0 Ω 1 Ω 2... Ω K-1 Ω K ΩkΩk a 0 a 1 a 2...a K-1 a K ProjectionRestrictions |Ω K | is logconcave,... so P is rapidly mixing. _. ? MC SWAP

The Restrictions of MC swap Ω 0 Ω 1 Ω 2... Ω K-1 Ω K ΩkΩk ProjectionRestrictions. Thm: MC SWAP is rapidly mixing on Ω k, k < K. (Bubley-Dyer’97). K Thm: MC SWAP is rapidly mixing on Ω k.  k = 1 (Decomposition) Cor: MC IND is rapidly mixing on Ω k.   k = 1 K (Comparison)

Summary: Indirect methods Pros: Offer a top down approach; allow hybrid methods to be used.. Extensions: Comparison thm for log-Sobolev (Diaconis-Saloff-Coste’96) Comparison for Glauber dynamics (R.-Tetali ‘98) Decomposition for log-Sobolev (Jerrum-Son-Tetali-Vigoda ‘02) Cons: Can increase the complexity.

Techniques: Coupling Flows and paths Hybrid methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights Outline

 They have a need for sampling  Use many interesting heuristics  Great intuition  Experts on “large data sets’’  Microscopic Macroscopic details behavior (i.e., phase transitions) Why Statistical Physics?

(3-colorings) (Independent sets) (Matchings) (Min cut) Models from statistical physics Potts model Hardcore model Dimer model Ising model +

 Independent sets: π(I)= |I| /Z Models (cont.)  Matchings: π(M)=  |M| /Z  Ising model: π(  )= |E | /Z, E = = {u v:  (u) =  (v)} (E = E = E ≠ ) ˜ = 

Models: ( The physics perspective)  Independent sets: H(  ) = -|I| If  = e  then π(  ) = |I| /Z. Given: A physical system Ω = {  } Define: A Gibbs measure as follows: π(  ) = e -  H(  ) / Z, H(  ) (the Hamiltonian),  = 1/kT (inverse temperature), normalizing constant or partition function. where Z = ∑  e -   H (  ) is the  Ising model: H(  ) = -∑  u  v (u,v)  E If = e 2  then π(  ) = |E | /Z. =

Physics perspective (cont.) Q: What about on the infinite lattice? Use conditional probabilities: ? But there can be boundary effects !!!

Phase transitions: Ind. sets Low temperature: long range effects High temperature: ∂ effects die out On finite regions … … T∞T∞ T0T0 TcTc T C indicates a “phase transition.”

Slow mixing of MC IND revisited ∞ S SCSC n     (n   n)  #R/#B 1 0 π(S i ) = ∑ π(s) e -  H(s) / Z Si Si sSisSi “Entropy “Energy term” term”

Group by # of “fault lines” SSCSC... Fault lines are vacant paths of width 2 from top to bottom (or left to right). SRSR S 1 SBSB S3S3 S2 S2

“Peierls Argument” 2. Shift right of fault by 1 and flip colors. For fixed path length l, S 1 S B x 2 n/2 x 3 l. 1. Identify horizontal or vertical fault line. (  S 1 ) 3. Remove rt column ; add points along fault line, if possible. ( SB)( SB)

Peierls Argument cont. ≤ 2 n/2 3 l S1S1 SBSB ( ≥ l - n/2 more points) ≤ π(S B ) 2 n /2 3 n ( n /2 ) (poly(n)) / n ) ≤ π(S B ) ( ) n/2 (poly(n)), if   > π(S 1 ) = ∑ π(  )  e S 1 ≤ ∑ ∑ π(  ) 2 n/2 3 l (n/2- l )   l  e S B (and similarly for S 2, S 3, …) 

Conclusions Techniques: Coupling: can be easy when it works Flows: requires global knowledge of chain; very useful for slow mixing Connection to physics: can offer tremendous insights Open problems:... Indirect methods: top down approach; often increases complexity

Conclusions Open problems:...  Sampling 4,5,6-colorings on the grid.  Sampling perfect matchings on non-bipartite graphs.  Sampling acyclic orientations in a graph.  Sampling configurations of the Potts model (a generalization of Ising, but with more colors).  How can we further exploit phase transitions? Other physical intuition?