Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: )
Outline Fundamentals for designing a Markov chain Bounding running times (convergence rates) Connections to statistical physics
Main Q: What do typical elements look like? Determine properties of “typical’’ elements Evaluate thermodynamic properties (such as free energy, entropy,…) Estimate the cardinality of the set “Markov chain Monte Carlo’’ Random sampling can be used to: Markov chains for sampling Given: A large set (matchings, colorings, independent sets,…)
A A K K 2 2 Andrei Andreyevich Markov Markov chains
Sampling using Markov chains State space Ω ( |Ω| ~ c n )
Sampling using Markov chains State space Ω Step 1. Connect the state space. ( |Ω| ~ c n ) E.g., if Ω = indep. sets of a graph G, connect I and I’ iff |I I’| = 1.
Basics of Markov chains Starting at x: - Pick a neighbor y. - Move to y with prob. P(x,y) = 1/∆. - With all remaining prob. stay at x. Transitions P: Random walk on H (max deg in H) H Def’n: A MC is ergodic if it is: irreducible - for all x,y Ω, t: P t (x,y) > 0; (connected) aperiodic - g.c.d. { t: P t (x,y) > 0 } =1. (not bipartite) (The “t step” transition prob.) x y
The stationary distribution (1/∆ /∆) Thm: Any finite, ergodic MC converges to a unique stationary distribution π. Thm: The stationary distribution π satisfies: (The detailed balance condition) π(x) P(x,y) = π(y) P(y,x). P symmetric π is uniform. So,
E.g., For >0, sample ind. set I w/ prob: π(I) = where Z = ∑ J |J|. | I | Z Q: What if we want to sample from some other distribution? Sampling from non- uniform distributions Step 2. Carefully define the transition probabilities.
The Metropolis Algorithm Propose a move from x to y as before, but accept with probability min (1, π(y)/π(x)) (with remaining probability stay at x). (MRRTT ’53) π(y)/∆π(x) 1 π(y) π(x) x y ( if π(x) ≥ π(y) ) π(x) P(x,y) = π(y) P(y,x) 1/∆ For independent sets: min(1, ) I I {v} min(1, -1 ) π(y) (|I|+1) /Z π(x) (|I|) /Z = ==
Q: But for how long do we walk? Basics continued… Step 1. Connect the state space. Step 2. Carefully define the transition probabilities. Starting at any state x 0, take a random walk for some number of steps... and output the final state (from ?). Step 3. Bound the mixing time. This tells us the number of steps to take.
The mixing rate Def’n: The total variation distance is ||P t,π|| = max __ ∑ |P t (x,y) - π(x)|. x Ω y Ω 2 1 A Markov chain is rapidly mixing if ( ) is poly (n, log( -1 )). Def’n Given , the mixing time is = min { t: ||P t’,π|| < , t’ ≥ t }. A
Spectral gap Let > ≥ … ≥ Ω be the eigenvalues of P. Def’n: Gap(P) = 1-| 2 | is the spectral gap. Mixing rate Spectral Gap Thm: (Alon, Alon-Milman, Sinclair) ≤ log ( ) ≥ log ( ). Gap(P) 1 2 Gap(P) | 2 | 1 π*π* 1 22
Outline Fundamentals for designing a Markov chain Bounding running times (convergence rates) Connections to statistical physics
Outline for rest of talk Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights
Once they agree, they move in sync (x t =y t x t+1 =y t+1 ) Couple moves, but each simulates the MC Start at any x 0 and y 0 x0x0 y0y0 Simulate 2 processes:
Def’n: A coupling is a MC on Ω x Ω: 1)Each process {X t }, {Y t } is a faithful copy of the original MC, 2)If X t = Y t, then X t+1 = Y t+1. Coupling T = max ( E [ T x,y ] ), where T x,y = min {t: X t =Y t | X 0 =x, Y 0 =y}. x,y The coupling time T is: Thm: ( ) ≤ T e ln -1. (Aldous’81)
Ex1: Walk on the hypercube MC CUBE : Start at v 0 =(0,0,…,0). Repeat: - Pick i [n], b {0,1}. - Set v i = b. Symmetric, ergodic π is uniform. Mixing time? Use coupling: x 0 = y 0 = i=2, b=0: x 1 = y 1 = i=6, b=1: x 2 = y 2 = i=1, b=1: x t = y t = so T = n log n (coupon collecting) ( ) = O ( n ln (n -1 ).
Outline Techniques: Coupling - path coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights
Ex 2: Colorings Given: A graph G (max deg d), k > 1. Goal: Find a random k-coloring of G. MC COL : (Single point replacement) Starting at some k-coloring C 0 Repeat: - With prob 1/2 do nothing. - Pick v V, c [k]; - Recolor v with c, if possible. The “lazy” chain If k ≥ d + 2, then the state space is connected. (Therefore π is uniform.) Note: k ≥ d + 1 colorings exist. (Greedy)
Path Coupling Coupling: Show for all x,y , E[ (dist(x,y)) ] < 0. Path coupling: Show for all u,v s.t. dist(u,v)=1, that E[ (dist(u,v)) ] < Consider a shortest path: x = z 0, z 1, z 2,..., z r = y, dist(z i,z i+1 ) = 1 dist(x,y) = r. [Bubley,Dyer,Greenhill’97-8] E[ (dist(x,y)) ] ≤ i E[ (dist(z i,z i+1 )) ] ≤ 0.
Path coupling for MC COL Thm: MC COL is rapidly mixing if k ≥ 3d. (Jerrum ‘95) Pf: Use path coupling: dist(x,y) = 1. x y ww E∆dist ≤ ( (k-d)(-1) + 2d(+1) ) = (3d-k) ≤ nk 1 v = w, c C \ {,, }: ∆dist = -1, Cases: v N(w), c {, }: ∆dist = + 1 (or 0) o.w.: ∆dist = 0.
Summary: Coupling Pros: Can yield very easy proofs Cons: Demands a lot from the chain Extensions: Careful coupling (k ≥ 2d) (Jerrum’95) Change the MC (Luby-R-Sinclair’95) “Macromoves” - burn in (Dyer-Frieze’01, Molloy’02) - non-Markovian couplings (Hayes-Vigoda’03)
Outline Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights
Conductance and flows Ω (Jerrum-Sinclair’88) = min (S) S Ω, π(S)≤1/2 S S C (S) = ∑ π(s) P(s,s’) ∑ π(s) s S, s’ S C sS sS 22 Thm: ≤ Gap(P) ≤ 2 2
x y Min cut Max flow paths: { xy : from x Ω, to y Ω, x ≠ y, carrying π(x)π(y) units of flow. } : Make |Ω| 2 canonical (Sinclair’92) Q(e) = π(u) P(u,v) = π(v) P(v,u). Capacity of e=(u,v): e = min l ( l is the max path length ) _ ( ) = max ∑ π(x) π(y) Q(e) 1 xy e e The congestion of these paths is: Ω Thm: ≤ log ( π(x)) -1. _
Ex 3: Back to the hypercube - The complementary pair (u’,v’) determines (s,t), so | xy e | = 2 n-1. and l = n = Õ(n 2 ). ( ) = max = = n Q(e) ∑ π(x) π(y) xy e e 2 n n 2 -n (1/2n) s = t = Ex 3: Back to the hypercube s = t = Ex 3: Back to the hypercube s = t = Ex 3: Back to the hypercube s = t = Ex 3: Back to the hypercube t = u = v = = s u’ = v’ = - Bound the number of paths through (u,v) E. - Define a canonical path from s to t.
Outline Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights
Ex 4: Sampling matchings
MC MATCH : Starting at M 0, repeat : Pick e = (u,v) E - If e M, remove e; - If u and v unmatched in M, add e; - If u matched (by e’) and v unmatched (or vice versa), add e and remove e’; - Otherwise do nothing. e u v u v e e’ e u v Thm: Coupling won’t work! (Kumar-Ramesh’99)
Mixing time of MC MATCH s t s t s t u v paths using (u,v) determined by u’... as before. u’
Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights Outline
Goal: Given, sample ind. set I with prob : π(I) = |I| /Z, Z = ∑ J |J|. Ex 5: Independent Sets MC IND : Starting at I 0, Repeat: - Pick v V and b {0,1}; - If v I, b=0, remove v w.p. min (1, -1 ) - If v I, b=1, add v w.p. min (1, ) if possible; - O.w. do nothing. /
Slow mixing of MC IND (large ) n (n n/2) 1 0 ∞ SSCSC large there is a “bad cut,”... so MC IND is slowly mixing. #R/#B (Even) (Odd)
Summary: Flows Pros: Offers a combinatorial approach to mixing; especially useful for proving slow mixing. Cons: Requires global knowledge of the chain to spread out paths. Extensions: Balanced flows (Morris-Sinclair’99) MCMC -- Major highlights: - The permanent (Jerrum-Sinclair-Vigoda’02) - Volume of a convex polytope (Dyer-Frieze-Kannan’89, +… )
Techniques: Coupling Flows and paths Indirect methods - Comparison - Decomposition Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights Outline
Comparison (Diaconis,Saloff-Coste’93) unknown P known P _ w z For each edge (x,y) P, make a path x,y using edges in P. Let (z,w) be the set of paths x,y using (z,w) _ x y Thm: Gap(P) ≥ Gap(P). _ 1 A A = max { ∑ | x, y |π(x)P(x,y) } 1 Q(e) e xy e _
Comparison w z (x,y) P x,y (using P) (z,w) is the set of paths x,y using (z,w) Thm: Gap(P) ≥ Gap(P). _ 1 A xy _ known P unknown P _ S S _ S S _ (S,S) cannot be a bad cut in P if it isn’t in P. _ _
Adjacency... The ˆ Matrix Reloaded Comparison, aka...
Disjoint decomposition Ω A1A1 A3A3 A2A2 A6A6 A5A5 A4A4 a1a1 a3a3 a4a4 a2a2 a5a5 a6a6 P — Projection P3P3 Restrictions P _ π(a i ) = π(A i ) P(a i,a j ) = ∑ π(x)P(x,y) π(A i ) x A i, y A j _ (Madras-R.’96, Martin-R.’00) Thm: Gap(P) ≥ — Gap(P) (min i Gap(P i )). 1 2 _
Let Ω = {ind. sets of G}; Ω k = {ind. sets of size k}. For G=(V,E): Ex 6: MC IND on small ind. sets MC SWAP : Starting at I 0, Repeat: - Pick (u,v,b) V x V x {0,1,2}; - If b=0 and u V, remove u w.p. min (1, -1 ) - If b=1 and u V, add u w.p. min (1, ) if possible; - If b=2 remove u and add v (if possible); - O.w. do nothing. * Consider first the “swap” chain: / Thm: MC IND is rapidly mixing on Ω k, where K = |V|/2(∆+1). k = 0 K
Ind. sets w/bounded size (cont.) Thm: MC IND is rapidly mixing on Ω k, where K=|V|/2(∆+1). k = 1 K Ω 0 Ω 1 Ω 2... Ω K-1 Ω K ΩkΩk a 0 a 1 a 2...a K-1 a K ProjectionRestrictions |Ω K | is logconcave,... so P is rapidly mixing. _. ? MC SWAP
The Restrictions of MC swap Ω 0 Ω 1 Ω 2... Ω K-1 Ω K ΩkΩk ProjectionRestrictions. Thm: MC SWAP is rapidly mixing on Ω k, k < K. (Bubley-Dyer’97). K Thm: MC SWAP is rapidly mixing on Ω k. k = 1 (Decomposition) Cor: MC IND is rapidly mixing on Ω k. k = 1 K (Comparison)
Summary: Indirect methods Pros: Offer a top down approach; allow hybrid methods to be used.. Extensions: Comparison thm for log-Sobolev (Diaconis-Saloff-Coste’96) Comparison for Glauber dynamics (R.-Tetali ‘98) Decomposition for log-Sobolev (Jerrum-Son-Tetali-Vigoda ‘02) Cons: Can increase the complexity.
Techniques: Coupling Flows and paths Hybrid methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights Outline
They have a need for sampling Use many interesting heuristics Great intuition Experts on “large data sets’’ Microscopic Macroscopic details behavior (i.e., phase transitions) Why Statistical Physics?
(3-colorings) (Independent sets) (Matchings) (Min cut) Models from statistical physics Potts model Hardcore model Dimer model Ising model +
Independent sets: π(I)= |I| /Z Models (cont.) Matchings: π(M)= |M| /Z Ising model: π( )= |E | /Z, E = = {u v: (u) = (v)} (E = E = E ≠ ) ˜ =
Models: ( The physics perspective) Independent sets: H( ) = -|I| If = e then π( ) = |I| /Z. Given: A physical system Ω = { } Define: A Gibbs measure as follows: π( ) = e - H( ) / Z, H( ) (the Hamiltonian), = 1/kT (inverse temperature), normalizing constant or partition function. where Z = ∑ e - H ( ) is the Ising model: H( ) = -∑ u v (u,v) E If = e 2 then π( ) = |E | /Z. =
Physics perspective (cont.) Q: What about on the infinite lattice? Use conditional probabilities: ? But there can be boundary effects !!!
Phase transitions: Ind. sets Low temperature: long range effects High temperature: ∂ effects die out On finite regions … … T∞T∞ T0T0 TcTc T C indicates a “phase transition.”
Slow mixing of MC IND revisited ∞ S SCSC n (n n) #R/#B 1 0 π(S i ) = ∑ π(s) e - H(s) / Z Si Si sSisSi “Entropy “Energy term” term”
Group by # of “fault lines” SSCSC... Fault lines are vacant paths of width 2 from top to bottom (or left to right). SRSR S 1 SBSB S3S3 S2 S2
“Peierls Argument” 2. Shift right of fault by 1 and flip colors. For fixed path length l, S 1 S B x 2 n/2 x 3 l. 1. Identify horizontal or vertical fault line. ( S 1 ) 3. Remove rt column ; add points along fault line, if possible. ( SB)( SB)
Peierls Argument cont. ≤ 2 n/2 3 l S1S1 SBSB ( ≥ l - n/2 more points) ≤ π(S B ) 2 n /2 3 n ( n /2 ) (poly(n)) / n ) ≤ π(S B ) ( ) n/2 (poly(n)), if > π(S 1 ) = ∑ π( ) e S 1 ≤ ∑ ∑ π( ) 2 n/2 3 l (n/2- l ) l e S B (and similarly for S 2, S 3, …)
Conclusions Techniques: Coupling: can be easy when it works Flows: requires global knowledge of chain; very useful for slow mixing Connection to physics: can offer tremendous insights Open problems:... Indirect methods: top down approach; often increases complexity
Conclusions Open problems:... Sampling 4,5,6-colorings on the grid. Sampling perfect matchings on non-bipartite graphs. Sampling acyclic orientations in a graph. Sampling configurations of the Potts model (a generalization of Ising, but with more colors). How can we further exploit phase transitions? Other physical intuition?