Dana Randall Georgia Tech

Dana Randall Georgia Tech
Mixing A tutorial on Markov chains Dana Randall Georgia Tech ( Slides at: )

Outline Fundamentals for designing a Markov chain
Bounding running times (convergence rates) Connections to statistical physics

Markov chains for sampling
Given: A large set (matchings, colorings, independent sets,…) Main Q: What do typical elements look like? Determine properties of “typical’’ elements Evaluate thermodynamic properties (such as free energy, entropy,…) Estimate the cardinality of the set “Markov chain Monte Carlo’’ Random sampling can be used to:

Markov chains A K 2 Andrei Andreyevich Markov

Sampling using Markov chains
State space Ω ( |Ω| ~ cn )

Sampling using Markov chains
State space Ω ( |Ω| ~ cn ) Step 1. Connect the state space. E.g., if Ω = indep. sets of a graph G, connect I and I’ iff |I I’| = 1.

Basics of Markov chains
x y H Starting at x: - Pick a neighbor y. - Move to y with prob. P(x,y) = 1/∆. - With all remaining prob. stay at x. Transitions P: Random walk on H (max deg in H) Def’n: A MC is ergodic if it is: irreducible - for all x,y Î Ω, $ t: Pt(x,y) > 0; (connected) aperiodic - g.c.d. { t: Pt(x,y) > 0 } =1. (not bipartite) (The “t step” transition prob.)

The stationary distribution p
Thm: Any finite, ergodic MC converges to a unique stationary distribution π. Thm: The stationary distribution π satisfies: (The detailed balance condition) π(x) P(x,y) = π(y) P(y,x). (1/∆) (1/∆) P symmetric implies π is uniform. So,

Sampling from non-uniform distributions
Q: What if we want to sample from some other distribution? E.g., For l>0, sample ind. set I w/ prob: π(I) = where Z = ∑J l|J|. l0 l2 l1 l|I| Z Step 2. Carefully define the transition probabilities.

The Metropolis Algorithm
(MRRTT ’53) Propose a move from x to y as before, but accept with probability min (1, π(y)/π(x)) (with remaining probability stay at x). π(y)/∆π(x) 1 π(y) π(x) x y ( if π(x) ≥ π(y) ) π(x) P(x,y) = π(y) P(y,x) 1/∆ For independent sets: min(1,l) I I {v} Ç min(1,l-1) π(y) l(|I|+1)/Z π(x) l(|I|)/Z = = l

Basics continued… Step 1. Connect the state space.
Step 2. Carefully define the transition probabilities. Starting at any state x0, take a random walk for some number of steps and output the final state (from p?). Q: But for how long do we walk? Step 3. Bound the mixing time. This tells us the number of steps to take.

||Pt,π|| = max __ ∑ |Pt(x,y) - π(x)|.
The mixing rate Def’n: The total variation distance is ||Pt,π|| = max __ ∑ |Pt(x,y) - π(x)|. 1 xÎ Ω 2 yÎ Ω Def’n Given e, the mixing time is t(e) = min {t: ||Pt’,π|| < e, t’ ≥ t}. A A Markov chain is rapidly mixing if t(e) is poly (n, log(e-1)).

Spectral gap t(e) ≤ log ( ) Let 1 = l1 > |l2| ≥ … ≥ |l|Ω|| be
the eigenvalues of P. Def’n: Gap(P) = 1-|l2| is the spectral gap. Thm: (Alon, Alon-Milman, Sinclair) t(e) ≤ log ( ) t(e) ≥ log ( ). Gap(P) 1 2 Gap(P) |l2| π*e 2e Mixing rate Spectral Gap

Outline Fundamentals for designing a Markov chain
Bounding running times (convergence rates) Connections to statistical physics

Outline for rest of talk
Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

Coupling y0 Simulate 2 processes: Start at any x0 and y0 x0
Couple moves, but each simulates the MC Once they agree, they move in sync (xt=yt xt+1=yt+1)

Coupling Def’n: A coupling is a MC on Ω x Ω: The coupling time T is:
Each process {Xt}, {Yt} is a faithful copy of the original MC, If Xt = Yt, then Xt+1 = Yt+1. T = max ( E [ Tx,y ] ), where Tx,y = min {t: Xt=Yt | X0=x, Y0=y}. x,y The coupling time T is: Thm: t(e) ≤ T e ln e-1 . (Aldous’81)

Ex1: Walk on the hypercube
MCCUBE: Start at v0=(0,0,…,0). Repeat: - Pick i Î [n], b Î {0,1}. - Set vi = b. Symmetric, ergodic π is uniform. Mixing time? Use coupling: x0 = y0 = i=2, b=0: x1 = y1 = i=6, b=1: x2 = y2 = i=1, b=1: xt = yt = . . . so T = n log n (coupon collecting) ® t(e) = O ( n ln (n e-1). x

Outline Techniques: Problems: Connections with statistical physics:
Coupling - path coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

Ex 2: Colorings Given: A graph G (max deg d), k > 1.
Goal: Find a random k-coloring of G. MCCOL: (Single point replacement) Starting at some k-coloring C0 Repeat: - With prob 1/2 do nothing. - Pick v Î V, c Î [k]; - Recolor v with c, if possible. The “lazy” chain Note: k ≥ d ® colorings exist. (Greedy) If k ≥ d + 2, then the state space is connected. (Therefore π is uniform.)

Path Coupling ≤ 0. Coupling: Show for all x,y Î W,
[Bubley,Dyer,Greenhill’97-8] Coupling: Show for all x,y Î W, E[ D (dist(x,y)) ] ≤ 0. Path coupling: Show for all u,v s.t. dist(u,v)=1, that E[ D (dist(u,v)) ] ≤ 0. Consider a shortest path: x = z0, z1, z2, , zr= y, dist(zi,zi+1) = 1 dist(x,y) = r. ® E[ D (dist(x,y)) ] ≤ Si E[ D (dist(zi,zi+1)) ] ≤ 0.

Path coupling for MCCOL
Thm: MCCOL is rapidly mixing if k ≥ 3d (Jerrum ‘95) Pf: Use path coupling: dist(x,y) = 1. x y w v = w, c Î C \ { , , }: ∆dist = -1, Cases: v Î N(w), c Î { , }: ∆dist = + 1 (or 0) o.w.: ∆dist = 0. E∆dist ≤ ( (k-d)(-1) + 2d(+1) ) = (3d-k) ≤ 0. 1 2nk x

Summary: Coupling Pros: Can yield very easy proofs
Cons: Demands a lot from the chain Extensions: Careful coupling (k ≥ 2d) (Jerrum’95) Change the MC (Luby-R-Sinclair’95) “Macromoves” - burn in (Dyer-Frieze’01, Molloy’02) - non-Markovian couplings (Hayes-Vigoda’03)

Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

Conductance and flows Ω SC F(S) = S ∑ π(s) F = min F(S) F2
(Jerrum-Sinclair’88) Ω F = min F(S) SÍΩ, π(S)≤1/2 S SC F(S) = ∑ π(s) P(s,s’) ∑ π(s) sÎS, s’ÎSC sÎS F2 Thm: ≤ Gap(P) ≤ 2 F. 2

Min cut, Max flow x y G: Make |Ω|2 canonical Ω Q(e) = π(u) P(u,v)
(Sinclair’92) G: Make |Ω|2 canonical paths: { gxy: from xÎΩ, to yÎΩ, x ≠ y, carrying π(x)π(y) units of flow. } Ω Q(e) = π(u) P(u,v) = π(v) P(v,u). Capacity of e=(u,v): e r(G) = max ∑ π(x) π(y) Q(e) 1 gxy e Î e The congestion of these paths is: r = min r(G) l(G) G (l(G) is the max path length ) _ Thm: t(e) ≤ r log (e π(x))-1. _

Ex 3: Back to the hypercube
s = t = Ex 3: Back to the hypercube s = t = Ex 3: Back to the hypercube s = t = Ex 3: Back to the hypercube s = t = Ex 3: Back to the hypercube t = Ex 3: Back to the hypercube u’ = v’ = = s u = v = - Define a canonical path from s to t. - Bound the number of paths through (u,v) Î E. The “complementary pair” (u’,v’) determines (s,t), so | gxy e | = 2n-1. Î and l(G) = n so t(e) = Õ(n2). r(G) = max = = n Q(e) ∑ π(x) π(y) gxy e Î e 2n-1 2-2n 2-n (1/2n) x

Ex 4: Sampling matchings
MCMATCH: Starting at M0, repeat: Pick e = (u,v) Î E e u v - If e Î M, remove e; - If u and v unmatched in M, add e; - If u matched (by e’) and v unmatched (or vice versa), add e and remove e’; - Otherwise do nothing. e u v u v e e’ Thm: Coupling won’t work! (Kumar-Ramesh’99)

Mixing time of MCMATCH s t s t Å s u’ u v . . . as before. x t
paths using (u,v) determined by u’ . . . as before. x u’

Ex 5: Independent Sets Goal: Given l, sample ind. set I Z = ∑J l|J|.
with prob: π(I) = l|I|/Z, Z = ∑J l|J|. MCIND: Starting at I0, Repeat: - Pick v Î V and b Î {0,1}; - If v Î I, b=0, remove v w.p. min (1,l-1) - If v Ï I, b=1, add v w.p. min (1,l) if possible; - O.w. do nothing.

Slow mixing of MCIND (large l)
(Even) (Odd) n2/2 l (n2/2-n/2) 1 ∞ S SC l large ® there is a “bad cut,” . . . so MCIND is slowly mixing x #R/#B

Summary: Flows Pros: Offers a combinatorial
approach to mixing; especially useful for proving slow mixing. Cons: Requires global knowledge of the chain to spread out paths. Extensions: Balanced flows (Morris-Sinclair’99) MCMC -- Major highlights: - The permanent (Jerrum-Sinclair-Vigoda’02) - Volume of a convex polytope (Dyer-Frieze-Kannan’89, +… )

Coupling Flows and paths Indirect methods - Comparison - Decomposition Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

Comparison _ P P unknown known A = max { ∑ |gx,y|π(x)P(x,y)} _
(Diaconis,Saloff-Coste’93) unknown P known P _ w z For each edge (x,y) Î P, make a path gx,y using edges in P. Let G(z,w) be the set of paths gx,y using (z,w) _ x y A = max { ∑ |gx,y|π(x)P(x,y)} 1 Q(e) e gxy e _ Î Thm: Gap(P) ≥ Gap(P). _ 1 A

Comparison ˜ P G(z,w) is the set P _ Thm: Gap(P) ≥ Gap(P).
x y known P _ _ (x,y) Î P gx,y (using P) G(z,w) is the set of paths gx,y using (z,w) w unknown P z _ 1 Thm: Gap(P) ≥ Gap(P). A S _ ˜ (S,S) cannot be a bad cut in P if it isn’t in P.

Comparison, aka . . . Adjacency . . . The ˆ Matrix Reloaded

Disjoint decomposition
(Madras-R.’96, Martin-R.’00) A2 A1 P A5 A3 A4 A6 Ω P3 Restrictions a1 a3 a4 a2 a5 a6 P — Projection _ π(ai) = π(Ai) P(ai,aj) = ∑ π(x)P(x,y) π(Ai) xÎAi, yÎAj Thm: Gap(P) ≥ — Gap(P) (mini Gap(Pi)). 1 2 _

Ex 6: MCIND on small ind. sets
For G=(V,E): Let Ω = {ind. sets of G}; Ωk = {ind. sets of size k}. Thm: MCIND is rapidly mixing on Ç Ωk , where K = |V|/2(∆+1). k = 0 K MCSWAP: Starting at I0, Repeat: - Pick (u,v,b) Î V x V x {0,1,2}; - If b=0 and u Î V, remove u w.p. min (1,l-1) - If b=1 and u Î V, add u w.p. min (1,l) if possible; - If b=2 remove u and add v (if possible); - O.w. do nothing. * Consider first the “swap” chain: /

Ind. sets w/bounded size (cont.)
Thm: MCIND is rapidly mixing on K Ç Ωk , where K=|V|/2(∆+1). k = 1 MCSWAP Ω0 Ω1 Ω ΩK-1 ΩK Restrictions Projection a0 a1 a aK-1 aK Ωk |ΩK| is logconcave, . . . so P is rapidly mixing. _ . ?

The Restrictions of MCswap
Ω0 Ω1 Ω ΩK-1 ΩK Restrictions Projection Ωk Thm: MCSWAP is rapidly mixing on Ωk , k < K. (Bubley-Dyer’97) . . K Thm: MCSWAP is rapidly mixing on Ωk . Ç k = 1 (Decomposition) Cor: MCIND is rapidly mixing on Ωk . x Ç k = 1 K (Comparison)

Summary: Indirect methods
Pros: Offer a top down approach; allow hybrid methods to be used.. Cons: Can increase the complexity. Extensions: Comparison thm for log-Sobolev (Diaconis-Saloff-Coste’96) Comparison for Glauber dynamics (R.-Tetali ‘98) Decomposition for log-Sobolev (Jerrum-Son-Tetali-Vigoda ‘02)

Coupling Flows and paths Hybrid methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

Why Statistical Physics?
They have a need for sampling Use many interesting heuristics Great intuition Experts on “large data sets’’ Microscopic Macroscopic details behavior (i.e., phase transitions)

Models from statistical physics
Potts model Hardcore model (3-colorings) (Independent sets) (Matchings) (Min cut) - - - + - - + Ising model Dimer model

Models (cont.) - + π(I)=l|I|/Z π(M)=m|M|/Z π(s)= n|E |/Z,
Independent sets: π(I)=l|I|/Z Matchings: π(M)=m|M|/Z Ising model: π(s)= n|E |/Z, E= = {u v: s(u) = s(v)} (E = E= U E≠) ˜ - + =

Models: (The physics perspective)
Given: A physical system Ω = {s} Define: A Gibbs measure as follows: H(s) (the Hamiltonian), b = 1/kT (inverse temperature), π(s) = e-bH(s)/ Z, normalizing constant or partition function. where Z = ∑t e-b H(t) is the Independent sets: H(s) = -|I| If l = eb then π(s) = l|I| /Z. Ising model: H(s) = -∑ su sv (u,v) Î E If n = e2b then π(s) = n|E | /Z. =

Physics perspective (cont.)
Q: What about on the infinite lattice? Use conditional probabilities: But there can be boundary effects !!! ?

Phase transitions: Ind. sets
On finite regions … T∞ T0 Tc Low temperature: long range effects High temperature: ∂ effects die out TC indicates a “phase transition.”

Slow mixing of MCIND revisited
(n 2/2-n) S SC 1 π(Si) = ∑ π(s) e-bH(s)/Z Si sÎSi #R/#B ∞ “Entropy “Energy term” term”

Group by # of “fault lines”
Fault lines are vacant paths of width 2 from top to bottom (or left to right). SR S1 SB S3 S2 S SC . . .

“Peierls Argument” (Î SB) (Î S1) For fixed path length l, S1
3. Remove rt column ; add points along fault line, if possible. (Î SB) 1. Identify horizontal or vertical fault line . (Î S1) 2. Shift right of fault by 1 and flip colors. For fixed path length l, S1 SB x 2n/2 x 3l.

Peierls Argument cont. S1 SB ≤ 2n/2 3l π(S1) = ∑ π(t)
( ≥ l - n/2 more points) ≤ π(SB) 2n/2 3n l(n/2) (poly(n)) /ln) ≤ π(SB) ( )n/2 (poly(n)), if l > 18. 18 l π(S1) = ∑ π(t) teS1 ≤ ∑ ∑ π(s) 2n/2 3l l(n/2-l) l seSB (and similarly for S2, S3, …) x

Conclusions Techniques: Coupling: can be easy when
it works Flows: requires global knowledge of chain; very useful for slow mixing Indirect methods: top down approach; often increases complexity Connection to physics: can offer tremendous insights Open problems: . . .

... Conclusions Open problems: Sampling 4,5,6-colorings on the grid.
Sampling perfect matchings on non-bipartite graphs. Sampling acyclic orientations in a graph. Sampling configurations of the Potts model (a generalization of Ising, but with more colors). How can we further exploit phase transitions? Other physical intuition? ...

Dana Randall Georgia Tech

Similar presentations

Presentation on theme: "Dana Randall Georgia Tech"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dana Randall Georgia Tech

Similar presentations

Presentation on theme: "Dana Randall Georgia Tech"— Presentation transcript:

Similar presentations

About project

Feedback