Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: www.math.gatech.edu/~randall )

Similar presentations


Presentation on theme: "Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: www.math.gatech.edu/~randall )"— Presentation transcript:

1

2 Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: www.math.gatech.edu/~randall )

3 Outline  Fundamentals for designing a Markov chain  Bounding running times (convergence rates)  Connections to statistical physics

4 Main Q: What do typical elements look like?  Determine properties of “typical’’ elements  Evaluate thermodynamic properties (such as free energy, entropy,…)  Estimate the cardinality of the set “Markov chain Monte Carlo’’ Random sampling can be used to: Markov chains for sampling Given: A large set (matchings, colorings, independent sets,…)

5 A A K K 2 2 Andrei Andreyevich Markov 1856-1922 Markov chains

6 Sampling using Markov chains State space Ω ( |Ω| ~ c n )

7 Sampling using Markov chains State space Ω Step 1. Connect the state space. ( |Ω| ~ c n ) E.g., if Ω = indep. sets of a graph G, connect I and I’ iff |I I’| = 1.

8 Basics of Markov chains Starting at x: - Pick a neighbor y. - Move to y with prob. P(x,y) = 1/∆. - With all remaining prob. stay at x. Transitions P: Random walk on H (max deg in H) H Def’n: A MC is ergodic if it is: irreducible - for all x,y  Ω,  t: P t (x,y) > 0; (connected) aperiodic - g.c.d. { t: P t (x,y) > 0 } =1. (not bipartite) (The “t step” transition prob.) x y

9 The stationary distribution  (1/∆    /∆) Thm: Any finite, ergodic MC converges to a unique stationary distribution π. Thm: The stationary distribution π satisfies: (The detailed balance condition) π(x) P(x,y) = π(y) P(y,x). P symmetric π  is uniform.  So,

10 E.g., For >0, sample ind. set I w/ prob: π(I) = where Z = ∑ J |J|.   0 2 1 | I | Z Q: What if we want to sample from some other distribution? Sampling from non- uniform distributions Step 2. Carefully define the transition probabilities.

11 The Metropolis Algorithm Propose a move from x to y as before, but accept with probability min (1, π(y)/π(x)) (with remaining probability stay at x). (MRRTT ’53) π(y)/∆π(x) 1 π(y) π(x) x y ( if π(x) ≥ π(y) ) π(x) P(x,y) = π(y) P(y,x) 1/∆ For independent sets: min(1, ) I I {v}   min(1, -1 ) π(y) (|I|+1) /Z π(x) (|I|) /Z = ==

12 Q: But for how long do we walk? Basics continued… Step 1. Connect the state space. Step 2. Carefully define the transition probabilities. Starting at any state x 0, take a random walk for some number of steps... and output the final state (from  ?). Step 3. Bound the mixing time. This tells us the number of steps to take.

13 The mixing rate Def’n: The total variation distance is ||P t,π|| = max __ ∑ |P t (x,y) - π(x)|. x  Ω y   Ω 2 1 A Markov chain is rapidly mixing if  (  ) is poly (n, log(  -1 )). Def’n Given , the mixing time is  = min { t: ||P t’,π|| < , t’ ≥ t }. A

14 Spectral gap Let   >    ≥  …  ≥   Ω   be the eigenvalues of P. Def’n: Gap(P) = 1-| 2 | is the spectral gap. Mixing rate Spectral Gap Thm: (Alon, Alon-Milman, Sinclair)  ≤ log ( )  ≥ log ( ). Gap(P) 1 2 Gap(P) | 2 | 1 π*π* 1 22

15 Outline  Fundamentals for designing a Markov chain  Bounding running times (convergence rates)  Connections to statistical physics

16 Outline for rest of talk Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

17 Coupling

18 Once they agree, they move in sync (x t =y t x t+1 =y t+1 ) Couple moves, but each simulates the MC Start at any x 0 and y 0 x0x0 y0y0 Simulate 2 processes:

19 Def’n: A coupling is a MC on Ω x Ω: 1)Each process {X t }, {Y t } is a faithful copy of the original MC, 2)If X t = Y t, then X t+1 = Y t+1. Coupling T = max ( E [ T x,y ] ), where T x,y = min {t: X t =Y t | X 0 =x, Y 0 =y}. x,y The coupling time T is: Thm:  (  ) ≤ T e ln  -1. (Aldous’81)

20 Ex1: Walk on the hypercube MC CUBE : Start at v 0 =(0,0,…,0). Repeat: - Pick i  [n], b  {0,1}. - Set v i = b. Symmetric, ergodic π is uniform. Mixing time? Use coupling: x 0 = 0 1 1 0 0 1 y 0 = 1 1 1 0 0 0 i=2, b=0: x 1 = 0 0 1 0 0 1 y 1 = 1 0 1 0 0 0 i=6, b=1: x 2 = 0 0 1 0 0 1 y 2 = 1 0 1 0 0 1 i=1, b=1: x t = 1 0 1 1 1 0 y t = 1 0 1 1 1 0...  so T = n log n (coupon collecting)  (  ) = O ( n ln (n  -1 ).  

21 Outline Techniques: Coupling - path coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

22 Ex 2: Colorings Given: A graph G (max deg d), k > 1. Goal: Find a random k-coloring of G. MC COL : (Single point replacement) Starting at some k-coloring C 0 Repeat: - With prob 1/2 do nothing. - Pick v  V, c  [k]; - Recolor v with c, if possible. The “lazy” chain If k ≥ d + 2, then the state space is connected. (Therefore π is uniform.) Note: k ≥ d + 1 colorings exist. (Greedy) 

23 Path Coupling Coupling: Show for all x,y  , E[  (dist(x,y)) ] < 0. Path coupling: Show for all u,v s.t. dist(u,v)=1, that E[  (dist(u,v)) ] < 0. - - Consider a shortest path: x = z 0, z 1, z 2,..., z r = y, dist(z i,z i+1 ) = 1 dist(x,y) = r. [Bubley,Dyer,Greenhill’97-8]  E[  (dist(x,y)) ] ≤  i E[  (dist(z i,z i+1 )) ] ≤ 0. 

24 Path coupling for MC COL Thm: MC COL is rapidly mixing if k ≥ 3d. (Jerrum ‘95) Pf: Use path coupling: dist(x,y) = 1. x y ww E∆dist ≤ ( (k-d)(-1) + 2d(+1) ) = (3d-k) ≤ 0. 1 2nk 1   v = w, c  C \ {,, }: ∆dist = -1, Cases:  v  N(w), c  {, }: ∆dist = + 1 (or 0)  o.w.: ∆dist = 0.

25 Summary: Coupling Pros: Can yield very easy proofs Cons: Demands a lot from the chain Extensions:  Careful coupling (k ≥ 2d) (Jerrum’95)  Change the MC (Luby-R-Sinclair’95)  “Macromoves” - burn in (Dyer-Frieze’01, Molloy’02) - non-Markovian couplings (Hayes-Vigoda’03)

26 Outline Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

27 Conductance and flows Ω (Jerrum-Sinclair’88)  = min  (S) S  Ω, π(S)≤1/2 S S C  (S) = ∑ π(s) P(s,s’) ∑ π(s) s  S, s’  S C sS sS 22 Thm: ≤ Gap(P) ≤ 2  2

28 x y Min cut Max flow  paths: {  xy : from x  Ω, to y  Ω, x ≠ y, carrying π(x)π(y) units of flow. }   : Make |Ω| 2 canonical (Sinclair’92) Q(e) = π(u) P(u,v) = π(v) P(v,u).  Capacity of e=(u,v): e  = min  l   ( l     is the max path length ) _  (  ) = max ∑ π(x) π(y) Q(e) 1  xy e  e  The congestion of these paths is: Ω Thm:  ≤  log (  π(x)) -1. _

29 Ex 3: Back to the hypercube - The complementary pair (u’,v’) determines (s,t), so |    xy e | = 2 n-1.  and l  = n   = Õ(n 2 ).  (  ) = max = = n Q(e) ∑ π(x) π(y)  xy e  e 2 n-1 2 -2n 2 -n (1/2n)   s = 0 1 1 0 0 1 t = 1 1 0 0 0 0 Ex 3: Back to the hypercube s = 0 1 1 0 0 1 t = 1 1 0 0 0 0 Ex 3: Back to the hypercube 1 1 1 0 0 1 s = 0 1 1 0 0 1 t = 1 1 0 0 0 0 Ex 3: Back to the hypercube 1 1 1 0 0 1 s = 0 1 1 0 0 1 t = 1 1 0 0 0 0 Ex 3: Back to the hypercube 1 1 0 0 0 1 t = 1 1 0 0 0 0 1 1 1 0 0 1 u = v = 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 = s u’ = v’ = - Bound the number of paths through (u,v)  E. - Define a canonical path from s to t.

30 Outline Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights

31 Ex 4: Sampling matchings

32 MC MATCH :  Starting at M 0, repeat :  Pick e = (u,v)  E - If e  M, remove e; - If u and v unmatched in M, add e; - If u matched (by e’) and v unmatched (or vice versa), add e and remove e’; - Otherwise do nothing. e u v u v e e’ e u v Thm: Coupling won’t work! (Kumar-Ramesh’99)

33 Mixing time of MC MATCH s t s  t s t u v paths using (u,v) determined by u’... as before.  u’ 

34 Techniques: Coupling Flows and paths Indirect methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights Outline

35 Goal: Given, sample ind. set I with prob : π(I) = |I| /Z, Z = ∑ J |J|. Ex 5: Independent Sets MC IND : Starting at I 0, Repeat: - Pick v  V and b  {0,1}; - If v  I, b=0, remove v w.p. min (1, -1 ) - If v  I, b=1, add v w.p. min (1, ) if possible; - O.w. do nothing. /

36 Slow mixing of MC IND (large ) n     (n   n/2)  1 0 ∞ SSCSC large there is a “bad cut,”... so MC IND is slowly mixing.   #R/#B (Even) (Odd)

37 Summary: Flows Pros: Offers a combinatorial approach to mixing; especially useful for proving slow mixing. Cons: Requires global knowledge of the chain to spread out paths. Extensions: Balanced flows (Morris-Sinclair’99) MCMC -- Major highlights: - The permanent (Jerrum-Sinclair-Vigoda’02) - Volume of a convex polytope (Dyer-Frieze-Kannan’89, +… )

38 Techniques: Coupling Flows and paths Indirect methods - Comparison - Decomposition Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights Outline

39 Comparison (Diaconis,Saloff-Coste’93) unknown P known P _ w z For each edge (x,y)  P, make a path  x,y using edges in P. Let  (z,w) be the set of paths  x,y using (z,w) _ x y Thm: Gap(P) ≥ Gap(P). _ 1 A A = max { ∑ |  x, y |π(x)P(x,y) } 1 Q(e) e   xy e _ 

40 Comparison w z (x,y)  P  x,y (using P)  (z,w) is the set of paths  x,y using (z,w) Thm: Gap(P) ≥ Gap(P). _ 1 A xy _ known P unknown P _ S S _ S S _  (S,S) cannot be a bad cut in P if it isn’t in P. _ _ 

41 Adjacency... The ˆ Matrix Reloaded Comparison, aka...

42 Disjoint decomposition Ω A1A1 A3A3 A2A2 A6A6 A5A5 A4A4 a1a1 a3a3 a4a4 a2a2 a5a5 a6a6 P — Projection P3P3 Restrictions P _ π(a i ) = π(A i ) P(a i,a j ) = ∑ π(x)P(x,y) π(A i ) x  A i, y  A j _ (Madras-R.’96, Martin-R.’00) Thm: Gap(P) ≥ — Gap(P) (min i Gap(P i )). 1 2 _

43 Let Ω = {ind. sets of G}; Ω k = {ind. sets of size k}. For G=(V,E): Ex 6: MC IND on small ind. sets MC SWAP : Starting at I 0, Repeat: - Pick (u,v,b)  V x V x {0,1,2}; - If b=0 and u  V, remove u w.p. min (1, -1 ) - If b=1 and u  V, add u w.p. min (1, ) if possible; - If b=2 remove u and add v (if possible); - O.w. do nothing. * Consider first the “swap” chain: / Thm: MC IND is rapidly mixing on  Ω k, where K = |V|/2(∆+1). k = 0 K

44 Ind. sets w/bounded size (cont.) Thm: MC IND is rapidly mixing on  Ω k, where K=|V|/2(∆+1). k = 1 K Ω 0 Ω 1 Ω 2... Ω K-1 Ω K ΩkΩk a 0 a 1 a 2...a K-1 a K ProjectionRestrictions |Ω K | is logconcave,... so P is rapidly mixing. _. ? MC SWAP

45 The Restrictions of MC swap Ω 0 Ω 1 Ω 2... Ω K-1 Ω K ΩkΩk ProjectionRestrictions. Thm: MC SWAP is rapidly mixing on Ω k, k < K. (Bubley-Dyer’97). K Thm: MC SWAP is rapidly mixing on Ω k.  k = 1 (Decomposition) Cor: MC IND is rapidly mixing on Ω k.   k = 1 K (Comparison)

46 Summary: Indirect methods Pros: Offer a top down approach; allow hybrid methods to be used.. Extensions: Comparison thm for log-Sobolev (Diaconis-Saloff-Coste’96) Comparison for Glauber dynamics (R.-Tetali ‘98) Decomposition for log-Sobolev (Jerrum-Son-Tetali-Vigoda ‘02) Cons: Can increase the complexity.

47 Techniques: Coupling Flows and paths Hybrid methods Problems: Walk on the hypercube Colorings Matchings Independent sets Connections with statistical physics: - problems - algorithms - physical insights Outline

48  They have a need for sampling  Use many interesting heuristics  Great intuition  Experts on “large data sets’’  Microscopic Macroscopic details behavior (i.e., phase transitions) Why Statistical Physics?

49

50 (3-colorings) (Independent sets) (Matchings) (Min cut) ---- + Models from statistical physics Potts model Hardcore model Dimer model - - --- -- - - + + + + + + + + + - Ising model +

51  Independent sets: π(I)= |I| /Z Models (cont.)  Matchings: π(M)=  |M| /Z  Ising model: π(  )= |E | /Z, E = = {u v:  (u) =  (v)} (E = E = E ≠ ) ˜ - --- -- + + + + + + + + - + = 

52 Models: ( The physics perspective)  Independent sets: H(  ) = -|I| If  = e  then π(  ) = |I| /Z. Given: A physical system Ω = {  } Define: A Gibbs measure as follows: π(  ) = e -  H(  ) / Z, H(  ) (the Hamiltonian),  = 1/kT (inverse temperature), normalizing constant or partition function. where Z = ∑  e -   H (  ) is the  Ising model: H(  ) = -∑  u  v (u,v)  E If = e 2  then π(  ) = |E | /Z. =

53 Physics perspective (cont.) Q: What about on the infinite lattice? Use conditional probabilities: ? But there can be boundary effects !!!

54 Phase transitions: Ind. sets Low temperature: long range effects High temperature: ∂ effects die out On finite regions … … T∞T∞ T0T0 TcTc T C indicates a “phase transition.”

55 Slow mixing of MC IND revisited ∞ S SCSC n     (n   n)  #R/#B 1 0 π(S i ) = ∑ π(s) e -  H(s) / Z Si Si sSisSi “Entropy “Energy term” term”

56 Group by # of “fault lines” SSCSC... Fault lines are vacant paths of width 2 from top to bottom (or left to right). SRSR S 1 SBSB S3S3 S2 S2

57 “Peierls Argument” 2. Shift right of fault by 1 and flip colors. For fixed path length l, S 1 S B x 2 n/2 x 3 l. 1. Identify horizontal or vertical fault line. (  S 1 ) 3. Remove rt column ; add points along fault line, if possible. ( SB)( SB)

58 Peierls Argument cont. ≤ 2 n/2 3 l S1S1 SBSB ( ≥ l - n/2 more points) ≤ π(S B ) 2 n /2 3 n ( n /2 ) (poly(n)) / n ) ≤ π(S B ) ( ) n/2 (poly(n)), if   > 18. 18 π(S 1 ) = ∑ π(  )  e S 1 ≤ ∑ ∑ π(  ) 2 n/2 3 l (n/2- l )   l  e S B (and similarly for S 2, S 3, …) 

59 Conclusions Techniques: Coupling: can be easy when it works Flows: requires global knowledge of chain; very useful for slow mixing Connection to physics: can offer tremendous insights Open problems:... Indirect methods: top down approach; often increases complexity

60 Conclusions Open problems:...  Sampling 4,5,6-colorings on the grid.  Sampling perfect matchings on non-bipartite graphs.  Sampling acyclic orientations in a graph.  Sampling configurations of the Potts model (a generalization of Ising, but with more colors).  How can we further exploit phase transitions? Other physical intuition?

61


Download ppt "Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: www.math.gatech.edu/~randall )"

Similar presentations


Ads by Google