Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mathematical Foundations of Markov Chain Monte Carlo Algorithms Based on lectures given by Alistair Sinclair Computer Science Division U.C. Berkeley.

Similar presentations


Presentation on theme: "Mathematical Foundations of Markov Chain Monte Carlo Algorithms Based on lectures given by Alistair Sinclair Computer Science Division U.C. Berkeley."— Presentation transcript:

1 Mathematical Foundations of Markov Chain Monte Carlo Algorithms Based on lectures given by Alistair Sinclair Computer Science Division U.C. Berkeley

2 Based on lectures by Alistair Sinclair. Dana Moshkovitz Overview 1.Random Sampling 2.The Markov Chain Monte-Carlo Paradigm 3.Mixing Time 1.Coupling 2.Flow 3.Geometry Techniques for Bounding the Mixing Time

3 Based on lectures by Alistair Sinclair. Dana Moshkovitz Random Sampling  - “very large” sample set.  - probability distribution over .  Goal: Sample points x  at random from distribution . x

4 Based on lectures by Alistair Sinclair. Dana Moshkovitz The Probability Distribution Typically, w:  R + is an easily- computed weight function Z=Σ x w(x) is an unknown normalization factor

5 Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 1 : Card Shuffling  - all 52! permutations of a deck of cards.  - uniform distribution [  x w(x)=1].  Goal: pick a permutation uniformly at random …

6 Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 2 : Counting How many ways can we tile some given pattern with dominos?

7 Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 2 : Counting (cont.) Sample tilings uniformly at random. Let P 1 = proportion of sample of type 1. Compute estimate N 1 * of N 1 recursively. output N * = N 1 * / P 1. N1N1 N2N2 N = N 1 + N 2 sample size = O(n), #levels = O(n)  O(n 2 ) samples total

8 Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 3 : Volume & Integration [Dyer\Frieze\Kannan]  : a convex body in R d (d large) Problem: estimate vol(  ) sequence of concentric balls B 0  …  B r estimate by sampling uniformly from   B i Generalization: Integration of log-concave function over a cube A  R d

9 Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 4 : Statistical Physics  - set of configurations of a physical system  - Gibbs distribution  (x)=Pr[ system in config. x]=w(x)/Z where w(x)=e -H(x)/KT “energy” “temperature”

10 Based on lectures by Alistair Sinclair. Dana Moshkovitz The Ising Model n atomic magnets configuration: x  { -,+ } n H(x)= -(# aligned neighbors ) + - + - - + - - + + - - + - - + + - - + - + + + - + +

11 Based on lectures by Alistair Sinclair. Dana Moshkovitz Why Sampling? statistics of “typical” configurations. mean energy (E  [H(x)]), specific heat, … estimate of “partition function” Z:=Z(T)=  x  w(x)

12 Based on lectures by Alistair Sinclair. Dana Moshkovitz Estimating the Partition Function Let =e -1/KT  Z:=Z( )=  x  -H(x). Define 1= 0 < 1 < …< r =. can be estimated by random sampling from  i-1 i  i-1 (1+1/n) ensures small variance  O(n) samples suffice for each ratio  r  nlog =O(n 2 )

13 Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 5 : Optimization  - set of feasible solutions to an optimization problem f(x) - value of solution x. Goal: maximize f(x). Idea: sample solutions where w(x)= f(x). 

14 Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 5 : Optimization Idea: sample solutions where w(x)= f(x). large concentration on good solutions (large values f(x)) small greater “mobility” (local optima are less high) Simulated Annealing heuristic: Slowly increase …

15 Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 6 : Hypothesis Verification in Statistical Models  - set of hypotheses X- observed data Let w(  )=P(  )P(X/  ). prior “easy”

16 Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 6 : Hypothesis Verification in Statistical Models (cont.) Sampling from  (  )=P(  /X) gives: 1.Statistical estimate of hypotheses . 2.Prediction: 3.Model comparison: normalization factor = P(X) = Prob[ model generated X ]

17 Based on lectures by Alistair Sinclair. Dana Moshkovitz Markov Chains Sample space  Random variables (r.v) over  X 1,X 2,…,X t,… “Memoryless”:  t>0,  x 1,…,x t+1 ,

18 Based on lectures by Alistair Sinclair. Dana Moshkovitz Sampling Algorithm Start at an arbitrary state X 0. Simulate MC for “sufficiently many” steps t. Output X t. Then,  x  Prob[ X t = x ] ≈  (x)  X0X0 XtXt

19 Based on lectures by Alistair Sinclair. Dana Moshkovitz PxPx Transitions Matrix P is non-negative P is stochastic (  x  x P(x,y)=1) Pr[X t+1 =y/X 0 =x]=P t (x,y) P x t =P x 0 · P t Definition:  is a stationary distribution, if  P= . P x y Pr[X t+1 =y/X t =x]

20 Based on lectures by Alistair Sinclair. Dana Moshkovitz Irreducibility Definition: P is irreducible if x y

21 Based on lectures by Alistair Sinclair. Dana Moshkovitz Aperiodicity Definition: P is aperiodic if

22 Based on lectures by Alistair Sinclair. Dana Moshkovitz Note on Irreducibility and Aperiodicity If P is irreducible, we can always make it aperiodic, by adding “self-loops”: P’ = ½(P+I) P’ has same stationary distribution as P. Call P’ a “lazy” MC. x y

23 Based on lectures by Alistair Sinclair. Dana Moshkovitz Fundamental Theorem Theorem: If P is irreducible and aperiodic, then it is ergodic, i.e where  is the (unique) stationary distribution of P – i.e  P= .

24 Based on lectures by Alistair Sinclair. Dana Moshkovitz Main Idea (The MCMC Paradigm) An ergodic MC provides an effective algorithm for sampling from .

25 Based on lectures by Alistair Sinclair. Dana Moshkovitz Examples 1.Random Walks on GraphsRandom Walks on Graphs 2.Ehrenfest UrnEhrenfest Urn 3.Card ShufflingCard Shuffling 4.Coloring of a GraphColoring of a Graph 5.The Ising ModelThe Ising Model

26 Based on lectures by Alistair Sinclair. Dana Moshkovitz 1. Random Walk on Undirected Graphs At each node, choose a neighbor u.a.r and jump to it

27 Based on lectures by Alistair Sinclair. Dana Moshkovitz Random Walk on Undirected Graph G=(V,E)  =V degree Irreducible  G is connected Aperiodic  G is not bipartite

28 Based on lectures by Alistair Sinclair. Dana Moshkovitz Random Walk: The Stationary Distribution Claim: If G is connected and not bipartite, then the probability distribution induced by a random walk on it converges to  (x)=d(x)/Σ x d(x). “Proof”: not essential =2|E|

29 Based on lectures by Alistair Sinclair. Dana Moshkovitz 2. Ehrenfest Urn Pick a ball u.a.r Move the ball to the other urn j balls(n-j) balls

30 Based on lectures by Alistair Sinclair. Dana Moshkovitz 2. Ehrenfest Urn X t = number of balls in first urn. MC is a non-uniform random walk on  ={0,1,…,n}.... 0 1 2 3 (j-1) j (j+1) n j/n1-j/n Irreducible ; Periodic Stationary distribution :

31 Based on lectures by Alistair Sinclair. Dana Moshkovitz 3. Card Shuffling a)Top-in-at-random  Irreducible  Aperiodic  P is doubly stochastic:  y Σ x P(x,y)=1    is uniform:  x  (x)=1/n!

32 Based on lectures by Alistair Sinclair. Dana Moshkovitz 3. Card Shuffling b)Random Transpositions  Irreducible  Aperiodic  P is symmetric:  x,y P(x,y)=P(y,x)    is uniform

33 Based on lectures by Alistair Sinclair. Dana Moshkovitz 3. Card Shuffling c)Riffle shuffle [Gilbert/Shannon/Reeds]

34 Based on lectures by Alistair Sinclair. Dana Moshkovitz 3. Card Shuffling c)Riffle shuffle [Gilbert/Shannon/Reeds]  Irreducible  Aperiodic  P is doubly stochastic    is uniform

35 Based on lectures by Alistair Sinclair. Dana Moshkovitz 4. Colorings of a graph G=(V,E): connected, undirected q: number of colors  : set of proper q-colorings of G  : uniform

36 Based on lectures by Alistair Sinclair. Dana Moshkovitz Colorings Markov Chain pick v  V and c  {1,…,q} u.a.r. recolor v with c if possible.  Irreducible if q  +2  Aperiodic  P is symmetric    is uniform G’s max degree

37 Based on lectures by Alistair Sinclair. Dana Moshkovitz 5. The Ising Model Markov chain (“Heat bath”): pick a site i u.a.r replace spin x(i) by random spin x’(i) s.t + - + - - + - - + + - - + - - + + - - + - + + + - + + n sites  ={-,+} n w(x)= #{aligned neighbors (x)} #{+ neighbors of i} Irreducible, aperiodic, reversible w.r.t   converges to 

38 Based on lectures by Alistair Sinclair. Dana Moshkovitz Designing Markov Chains What do we want? Given ,  MC over  which converges to 

39 Based on lectures by Alistair Sinclair. Dana Moshkovitz The Metropolis Rule Define any connected undirected graph on  (“neighborhood structure”/”(local) moves”) 

40 Based on lectures by Alistair Sinclair. Dana Moshkovitz The Metropolis Rule Transitions from state x  : –pick a neighbor y of x w.p  (x,y) –move to y w.p min{w(y)/w(x),1} (else stay at x)  (x,y)=  (y,x),  (x,x)=1-Σ y-x  (x,y)  Irreducible  Aperiodic (make lazy if nec.)  reversible w.r.t w   converges to .

41 Based on lectures by Alistair Sinclair. Dana Moshkovitz The Mixing Time Key Question: How long until P x t looks like  ? We will use the variation distance:

42 Based on lectures by Alistair Sinclair. Dana Moshkovitz The Mixing Time Define: –  x (t) = ||p x t -  || –  (t) = max x  x (t) The mixing time is  mix =min{ t :  (t)  1/2e } 1 1/2e  mix

43 Based on lectures by Alistair Sinclair. Dana Moshkovitz Toy Example: Top-In-At-Random Let T = time after initial bottom card reaches top T is a strong stationary time, i.e Pr[X t =x/t=T]=  (x) Claim:  (t)  Pr[T>t] Thus, it remains to estimate T. n

44 Based on lectures by Alistair Sinclair. Dana Moshkovitz The Coupon Collector Problem Each pack contains one coupon. The goal is to complete the series. How many packs would we buy?!

45 Based on lectures by Alistair Sinclair. Dana Moshkovitz The Coupon Collector Problem N – total number of different coupons. X i – time to get the i-th coupon.

46 Based on lectures by Alistair Sinclair. Dana Moshkovitz Toy Example: Top-In-At-Random By the coupon collector, –the i-th coupon is a ‘ticket’ to advance from the (n-i+1) level to the next one. Pr[ T > nlnn + cn]  e -c   mix =nlnn + cn n

47 Based on lectures by Alistair Sinclair. Dana Moshkovitz Example: Riffle Shuffle

48 Based on lectures by Alistair Sinclair. Dana Moshkovitz Example: Riffle Shuffle Inverse shuffle (same mixing time) 0001111100011111 1011101010111010 0/1 u.a.r sorted stably

49 Based on lectures by Alistair Sinclair. Dana Moshkovitz Inverse Shuffle After t steps, each card is labeled with t digits. Cards are sorted by their labels. Cards with different labels are in random order Cards with same label are in original order 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 1

50 Based on lectures by Alistair Sinclair. Dana Moshkovitz Riffle Shuffle (Cont.) Let T = time until all cards have distinct labels T is a strong stationary time. Again we need to estimate T.

51 Based on lectures by Alistair Sinclair. Dana Moshkovitz B i rthday Paradox With which probability two of them have the same birthday?

52 Based on lectures by Alistair Sinclair. Dana Moshkovitz B I rthday Paradox (Cont.) k people, n days (n>k>1) The probability all birthdays are distinct: arithmetic sum

53 Based on lectures by Alistair Sinclair. Dana Moshkovitz Riffle Shuffle (Cont.) By the birthday paradox, –each card (1..n) picks a random label –there are 2 t possible labels –we want all labels to be distinct  mix =O(logn)

54 Based on lectures by Alistair Sinclair. Dana Moshkovitz General Techniques for Mixing Time Probabilistic – “Coupling” Combinatorial – “Flows” Geometric - “Conductance”

55 Based on lectures by Alistair Sinclair. Dana Moshkovitz Coupling

56 Based on lectures by Alistair Sinclair. Dana Moshkovitz Mixing Time Via Coupling Let P be an ergodic MC. A coupling for P is a pair process (X t,Y t ) s.t X t,Y t are each copies of P X t =Y t  X t+1 =Y t+1 Define T xy =min t {X t =Y t | X 0 =x, Y 0 =Y}

57 Based on lectures by Alistair Sinclair. Dana Moshkovitz Coupling Theorem Theorem [Aldous et al.]:  (t)  max x,y Pr[T x,y > t] Design a coupling that brings X and Y together fast

58 Based on lectures by Alistair Sinclair. Dana Moshkovitz 1. Random Walk On Cube Markov Chain: –pick coordinate i  R {1,…,n} –pick value b  R {0,1} –set x(i)=b  ={0,1} n  is uniform 1/6 1/2

59 Based on lectures by Alistair Sinclair. Dana Moshkovitz Coupling For Random Walk pick same i,b for both X and Y T xy  time to hit all n coordinates By coupon collecting, Pr[ T xy > nlnn + cn ] < e -c   mix  nlnn + cn ( 0, 0, 1, 0, 1, 1 )( 1, 1, 0, 0, 1, 0 )( 0, 0, 1, 0, 1, 1 )( 1, 1, 0, 0, 1, 1 )

60 Based on lectures by Alistair Sinclair. Dana Moshkovitz Flow  capacity of e=(z,z’) C(e)=  (z)P(z,z’) flow routes  (x)  (y) units from x to y, for every x,y flow along e denoted f(e) cost of f p(f)=max e {f(e)/C(e)} l(f) Diameter

61 Based on lectures by Alistair Sinclair. Dana Moshkovitz Flow Theorem Theorem [ Diaconis/Stroak, Jerrum/Sinclair ]: For a lazy ergodic MC and any flow f,  x (  )  2·p(f)·l(f)·[ ln  (x) -1 + 2ln  -1 ]

62 Based on lectures by Alistair Sinclair. Dana Moshkovitz 1. Random Walk On Cube Flow f: Route (x,y) flow evenly along all shortest paths x~y   mix  const·p(f)l(f)log  -1 = O(n 3 )  ={0,1} n |  |=2 n :=N  x  (x)=1/N 1/2n 1/2

63 Based on lectures by Alistair Sinclair. Dana Moshkovitz Conductance “bottleneck”

64 Based on lectures by Alistair Sinclair. Dana Moshkovitz Conductance S -S-S

65 Based on lectures by Alistair Sinclair. Dana Moshkovitz Conductance Theorem Theorem [ Jerrum/Sinclair, Lawler/Sokal, Alon, Cheeger… ]: For a lazy reversible MC,  x (  )  2/  2 ·[ ln  (x) -1 + ln  -1 ]

66 Based on lectures by Alistair Sinclair. Dana Moshkovitz 1. Random Walk On Cube The sketched S is (essentially) the worst S.   mix = O(  -2 ·log  min -1 ) = O(n 3 )


Download ppt "Mathematical Foundations of Markov Chain Monte Carlo Algorithms Based on lectures given by Alistair Sinclair Computer Science Division U.C. Berkeley."

Similar presentations


Ads by Google