Presentation is loading. Please wait.

Presentation is loading. Please wait.

A tutorial on Markov Chain Monte Carlo. Problem  g (x) dx I = If{X } form a Markov chain with stationary probability  i  I  g(x ) i  (x ) i  i=1.

Similar presentations


Presentation on theme: "A tutorial on Markov Chain Monte Carlo. Problem  g (x) dx I = If{X } form a Markov chain with stationary probability  i  I  g(x ) i  (x ) i  i=1."— Presentation transcript:

1 A tutorial on Markov Chain Monte Carlo

2 Problem  g (x) dx I = If{X } form a Markov chain with stationary probability  i  I  g(x ) i  (x ) i  i=1 N  1 N ( e.g. bayesian inference )

3 MCMC is then The problem of designing a Markov Chain with a pre-specified stationary distribution so that the integral I can be accurately approximated in a reasonable amount of time.     X 1 X X 2,,... X N

4 Metropolis et. al. (circ. 1953) y x p 1-P y ‘............ p’ p = min( 1, (y)  (x)   ) G(y|x)

5 Theorem: Metropolis works for any proposal distribution G such that: G(y|x) = G(x|y) provided the MC is irreducible and aperiodic. Proof: min( 1, (y)  (x)   ) G(y|x) (x)  Is symmetric in x and y. Note: it also works for general G provided we change p a bit (Hastings’ trick) 1 2 3 3 2 1 1.5 1 Period = 2 Reducible

6 Let f(w,z) be the joint density with conditionals u(w|z),v(z|w) and marginals g(w), h(z).  u(w|z) h(z) dz = g(w)  v(z|w) g(w) dw = h(z) Gibbs sampler Take X=(W,Z) a vector. To sample X it is sufficient to sample cyclically from the conditionals (W|z), (Z|w). Gibbs is in fact a special case of Metropolis. Take proposals as the exact conditionals and G(y|x) = 1, i.e. always accept a proposed move.  T(g,h) = (g,h) a fix point!

7 Example: Likelihood Entropic prior Entropic posterior In terms of the suff. stats. and Entropic inference on gaussians  d  exp(-  I(  ’))  d 

8 The Conditionals | | gaussian Generalized inverse gaussian

9 Gibbs+Metropolis % init posterior log likelihood LL = ((t1-n2*mu).*mu-t3).*v + (n3-1)*log(v) - a2*((mu-m).^2+1./v); LL1s(1:Nchains,1) = LL; for t=1:burnin mu = normrnd((v*t1+a1m)./(n*v+a1),1./(n*v+a1)); v = do_metropolis(v,Nmet,n3,beta,a2); LL1s(1:Nchains,t+1) =... ((t1-n2*mu).*mu-t3).*v + (n3-1)*log(v) - a2*((mu-m).^2+1./v); end function y = do_metropolis(v,Nmet,n3,t3,a2) % [Nchains,one] = size(v); x = v; accept = 0; reject = 0; lx = log(x); lfx = (n3-1)*lx-t3*x-a2./x; for t=1:Nmet y = gamrnd(n3,t3,Nchains,1); ly = log(y); lfy = (n3-1)*ly-t3*y-a2./y; for c=1:Nchains if (lfy(c) > lfx(c)) | (rand(1,1) < exp(lfy(c)-lfx(c))) x(c) = y(c); lx(c) = ly(c); lfx(c) = lfy(c); accept = accept+1; else reject = reject+1; end

10 Convergence: Are we there yet? Looks OK after second point.

11 Mixing is Good Segregation is Bad!

12

13

14

15 The of simulation Run several chains Start at over-dispersed points Monitor the log lik. Monitor the serial correlations Monitor acceptance ratios Re-parameterize (to get approx. indep.) Re-block (Gibbs) Collapse (int. over other pars.) Run with troubled pars. fixed at reasonable vals. Monitor R-hat, Monitor mean of score functions, Monitor coalescence, use connections, become EXACT!

16

17 Get Connected! Unnormalized posteriors: q(  |w  w)  (  w) (e.g. w = w(x) = vector of suff. stats.) q(  |w) q(  |w  q(  |w(t)  t=0 t=1   v k (t)  k ( ,w(t))  log(Z 1 /Z 0 )  (1/N)   (  j,w(t j )) Where t j uniform on [0,1] and  j from  (  w(t j )).  is the average tangent direction along the path. Choice of path is equivalent to choice of prior on [0,1]. Best (min. var.) prior (path) is generalized Jeffreys! Information geodesics are the best paths on manifold of unnormalized posteriors. Easy paths: - geometric - mixture - scale  Exact rejection constants are known along the mixture path!

18 The Present is trying to be Perfectly Exact

19 New Exact Math Most MCs are iterations of random functions: Let  f  :     family of functions. Choose n points,     …,  n in  independently with some p.m.  defined on  Forward iter.: X 0 = x 0, X 1 = f   (x 0 ), …, X n+1 = f  n (X n ) = (f  n  f   f  1 )  (x 0 ) Backward iter.: Y 0 = x 0, Y 1 = f  n  (x 0 ), …, Y n+1 = f  1 (Y n ) = (f  1  f  2...f  n )  (x 0 ) X n = Y n for all n, but as processes {X n }  {Y n } d E.g. Let a<1. Take S (space of states) the real line,  ={+,-},  (+)=  (-)=1/2 and f + (x) = a x + 1, f - (x) = a x - 1. X n = a X n-1 + e n but Y n = a Y n-1 + e 1 Moves all over S To a constant on S (corresp. frames have same distribution but the MOVIES are different )

20 Dead Leaves Simulation Forward Backward Looking downLooking up http://www.warwick.ac.uk/statsdept/Staff/WSK/

21 Convergence of functions are contracting on average when Y n+1 = (f  1  f  2...f  n )  (x 0 )

22 Propp & Wilson www.dbwilson.com Perfectly equilibrated 2D Ising state at critical T = 529K t = 0 t = -M Gibbs with the same random numbers s   t  f  (s)  f  (t) 0 1 1.5 Need backward iterations. First time to coalescence is not distributed as   ere chain always coalesces at 0 first BUT  (0) = 2/3,  (1) = 1/3

23

24 Not Exactly! Yet.


Download ppt "A tutorial on Markov Chain Monte Carlo. Problem  g (x) dx I = If{X } form a Markov chain with stationary probability  i  I  g(x ) i  (x ) i  i=1."

Similar presentations


Ads by Google