Download presentation
Presentation is loading. Please wait.
Published byFrank French Modified over 9 years ago
1
A tutorial on Markov Chain Monte Carlo
2
Problem g (x) dx I = If{X } form a Markov chain with stationary probability i I g(x ) i (x ) i i=1 N 1 N ( e.g. bayesian inference )
3
MCMC is then The problem of designing a Markov Chain with a pre-specified stationary distribution so that the integral I can be accurately approximated in a reasonable amount of time. X 1 X X 2,,... X N
4
Metropolis et. al. (circ. 1953) y x p 1-P y ‘............ p’ p = min( 1, (y) (x) ) G(y|x)
5
Theorem: Metropolis works for any proposal distribution G such that: G(y|x) = G(x|y) provided the MC is irreducible and aperiodic. Proof: min( 1, (y) (x) ) G(y|x) (x) Is symmetric in x and y. Note: it also works for general G provided we change p a bit (Hastings’ trick) 1 2 3 3 2 1 1.5 1 Period = 2 Reducible
6
Let f(w,z) be the joint density with conditionals u(w|z),v(z|w) and marginals g(w), h(z). u(w|z) h(z) dz = g(w) v(z|w) g(w) dw = h(z) Gibbs sampler Take X=(W,Z) a vector. To sample X it is sufficient to sample cyclically from the conditionals (W|z), (Z|w). Gibbs is in fact a special case of Metropolis. Take proposals as the exact conditionals and G(y|x) = 1, i.e. always accept a proposed move. T(g,h) = (g,h) a fix point!
7
Example: Likelihood Entropic prior Entropic posterior In terms of the suff. stats. and Entropic inference on gaussians d exp(- I( ’)) d
8
The Conditionals | | gaussian Generalized inverse gaussian
9
Gibbs+Metropolis % init posterior log likelihood LL = ((t1-n2*mu).*mu-t3).*v + (n3-1)*log(v) - a2*((mu-m).^2+1./v); LL1s(1:Nchains,1) = LL; for t=1:burnin mu = normrnd((v*t1+a1m)./(n*v+a1),1./(n*v+a1)); v = do_metropolis(v,Nmet,n3,beta,a2); LL1s(1:Nchains,t+1) =... ((t1-n2*mu).*mu-t3).*v + (n3-1)*log(v) - a2*((mu-m).^2+1./v); end function y = do_metropolis(v,Nmet,n3,t3,a2) % [Nchains,one] = size(v); x = v; accept = 0; reject = 0; lx = log(x); lfx = (n3-1)*lx-t3*x-a2./x; for t=1:Nmet y = gamrnd(n3,t3,Nchains,1); ly = log(y); lfy = (n3-1)*ly-t3*y-a2./y; for c=1:Nchains if (lfy(c) > lfx(c)) | (rand(1,1) < exp(lfy(c)-lfx(c))) x(c) = y(c); lx(c) = ly(c); lfx(c) = lfy(c); accept = accept+1; else reject = reject+1; end
10
Convergence: Are we there yet? Looks OK after second point.
11
Mixing is Good Segregation is Bad!
15
The of simulation Run several chains Start at over-dispersed points Monitor the log lik. Monitor the serial correlations Monitor acceptance ratios Re-parameterize (to get approx. indep.) Re-block (Gibbs) Collapse (int. over other pars.) Run with troubled pars. fixed at reasonable vals. Monitor R-hat, Monitor mean of score functions, Monitor coalescence, use connections, become EXACT!
17
Get Connected! Unnormalized posteriors: q( |w w) ( w) (e.g. w = w(x) = vector of suff. stats.) q( |w) q( |w q( |w(t) t=0 t=1 v k (t) k ( ,w(t)) log(Z 1 /Z 0 ) (1/N) ( j,w(t j )) Where t j uniform on [0,1] and j from ( w(t j )). is the average tangent direction along the path. Choice of path is equivalent to choice of prior on [0,1]. Best (min. var.) prior (path) is generalized Jeffreys! Information geodesics are the best paths on manifold of unnormalized posteriors. Easy paths: - geometric - mixture - scale Exact rejection constants are known along the mixture path!
18
The Present is trying to be Perfectly Exact
19
New Exact Math Most MCs are iterations of random functions: Let f : family of functions. Choose n points, …, n in independently with some p.m. defined on Forward iter.: X 0 = x 0, X 1 = f (x 0 ), …, X n+1 = f n (X n ) = (f n f f 1 ) (x 0 ) Backward iter.: Y 0 = x 0, Y 1 = f n (x 0 ), …, Y n+1 = f 1 (Y n ) = (f 1 f 2...f n ) (x 0 ) X n = Y n for all n, but as processes {X n } {Y n } d E.g. Let a<1. Take S (space of states) the real line, ={+,-}, (+)= (-)=1/2 and f + (x) = a x + 1, f - (x) = a x - 1. X n = a X n-1 + e n but Y n = a Y n-1 + e 1 Moves all over S To a constant on S (corresp. frames have same distribution but the MOVIES are different )
20
Dead Leaves Simulation Forward Backward Looking downLooking up http://www.warwick.ac.uk/statsdept/Staff/WSK/
21
Convergence of functions are contracting on average when Y n+1 = (f 1 f 2...f n ) (x 0 )
22
Propp & Wilson www.dbwilson.com Perfectly equilibrated 2D Ising state at critical T = 529K t = 0 t = -M Gibbs with the same random numbers s t f (s) f (t) 0 1 1.5 Need backward iterations. First time to coalescence is not distributed as ere chain always coalesces at 0 first BUT (0) = 2/3, (1) = 1/3
24
Not Exactly! Yet.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.