A tutorial on Markov Chain Monte Carlo. Problem  g (x) dx I = If{X } form a Markov chain with stationary probability  i  I  g(x ) i  (x ) i  i=1.

Slides:



Advertisements
Similar presentations
02/12/ a tutorial on Markov Chain Monte Carlo (MCMC) Dima Damen Maths Club December 2 nd 2008.
Advertisements

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Bayesian Estimation in MARK
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian statistics – MCMC techniques
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
Today Introduction to MCMC Particle filters and MCMC
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Bayes Factor Based on Han and Carlin (2001, JASA).
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Priors, Normal Models, Computing Posteriors
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Markov Random Fields Probabilistic Models for Images
1 A Bayes method of a Monotone Hazard Rate via S-paths Man-Wai Ho National University of Singapore Cambridge, 9 th August 2007.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Daphne Koller Sampling Methods Metropolis- Hastings Algorithm Probabilistic Graphical Models Inference.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Markov Chain Monte Carlo methods --the final project of stat 6213
Advanced Statistical Computing Fall 2016
Jun Liu Department of Statistics Stanford University
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Markov Networks.
Haim Kaplan and Uri Zwick
Multidimensional Integration Part I
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Lecture 15 Sampling.
Slides for Sampling from Posterior of Shape
Approximate Inference by Sampling
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Markov Networks.
Presentation transcript:

A tutorial on Markov Chain Monte Carlo

Problem  g (x) dx I = If{X } form a Markov chain with stationary probability  i  I  g(x ) i  (x ) i  i=1 N  1 N ( e.g. bayesian inference )

MCMC is then The problem of designing a Markov Chain with a pre-specified stationary distribution so that the integral I can be accurately approximated in a reasonable amount of time.     X 1 X X 2,,... X N

Metropolis et. al. (circ. 1953) y x p 1-P y ‘ p’ p = min( 1, (y)  (x)   ) G(y|x)

Theorem: Metropolis works for any proposal distribution G such that: G(y|x) = G(x|y) provided the MC is irreducible and aperiodic. Proof: min( 1, (y)  (x)   ) G(y|x) (x)  Is symmetric in x and y. Note: it also works for general G provided we change p a bit (Hastings’ trick) Period = 2 Reducible

Let f(w,z) be the joint density with conditionals u(w|z),v(z|w) and marginals g(w), h(z).  u(w|z) h(z) dz = g(w)  v(z|w) g(w) dw = h(z) Gibbs sampler Take X=(W,Z) a vector. To sample X it is sufficient to sample cyclically from the conditionals (W|z), (Z|w). Gibbs is in fact a special case of Metropolis. Take proposals as the exact conditionals and G(y|x) = 1, i.e. always accept a proposed move.  T(g,h) = (g,h) a fix point!

Example: Likelihood Entropic prior Entropic posterior In terms of the suff. stats. and Entropic inference on gaussians  d  exp(-  I(  ’))  d 

The Conditionals | | gaussian Generalized inverse gaussian

Gibbs+Metropolis % init posterior log likelihood LL = ((t1-n2*mu).*mu-t3).*v + (n3-1)*log(v) - a2*((mu-m).^2+1./v); LL1s(1:Nchains,1) = LL; for t=1:burnin mu = normrnd((v*t1+a1m)./(n*v+a1),1./(n*v+a1)); v = do_metropolis(v,Nmet,n3,beta,a2); LL1s(1:Nchains,t+1) =... ((t1-n2*mu).*mu-t3).*v + (n3-1)*log(v) - a2*((mu-m).^2+1./v); end function y = do_metropolis(v,Nmet,n3,t3,a2) % [Nchains,one] = size(v); x = v; accept = 0; reject = 0; lx = log(x); lfx = (n3-1)*lx-t3*x-a2./x; for t=1:Nmet y = gamrnd(n3,t3,Nchains,1); ly = log(y); lfy = (n3-1)*ly-t3*y-a2./y; for c=1:Nchains if (lfy(c) > lfx(c)) | (rand(1,1) < exp(lfy(c)-lfx(c))) x(c) = y(c); lx(c) = ly(c); lfx(c) = lfy(c); accept = accept+1; else reject = reject+1; end

Convergence: Are we there yet? Looks OK after second point.

Mixing is Good Segregation is Bad!

The of simulation Run several chains Start at over-dispersed points Monitor the log lik. Monitor the serial correlations Monitor acceptance ratios Re-parameterize (to get approx. indep.) Re-block (Gibbs) Collapse (int. over other pars.) Run with troubled pars. fixed at reasonable vals. Monitor R-hat, Monitor mean of score functions, Monitor coalescence, use connections, become EXACT!

Get Connected! Unnormalized posteriors: q(  |w  w)  (  w) (e.g. w = w(x) = vector of suff. stats.) q(  |w) q(  |w  q(  |w(t)  t=0 t=1   v k (t)  k ( ,w(t))  log(Z 1 /Z 0 )  (1/N)   (  j,w(t j )) Where t j uniform on [0,1] and  j from  (  w(t j )).  is the average tangent direction along the path. Choice of path is equivalent to choice of prior on [0,1]. Best (min. var.) prior (path) is generalized Jeffreys! Information geodesics are the best paths on manifold of unnormalized posteriors. Easy paths: - geometric - mixture - scale  Exact rejection constants are known along the mixture path!

The Present is trying to be Perfectly Exact

New Exact Math Most MCs are iterations of random functions: Let  f  :     family of functions. Choose n points,     …,  n in  independently with some p.m.  defined on  Forward iter.: X 0 = x 0, X 1 = f   (x 0 ), …, X n+1 = f  n (X n ) = (f  n  f   f  1 )  (x 0 ) Backward iter.: Y 0 = x 0, Y 1 = f  n  (x 0 ), …, Y n+1 = f  1 (Y n ) = (f  1  f  2...f  n )  (x 0 ) X n = Y n for all n, but as processes {X n }  {Y n } d E.g. Let a<1. Take S (space of states) the real line,  ={+,-},  (+)=  (-)=1/2 and f + (x) = a x + 1, f - (x) = a x - 1. X n = a X n-1 + e n but Y n = a Y n-1 + e 1 Moves all over S To a constant on S (corresp. frames have same distribution but the MOVIES are different )

Dead Leaves Simulation Forward Backward Looking downLooking up

Convergence of functions are contracting on average when Y n+1 = (f  1  f  2...f  n )  (x 0 )

Propp & Wilson Perfectly equilibrated 2D Ising state at critical T = 529K t = 0 t = -M Gibbs with the same random numbers s   t  f  (s)  f  (t) Need backward iterations. First time to coalescence is not distributed as   ere chain always coalesces at 0 first BUT  (0) = 2/3,  (1) = 1/3

Not Exactly! Yet.