How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Other MCMC features in MLwiN and the MLwiN->WinBUGS interface
ST3236: Stochastic Process Tutorial 3 TA: Mar Choong Hock Exercises: 4.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
June 9, 2008Stat Lecture 8 - Sampling Distributions 1 Introduction to Inference Sampling Distributions Statistics Lecture 8.
Bayesian Estimation in MARK
Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Sampling Distributions (§ )
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
Lecture 3: Markov processes, master equation
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Graduate School of Information Sciences, Tohoku University
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Planning under Uncertainty
What if time ran backwards? If X n, 0 ≤ n ≤ N is a Markov chain, what about Y n = X N-n ? If X n follows the stationary distribution, Y n has stationary.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Random-Variate Generation. Need for Random-Variates We, usually, model uncertainty and unpredictability with statistical distributions Thereby, in order.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Approximate Inference 2: Monte Carlo Markov Chain
Al Parker September 14, 2010 Drawing samples from high dimensional Gaussians using polynomials.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
9. Convergence and Monte Carlo Errors. Measuring Convergence to Equilibrium Variation distance where P 1 and P 2 are two probability distributions, A.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
An Iterative Monte Carlo Method for Nonconjugate Bayesian Analysis B. P. Carlin and A. E. Gelfand Statistics and Computing 1991 A Generic Approach to Posterior.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Efficiency Measurement William Greene Stern School of Business New York University.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Generalized Spatial Dirichlet Process Models
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Ch13 Empirical Methods.
Discrete-time markov chain (continuation)
Presentation transcript:

How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian Pruteanu 11/17/2006

Outline Introduction How many iterations to estimate a posterior quantile? Extensions Examples

Introduction There is no guarantee, no matter how long you run the MCMC algorithm for, that it will converge to the posterior distribution. Diagnostic statistics identify problems with convergence but cannot “prove” that convergence has occurred. One long run or many short runs? The Gibbs sampler can be extremely computationally demanding, even for relatively small-scale problems.

One long run or many short runs? short runs: (a) choose a starting point; (b) run the Gibbs sampler for iterations and store only the last iterate; (c) return to (a). one long run may well be more efficient. The starting point for every sequence of length is closer to a draw from the stationary distribution than the short-runs case. it is still important to use several different starting points since we don’t know if a single run has converged.

Introduction The Raftery-Lewis test: 1.specify a particular quantile of the distribution of interest, an accuracy of the quantile. 2.The test breaks the chain (coming from the Gibbs sampling) into a (1,0) sequence and generates a two-state Markov chain. 3.The tests uses the sequence to estimate the transition probabilities and than the number of addition burn-ins required and the total chain length required to achieve the present level of accuracy.

The Raftery-Lewis test we want to estimate within with probability, where is the posterior conditional distribution we are looking to estimate. we calculate for each iteration and then form the problem is to determine - initial iterations; - further iterations and - step size is a binary 0-1 process derived from a Markov chain we form a new process where assuming that is indeed a Markov chain, we determine, and to approach stationarity.

The Raftery-Lewis test let be the transition matrix for the equilibrium distribution is then the -step transition matrix is

The Raftery-Lewis test we require that be within of then where then or thus

The Raftery-Lewis test the sample mean of the process is which follows a normal distribution (the central limit theorem) so, will be satisfied if thus initial number of iterations to estimate and

Extensions several quantiles: run the Gibbs sampler for iterations to each quantile and then use the maximum values of, and. independent iterates: when it is much more expensive to analyze a Gibbs iterate than to simulate it, it is desirable to have approximately independent Gibbs iterates (by making big enough).

Examples The method was applied to both simulated and real examples. The results are given for, and

Discussion for ‘nice’ posterior distributions, the accuracy can be achieved by running the sampler for 5000 iterations and using all the iterates. when the posterior is not ‘nice’ the required number can be much greater. the required number of iterations can be dramatically different, and even for different quantities of interest within the same problem.

References Billingsley, P. “Convergence of probability measures”, 1968 Holland, P. W. “Discrete multivariate analysis”, 1975 Gelfand, A. E. “Sampling-based approaches to calculate marginal densities”, 1990