Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Biointelligence Laboratory, Seoul National University
Bayesian Estimation in MARK
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Today Introduction to MCMC Particle filters and MCMC
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Introduction to Monte Carlo Methods D.J.C. Mackay.
7. Bayesian phylogenetic analysis using MrBAYES UST Jeong Dageum Thomas Bayes( ) The Phylogenetic Handbook – Section III, Phylogenetic.
Bayes Factor Based on Han and Carlin (2001, JASA).
Material Model Parameter Identification via Markov Chain Monte Carlo Christian Knipprath 1 Alexandros A. Skordos – ACCIS,
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Priors, Normal Models, Computing Posteriors
Exam I review Understanding the meaning of the terminology we use. Quick calculations that indicate understanding of the basis of methods. Many of the.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Simulated Annealing.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
Numerical Bayesian Techniques. outline How to evaluate Bayes integrals? Numerical integration Monte Carlo integration Importance sampling Metropolis algorithm.
Cosmological Model Selection David Parkinson (with Andrew Liddle & Pia Mukherjee)
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein The University of Texas at Austin Sponsored.
MCMC reconstruction of the 2 HE cascade events Dmitry Chirkin, UW Madison.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
A tutorial on Markov Chain Monte Carlo. Problem  g (x) dx I = If{X } form a Markov chain with stationary probability  i  I  g(x ) i  (x ) i  i=1.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
MCMC Output & Metropolis-Hastings Algorithm Part I
Optimization of Monte Carlo Integration
Advanced Statistical Computing Fall 2016
Jun Liu Department of Statistics Stanford University
Bayesian inference Presented by Amir Hadadi
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Hidden Markov Models Part 2: Algorithms
Haim Kaplan and Uri Zwick
Lecture 15 Sampling.
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Presentation transcript:

Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:

Lecture 7 Why does it work? With the Metropolis-Hastings algorithm, the desired posterior distribution (the stationary distribution of the Markov Chain) is recovered for a wide range of proposal distributions. For this, the chain must have three properties; Irreducibility: Given any starting point, the chain must be able to (eventually) jump to all states in the posterior distribution. Aperiodic: The chain must oscillate between two different states with a regular periodic motion (i.e. it gets stuck in the oscillation for ever).

Lecture 7 Why does it work? Positive Recurrent: This basically means that the posterior distribution exists, such that if an initial value X 0 samples  (X) then all subsequent iterations will also sample  (X). These can be shown for Metropolis-Hastings e.g.; This is the Detailed Balance equation.

Lecture 7 Where can we go wrong? Our posterior distribution may be multi-modal, with several significant peaks. Given enough time, our MCMC walk through the probability space will eventually cover the entire volume. However, the walk may stay on one peak for a significant period before moving to the next. If we have only a certain amount of time (i.e. a three year PhD), how can we ensure that we have appropriately sampled the space and that the MCMC chain truly reflects the underlying posterior distribution? If it does not, properties you draw from the sample will be biased.

Lecture 7 Simulated Tempering The problem is similar to ensuring you find a global minimum in optimization problems; one approach, simulated annealing, allows a solution to “cool” into the global minimum. We can take a similar approach with out MCMC, heating up the posterior distribution (to make it flatter) and then cooling it down. When hotter, the MCMC can hop out of local regions of significant probability and explore more of the volume, then cooling down again into regions of interest. We start with Bayes’ theorem such that

Lecture 7 Simulated Tempering We can construct a flatter distribution through Typically, a discrete set of tempering parameters, , are used, with  =1 (the “cold sampler”) being the target distribution. We can “random walk” through the temperature, and consider only those steps taken when  =1 to represent our target distribution. However, parallel tempering provides a similar, but more efficient, approach to exploring the posterior distribution.

Lecture 7 Parallel Tempering Parallel tempering uses a series of MCMC explorations of the posterior distribution, each at a different tempering parameter,  i ; those at high temperature will hop all over the space, while those at colder temperature will take a more sedate walk. Typically, the temperatures are distributed over a ladder  i = {  1 =1,  2, …,  n }. The goal of parallel tempering is to take parallel chains and consider swapping them. Suppose we choose a swap to take place once every n s steps in the chain, the proposal to make a swap can be undertaken by choosing a uniform random number and considering a swap if U 1  1/n s. If we choose to swap, two chains are chosen, one at  i and in state X t,i, and the other at  i+1 and in state X t,i+1.

Lecture 7 Parallel Tempering We can then choose to swap with a probability by again selecting a uniform random number between 0 & 1 and choosing to swap if U(0,1) ≤ r. The swaps move information between the parallel chains at different temperatures. As ever, the choice of  i depends on experimentation and experience.

Lecture 7 An example Earlier, we examined the comparison between two models for some spectral data. Here, we look at the results of a Metropolis- Hastings and parallel tempering analysis of this problem. To match the earlier analysis; A Jeffreys prior was used for T between 0.1mK and 100mK. A uniform prior was used for  between channel 1 and 44. The proposal for both parameters was Gaussian with  =1.

Lecture 7 An example After a distinct burn-in, the chain wanders through the parameter space, but it clearly prefers T~1 and ~38, although significant departures are apparent.

Lecture 7 An example However, it is interesting to examine the marginalized distributions compared to the numerical integration results obtained earlier. While the M-H approach has nicely recovered the distribution in T, and has captured the strong peak in, the chain has clearly failed to characterize the structure in the posterior at low channel numbers, not spending enough time in regions with <30.

Lecture 7 An example Here is the  =1 chain for the parallel tempering run (with five evenly-spaced  between 0.01 and 1, and swaps considered every 50 steps (on average)).

Lecture 7 An example The difference is quite apparent in the marginalized distributions. Again, T and the strong peak in are well characterized, but the application of parallel tempering has also well sampled channel numbers with <30, better recovering the underlying distribution.

Lecture 7 Model Comparison Remember, to compare models and to deduce which is more probable, we calculate the odds ratio; Where the final term, B 12, is the Bayes factor. Suppose we have the same two competing models for the spectral line data, one with no parameters (so the Bayes factor can be calculated analytically), and the other which we have analyzed with parallel tempering. How do we calculate the Bayes factor for the latter?

Lecture 7 Model Comparison What we want to calculate is; We can combine the information in the parallel tempering chains through the relation (read Chap 12.7); where

Lecture 7 Model Comparison Here are the results for the analysis of the spectral line model. There are only five points in  and so we need to interpolate between the points (this is a Matlab spline). Of course, we would prefer more samples in . The result of the integral yield ln[ p(D|M 1,I)] = , with a resultant Bayes factor of B 12 =1.04 (similar to the result obtained earlier from the analytic calculation).

Lecture 7 Nested Sampling Figures form There are other ways to analyze the posterior and the likelihood space (with more efficient and faster approaches). One, of these, nested sampling, iteratively re-samples the space and slice it into regions of likelihood; Brendon will discuss this in more detail in his final lecture.