Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

MCMC estimation in MlwiN
Bayesian and Least Squares fitting: Problem: Given data (d) and model (m) with adjustable parameters (x), what are the best values and uncertainties for.
Pattern Recognition and Machine Learning
Monte Carlo Methods and Statistical Physics
Bayesian Estimation in MARK
Sampling Distributions (§ )
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Markov-Chain Monte Carlo
Bayesian Methods with Monte Carlo Markov Chains III
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
Constraining Astronomical Populations with Truncated Data Sets Brandon C. Kelly (CfA, Hubble Fellow, 6/11/2015Brandon C. Kelly,
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Bayesian Analysis of X-ray Luminosity Functions A. Ptak (JHU) Abstract Often only a relatively small number of sources of a given class are detected in.
CF-3 Bank Hapoalim Jun-2001 Zvi Wiener Computational Finance.
Motion Analysis (contd.) Slides are from RPI Registration Class.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Today Introduction to MCMC Particle filters and MCMC
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Lecture II-2: Probability Review
Robin McDougall, Ed Waller and Scott Nokleby Faculties of Engineering & Applied Science and Energy Systems & Nuclear Science 1.
Bayes Factor Based on Han and Carlin (2001, JASA).
Ewa Lukasik - Jakub Lawik - Juan Mojica - Xiaodong Xu.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Priors, Normal Models, Computing Posteriors
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
MCMC in practice Start collecting samples after the Markov chain has “mixed”. How do you know if a chain has mixed or not? In general, you can never “proof”
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
CHAPTER 2.3 PROBABILITY DISTRIBUTIONS. 2.3 GAUSSIAN OR NORMAL ERROR DISTRIBUTION  The Gaussian distribution is an approximation to the binomial distribution.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Basic simulation methodology
Bayesian data analysis
BXA robust parameter estimation & practical model comparison
Introduction to the bayes Prefix in Stata 15
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Predictive distributions
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Ch13 Empirical Methods.
Slides for Sampling from Posterior of Shape
#21 Marginalize vs. Condition Uninteresting Fitted Parameters
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Sampling Distributions (§ )
Probabilistic Surrogate Models
Presentation transcript:

Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often need to integrate it The posterior probability distribution contains the complete information concerning the parameters, but need often need to integrate it E.g., normalizing posterior requires an n- dimensional integral for an n-parameter posterior E.g., normalizing posterior requires an n- dimensional integral for an n-parameter posterior Rarely can analytically integrate posterior Rarely can analytically integrate posterior Numerical integration is also difficult when n > several Numerical integration is also difficult when n > several

Markov-Chain Monte Carlo Can then easily compute confidence intervals: 1.Sum histogram from best-fit value (often peak of histogram) in both directions 2.Stop when x% of values summed for an x% confidence interval The histogram of chain values for a parameter is a visual representation of the (marginalized) probability distribution for that parameter Instead of integrating, sample from posterior in a way that results in a set of parameters values that is identically-distributed as the posterior

MCMC Algorithms MCMC is similar to a random-walk approach MCMC is similar to a random-walk approach Most common techniques are Metropolis- Hastings and Gibbs Sampling (see Wikipedia entries on these) Most common techniques are Metropolis- Hastings and Gibbs Sampling (see Wikipedia entries on these) Metropolis-Hastings is simplest algorithm since it only requires evaluations of the posterior for a given set of parameter values Metropolis-Hastings is simplest algorithm since it only requires evaluations of the posterior for a given set of parameter values

Gibbs Sampling 1.Derive conditional probability of each parameter given values of the other parameters 2.Pick parameter at random 3.Draw from conditional probability of that parameter given values of all other parameters from previous iteration 4.Repeat until chain converges A good approach is to form “hierarchical” models that result in simple conditional probabilities. Example: Case of a two-variable posterior p(x,y), for each MCMC draw i: x i ~ p(x|y=y i-1 ) y i ~ p(y|x=x i-1 )“~” means “distributed as”

Metropolis-Hastings Can be visualized as similar to the rejection method of random number generation Can be visualized as similar to the rejection method of random number generation Use a “proposal” distribution that is similar in shape (especially width) to the expected posterior distribution to generate new parameter values Use a “proposal” distribution that is similar in shape (especially width) to the expected posterior distribution to generate new parameter values Accept new step when probability of new values is higher, occasionally accept new step otherwise (to go “up hill”, avoiding relative minima) Accept new step when probability of new values is higher, occasionally accept new step otherwise (to go “up hill”, avoiding relative minima)

M-H with a Gaussian Proposal Distr. 1. For each parameter θ estimate/determine “step” size σ 2. For each chain iteration, new proposal value θ’= θ i-1 + N(σ), N = normal deviate with st. dev. = σ 3. If p(θ’)/p(θ i-1 ) > 1, θ i = θ’ 4. If p(θ’)/p(θ i-1 ) < 1, accept θ i = θ’ with probability p(θ’)/p(θ i-1 ), θ i = θ i-1 otherwise

M-H Issues Can be very slow to converge, especially when there are correlated variables Can be very slow to converge, especially when there are correlated variables Use multivariate proposal distributions (done in XSPEC approach) Use multivariate proposal distributions (done in XSPEC approach) Transform correlated variables: Transform correlated variables: If x and y are correlated, instead set y = ax + b, fit for x, a, b, compute y from chain values afterwards (may also need to also fix a or b with a tight prior) If x and y are correlated, instead set y = ax + b, fit for x, a, b, compute y from chain values afterwards (may also need to also fix a or b with a tight prior) Convergence Convergence Run multiple chains, compute convergence statistics Run multiple chains, compute convergence statistics

MCMC Example In Ptak et al. (2007) we used MCMC to fit the X-ray luminosity functions of normal galaxies in the GOODS area (see poster) In Ptak et al. (2007) we used MCMC to fit the X-ray luminosity functions of normal galaxies in the GOODS area (see poster) Tested code first by fitting published FIR luminosity function Tested code first by fitting published FIR luminosity function Key advantages: Key advantages: visualizing full probability space of parameters visualizing full probability space of parameters ability to derive quantities from MCMC chain value (e.g., luminosity density) ability to derive quantities from MCMC chain value (e.g., luminosity density)

Sanity Check: Fitting local 60  m LF log L/L ○ Fit Saunders et al (1990) LF assuming Gaussian errors and ignoring upper limits Param. S1990 MCMC α 1.09 ± ± 0.08 σ 0.72 ± ± 0.02 Φ* ± ± log L* 8.47 ± ± 0.15 Φ

(Ugly) Posterior Probabilities Early-type GalaxiesLate-type Galaxies Red crosses show 68% confidence interval z< 0.5 X-ray luminosity functions

Marginalized Posterior Probabilities log L*  log φ* Dashed curves show Gaussian with same mean & st. dev. as posterior Dotted curves show prior   Note: α and σ tightly constrained by (Gaussian) prior, rather than being “fixed”

MCMC in XSPEC XSPEC MCMC is based on the Metropolis-Hastings algorithm. The chain proposal command is used to set the proposal distribution. The basic options are multivariate gaussian or cauchy although user-defined distributions can be entered. The covariance matrix for these distributions can be calculated from the best-fit, entered from the command line, or calculated from previous MCMC chain(s). MCMC is integrated into other XSPEC commands. If chains are loaded then these are used to generate confidence regions on parameters, fluxes and luminosities. This is more accurate than the current method for estimating errors on fluxes and luminosities. The tclout simpars command returns a set of parameter values drawn from the probability distribution defined by the currently loaded chain(s). Chains can be saved either as ascii or FITS files.

XSPEC MCMC Output Histogram and probability density plot (2-d histogram) of spectral fit parameters from an XSPEC MCMC run produced by fv (see

Future Use “physical” priors… have posterior from previous work be prior for current work Use “physical” priors… have posterior from previous work be prior for current work Use observed distribution of photon indices of nearby AGN when fitting for N H in deep surveys Use observed distribution of photon indices of nearby AGN when fitting for N H in deep surveys Incorporate calibration uncertainty into fitting (Kashyap AISR project) Incorporate calibration uncertainty into fitting (Kashyap AISR project) XSPEC has a plug-in mechanism for user-defined proposal distributions… would be good to also allow user-defined priors XSPEC has a plug-in mechanism for user-defined proposal distributions… would be good to also allow user-defined priors Code repository/WIKI for MCMC analysis in astronomy Code repository/WIKI for MCMC analysis in astronomy