Efficient MCMC for cosmological parameters

Slides:

Advertisements

Similar presentations

Antony Lewis Institute of Astronomy, Cambridge

Advertisements

Observational constraints on primordial perturbations Antony Lewis CITA, Toronto

Cosmological Parameters and the WMAP data Antony Lewis CfA, Harvard / CITA, Toronto Standard assumptions – what are the parameters?

Observational constraints and cosmological parameters

CMB Power spectrum likelihood approximations

Primordial perturbations and precision cosmology from the Cosmic Microwave Background Antony Lewis CITA, University of Toronto

Introduction to Monte Carlo Markov chain (MCMC) methods

Other MCMC features in MLwiN and the MLwiN->WinBUGS interface

Bayesian Belief Propagation

Bayesian and Least Squares fitting: Problem: Given data (d) and model (m) with adjustable parameters (x), what are the best values and uncertainties for.

Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.

Monte Carlo Methods and Statistical Physics

Bayesian Estimation in MARK

Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

Markov-Chain Monte Carlo

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Bayesian Reasoning: Markov Chain Monte Carlo

Bayesian statistics – MCMC techniques

BAYESIAN INFERENCE Sampling techniques

CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov

Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.

Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.

Today Introduction to MCMC Particle filters and MCMC

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Algorithm Evaluation and Error Analysis class 7 Multiple View Geometry Comp Marc Pollefeys.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Introduction to Monte Carlo Methods D.J.C. Mackay.

Bayes Factor Based on Han and Carlin (2001, JASA).

Priors, Normal Models, Computing Posteriors

Module 1: Statistical Issues in Micro simulation Paul Sousa.

1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:

1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.

Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.

Simulated Annealing.

Principal Component Analysis: Preliminary Studies Émille E. O. Ishida IF - UFRJ First Rio-Saclay Meeting: Physics Beyond the Standard Model Rio de Janeiro.

Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.

Observational constraints and cosmological parameters Antony Lewis Institute of Astronomy, Cambridge

Cosmological Model Selection David Parkinson (with Andrew Liddle & Pia Mukherjee)

Lecture 2: Statistical learning primer for biologists

3rd International Workshop on Dark Matter, Dark Energy and Matter-Antimatter Asymmetry NTHU & NTU, Dec 27—31, 2012 Likelihood of the Matter Power Spectrum.

Tutorial I: Missing Value Analysis

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.

Computational Physics (Lecture 10) PHY4370. Simulation Details To simulate Ising models First step is to choose a lattice. For example, we can us SC,

SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Markov Chain Monte Carlo in R

Introduction to Sampling based inference and MCMC

Missing data: Why you should care about it and what to do about it

Optimization of Monte Carlo Integration

Advanced Statistical Computing Fall 2016

Jun Liu Department of Statistics Stanford University

Markov chain monte carlo

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Filtering and State Estimation: Basic Concepts

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Robust Full Bayesian Learning for Neural Networks

Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics

Presentation transcript:

Efficient MCMC for cosmological parameters Antony Lewis Institute of Astronomy, Cambridge http://cosmologist.info/ collaborator: Sarah Bridle CosmoMC: http://cosmologist.info/cosmomc Lewis, Bridle: astro-ph/0205436 http://cosmologist.info/notes/cosmomc.ps.gz

Introduction + Cosmological parameters CMB Polarization Baryon oscillations Weak lensing Galaxy power spectrum Cluster gas fraction Lyman alpha etc… + + Priors WMAP Science Team Cosmological parameters

Bayesian parameter estimation Can compute P( {ө} | data) using e.g. assumption of Gaussianity of CMB field and priors on parameters Often want marginalized constraints. e.g. BUT: Large n-integrals very hard to compute! If we instead sample from P( {ө} | data) then it is easy: Generate samples from P

old CMB data alone color = optical depth Samples in 6D parameter space old CMB data alone color = optical depth

Markov Chain Monte Carlo sampling Move in parameter space with transition probability: Chain distribution converges to P if all points can be reached (ergodic) and T leaves the distribution invariant for all θ’ Detailed balance is a sufficient condition: Metropolis-Hastings method: for (nearly) arbitrary proposal function q and Simplest if q chosen to be symmetric

If proposed position has higher probability: always move P(x) x1 x2

If proposed position has lower probability: move to x2 with probability P(x2)/P(x1) otherwise take another sample at x1 P(x) P(x1) ? P(x2) x2 x1

Procedure: - Chose priors and parameterization - Chose a proposal distribution - Start chain in some random position (or several in different starting positions) - Iterate large sequence of chain steps storing chain positions (correlated samples) - Remove ‘some’ samples from the beginning (‘burn in’) - Optionally thin the samples - Check for convergence - Estimate desired quantities from samples (or just publish list of samples)

Potential Problems: - Proposal width low or directions bad: slow sqrt(N) random walk through space - Proposal width too large: low probability of proposing similar likelihood, low acceptance rate - Proposal distribution must be able to reach all points in parameter space Can never cross gap!

Assessing performance and convergence From single chain: - Autocorrelation length Raftery and Lewis Chain splitting variances From multiple chains (e.g. MPI): Gelman-Rubin (last half chains): - variance of means - diagonalized variance of means - diagonalized variance of limits All necessary but not sufficient!

Which proposal distribution? Possible approaches: Use many different proposals, try to be as robust as possible – ‘black box’: - e.g. BayeSys http://www.inference.phy.cam.ac.uk/bayesys/ - Unnecessarily slow for ‘simple’ systems. - Less likely to give wrong answers. - Versatile (but work to integrate with cosmology codes). Use physical knowledge to make distribution ‘simple’ – use simple fast method: e.g. CosmoMC - Fast as long as assumptions are met. - More likely to give wrong answers; need e.g. close to unimodal distributions. - Less versatile - Optimized to use multi-processors efficiently; integrated CAMB Either of above, but also try to calculate ‘Evidence’: - e.g. BayeSys, Nested Sampling (CosmoMC plug-in), fast method + thermodynamic integration, etc… Essentially no method is guaranteed to give correct answers in reasonable time!

Proposal distribution: Use knowledge to change to physical parameters E.g. peak position ~ sound horizon -------------------------------- angular diameter distance use this as variable if non-linear transformation beware of priors (Jacobian) Kosowsky et al. astro-ph/0206014

Proposal distribution: changing variables, transform to near-isotropic Gaussian Linear Transformation No change in priors Equivalent to proposing along eigenvector directions Easily done automatically from estimate of covariance matrix e.g.

Proposal distribution: dynamically adjusting parameters Beware of non-Markovian updates… OK methods: Do short run, estimate covariance, change parameters, do final run -Make periodic updates, but discard all samples before the final update Periodically update covariance, but use constant fraction of all previous samples; covariance will convergence, hence asymptotically Markovian BAD methods (sample from wrong distribution, e.g. astro-ph/0307219): - Estimate local covariance and use for proposal distribution - Dynamically adjust proposal width based on local acceptance rate - Propose change to some parameters then adjust other parameters to find good fit, then take new point as final proposal ?? The onus is on you to prove any new method leaves the target distribution invariant!

Symmetric Metropolis proposal distributions - Most common choice is a n-D Gaussian (n may be a sub-space dimension) - in 2D - Scale with to n get same acceptance rate in any dimension - In general corresponds to choosing random direction, and moving normalized distance r in that direction with probability ‘Best’ scaling is r → r / 2.4 if posterior is Gaussian Dunkley et al astro-ph/0405462 Q: Is this really best, or just shortest autocorrelation length??

Is using a large-n Gaussian proposal a good idea. e. g Is using a large-n Gaussian proposal a good idea? e.g. consider Pn(r) as a proposal in 1D – be careful how you assess performance! Higher n, shorter correlation length, worse Raftery & Lewis Definitely not in 1D… maybe not in general

Want robust to inaccurate size estimates: broaden tail, make non-zero at zero: e.g. mixture with exponential 1/3 exponential, n=2, 5 (n=2 used by CosmoMC) +cycling

Fast and Slow Parameters Calculating theoretical CMB and P(k) transfer functions is slow (~ second) ‘Slow’ parameters: - Changing initial power spectrum is fast (~ millisecond) ‘Fast’ parameters: Other nuisance parameters may also be fast (calibrations etc.), though often can be integrated numerically. ‘Fast’ only as fast as the data likelihood code How do we use the fast/slow distinction efficiently?

Usual choice of proposal directions: Fast-slow optimized choice: Slow Fast

‘Directional Gridding’ (Radford Neal) Slow Slow grid width chosen randomly at start of each gridding step: full gridding step leaves distribution invariant Fast Other methods: e.g. tempered transitions etc. may work for very fast parameters Variation of: Neal 1996; Statistics and Computing 6, 353 Neal 2005, math.ST/0502099 (Nested sampling??)

Choice of Priors e.g. parameterize reionization history in a set of bins of xe. What is your prior? P(τn) Physically 0 < xe < 1 - ‘Simple’ choice is uncorrelated uniform flat prior on optical depth from each bin: 1/τn τn Lewis et al. 2006, in prep

What does this mean for our prior on the total optical depth? Exact result is piecewise polynomial Large N Total optical depth prior ~ Gaussian (strongly peaked away from zero). Low optical depth is a priori very unlikely - is this really what you meant??

Another example: reconstructing primordial power spectrum in n bins If you don’t have very good data worth thinking carefully about your prior! Another example: reconstructing primordial power spectrum in n bins Bridle et al. astro-ph/0302306 or ? with flat uncorrelated priors on bins? Differ by an effective prior on optical depth ! Also, do we really mean to allow arbitrarily wiggly initial power spectra? One sensible solution: Gaussian process prior (imposes smoothness) - result then converges for large n

Conclusions MCMC is a powerful method, but should be used with care Judicious choice of parameters and proposal distribution greatly improves performance Can exploit difference between fast and slow parameters Worth thinking carefully about the prior Do not run CosmoMC blindly with flat priors on arbitrary new parameters

Random directions? Don’t really want to head back in same direction… Chose random orientation of orthogonalized basis vectors Cycle through directions - Periodically chose new random orientation Helps in low dimensions Also good for slice sampling (sampling_method=2,3) (Neal: physics/0009028) mixture only

Using fast-slow basis: - Less efficient per step because basis not optimal - Can compensate by doing more ‘fast’ transitions at near-zero cost - Ideally halves the number of ‘slow’ calculations Can be generalized to n-dimensions: - make several proposals in fast subspace - make proposals in best slow eigendirections - save lots of time if many fast parameters This is CosmoMC default