Queensland University of Technology CRICOS No. 000213J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane Joint work with.

Slides:



Advertisements
Similar presentations
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Advertisements

METHODS FOR HAPLOTYPE RECONSTRUCTION
Bayesian Estimation in MARK
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov-Chain Monte Carlo
Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian statistics – MCMC techniques
BAYESIAN INFERENCE Sampling techniques
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Results 2 (cont’d) c) Long term observational data on the duration of effective response Observational data on n=50 has EVSI = £867 d) Collect data on.
Approximate Bayesian Methods in Genetic Data Analysis Mark A. Beaumont, University of Reading,
Particle Filters for Mobile Robot Localization 11/24/2006 Aliakbar Gorji Roborics Instructor: Dr. Shiri Amirkabir University of Technology.
Today Introduction to MCMC Particle filters and MCMC
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Bayes Factor Based on Han and Carlin (2001, JASA).
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Computer vision: models, learning and inference Chapter 19 Temporal models.
Model Inference and Averaging
Priors, Normal Models, Computing Posteriors
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
ABC The method: practical overview. 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Markov Random Fields Probabilistic Models for Images
A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell Almond Florida State University College of Education Educational Psychology.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Tracking Multiple Cells By Correspondence Resolution In A Sequential Bayesian Framework Nilanjan Ray Gang Dong Scott T. Acton C.L. Brown Department of.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
A latent Gaussian model for compositional data with structural zeroes Adam Butler & Chris Glasbey Biomathematics & Statistics Scotland.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Introduction to Sampling based inference and MCMC
(joint work with Ai-ru Cheng, Ron Gallant, Beom Lee)
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Bayesian inference Presented by Amir Hadadi
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
More about Posterior Distributions
Filtering and State Estimation: Basic Concepts
Ch13 Empirical Methods.
Slides for Sampling from Posterior of Shape
Presentation transcript:

Queensland University of Technology CRICOS No J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane Joint work with Rob Reeves

CRICOS No J a university for the world real R Outline 1.Some problems with intractable likelihoods. 2.Monte Carlo methods and Inference. 3.Normalizing constant/partition function. 4.Likelihood free Markov chain Monte Carlo. 5.Approximating Hierarchical model 6.Indirect Inference and likelihood free MCMC 7.Conclusions.

CRICOS No J a university for the world real R Stochastic models (Riley et al, 2003) Macroparasite within a host. Juvenile worm grows to adulthood in a cat. Host fights back with immunity. Number of Juveniles, Adults and amount of Immunity (all integer). evolve through time according to Markov process unknown parameters, eg Juvenile → Adult rate of maturation Immunity changes with time Juveniles die due to Immunity Moment closure approximations for distribution of limited to restricted parameter values.

CRICOS No J a university for the world real R Numerical computation of limited by small maximum values of J, A, I. Can simulate process easily. Data: J at t=0 and A at t (sacrifice of cat), replicated with several cats Source: Riley et al, 2003.

CRICOS No J a university for the world real R Other stochastic process models include spatial stochastic expansion of species (Hamilton et al, 2005; Estoup et al, 2004) birth-death-mutation process for estimating transmission rate from TB genotyping (Tanaka et al, 2006) population genetic models, eg coalescent models (Marjoram et al 2003) Likelihood free Bayesian MCMC methods are often employed with quite precise priors.

CRICOS No J a university for the world real R Normalizing constant/partition function problem. The algebraic form of the distribution for y is known but it is not normalized, eg Ising model For means neighbours (on a lattice, say). The normalizing constant involves in general a sum over terms. Write

CRICOS No J a university for the world real R N-S and E-W neighbourhood

CRICOS No J a university for the world real R Outline 1.Some problems with intractable likelihoods. 2.Monte Carlo methods and Inference. 3.Normalizing constant/partition function. 4.Likelihood free Markov chain Monte Carlo. 5.Approximating Hierarchical model 6.Indirect Inference and likelihood free MCMC 7.Conclusions.

CRICOS No J a university for the world real R Monte Carlo methods and Inference. Intractable likelihood, instead use easily simulated values of y. Simulated method of moments (McFadden, 1989). Method of estimation: comparing theoretical moments or frequencies with observed moments or frequencies. Can be implemented using a chi-squared goodness-fit-statistic, eg Riley et al, Data: number of adult worms in cat at sacrifice.

CRICOS No J a university for the world real R Source: Riley et al Plot of goodness-of-fit statistic versus parameter. Greedy Monte Carlo. Precision of estimate?

CRICOS No J a university for the world real R Outline 1.Some problems with intractable likelihoods. 2.Monte Carlo methods and Inference. 3.Normalizing constant/partition function. 4.Likelihood free Markov chain Monte Carlo. 5.Approximating Hierarchical model 6.Indirect Inference and likelihood free MCMC 7.Conclusions.

CRICOS No J a university for the world real R 3.Normalizing constant/partition function and MCMC (half-way to likelihood free inference) Here we assume (Møller, Pettitt, Reeves and Berthelsen, 2006) Key idea Importance sample estimate of given by Sample.

CRICOS No J a university for the world real R Used off-line to estimate then carry out standard Metropolis- Hastings with interpolation over a grid of values.( eg Green and Richardson, 2002, in a Potts model). Standard Metropolis Hastings: Simulating from target distribution Acceptance ratio for changing accepted with probability. Key Question: Can be calculated on-line or avoided?

CRICOS No J a university for the world real R On-line algorithm – single auxiliary variable method. Introduce auxiliary variable x on same space as y and extend target distribution for the MCMC Key Question: How to choose distribution of x so that removed from Now acceptance ratio is as a new pair proposed. Proposal becomes. Assume the factorisation Choose the proposal so that Then algebra → cancellation of and does not depend on

CRICOS No J a university for the world real R Note: Need perfect or exact simulation from for the proposal. Key Question: How to choose, the auxiliary variable distribution? The best choice

CRICOS No J a university for the world real R Choice (i)

CRICOS No J a university for the world real R Choice (ii)

CRICOS No J a university for the world real R Choice (i) Fix, say at a good estimate of. Then so does not depend on only y and cancels in. Choice (ii) Eg Partially ordered Markov mesh model for Ising data Comment Both choices can suffer from getting stuck because can be very different from the ideal.

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R Source: Møller et al, 2006 Single auxiliary method tends to get stuck Murray et al (2006) offer suggestions involving multiple auxiliary variables

CRICOS No J a university for the world real R Outline 1.Some problems with intractable likelihoods. 2.Monte Carlo methods and Inference. 3.Normalizing constant/partition function. 4.Likelihood free Markov chain Monte Carlo. 5.Approximating Hierarchical model 6.Indirect Inference and likelihood free MCMC 7.Conclusions.

CRICOS No J a university for the world real R 4.Likelihood free MCMC Single Auxiliary Variable Method as almost Approximate Bayesian Computation (ABC) We wish to eliminate or equivalently, the likelihood from the M-H algorithm. Solution: The distribution of x given y and puts all probability on y, the observed data, then with the likelihood This might work for discrete data, sample size small, and if the proposal were a very good approximation to. If sufficient statistics s(y) exist then

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R Likelihood free methods, ABC- MCMC Change of notation, observed data (fixed), y is pseudo data or auxiliary data generated from the likelihood. Instead of, now have y close to in the sense of statistics s( ), distance ABC allows rather than equal to 0 Target distribution for variables Standard M-H with proposals (Marjoram et al 2003; ABC MCMC) for acceptance of. Ideally  should be small but this leads to very small acceptance probabilities.

CRICOS No J a university for the world real R Issues of implementing Metropolis-Hastings ABC (a)Tune for  to get reasonable acceptance probabilities; (b)All satisfying (hard) accepted with equal probability rather than smoothly weighted by (soft). (c)Choose summary statistics carefully if no sufficient statistics

CRICOS No J a university for the world real R Tune for  A solution is to allow  to vary as a parameter (Bortot et al, 2004). The target distribution is Run chain and post filter output for small values of 

CRICOS No J a university for the world real R Outline 1.Some problems with intractable likelihoods. 2.Monte Carlo methods and Inference. 3.Normalizing constant/partition function. 4.Likelihood free Markov chain Monte Carlo. 5.Approximating Hierarchical model 6.Indirect Inference and likelihood free MCMC 7.Conclusions.

CRICOS No J a university for the world real R Beaumont, Zhang and Balding (2002) use kernel smoothing in ABC-MC

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R Approximating Hierarchical Model

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R Outline 1.Some problems with intractable likelihoods. 2.Monte Carlo methods and Inference. 3.Normalizing constant/partition function. 4.Likelihood free Markov chain Monte Carlo. 5.Approximating Hierarchical model 6.Indirect Inference and likelihood free MCMC 7.Conclusions.

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R

CRICOS No J a university for the world real R Some points How could approximate posterior be made more precise? –Use more parameters in approximating likelihood, the POMM? (Gouriéroux at al (1993), Heggland and Frigassi (2004) discuss this in the frequentist setting) –More iterations for side chain “exact” calculation of approximate posterior? How to choose a good approximating likelihood? Relationship to summary statistics approach?

CRICOS No J a university for the world real R Outline 1.Some problems with intractable likelihoods. 2.Monte Carlo methods and Inference. 3.Normalizing constant/partition function. 4.Likelihood free Markov chain Monte Carlo. 5.Approximating Hierarchical model 6.Indirect Inference and likelihood free MCMC 7.Conclusions.

CRICOS No J a university for the world real R Conclusions 1.For the normalizing constant problem we presented a single on- line M-H algorithm. 2.We linked these ideas to ABC-MCMC and developed a hierarchical model (HM) to approximate the true posterior – showed variance inflation. 3.We showed that the approximating HM could be tempered swaps made to improve mixing using parallel chains, variance inflation effect corrected by smoothing posterior summaries from the tempered chains. 4.We extended indirect inference to an HM to find a way of implementing the Metropolis Hastings algorithm which is likelihood free. 5.We demonstrated the ideas with the Ising/autologistic model. 6.Application to specific examples is on-going and requires refinement of general approaches.

CRICOS No J a university for the world real R Acknowledgements Support of the Australian Research Council Co-authors Rob Reeves, Jesper Møller, Kasper Berthelsen Discussions with Malcolm Faddy, Gareth Ridall, Chris Glasbey, Grant Hamilton …