A latent Gaussian model for compositional data with structural zeroes Adam Butler & Chris Glasbey Biomathematics & Statistics Scotland.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Capacity of MIMO Channels: Asymptotic Evaluation Under Correlated Fading Presented by: Zhou Yuan University of Houston 10/22/2009.
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Bayesian Estimation in MARK
Lwando Kondlo Supervisor: Prof. Chris Koen University of the Western Cape 12/3/2008 SKA SA Postgraduate Bursary Conference Estimation of the parameters.
1 Parametric Sensitivity Analysis For Cancer Survival Models Using Large- Sample Normal Approximations To The Bayesian Posterior Distribution Gordon B.
Markov-Chain Monte Carlo
Lecture 3 Probability and Measurement Error, Part 2.
Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
USE OF LAPLACE APPROXIMATIONS TO SIGNIFICANTLY IMPROVE THE EFFICIENCY
Bootstrap in Finance Esther Ruiz and Maria Rosa Nieto (A. Rodríguez, J. Romo and L. Pascual) Department of Statistics UNIVERSIDAD CARLOS III DE MADRID.
Particle Filters for Mobile Robot Localization 11/24/2006 Aliakbar Gorji Roborics Instructor: Dr. Shiri Amirkabir University of Technology.
#10 MONTE CARLO SIMULATION Systems Fall 2000 Instructor: Peter M. Hahn
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Computer vision: models, learning and inference Chapter 3 Common probability distributions.
Lecture II-2: Probability Review
Introduction to Monte Carlo Methods D.J.C. Mackay.
Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.
Bayes Factor Based on Han and Carlin (2001, JASA).
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Queensland University of Technology CRICOS No J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane Joint work with.
 1  Outline  stages and topics in simulation  generation of random variates.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Julian Center on Regression for Proportion Data July 10, 2007 (68)
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
1 Advances in the Construction of Efficient Stated Choice Experimental Designs John Rose 1 Michiel Bliemer 1,2 1 The University of Sydney, Australia 2.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Testing Models on Simulated Data Presented at the Casualty Loss Reserve Seminar September 19, 2008 Glenn Meyers, FCAS, PhD ISO Innovative Analytics.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Monté Carlo Simulation  Understand the concept of Monté Carlo Simulation  Learn how to use Monté Carlo Simulation to make good decisions  Learn how.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.
HMM - Part 2 The EM algorithm Continuous density HMM.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Bayesian Travel Time Reliability
Sampling and estimation Petter Mostad
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Dario Grana and Tapan Mukerji Sequential approach to Bayesian linear inverse problems in reservoir modeling using Gaussian mixture models SCRF Annual Meeting,
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
MCMC Output & Metropolis-Hastings Algorithm Part I
Latent Variables, Mixture Models and EM
Latent Dirichlet Analysis
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Probabilistic Surrogate Models
Classical regression review
Presentation transcript:

A latent Gaussian model for compositional data with structural zeroes Adam Butler & Chris Glasbey Biomathematics & Statistics Scotland

1. Application to seabird diet Kittiwake data from four islands on the East coast of Scotland for Previously analysed by Bull et al. (2004) How does the composition of seabird diet vary between colonies, years and seasons…?

Relative proportions of D=3 food types: - SE0: juveline sandeels - SE1: adult sandeels - Other species (aggregated) 543 individual birds – -251 have SE0 only -51 have SE1 only -80 have “other” only -158 have a mix

2. Compositional data Compositional data refer to relative frequencies (proportions), and frequently arise in fields such as geology, economics and ecology. If x denote data on the proportions of D components then x must lie on the unix simplex: Such data cannot be analysed using standard methods because of the sum constraint that x T 1 = 1.

Well established approach for dealing with compositional data by modelling log-ratios of x using a multivariate normal distribution: Aitchison (1986) If x lies on the interior of the simplex this works well, but it cannot be applied when some proportions of x are zero No general approach for situation in which zero values of x may correspond to genuine absences of a component: “structural zeroes”

3. A latent Gaussian model

We assume that x=g(y), where: –y has a D-dimensional multivariate normal distribution with mean  and covariance matrix , where  T 1=1 and  1=0. –g is the function which performs a Euclidean projection of y onto the unit Simplex S D

Parsimonious: (D-1)(D+2)/2 parameters Relatively flexible – can cope with a high proportion of zero values No mathematical justification for our model, so important to check fit to the data Diagnostic: compare patterns of zero values in the data with those given by the model

4. Inference The log-likelihood function is where:  D (x; ,  ) is the PDF of a multivariate normal distribution is the “inverse” of g(y)

1)There are no explicit formulae for either g(y) or h(x) 2)If we could evaluate h(x) the likelihood would still contain intractable integrals… For general D the likelihood cannot be evaluated analytically, because:

But in order to simulate from the model we only need to find the Euclidean projection of y onto the unit simplex: We propose an iterative algorithm for doing this – will reach solution in at most D-1 steps

5. Approximate Bayesian Computation “ABC” is a methodology for drawing inferences by Monte Carlo simulation when the likelihood is intractable but the model is easy to simulate from In usual MCMC we tend to accept parameter values that have relatively high values of the likelihood In ABC we tend to accept parameter values that simulate data with summary statistics similar to those of the real data

Elements of ABC: Prior distribution  (  ) Summary statistics S, Distance measure , threshold  Number of samples N

Basic ABC algorithm: for (i = 1,…,N) { (1) Generate values  * by simulating from prior  (  ) (2) Simulate y * from model with parameters  * (3) If D(S(y * ), S(y)) <  then set  (i) =  * ; else go to (1) }

Sequential ABC algorithm (Sisson et al., 2006) Generate values {  0 (1),…,  0 (N) }by simulating from prior  (  ) and applying basic ABC algorithm with threshold e 0 for (t = 1,…,T) { Generate values {  t (1),…,  t (N) }by sampling from {  t-1 (1),…,  t-1 (N) }, proposing a move using q, and applying basic ABC algorithm with threshold e t } Take e t = , need proposal distn q, thresholds e 0, e 0,…,e T-1

Elements of ABC – our choices: Prior distribution  (  ): uniform over a wide interval Summary statistics S: -marginal means, marginal variances (x2); - means of differences between components (/2); - proportions of zero and one values for each component Distance measure D: Mean of absolute values of the elements of S(y * ) - S(y)

6. Results – simulated data D=3 components Compare ABC (black) and analytic MLEs (red) Generate n=200 obs from symmetric model with marginal SDs of 1

6. Results – seabird data Aim in future to apply model to: - individual groups - more diet classes

7. Conclusions Parsimonious model for compositional data that contain structural zeroes Developed an iterative algorithm to simulate from the model Likelihood cannot be computed analytically, so use ABC methods to draw inferences Sequential ABC algorithm (Sisson et al., 2006) much more efficient than other ABC algorithms

Further information Manuscript Manuscript: