Model checking in mixture models via mixed predictive p-values Alex Lewin and Sylvia Richardson, Centre for Biostatistics, Imperial College, London Mixed.

Slides:



Advertisements
Similar presentations
Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, : Imperial College Dept. Epidemiology 2: Imperial College.
Advertisements

STATISTICAL TOOLS FOR SYNTHESIZING LISTS OF DIFFERENTIALLY EXPRESSED FEATURES IN MICROARRAY EXPERIMENTS Marta Blangiardo and Sylvia Richardson 1 1 Centre.
Estimating the False Discovery Rate in Multi-class Gene Expression Experiments using a Bayesian Mixture Model Alex Lewin 1, Philippe Broët 2 and Sylvia.
Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.
1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.
BGX 1 Sylvia Richardson Centre for Biostatistics Imperial College, London Statistical Analysis of Gene Expression Data In collaboration with Natalia Bochkina,
Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.
Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina.
Introduction to Monte Carlo Markov chain (MCMC) methods
Bayesian Hypothesis Testing In Nested Models Harold Jeffreys Jeff Rouder.
A Bayesian random coefficient nonlinear regression for a split-plot experiment for detecting differences in the half- life of a compound Reid D. Landes.
Hierarchical Dirichlet Processes
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Bayesian Estimation in MARK
By Addison Euhus, Guidance by Edward Phillips An Introduction To Uncertainty Quantification.
1 Graphical Diagnostic Tools for Evaluating Latent Class Models: An Application to Depression in the ECA Study Elizabeth S. Garrett Department of Biostatistics.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated?
Bayesian Analysis of X-ray Luminosity Functions A. Ptak (JHU) Abstract Often only a relatively small number of sources of a given class are detected in.
USE OF LAPLACE APPROXIMATIONS TO SIGNIFICANTLY IMPROVE THE EFFICIENCY
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Bayesian Methods for Benefit/Risk Assessment
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Priors, Normal Models, Computing Posteriors
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Randomized Algorithms for Bayesian Hierarchical Clustering
Hierarchical Bayesian Model Specification Model is specified by the Directed Acyclic Network (DAG) and the conditional probability distributions of all.
Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University
Bayesian Hierarchical Modeling for Longitudinal Frequency Data Joseph Jordan Advisor: John C. Kern II Department of Mathematics and Computer Science Duquesne.
MCMC reconstruction of the 2 HE cascade events Dmitry Chirkin, UW Madison.
Bayesian Travel Time Reliability
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
Inference of Non-Overlapping Camera Network Topology by Measuring Statistical Dependence Date :
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
A latent Gaussian model for compositional data with structural zeroes Adam Butler & Chris Glasbey Biomathematics & Statistics Scotland.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Capitalizing on GW polarization bias
CHAPTER 10 Comparing Two Populations or Groups
MCMC Output & Metropolis-Hastings Algorithm Part I
MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.
Bayesian data analysis
CHAPTER 10 Comparing Two Populations or Groups
Checking For Prior-Data Conflict
Mixture Modeling of the Distribution of p-values from t-tests
Michael Epstein, Ben Calderhead, Mark A. Girolami, Lucia G. Sivilotti 
Mixture modeling of the distribution of p-values from t-tests
Bayesian inference Presented by Amir Hadadi
Limitations of Hierarchical and Mixture Model Comparisons
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology)
CS639: Data Management for Data Science
CHAPTER 10 Comparing Two Populations or Groups
Classical regression review
Presentation transcript:

Model checking in mixture models via mixed predictive p-values Alex Lewin and Sylvia Richardson, Centre for Biostatistics, Imperial College, London Mixed predictive distribution The hierarchical model has parameters for each individual g (at the 2nd and 3rd levels), and global parameters (at the 3rd and 4th levels). Mixed predictive data: (1) predict new 2nd level parameters conditional on the 3rd level parameters in the model, (2) predict new data conditional on the new 2nd level parameters. Mixed predicted data for each individual has reduced dependence on the observed data for that individual, as the new data is sampled conditional on the global hyperparameters (posterior predictive data is sampled conditional on individual parameters). Therefore the mixed predictive p- values are less conservative than posterior predictive p-values. Calculation of p-values is simple: model is run with Monte-Carlo Markov Chain (MCMC). Sample predictive parameters and data from distributions specified in model, count how many times predicted test statistic is larger than observed test statistic. Mixed predictive checks have been used to check other aspects of 2nd level distributions (Lewin et al. 2006). Choice of parameters to predict main parameter (corresponds to test statistic) results similar whether or not this is also predicted important not to predict this (want to look at each mixture component separately) Introduction We are concerned with model checking for complex Bayesian hierarchical models, using predictive distributions. A common choice is the posterior predictive. Model checks using this are conservative, as predicted data is highly dependent on observed data. We use the mixed predictive (Gelman et al 1996), which is less conservative (Marshall & Spiegelhalter 2003). We focus our checks on 2nd level parameters, specificially parameters whose distribution is defined as a mixture. It is at this level that sensitivity to model assumptions is most expected and hardest to check directly. Mixed predictive p-values for mis-specified model Investigate behaviour of predictive p- values under a mis-specified model: Simulate data from mixture of Uniforms (all other parameters as before). Reduced conservativeness Investigate behaviour of predictive p-values under the null: simulate data from the model we fit individuals (g=1,…,1000), 8 repeats (i=1,…,8). Mixed predictive p-values much closer to Uniform than posterior predictive p-values. References Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior Predictive Assessment of Model Fitness via Realized Discrepancies. Statistica Sinica 6, Marshall, E. C. and Spiegelhalter, D. J. (2003). Approximate cross-validatory predictive checks in disease mapping models. Statistics in Medicine 22, Lewin, A., Richardson, S., Marshall C., Glazier A. and Aitman T. (2006). Bayesian Modelling of Differential Gene Expression. Biometrics, 62, 1-9 Our approach to model-checking Aspects of Model s of individuals modelled in parallel, exchangeably - assumptions made on model structure (see below for mixture model) - no strong prior information on model parameters Model Checks - aim to check each mixture component separately - obtain measure of fit for each individual - compare predicted distributions with observed data using Bayesian p-values - assess Uniformity of p-values using histograms and q-q plots - use mixed predictive distribution (see below) Mixed predictive checks Red shows the model fitted. Green shows the posterior predictive quantities. Blue shows the mixed predictive quantities (new parameters are predicted within the model). Mixed Prediction Posterior Prediction δgδg δ g pr ed zgzg mixed pred. x gi post. pred. x gi g α, βη π obs. x gi Mixture model q-q plots of p-values for the 3 mixture components. Note small numbers of individuals in the 2 outer components. p-values for genes with strong inference on mixture component: results are much more Uniform Mixed predictive p-values for separate mixture components Define p-values conditional on membership of mixture component: These p-values are a mixture of Uniform (individuals assigned to the correct mixture component) and Non-Uniform (individuals assigned to the wrong component). Discussion. For real data, true model does not exist. Need criterion to judge acceptable departures from Uniformity. Model checks for mixtures should consider both marginal and conditional predictions. Mixed predictive checking is a sensitive tool for highlighting mis-specification