Lecture 9 Model Comparison using MCMC and further models.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

J. Daunizeau Institute of Empirical Research in Economics, Zurich, Switzerland Brain and Spine Institute, Paris, France Bayesian inference.
Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.
MCMC for Poisson response models
Generalised linear mixed models in WinBUGS
Introduction to Monte Carlo Markov chain (MCMC) methods
Other MCMC features in MLwiN and the MLwiN->WinBUGS interface
Bayesian Statistics and Inference
Lecture 23 Spatial Modelling 2 : Multiple membership and CAR models for spatial data.
MCMC estimation in MlwiN
Multiple Regression and Model Building
Statistical Decision Theory Abraham Wald ( ) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.
Biointelligence Laboratory, Seoul National University
Computer vision: models, learning and inference Chapter 8 Regression.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Bayesian statistics 2 More on priors plus model choice.
Bayesian posterior predictive probability - what do interim analyses mean for decision making? Oscar Della Pasqua & Gijs Santen Clinical Pharmacology Modelling.
Lecture 3 Probability and Measurement Error, Part 2.
Model Assessment, Selection and Averaging
2005 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Francesca Dominici Michael Griswold The Johns Hopkins University Bloomberg.
Bayesian statistics – MCMC techniques
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008.
Department of Geography, Florida State University
Introduction to Multilevel Modeling Using SPSS
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Bayes Factor Based on Han and Carlin (2001, JASA).
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Modelling non-independent random effects in multilevel models William Browne Harvey Goldstein University of Bristol.
2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and.
Model Inference and Averaging
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Calibrated imputation of numerical data under linear edit restrictions Jeroen Pannekoek Natalie Shlomo Ton de Waal.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
Jump to first page Bayesian Approach FOR MIXED MODEL Bioep740 Final Paper Presentation By Qiang Ling.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Canadian Bioinformatics Workshops
Bayesian analysis of a conceptual transpiration model with a comparison of canopy conductance sub-models Sudeep Samanta Department of Forest Ecology and.
Chapter 14 Introduction to Multiple Regression
MCMC Output & Metropolis-Hastings Algorithm Part I
Module 2: Bayesian Hierarchical Models
Bayesian Generalized Product Partition Model
Bayesian data analysis
Model Inference and Averaging
Statistics in MSmcDESPOT
Set-up of the Bayesian Regression Model
School of Mathematical Sciences, University of Nottingham.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Biointelligence Laboratory, Seoul National University
PSY 626: Bayesian Statistics for Psychological Science
Robust Full Bayesian Learning for Neural Networks
Multivariate Methods Berlin Chen
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Bayesian Data Analysis in R
Bayesian Model Selection and Averaging
Presentation transcript:

Lecture 9 Model Comparison using MCMC and further models

Lecture Contents Model comparison DIC diagnostic Random slopes regression model Priors for variance matrices MLwiN RSR demonstration Other predictor variables DIC in WinBUGS

Bayesian Model Comparison In Bayesian statistics model comparison is a thorny issue!! In MLwiN we used to suggest running IGLS for model selection then MCMC on your chosen model. Why is it a thorny issue? The posterior f(θ|Y) does not allow criticism of the model in light of the observed data nor comparison amonst models. It is f(Y) that can be used to assess model performance. Regardless of the model, f(Y) is a density over the space of observables which can be compared with what was actually observed.

Bayes Factors If we observe Y OBS and have 2 models M 1 and M 2 then the Bayes Factor is This provides the relative weight of evidence for model M 1 compared to model M 2. Rough calibration of the Bates factor has been proposed: BF< >150 Evidence-veweak+veStrongV. Strong

Problems with Bayes Factor 1. When prior is vague -> f(θ) is improper This implies that even though f(θ |Y) may be proper, f(Y) is improper so Bayes Factors cannot be used! 2. Computation of the Bayes factor itself requires high-dimensional integration. 3. Lindleys paradox – data points to rejection but prior is diffuse so denominator of Bayes factor much smaller than numerator and too much weight given to parsimonious models.

Other related ideas Prior predictive distributions f(Y). Cross-validation predictive distributions F(y r |Y (r) ). Posterior predictive distributions f(Y|Y obs ). Model uncertainty – where the model is itself a parameter to be estimated. Bayesian model averaging. Reversible jump MCMC.

Model Comparison for random effect models As we will typically use diffuse priors, Bayes factors are not an option here. The methods listed previously are possibilities but not built into software packages. The Deviance Information Criterion (DIC) is one possibility but is it a saviour for Bayesian model choice or a white elephant?

DIC – Spiegelhalter et al. (2002) Plus points: 1.Discussion paper proposing it written by leading figures in Bayesian modelling. 2.Available in both MLwiN and WinBUGS for standard models Minus points: The paper was given a very mixed reception at the RSS when it was discussed!

DIC A natural way to compare models is to use a criterion based on a trade-off between the fit of the data to the model and the corresponding complexity of the model. DIC does this in a Bayesian way. DIC = goodness of fit + complexity. Fit is measured by deviance Complexity is measured by an estimate of the effective number of parameters defined as i.e. Posterior mean deviance minus the deviance evaluated at the posterior mean of the parameters.

DIC (continued) The DIC is then defined analagously to AIC as Models with smaller DIC are better supported by the data. DIC can be monitored in WinBUGS from Inference/DIC menu. DIC is available in MLwiN under the Model/MCMC menu.

Education dataset We can fit a simple (Bayesian) linear regression in MLwiN The DIC output is as follows: Param Dbar D(thetabar) PDPD 3.02 DIC Note PD ~ 3 = the actual number of parameters

Variance components model Here we consider the random intercepts model from earlier practicals This is the parallel lines model

Change in DIC ModelDbarDthetabarPDPD DIC Regression VC Here we see the clear improvement in fitting random effects for school. Note that the effective number of parameters is ~60 compared with 68 actual parameters in the dataset due to random rather than fixed school effects.

Random slopes model (crossing lines) x 1ij school 2 school 1 u 1,1 u 0,2 u 1,2 u 0,1

Fitting an RSR in a Bayesian Framework The basic random slopes regression model is as follows: To this model we need to add priors for

Wishart priors For a (kxk) variance matrix parameter in a Normal likelihood the conjugate prior is the inverse Wishart distribution with parameters ν and S This distribution looks complex but is simply a multivariate generalisation of the inverse Gamma distribution.

Wishart prior for Ω u -1 In MLwiN we use an inverse Wishart prior for the precision matrix: Note this is a (weakly informative) prior as the first parameter represents the prior sample size and is set to the smallest feasible value. Browne and Draper have looked at alternative Wishart priors as well as a Uniform prior and performed simulations.

Gibbs Sampling algorithm for RSR model Repeat the following four steps 1. Generate β from its (Multivariate) Normal conditional distribution. 2. Generate each u j from its (Multivariate) Normal conditional distribution. 3. Generate Ω u -1 from its Wishart conditional distribution. 3. Generate 1/σ e 2 from its Gamma conditional distribution

Bayesian RSR Model for education dataset Note IGLS estimates used in prior. Variance (posterior mean) estimates bigger than IGLS estimates.

DIC for RSR model ModelDbarDthetabarPDPD DIC RSR VC As with the frequentist approach the random slopes model is an improvement over the random intercepts model. The additional 65 random parameters only add 32 effective parameters

Trajectories for the RSR model

MCMC Diagnostics for Ω u00

Predictions for the RSR model with highlighted data Here the top and bottom school are highlighted:

Residuals for the RSR Individually: and pairwise:

Uniform Priors Here the level 2 variance estimates increase as in Browne and Draper (2000) Browne and Draper found that the Wishart priors were preferable although the use of the IGLS estimate is not strictly Bayesian as we are using the data twice!

Other predictors in the education dataset This dataset has other predictors such as gender and school gender that can be considered in the practical. In the next slide we see the equations window for a model with these added which has DIC a reduction of over 25 on the earlier RSR model

RSR + gender effects

WinBUGS RSR & gender model { # Level 1 definition for(i in 1:N) { normexam[i] ~ dnorm(mu[i],tau) mu[i]<- beta[1] * cons[i] + beta[2] * standlrt[i] + beta[3] * girl[i] + beta[4] * boysch[i] + beta[5] * girlsch[i] + u2[school[i],1] * cons[i] + u2[school[i],2] * standlrt[i] } # Higher level definitions for (j in 1:n2) { u2[j,1:2] ~ dmnorm(zero2[1:2],tau.u2[1:2,1:2]) } # Priors for fixed effects for (k in 1:5) { beta[k] ~ dflat() } # Priors for random terms tau ~ dgamma( , ) sigma2 <- 1/tau for (i in 1:2) {zero2[i] <- 0} tau.u2[1:2,1:2] ~ dwish(R2[1:2, 1:2],2) sigma2.u2[1:2,1:2] <- inverse(tau.u2[,]) } Here we see the WiNBUGS code for our last model. Notice how MVN and Wishart distributions are specified in WinBUGS

DIC in WinBUGS In WinBUGS DIC is available from the Inference menu: The DIC is set after the burnin and then the DIC button is pressed after running giving: Dbar = post.mean of -2logL; Dhat = -2LogL at post.mean of stochastic nodes Dbar Dhat pDDIC Normexam total

Parameter estimates in WinBUGS Note that here we see that WinBUGS gives similar estmates as MLwiN for the model. Note that for the fixed effects β that WinBUGS indexes from 1 while MLwiN indexes from 0. node mean sd (2.5%, 97.5%) beta[1] ( , ) beta[2] (0.5145, ) beta[3] (0.1019, ) beta[4] ( , ) beta[5] ( , ) sigma (0.5272,0.576) sigma2.u2[1,1] ( , ) sigma2.u2[1,2] ( ,0.0372) sigma2.u2[2,2] ( , )

Next Practical The next practical is free ranging: You can follow the MLwiN chapter on RSR models that is given. You can try out RSR models in WinBUGS. You can try out fitting random effect models to the orthodont dataset using MCMC. You can try out DIC on other models.