Bayesian Model Selection and Averaging

Slides:



Advertisements
Similar presentations
J. Daunizeau Institute of Empirical Research in Economics, Zurich, Switzerland Brain and Spine Institute, Paris, France Bayesian inference.
Advertisements

Bayesian inference Lee Harrison York Neuroimaging Centre 01 / 05 / 2009.
Hierarchical Models and
Bayesian models for fMRI data
Group analyses of fMRI data Methods & models for fMRI data analysis in neuroeconomics November 2010 Klaas Enno Stephan Laboratory for Social and Neural.
MEG/EEG Inverse problem and solutions In a Bayesian Framework EEG/MEG SPM course, Bruxelles, 2011 Jérémie Mattout Lyon Neuroscience Research Centre ? ?
Classical inference and design efficiency Zurich SPM Course 2014
The M/EEG inverse problem
Bayesian models for fMRI data
Bayesian models for fMRI data Methods & models for fMRI data analysis 06 May 2009 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
J. Daunizeau Wellcome Trust Centre for Neuroimaging, London, UK Institute of Empirical Research in Economics, Zurich, Switzerland Bayesian inference.
Group analyses of fMRI data Methods & models for fMRI data analysis 28 April 2009 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
Group analyses of fMRI data Methods & models for fMRI data analysis 26 November 2008 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
General Linear Model & Classical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM M/EEGCourse London, May.
Dynamic Causal Modelling (DCM): Theory Demis Hassabis & Hanneke den Ouden Thanks to Klaas Enno Stephan Functional Imaging Lab Wellcome Dept. of Imaging.
7/16/2014Wednesday Yingying Wang
SPM Course Zurich, February 2015 Group Analyses Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London With many thanks to.
Group analyses of fMRI data Methods & models for fMRI data analysis November 2012 With many thanks for slides & images to: FIL Methods group, particularly.
Bayesian Inference and Posterior Probability Maps Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course,
Randomized Algorithms for Bayesian Hierarchical Clustering
Contrasts & Statistical Inference
Dynamic Causal Modelling Advanced Topics SPM Course (fMRI), May 2015 Peter Zeidman Wellcome Trust Centre for Neuroimaging University College London.
Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly.
Cosmological Model Selection David Parkinson (with Andrew Liddle & Pia Mukherjee)
Dynamic Causal Modelling Introduction SPM Course (fMRI), October 2015 Peter Zeidman Wellcome Trust Centre for Neuroimaging University College London.
Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston 18.
Dynamic Causal Model for evoked responses in MEG/EEG Rosalyn Moran.
Bayesian Methods Will Penny and Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course, London, May 12.
Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.
Dynamic Causal Models Will Penny Olivier David, Karl Friston, Lee Harrison, Andrea Mechelli, Klaas Stephan Mathematics in Brain Imaging, IPAM, UCLA, USA,
Bayes for Beginners Anne-Catherine Huys M. Berk Mirza Methods for Dummies 20 th January 2016.
Bayesian Inference in SPM2 Will Penny K. Friston, J. Ashburner, J.-B. Poline, R. Henson, S. Kiebel, D. Glaser Wellcome Department of Imaging Neuroscience,
Dynamic Causal Models Will Penny Olivier David, Karl Friston, Lee Harrison, Stefan Kiebel, Andrea Mechelli, Klaas Stephan MultiModal Brain Imaging, Copenhagen,
Model Comparison.
Bayesian Model Selection and Averaging SPM for MEG/EEG course Peter Zeidman 17 th May 2016, 16:15-17:00.
J. Daunizeau ICM, Paris, France TNU, Zurich, Switzerland
Group Analyses Guillaume Flandin SPM Course London, October 2016
Dynamic Causal Modeling of Endogenous Fluctuations
General Linear Model & Classical Inference
Neural mechanisms underlying repetition suppression in occipitotemporal cortex Michael Ewbank MRC Cognition and Brain Sciences Unit, Cambridge, UK.
Bayesian Inference Will Penny
SPM for M/EEG - introduction
Group analyses Thanks to Will Penny for slides and content
Computational models for imaging analyses
Dynamic Causal Model for evoked responses in M/EEG Rosalyn Moran.
Contrasts & Statistical Inference
'Linear Hierarchical Models'
Linear Hierarchical Modelling
Group analyses Thanks to Will Penny for slides and content
Statistical Parametric Mapping
SPM2: Modelling and Inference
Dynamic Causal Modelling for M/EEG
Dynamic Causal Modelling
Guillaume Flandin Wellcome Trust Centre for Neuroimaging
Bayesian Methods in Brain Imaging
Hierarchical Models and
Parametric Methods Berlin Chen, 2005 References:
Bayesian inference J. Daunizeau
M/EEG Statistical Analysis & Source Localization
Contrasts & Statistical Inference
Bayesian Inference in SPM2
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Wellcome Centre for Neuroimaging, UCL, UK.
Mixture Models with Adaptive Spatial Priors
Probabilistic Modelling of Brain Imaging Data
WellcomeTrust Centre for Neuroimaging University College London
Contrasts & Statistical Inference
Group DCM analysis for cognitive & clinical studies
Presentation transcript:

Bayesian Model Selection and Averaging SPM for MEG/EEG course Peter Zeidman May 2019

Contents DCM recap Comparing models Bayes rule for models, Bayes Factors Rapidly evaluating models Bayesian Model Reduction Investigating the parameters Bayesian Model Averaging Multi-subject analysis Parametric Empirical Bayes

Forward Problem Inverse Problem 𝑝(𝑌|𝜃,𝑚) 𝑚 𝜃 𝑝(𝜃|𝑌,𝑚) 𝑝(𝑌|𝑚) 𝑝(𝜃|𝑚) Likelihood Data 𝑌 Model 𝑚 Parameters 𝜃 Posterior Evidence 𝑝(𝜃|𝑌,𝑚) 𝑝(𝑌|𝑚) With priors 𝑝(𝜃|𝑚) Inverse Problem Adapted from a slide by Rik Henson

DCM Recap Priors determine the structure of the model R1 R2 Stimulus Connection ‘on’ Prior Connection strength (Hz) Probability Connection ‘off’ Prior Connection strength (Hz)

DCM Recap We have: Measured data 𝑦 A model 𝑚 with prior beliefs about the parameters 𝑝 𝜃 𝑚 ~𝑁 𝜇,Σ Model estimation (inversion) gives us: A score for the model, which we can use to compare it against other models 𝐹≅ log 𝑝 𝑦 𝑚 =accuracy−complexity Free energy 2. Estimated parameters – i.e. the posteriors 𝑝(𝜃|𝑚,𝑦)~𝑁 𝜇,Σ 𝜇: DCM.Ep – expected value of each parameter Σ: DCM.Cp – covariance matrix

DCM Framework We embody each of our hypotheses in a generative model. Each model differs in terms of connections that are present are absent (i.e. priors over parameters). We perform model estimation (inversion) We inspect the estimated parameters and / or we compare models to see which best explains the data.

Contents DCM recap Comparing models Bayes rule for models, Bayes Factors Rapidly evaluating models Bayesian Model Reduction Investigating the parameters Bayesian Model Averaging Multi-subject analysis Parametric Empirical Bayes

Bayes Rule for Models Question: I’ve estimated 10 DCMs for a subject. What’s the posterior probability that any given model is the best?     Model evidence Probability of each model given the data Prior on each model  

Bayes Factors Ratio of model evidence   Ratio of model evidence       From Raftery et al. (1995)   Note: The free energy approximates the log of the model evidence. So the log Bayes factor is:  

Bayes Factors Example: The free energy for model 𝑗 is 𝐹 𝑗 =23 and the free energy for model 𝑖 is 𝐹 𝑖 =20. So the log Bayes factor in favour of model 𝑗 is: We remove the log using the exponential function: ln 𝐵𝐹 𝑗 = ln 𝑝 𝑦 𝑚 𝑗 − ln 𝑝 𝑦 𝑚 𝑖 = 𝐹 𝑗 − 𝐹 𝑖 =23−20=3 𝐵𝐹 𝑗 = exp 3 ≈20 A difference in free energy of 3 means approximately 20 times stronger evidence for model 𝑗

Bayes Factors cont. Posterior probability of a model is               Posterior probability of a model is the sigmoid function of the log Bayes factor

Log BF relative to worst model         Posterior probabilities

Interim summary  

Contents DCM recap Comparing models Bayes rule for models, Bayes Factors Rapidly evaluating models Bayesian Model Reduction Investigating the parameters Bayesian Model Averaging Multi-subject analysis Parametric Empirical Bayes

Bayesian model reduction (BMR) Full model   Model inversion (VB) Priors:   X   Priors: Nested / reduced model   Bayesian Model Reduction (BMR)   Friston et al., Neuroimage, 2016

Contents DCM recap Comparing models Bayes rule for models, Bayes Factors Rapidly evaluating models Bayesian Model Reduction Investigating the parameters Bayesian Model Averaging Multi-subject analysis Parametric Empirical Bayes

Bayesian Model Averaging (BMA) Having compared models, we can look at the parameters (connection strengths). We average over models, weighted by the posterior probability of each model. This can be limited to models within the winning family.     SPM does this using sampling

Contents DCM recap Comparing models Bayes rule for models, Bayes Factors Rapidly evaluating models Bayesian Model Reduction Investigating the parameters Bayesian Model Averaging Multi-subject analysis Parametric Empirical Bayes

Hierarchical model of parameters What’s the average connection strength 𝜃? Is there an effect of disease on this connection? Could we predict a new subject’s disease status using our estimate of this connection? + Could we get better estimates of connection strengths knowing what’s typical for the group? Group Mean Disease First level DCM 𝜃 Image credit: Wilson Joseph from Noun Project

Hierarchical model of parameters Parametric Empirical Bayes   Priors on second level parameters Second level   Second level (linear) model Between-subject error   DCM for subject i Measurement noise First level Image credit: Wilson Joseph from Noun Project

GLM of connectivity parameters 𝜃 (1) =𝑋 𝜃 (2) + 𝜖 (2) Unexplained between-subject variability Design matrix (covariates) Group level parameters 𝜃 (2) × 𝜃 (1) = Subject 1 2 3 4 5 6 Between-subjects effects Covariate 𝑋 Group average connection strength Effect of group on the connection Effect of age on the connection

PEB Estimation First level Second level DCMs Subject 1 .     . PEB Estimation . Subject N First level free energy / parameters with empirical priors

spm_dcm_peb_review

PEB Advantages / Applications Properly conveys uncertainty about parameters from the subject level to the group level Can improve first level parameters estimates Can be used to compare specific reduced PEB models (switching off combinations of group-level parameters) Or to search over nested models (BMR) Prediction (leave-one-out cross validation)

Summary We can score the quality of models based on their (approximate) log model evidence or free energy, 𝐹. We compute 𝐹 by performing model estimation If models differ only in their priors, we can compute 𝐹 rapidly using Bayesian Model Reduction (BMR) Models are compared using Bayes rule for models. Under equal priors for each model, this simplifies to the log Bayes factor. We can test hypotheses at the group level using the Parametric Empirical Bayes (PEB) framework.

Further reading PEB tutorial: https://github.com/pzeidman/dcm-peb-example Free energy: Penny, W.D., 2012. Comparing dynamic causal models using AIC, BIC and free energy. Neuroimage, 59(1), pp.319-330. Parametric Empirical Bayes (PEB): Friston, K.J., Litvak, V., Oswal, A., Razi, A., Stephan, K.E., van Wijk, B.C., Ziegler, G. and Zeidman, P., 2015. Bayesian model reduction and empirical Bayes for group (DCM) studies. NeuroImage. Thanks to Will Penny for his lecture notes: http://www.fil.ion.ucl.ac.uk/~wpenny/

extras

Fixed effects (FFX) FFX summary of the log evidence:   Group Bayes Factor (GBF):     Stephan et al., Neuroimage, 2009

Fixed effects (FFX) 11 out of 12 subjects favour model 1 GBF = 15 (in favour of model 2). So the FFX inference disagrees with most subjects. Stephan et al., Neuroimage, 2009

Random effects (RFX) SPM estimates a hierarchical model with variables:     Expected probability of model 2 Outputs: Exceedance probability of model 2 This is a model of models Stephan et al., Neuroimage, 2009

Expected probabilities Exceedance probabilities

The log model evidence: Variational Bayes Approximates: The log model evidence:   Posterior over parameters:   The log model evidence is decomposed:   The difference between the true and approximate posterior Free energy (Laplace approximation)   Accuracy - Complexity

The Free Energy Accuracy - Complexity Complexity Distance between   Accuracy - Complexity Complexity Distance between prior and posterior means Occam’s factor   Volume of prior parameters posterior-prior parameter means Prior precisions Volume of posterior parameters (Terms for hyperparameters not shown)

Bayes Factors cont. If we don’t have uniform priors, we can easily compare models i and j using odds ratios:   The Bayes factor is still:   The prior odds are:   The posterior odds are: So Bayes rule is:   eg. priors odds of 2 and Bayes factor of 10 gives posterior odds of 20 “20 to 1 ON” in bookmakers’ terms

Dilution of evidence If we had eight different hypotheses about connectivity, we could embody each hypothesis as a DCM and compare the evidence: Problem: “dilution of evidence” Similar models share the probability mass, making it hard for any one model to stand out Models 5 to 8 have ‘bottom-up’ connections Models 1 to 4 have ‘top-down’ connections

Family analysis Grouping models into families can help. Now, one family = one hypothesis. Family 1: four “top-down” DCMs Posterior family probability:   Family 2: four “bottom-up” DCMs Comparing a small number of models or a small number of families helps avoid the dilution of evidence problem

Family analysis

Generative model (DCM) 𝑚 time Timing of stimulus Generative model (DCM) 𝑚 What data would we expect to measure given this model and a particular setting of the parameters? Forward problem 𝑝(𝑦|𝑚,𝜃) Inverse Problem Given: Some data 𝑦 Prior beliefs 𝑝(𝜃) What setting of the parameters 𝑝 𝜃 𝑦,𝑚 maximises the model evidence 𝑝(𝑦|𝑚)? Parameter 𝜃 (𝑖) e.g. the strength of a connection Predicted data (e.g. ERP) Image credit: Marcin Wichary, Flickr