Bayesian models for fMRI data

Slides:



Advertisements
Similar presentations
The General Linear Model (GLM)
Advertisements

J. Daunizeau Institute of Empirical Research in Economics, Zurich, Switzerland Brain and Spine Institute, Paris, France Bayesian inference.
Bayesian inference Lee Harrison York Neuroimaging Centre 01 / 05 / 2009.
Bayesian inference Jean Daunizeau Wellcome Trust Centre for Neuroimaging 16 / 05 / 2008.
SPM – introduction & orientation introduction to the SPM software and resources introduction to the SPM software and resources.
Statistical Inference
Hierarchical Models and
Overview of SPM p <0.05 Statistical parametric map (SPM)
SPM Course Zurich, February 2012 Statistical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London.
SPM 2002 Experimental Design Daniel Glaser Institute of Cognitive Neuroscience, UCL Slides from: Rik Henson, Christian Buchel, Karl Friston, Chris Frith,
SPM Software & Resources Wellcome Trust Centre for Neuroimaging University College London SPM Course London, May 2011.
SPM Software & Resources Wellcome Trust Centre for Neuroimaging University College London SPM Course London, October 2008.
SPM for EEG/MEG Guillaume Flandin
Event-related fMRI (er-fMRI)
Group Analyses Guillaume Flandin SPM Course Zurich, February 2014
Experimental design of fMRI studies SPM Course 2014 Sandra Iglesias Translational Neuromodeling Unit University of Zurich & ETH Zurich With many thanks.
Experimental design of fMRI studies Methods & models for fMRI data analysis in neuroeconomics April 2010 Klaas Enno Stephan Laboratory for Social and Neural.
Experimental design of fMRI studies SPM Course 2012 Sandra Iglesias Translational Neuromodeling Unit University of Zurich & ETH Zurich With many thanks.
Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL SPM Course London, May 12, 2014 Thanks to Jean Daunizeau and Jérémie Mattout.
Group analyses of fMRI data Methods & models for fMRI data analysis in neuroeconomics November 2010 Klaas Enno Stephan Laboratory for Social and Neural.
Classical inference and design efficiency Zurich SPM Course 2014
The General Linear Model (GLM) Methods & models for fMRI data analysis in neuroeconomics November 2010 Klaas Enno Stephan Laboratory for Social & Neural.
Bayesian models for fMRI data
Bayesian models for fMRI data Methods & models for fMRI data analysis 06 May 2009 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
The General Linear Model (GLM)
The General Linear Model (GLM) SPM Course 2010 University of Zurich, February 2010 Klaas Enno Stephan Laboratory for Social & Neural Systems Research.
Methods & models for fMRI data analysis: Information on the exam (HS 2008) Klaas Enno Stephan Laboratory for Social & Neural Systems Research Institute.
J. Daunizeau Wellcome Trust Centre for Neuroimaging, London, UK Institute of Empirical Research in Economics, Zurich, Switzerland Bayesian inference.
Group analyses of fMRI data Methods & models for fMRI data analysis 28 April 2009 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
Multiple comparison correction Methods & models for fMRI data analysis 29 October 2008 Klaas Enno Stephan Branco Weiss Laboratory (BWL) Institute for Empirical.
Group analyses of fMRI data Methods & models for fMRI data analysis 26 November 2008 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
With many thanks for slides & images to: FIL Methods group, Virginia Flanagin and Klaas Enno Stephan Dr. Frederike Petzschner Translational Neuromodeling.
Dynamic Causal Modelling (DCM): Theory Demis Hassabis & Hanneke den Ouden Thanks to Klaas Enno Stephan Functional Imaging Lab Wellcome Dept. of Imaging.
Group analyses of fMRI data Methods & models for fMRI data analysis November 2012 With many thanks for slides & images to: FIL Methods group, particularly.
EEG/MEG Source Localisation SPM Course – Wellcome Trust Centre for Neuroimaging – Oct ? ? Jérémie Mattout, Christophe Phillips Jean Daunizeau Guillaume.
Bayesian Inference and Posterior Probability Maps Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course,
Experimental design of fMRI studies Methods & models for fMRI data analysis 01 November 2013 Klaas Enno Stephan Translational Neuromodeling Unit (TNU)
Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly.
Methods & models for fMRI data analysis – HS 2013 David Cole Andrea Diaconescu Jakob Heinzle Sandra Iglesias Sudhir Shankar Raman Klaas Enno Stephan.
Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston 18.
Dynamic Causal Model for evoked responses in MEG/EEG Rosalyn Moran.
The General Linear Model Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM fMRI Course London, May 2012.
Bayesian Methods Will Penny and Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course, London, May 12.
Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.
Mixture Models with Adaptive Spatial Priors Will Penny Karl Friston Acknowledgments: Stefan Kiebel and John Ashburner The Wellcome Department of Imaging.
Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course Zurich, February 2008 Bayesian Inference.
Bayesian Inference in SPM2 Will Penny K. Friston, J. Ashburner, J.-B. Poline, R. Henson, S. Kiebel, D. Glaser Wellcome Department of Imaging Neuroscience,
The General Linear Model Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM fMRI Course London, October 2012.
J. Daunizeau ICM, Paris, France TNU, Zurich, Switzerland
The General Linear Model (GLM)
The General Linear Model (GLM)
The general linear model and Statistical Parametric Mapping
Bayesian Inference Will Penny
Wellcome Trust Centre for Neuroimaging University College London
Dynamic Causal Modelling (DCM): Theory
Computational models for imaging analyses
The General Linear Model (GLM)
SPM2: Modelling and Inference
Hierarchical Models and
The General Linear Model (GLM)
Bayesian inference J. Daunizeau
Wellcome Centre for Neuroimaging at UCL
Bayesian Inference in SPM2
Wellcome Centre for Neuroimaging, UCL, UK.
The General Linear Model
Mixture Models with Adaptive Spatial Priors
The General Linear Model (GLM)
The General Linear Model
Will Penny Wellcome Trust Centre for Neuroimaging,
Presentation transcript:

Bayesian models for fMRI data Klaas Enno Stephan Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering, University of Zurich & ETH Zurich Laboratory for Social & Neural Systems Research (SNS), University of Zurich Wellcome Trust Centre for Neuroimaging, University College London With many thanks for slides & images to: FIL Methods group, particularly Guillaume Flandin and Jean Daunizeau The Reverend Thomas Bayes (1702-1761) SPM Course Zurich 13-15 February 2013

Bayes‘ Theorem Posterior Likelihood Prior Evidence Reverend Thomas Bayes 1702 - 1761 “Bayes‘ Theorem describes, how an ideally rational person processes information." Wikipedia

Bayes’ Theorem Given data y and parameters , the joint probability is: Eliminating p(y,) gives Bayes’ rule: Likelihood Prior Posterior Evidence

Bayesian inference: an animation

Principles of Bayesian inference Formulation of a generative model likelihood p(y|) prior distribution p() Observation of data y Update of beliefs based upon observations, given a prior state of knowledge

Posterior mean & variance of univariate Gaussians Likelihood & Prior Posterior Posterior: Likelihood Prior Posterior mean = variance-weighted combination of prior mean and data mean

Same thing – but expressed as precision weighting Likelihood & prior Posterior Posterior: Likelihood Prior Relative precision weighting

Same thing – but explicit hierarchical perspective Likelihood & Prior Posterior Posterior Likelihood Prior Relative precision weighting

 Why should I know about Bayesian stats? Because Bayesian principles are fundamental for statistical inference in general sophisticated analyses of (neuronal) systems contemporary theories of brain function

Problems of classical (frequentist) statistics p-value: probability of observing data in the effect’s absence Limitations: One can never accept the null hypothesis Given enough data, one can always demonstrate a significant effect Correction for multiple comparisons necessary Solution: infer posterior probability of the effect

posterior distribution Generative models: Forward and inverse problems forward problem likelihood  prior inverse problem posterior distribution

Dynamic causal modeling (DCM) EEG, MEG fMRI Forward model: Predicting measured activity given a putative neuronal state Model inversion: Estimating neuronal mechanisms from brain activity measures Friston et al. (2003) NeuroImage

sensations – predictions The Bayesian brain hypothesis & free-energy principle sensations – predictions Prediction error Change sensory input Change predictions Action Perception Maximizing the evidence (of the brain's generative model) = minimizing the surprise about the data (sensory inputs). Friston et al. 2006, J Physiol Paris

Individual hierarchical Bayesian learning volatility associations events in the world sensory stimuli Mathys et al. 2011, Front. Hum. Neurosci.

Aberrant Bayesian message passing in schizophrenia: abnormal (precision-weighted) prediction errors  abnormal modulation of NMDAR-dependent synaptic plasticity at forward connections of cortical hierarchies Backward & lateral input Forward & lateral g: generative model : expectation of approximate recognition density : parameters of generative model (= connection strengths between levels) : hyperparameters (= parameters encoding the uncertainty of the approximate recognition model) : prediction error Forward recognition effects De-correlating lateral interactions Backward generation effects Lateral interactions mediating priors Stephan et al. 2006, Biol. Psychiatry

 Why should I know about Bayesian stats? Because SPM is getting more and more Bayesian: Segmentation & spatial normalisation Posterior probability maps (PPMs) 1st level: specific spatial priors 2nd level: global spatial priors Dynamic Causal Modelling (DCM) Bayesian Model Selection (BMS) EEG: source reconstruction

Bayesian segmentation Posterior probability and normalisation Spatial priors on activation extent Posterior probability maps (PPMs) Dynamic Causal Modelling Image time-series Statistical parametric map (SPM) Kernel Design matrix Realignment Smoothing General linear model Statistical inference Gaussian field theory Normalisation p <0.05 Template Parameter estimates

Spatial normalisation: Bayesian regularisation Deformations consist of a linear combination of smooth basis functions (3D DCT). Find maximum a posteriori (MAP) estimates: Deformation parameters MAP: “Difference” between template and source image Squared distance between parameters and their expected values (regularisation)

Spatial normalisation: overfitting Affine registration. (2 = 472.1) Template image Non-linear registration without regularisation. (2 = 287.3) Non-linear registration using regularisation. (2 = 302.7)

Bayesian segmentation with empirical priors Goal: for each voxel, compute probability that it belongs to a particular tissue type, given its intensity Likelihood: Intensities are modelled by a mixture of Gaussian distributions representing different tissue classes (e.g. GM, WM, CSF). Priors: obtained from tissue probability maps (segmented images of 151 subjects). p (tissue | intensity) p (intensity | tissue) ∙ p (tissue) Ashburner & Friston 2005, NeuroImage

Bayesian fMRI analyses General Linear Model: with What are the priors? In “classical” SPM, no priors (= “flat” priors) Full Bayes: priors are predefined Empirical Bayes: priors are estimated from the data, assuming a hierarchical generative model Parameters of one level = priors for distribution of parameters at lower level Parameters and hyperparameters at each level can be estimated using EM

Posterior Probability Maps (PPMs) Posterior distribution: probability of the effect given the data mean: size of effect precision: variability Posterior probability map: images of the probability that an activation exceeds some specified threshold, given the data y Two thresholds: activation threshold : percentage of whole brain mean signal probability  that voxels must exceed to be displayed (e.g. 95%)

2nd level PPMs with global priors 1st level (GLM): 2nd level (shrinkage prior): Heuristically: use the variance of mean-corrected activity over voxels as prior variance of  at any particular voxel. (1) reflects regionally specific effects  assume that it is zero on average over voxels  variance of this prior is implicitly estimated by estimating (2) In the absence of evidence to the contrary, parameters will shrink to zero.

2nd level PPMs with global priors 1st level (GLM): voxel-specific 2nd level (shrinkage prior): global  pooled estimate over voxels Compute Cε and C via ReML/EM, and apply the usual rule for computing posterior mean & covariance for Gaussians: Friston & Penny 2003, NeuroImage

PPMs vs. SPMs PPMs Posterior Likelihood Prior SPMs Bayesian test: Classical t-test:

PPMs and multiple comparisons Friston & Penny (2003): No need to correct for multiple comparisons: Thresholding a PPM at 95% confidence: in every voxel, the posterior probability of an activation  is  95%. At most, 5% of the voxels identified could have activations less than . Independent of the search volume, thresholding a PPM thus puts an upper bound on the false discovery rate. NB: being debated

PPMs vs.SPMs PPMs: Show activations greater than a given size SPMs: Show voxels with non-zero activations

PPMs: pros and cons Advantages Disadvantages One can infer that a cause did not elicit a response Inference is independent of search volume do not conflate effect-size and effect-variability Estimating priors over voxels is computationally demanding Practical benefits are yet to be established Thresholds other than zero require justification

Model comparison and selection Given competing hypotheses on structure & functional mechanisms of a system, which model is the best? Pitt & Miyung (2002) TICS Which model represents the best balance between model fit and model complexity? For which model m does p(y|m) become maximal?

Bayesian model selection (BMS) Model evidence: Gharamani, 2004 p(y|m) y all possible datasets accounts for both accuracy and complexity of the model Various approximations, e.g.: negative free energy, AIC, BIC a measure of generalizability McKay 1992, Neural Comput. Penny et al. 2004a, NeuroImage

Approximations to the model evidence Logarithm is a monotonic function Maximizing log model evidence = Maximizing model evidence Log model evidence = balance between fit and complexity No. of parameters In SPM2 & SPM5, interface offers 2 approximations: No. of data points Akaike Information Criterion: Bayesian Information Criterion: Penny et al. 2004a, NeuroImage

The (negative) free energy approximation Under Gaussian assumptions about the posterior (Laplace approximation):

The complexity term in F In contrast to AIC & BIC, the complexity term of the negative free energy F accounts for parameter interdependencies. The complexity term of F is higher the more independent the prior parameters ( effective DFs) the more dependent the posterior parameters the more the posterior mean deviates from the prior mean NB: Since SPM8, only F is used for model selection !

Bayes factors To compare two models, we could just compare their log evidences. But: the log evidence is just some number – not very intuitive! A more intuitive interpretation of model comparisons is made possible by Bayes factors: positive value, [0;[ B12 p(m1|y) Evidence 1 to 3 50-75% weak 3 to 20 75-95% positive 20 to 150 95-99% strong  150  99% Very strong Kass & Raftery classification: Kass & Raftery 1995, J. Am. Stat. Assoc.

BMS in SPM8: an example M1 M2 M3 M4 attention PPC PPC BF 2966 M2 better than M1 attention stim V1 V5 stim V1 V5 M1 M2 M3 M4 V1 V5 stim PPC M3 attention M3 better than M2 BF  12 F = 2.450 Posterior model probability in lower plot is a normalised probability: p(m_i|y) = p(y|m_i)/sum(p(y|m_i)) Note that under flat model priors p(m_i|y) = p(y|m_i) V1 V5 stim PPC M4 attention M4 better than M3 BF  23 F = 3.144

Thank you