Multivariate models for fMRI data Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen) Translational Neuromodeling Unit (TNU)

Slides:

Advertisements

Similar presentations

J. Daunizeau Institute of Empirical Research in Economics, Zurich, Switzerland Brain and Spine Institute, Paris, France Bayesian inference.

Advertisements

Hierarchical Models and

Bayesian models for fMRI data

Multivariate analyses and decoding Kay H. Brodersen Computational Neuroeconomics Group Institute of Empirical Research in Economics University of Zurich.

Group analyses of fMRI data Methods & models for fMRI data analysis in neuroeconomics November 2010 Klaas Enno Stephan Laboratory for Social and Neural.

Classical inference and design efficiency Zurich SPM Course 2014

The General Linear Model (GLM)

J. Daunizeau Wellcome Trust Centre for Neuroimaging, London, UK Institute of Empirical Research in Economics, Zurich, Switzerland Bayesian inference.

Group analyses of fMRI data Methods & models for fMRI data analysis 28 April 2009 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.

Group analyses of fMRI data Methods & models for fMRI data analysis 26 November 2008 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.

Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.

Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data Rebecca Hutchinson, Tom Mitchell, Indra Rustandi Carnegie Mellon University.

Experimental Design Tali Sharot & Christian Kaul With slides taken from presentations by: Tor Wager Christian Ruff.

General Linear Model & Classical Inference

London, UK 21 May 2012 Janaina Mourao-Miranda, Machine Learning and Neuroimaging Lab, University College London, UK Pattern Recognition for Neuroimaging.

GUIDE to The… D U M M I E S’ DCM Velia Cardin. Functional Specialization is a question of Where? Where in the brain is a certain cognitive/perceptual.

General Linear Model & Classical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM M/EEGCourse London, May.

Dynamic Causal Modelling THEORY SPM Course FIL, London October 2009 Hanneke den Ouden Donders Centre for Cognitive Neuroimaging Radboud University.

From Localization to Connectivity and... Lei Sheu 1/11/2011.

Dynamic Causal Modelling

How To Do Multivariate Pattern Analysis

DCM Advanced, Part II Will Penny (Klaas Stephan) Wellcome Trust Centre for Neuroimaging Institute of Neurology University College London SPM Course 2014.

With many thanks for slides & images to: FIL Methods group, Virginia Flanagin and Klaas Enno Stephan Dr. Frederike Petzschner Translational Neuromodeling.

Dynamic Causal Modelling (DCM): Theory Demis Hassabis & Hanneke den Ouden Thanks to Klaas Enno Stephan Functional Imaging Lab Wellcome Dept. of Imaging.

SPM Course Zurich, February 2015 Group Analyses Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London With many thanks to.

FMRI Methods Lecture7 – Review: analyses & statistics.

Current work at UCL & KCL. Project aim: find the network of regions associated with pleasant and unpleasant stimuli and use this information to classify.

Group analyses of fMRI data Methods & models for fMRI data analysis November 2012 With many thanks for slides & images to: FIL Methods group, particularly.

Contrasts & Inference - EEG & MEG Himn Sabir 1. Topics 1 st level analysis 2 nd level analysis Space-Time SPMs Time-frequency analysis Conclusion 2.

Bayesian Inference and Posterior Probability Maps Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course,

Contrasts & Statistical Inference

Dynamic Causal Modelling Advanced Topics SPM Course (fMRI), May 2015 Peter Zeidman Wellcome Trust Centre for Neuroimaging University College London.

Neural Modeling - Fall NEURAL TRANSFORMATION Strategy to discover the Brain Functionality Biomedical engineering Group School of Electrical Engineering.

MVPD – Multivariate pattern decoding Christian Kaul MATLAB for Cognitive Neuroscience.

C O R P O R A T E T E C H N O L O G Y Information & Communications Neural Computation Machine Learning Methods on functional MRI Data Siemens AG Corporate.

Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston 18.

Methods for Dummies Second level Analysis (for fMRI) Chris Hardy, Alex Fellows Expert: Guillaume Flandin.

Statistical Analysis An Introduction to MRI Physics and Analysis Michael Jay Schillaci, PhD Monday, April 7 th, 2007.

1 st level analysis: Design matrix, contrasts, and inference Stephane De Brito & Fiona McNabe.

Accuracy, Reliability, and Validity of Freesurfer Measurements David H. Salat

The general linear model and Statistical Parametric Mapping II: GLM for fMRI Alexa Morcom and Stefan Kiebel, Rik Henson, Andrew Holmes & J-B Poline.

Bayesian Inference in fMRI Will Penny Bayesian Approaches in Neuroscience Karolinska Institutet, Stockholm February 2016.

The General Linear Model Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM fMRI Course London, October 2012.

Bayesian Model Selection and Averaging SPM for MEG/EEG course Peter Zeidman 17 th May 2016, 16:15-17:00.

Multivariate analysis (Machine learning) Supervised True answer is known Classification Answer is categorical Regression Answer is continuous (ordered.

Group Analyses Guillaume Flandin SPM Course London, October 2016

Representational Similarity Analysis

Effective Connectivity

Dynamic Causal Modelling (DCM): Theory

and Stefan Kiebel, Rik Henson, Andrew Holmes & J-B Poline

The General Linear Model (GLM)

Contrasts & Statistical Inference

The General Linear Model

Dynamic Causal Modelling

SPM2: Modelling and Inference

Dynamic Causal Modelling

Bayesian Methods in Brain Imaging

The General Linear Model

CRIS Workshop: Computational Neuroscience and Bayesian Modelling

Hierarchical Models and

The General Linear Model (GLM)

Effective Connectivity

Bayesian inference J. Daunizeau

Contrasts & Statistical Inference

MfD 04/12/18 Alice Accorroni – Elena Amoruso

The General Linear Model

The General Linear Model

Will Penny Wellcome Trust Centre for Neuroimaging,

Contrasts & Statistical Inference

Group DCM analysis for cognitive & clinical studies

Presentation transcript:

Multivariate models for fMRI data Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen) Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich

2 Univariate approaches are excellent for localizing activations in individual voxels. Why multivariate? v1v1 v2v2 v1v1 v2v2 rewardno reward * n.s.

3 Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels. Why multivariate? v1v1 v2v2 v1v1 v2v2 n.s. orange juiceapple juice v1v1 v2v2 n.s.

4 Multivariate approaches can utilize ‘hidden’ quantities such as coupling strengths. Why multivariate? observed BOLD signal hidden underlying neural activity and coupling strengths t t Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage

5 Overview 1Modelling principles 2Classification 3Multivariate Bayes 4Generative embedding

6 Overview 1Modelling principles 2Classification 3Multivariate Bayes 4Generative embedding

7 Encoding vs. decoding encoding model decoding model

8 Regression vs. classification Regression model independent variables (regressors) continuous dependent variable Classification model independent variables (features) categorical dependent variable (label) vs.

9 Univariate vs. multivariate models A univariate model considers a single voxel at a time. A multivariate model considers many voxels at once. Spatial dependencies between voxels are only introduced afterwards, through random field theory. Multivariate models enable inferences on distributed responses without requiring focal activations.

10 Prediction vs. inference The goal of prediction is to find a generalisable encoding or decoding function. The goal of inference is to decide between competing hypotheses. predicting a cognitive state using a brain-machine interface predicting a subject-specific diagnostic status comparing a model that links distributed neuronal activity to a cognitive state with a model that does not weighing the evidence for sparse vs. distributed coding

11 Goodness of fit vs. complexity Goodness of fit is the degree to which a model explains observed data. Complexity is the flexibility of a model (including, but not limited to, its number of parameters). 4 parameters9 parameters Bishop (2007) PRML 1 parameter We wish to find the model that optimally trades off goodness of fit and complexity. underfitting overfitting optimal truth data model

12 Summary of modelling terminology

13 Overview 1Modelling principles 2Classification 3Multivariate Bayes 4Generative embedding

14 A principled way of designing a classifier would be to adopt a probabilistic approach: Constructing a classifier In practice, classifiers differ in terms of how strictly they implement this principle.

15 Common types of fMRI classification studies Nandy & Cordes (2003) MRM Kriegeskorte et al. (2006) PNAS Mourao-Miranda et al. (2005) NeuroImage

16 Support vector machine (SVM) Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press Nonlinear SVM Linear SVM v1v1 v2v2

17 Stages in a classification analysis Feature extraction Classification using cross- validation Performance evaluation mixed effects

18 We can obtain trial-wise estimates of neural activity by filtering the data with a GLM. Feature extraction for trial-by-trial classification coefficients Boxcar regressor for trial 2 Estimate of this coefficient reflects activity on trial 2

19 The generalization ability of a classifier can be estimated using a resampling procedure known as cross-validation. One example is 2-fold cross-validation: Cross-validation examples ? ? training example test examples folds ? ? ? ? 1... ? ? ? ? ? ? 2 performance evaluation

20 A more commonly used variant is leave-one-out cross-validation. Cross-validation examples ? ? training example test example ? ? ? ? ? ? folds ? ? 1... ? ? 2 performance evaluation

21 Performance evaluation subject trial 1 …

22 Performance evaluation subject trial 1 … population subject 2subject 3subject 4 … …

23 Performance evaluation fixed effects random effects fixed effects mixed effects Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (2012) JMLR Brodersen, Daunizeau, Mathys, Chumbley, Buhmann, Stephan (2013) NeuroImage available for MATLAB and R

24 Research questions for classification Temporal evolution of discriminabilityModel-based classification accuracy 50 % 100 % within-trial time Accuracy rises above chance Participant indicates decision Overall classification accuracySpatial deployment of discriminative regions 80% 55% accuracy 50 % 100 % classification task Truth or lie? Left or right button? Healthy or ill? Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection { group 1, group 2 }

25  Multivariate classification studies conduct group tests on single-subject summary statistics that  discard the sign or direction of underlying effects  do not necessarily take into account confounding effects (e.g. correlation of task conditions with difficulty etc.)  Therefore, in some analyses confounds rather than distributed representations may have produced positive results. Potential problems Todd et al. 2013, NeuroImage

26  Simulation: Experiment condition (rule A vs. rule B) does not affect voxel activity, but difficulty does. Moreover, experiment condition and difficulty are confounded at the individual-subject level in random directions across 500 subjects. Potential problems Todd et al. 2013, NeuroImage

27  Empirical example: searchlight analysis: fMRI study on rule representations (flexible stimulus–response mappings  Standard MVPA: rule representations in prefrontal regions.  GLM: no significant results.  Controlling for a variable that is confounded with rule at the individual- subject level but not the group level (reaction time differences across rules) eliminates the MVPA results. Potential problems Todd et al. 2013, NeuroImage

28 Overview 1Modelling principles 2Classification 3Multivariate Bayes 4Generative embedding

29 Multivariate Bayes Mike West SPM brings multivariate analyses into the conventional inference framework of hierarchical Bayesian models and their inversion.

30 Multivariate Bayes decoding model Multivariate analyses in SPM rest on the central notion that inferences about how the brain represents things can be reduced to model comparison. vs. sparse coding in orbitofrontal cortex distributed coding in prefrontal cortex

31 From encoding to decoding Encoding model: GLMDecoding model: MVB

32 Multivariate Bayes Friston et al NeuroImage

33  Encoding model:  neuronal activity = linear mixture of causes: A=X  BOLD data matrix is a temporal convolution of underlying neuronal activity: =A++  Decoding model:  mental state is a linear mixture of voxel-wise activity: =A thus A=X -1  substitution into gives: Y =X ⇒ Y =X++ ⇒ X= Y--  Importantly, we can remove the influence of confounds by premultiplying with the appropriate residual forming matrix:  RTX= RY+ MVB: details

34  To make inversion tractable:  imposing priors on (i.e., assumption about how mental states are represented by voxel-wise activity)  Scientific inquiry:  model selection: comparing different priors to identify which neuronal representation of mental states is most plausible MVB: details

35 Specifying the prior for MVB To make the ill-posed regression problem tractable, MVB uses a prior on voxel weights. Different priors reflect different anatomical and/or coding hypotheses. For example: Voxel 3 is allowed to play a role, but only if its neighbours play similar roles. Voxel 2 is allowed to play a role. Friston et al NeuroImage

36 MVB can be illustrated using SPM’s attention- to-motion example dataset. This dataset is based on a simple block design. There are three experimental factors:  photic– display shows random dots  motion– dots are moving  attention– subjects asked to pay attention Example: decoding motion from visual cortex scans photicmotionattentionconst Büchel & Friston 1999 Cerebral Cortex Friston et al NeuroImage During these scans, for example, subjects were passively viewing moving dots.

37 Multivariate Bayes in SPM Step 1 After having specified and estimated a model, use the Results button. Step 2 Select the contrast to be decoded.

38 Multivariate Bayes in SPM Step 3 Pick a region of interest.

39 Multivariate Bayes in SPM Step 4 Multivariate Bayes can be invoked from within the Multivariate section. Step 5 Here, the region of interest is specified as a sphere around the cursor. The spatial prior implements a sparse coding hypothesis. anatomical hypothesis coding hypothesis

40 Multivariate Bayes in SPM Step 6 Results can be displayed using the BMS button.

41 Model evidence and voxel weights log BF = 3

42 Summary: research questions for MVB How does the brain represent things? Evaluating competing coding hypotheses Where does the brain represent things? Evaluating competing anatomical hypotheses

43 Overview 1Modelling principles 2Classification 3Multivariate Bayes 4Generative embedding

44 Model-based classification Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImage Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol step 2 — embedding step 1 — modelling measurements from an individual subject subject-specific generative model subject representation in model-based feature space A → B A → C B → B B → C A C B step 3 — classification A C B jointly discriminative connection strengths? step 5 — interpretation classification model 1 0 discriminability of groups? accuracy step 4 — evaluation

45 Model-based classification: model specification anatomical regions of interest y = –26 mm LR

46 Model-based classification patients controls activation- based correlation- based model- based acspmezoflr balanced accuracy 100% 50% 90% 80% 70% 60% n.s. *

47 Generative embedding generative embedding Parameter 3 Voxel 3 Parameter 1 Voxel 1 Voxel 2 Parameter 2 controls patients Voxel-based activity spaceModel-based parameter space classification accuracy 98 % classification accuracy 75 %

48 Model-based classification: interpretation MGB PT HG (A1) MGB PT HG (A1) stimulus input LR

49 Model-based classification: interpretation MGB PT HG (A1) MGB PT HG (A1) stimulus input LR highly discriminative somewhat discriminative not discriminative

50 Model-based clustering Deserno, Sterzer, Wüstenberg, Heinz, & Schlagenhauf (2012) J Neurosci PC dLPFC VC WM stimulus fMRI data acquired during working-memory task & modelled using a three-region DCM 42 patients diagnosed with schizophrenia 41 healthy controls

51 Model-based clustering model selectioninterpretationvalidation Brodersen et al. 2014, NeuroImage: Clinical

52 Summary Classification to assess whether a cognitive state is linked to patterns of activity to visualize the spatial deployment of discriminative activity Multivariate Bayes to evaluate competing anatomical hypotheses to evaluate competing coding hypotheses Generative embedding to assess whether groups differ in terms of model parameter estimates (connectivity) to generate mechanistic subgroup hypotheses

53 Thank you

54

55 Classification  Pereira, F., Mitchell, T., & Botvinick, M. (2009). Machine learning classifiers and fMRI: A tutorial overview. NeuroImage, 45(1, Supplement 1), S199-S209.  O'Toole, A. J., Jiang, F., Abdi, H., Penard, N., Dunlop, J. P., & Parent, M. A. (2007). Theoretical, Statistical, and Practical Perspectives on Pattern-based Classification Approaches to the Analysis of Functional Neuroimaging Data. Journal of Cognitive Neuroscience, 19(11),  Haynes, J., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7),  Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9),  Brodersen, K. H., Haiss, F., Ong, C., Jung, F., Tittgemeyer, M., Buhmann, J., Weber, B., et al. (2010). Model-based feature construction for multivariate decoding. NeuroImage (2010).  Brodersen, K. H., Ong, C., Stephan, K. E., Buhmann, J. (2010). The balanced accuracy and its posterior distribution. ICPR, Multivariate Bayes  Friston, K., Chu, C., Mourao-Miranda, J., Hulme, O., Rees, G., Penny, W., et al. (2008). Bayesian decoding of brain images. NeuroImage, 39(1), Further reading

56 Multivariate approaches can exploit a sampling bias in voxelized images to reveal interesting activity on a subvoxel scale. Why multivariate? Boynton (2005) Nature Neuroscience

57 The last 10 years have seen a notable increase in decoding analyses in neuroimaging. Why multivariate? Haxby et al. (2001) ScienceLautrup et al. (1994) Supercomputing in Brain Research PET prediction

58 Which brain regions are jointly informative of a cognitive state of interest? Spatial deployment of informative regions Nandy & Cordes (2003) MRM Kriegeskorte et al. (2006) PNAS Mourao-Miranda et al. (2005) NeuroImage

59  Classification induces constraints on the experimental design.  When estimating trial-wise Beta values, we need longer ITIs (typically 8 – 15 s).  At the same time, we need many trials (typically 100+).  Classes should be balanced. If they are imbalanced, we can resample the training set, constrain the classifier, or report the balanced accuracy.  Construction of examples  Estimation of Beta images is the preferred approach.  Covariates should be included in the trial-by-trial design matrix.  Temporal autocorrelation  In trial-by-trial classification, exclude trials around the test trial from the training set.  Avoiding double-dipping  Any feature selection and tuning of classifier settings should be carried out on the training set only.  Performance evaluation  Do random-effects or mixed-effects inference.  Correct for multiple tests. Issues to be aware of (as researcher or reviewer)

60  Evaluating the performance of a classification algorithm critically requires a measure of the degree to which unseen examples have been identified with their correct class labels.  The procedure of averaging across accuracies obtained on individual cross-validation folds is flawed in two ways. First, it does not allow for the derivation of a meaningful confidence interval. Second,it leads to an optimistic estimate when a biased classifier is tested on an imbalanced dataset.  Both problems can be overcome by replacing the conventional point estimate of accuracy by an estimate of the posterior distribution of the balanced accuracy. Performance evaluation Brodersen, Ong, Buhmann, Stephan (2010) ICPR

61 Pattern characterization Example – decoding the identity of the person speaking to the subject in the scanner Formisano et al. (2008) Science voxel 1 voxel 2... fingerprint plot (one plot per class)

62 Lessons from the Neyman-Pearson lemma Neyman & Person (1933) Phil Trans Roy Soc London

63 Lessons from the Neyman-Pearson lemma Neyman & Person (1933) Phil Trans Roy Soc London Kass & Raftery (1995) J Am Stat Assoc Friston et al. (2009) NeuroImage

64 Temporal evolution of discriminability Soon et al. (2008) Nature Neuroscience Example – decoding which button the subject pressed motor cortex frontopolar cortex classification accuracy decision response

65 Approach 1. estimation of an encoding model 2. nearest-neighbour classification or voting Identification / inferring a representational space Mitchell et al. (2008) Science

66 Approach 1. estimation of an encoding model 2. model inversion Reconstruction / optimal decoding Paninski et al. (2007) Progr Brain Res Pillow et al. (2008) Nature Miyawaki et al. (2009) Neuron

67 Recent MVB studies

68 Specifying the prior for MVB Voxel 3 is allowed to play a role, but only if its neighbours play weaker roles. Voxel 2 is allowed to play a role.

69 Inverting the model Partition #1 Partition #2 Partition #3 (optimal) Pattern 8 is allowed to make some contribution, independently of the contributions of other patterns. Pattern 3 makes an important positive contribution / is activated.

70 Observations vs. predictions (motion)

71 MVB may outperform conventional point classifiers when using a more appropriate coding hypothesis. Using MVB for point classification Support Vector Machine

72 Model-based analyses by data representation Model-based analyses How do patterns of hidden quantities (e.g., connectivity among brain regions) differ between groups? Structure-based analyses Which anatomical structures allow us to separate patients and healthy controls? Activation-based analyses Which functional differences allow us to separate groups?

73 Model-based clustering functional connectivity effective connectivity 78% 71% best model 71 % supervised learning: SVM classification unsupervised learning: GMM clustering (using effective connectivity) 62 % 78 % 55 % regional activity Brodersen, Deserno, Schlagenhauf, Penny, Lin, Gupta, Buhmann, Stephan (in preparation)

74 Generative embedding and DCM ? ? { patient, control }

75 Model-based classification using DCM structure-based classification activation-based classification model-based classification model selection inference on model parameters vs. ? { group 1, group 2 }