Multivariate models for fMRI data Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen) Translational Neuromodeling Unit (TNU)

Multivariate models for fMRI data Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen) Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich

2 Univariate approaches are excellent for localizing activations in individual voxels. Why multivariate? v1v1 v2v2 v1v1 v2v2 rewardno reward * n.s.

3 Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels. Why multivariate? v1v1 v2v2 v1v1 v2v2 n.s. orange juiceapple juice v1v1 v2v2 n.s.

4 Multivariate approaches can utilize ‘hidden’ quantities such as coupling strengths. Why multivariate? observed BOLD signal hidden underlying neural activity and coupling strengths t t Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage

5 Overview 1Modelling principles 2Classification 3Multivariate Bayes 4Generative embedding

7 Encoding vs. decoding encoding model decoding model

8 Regression vs. classification Regression model independent variables (regressors) continuous dependent variable Classification model independent variables (features) categorical dependent variable (label) vs.

9 Univariate vs. multivariate models A univariate model considers a single voxel at a time. A multivariate model considers many voxels at once. Spatial dependencies between voxels are only introduced afterwards, through random field theory. Multivariate models enable inferences on distributed responses without requiring focal activations.

10 Prediction vs. inference The goal of prediction is to find a generalisable encoding or decoding function. The goal of inference is to decide between competing hypotheses. predicting a cognitive state using a brain-machine interface predicting a subject-specific diagnostic status comparing a model that links distributed neuronal activity to a cognitive state with a model that does not weighing the evidence for sparse vs. distributed coding

11 Goodness of fit vs. complexity Goodness of fit is the degree to which a model explains observed data. Complexity is the flexibility of a model (including, but not limited to, its number of parameters). 4 parameters9 parameters Bishop (2007) PRML 1 parameter We wish to find the model that optimally trades off goodness of fit and complexity. underfitting overfitting optimal truth data model

12 Summary of modelling terminology

14 A principled way of designing a classifier would be to adopt a probabilistic approach: Constructing a classifier In practice, classifiers differ in terms of how strictly they implement this principle.

15 Common types of fMRI classification studies Nandy & Cordes (2003) MRM Kriegeskorte et al. (2006) PNAS Mourao-Miranda et al. (2005) NeuroImage

16 Support vector machine (SVM) Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press Nonlinear SVM Linear SVM v1v1 v2v2

17 Stages in a classification analysis Feature extraction Classification using cross- validation Performance evaluation mixed effects

18 We can obtain trial-wise estimates of neural activity by filtering the data with a GLM. Feature extraction for trial-by-trial classification coefficients Boxcar regressor for trial 2 Estimate of this coefficient reflects activity on trial 2

19 The generalization ability of a classifier can be estimated using a resampling procedure known as cross-validation. One example is 2-fold cross-validation: Cross-validation examples 1 2 3 99 100 ? ? training example test examples folds ? ? ? ? 1... ? ? ? ? ? ? 2 performance evaluation

20 A more commonly used variant is leave-one-out cross-validation. Cross-validation examples 1 2 3 99 100 ? ? training example test example ? ?... 98 ? ?... 99 ? ?... 100... folds ? ? 1... ? ? 2 performance evaluation

21 Performance evaluation subject - + - - + trial 1 …

22 Performance evaluation subject 1 - + - - + trial 1 … population subject 2subject 3subject 4 … …

23 Performance evaluation fixed effects random effects fixed effects mixed effects Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (2012) JMLR Brodersen, Daunizeau, Mathys, Chumbley, Buhmann, Stephan (2013) NeuroImage available for MATLAB and R

24 Research questions for classification Temporal evolution of discriminabilityModel-based classification accuracy 50 % 100 % within-trial time Accuracy rises above chance Participant indicates decision Overall classification accuracySpatial deployment of discriminative regions 80% 55% accuracy 50 % 100 % classification task Truth or lie? Left or right button? Healthy or ill? Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection { group 1, group 2 }

25  Multivariate classification studies conduct group tests on single-subject summary statistics that  discard the sign or direction of underlying effects  do not necessarily take into account confounding effects (e.g. correlation of task conditions with difficulty etc.)  Therefore, in some analyses confounds rather than distributed representations may have produced positive results. Potential problems Todd et al. 2013, NeuroImage

26  Simulation: Experiment condition (rule A vs. rule B) does not affect voxel activity, but difficulty does. Moreover, experiment condition and difficulty are confounded at the individual-subject level in random directions across 500 subjects. Potential problems Todd et al. 2013, NeuroImage

27  Empirical example: searchlight analysis: fMRI study on rule representations (flexible stimulus–response mappings  Standard MVPA: rule representations in prefrontal regions.  GLM: no significant results.  Controlling for a variable that is confounded with rule at the individual- subject level but not the group level (reaction time differences across rules) eliminates the MVPA results. Potential problems Todd et al. 2013, NeuroImage

29 Multivariate Bayes Mike West SPM brings multivariate analyses into the conventional inference framework of hierarchical Bayesian models and their inversion.

30 Multivariate Bayes decoding model Multivariate analyses in SPM rest on the central notion that inferences about how the brain represents things can be reduced to model comparison. vs. sparse coding in orbitofrontal cortex distributed coding in prefrontal cortex

31 From encoding to decoding Encoding model: GLMDecoding model: MVB

32 Multivariate Bayes Friston et al. 2008 NeuroImage

33  Encoding model:  neuronal activity = linear mixture of causes: A=X  BOLD data matrix is a temporal convolution of underlying neuronal activity: =A++  Decoding model:  mental state is a linear mixture of voxel-wise activity: =A thus A=X -1  substitution into gives: Y =X -1 ++ ⇒ Y =X++ ⇒ X= Y--  Importantly, we can remove the influence of confounds by premultiplying with the appropriate residual forming matrix:  RTX= RY+ MVB: details

34  To make inversion tractable:  imposing priors on (i.e., assumption about how mental states are represented by voxel-wise activity)  Scientific inquiry:  model selection: comparing different priors to identify which neuronal representation of mental states is most plausible MVB: details

35 Specifying the prior for MVB To make the ill-posed regression problem tractable, MVB uses a prior on voxel weights. Different priors reflect different anatomical and/or coding hypotheses. For example: Voxel 3 is allowed to play a role, but only if its neighbours play similar roles. Voxel 2 is allowed to play a role. Friston et al. 2008 NeuroImage

36 MVB can be illustrated using SPM’s attention- to-motion example dataset. This dataset is based on a simple block design. There are three experimental factors:  photic– display shows random dots  motion– dots are moving  attention– subjects asked to pay attention Example: decoding motion from visual cortex scans photicmotionattentionconst Büchel & Friston 1999 Cerebral Cortex Friston et al. 2008 NeuroImage During these scans, for example, subjects were passively viewing moving dots.

37 Multivariate Bayes in SPM Step 1 After having specified and estimated a model, use the Results button. Step 2 Select the contrast to be decoded.

38 Multivariate Bayes in SPM Step 3 Pick a region of interest.

39 Multivariate Bayes in SPM Step 4 Multivariate Bayes can be invoked from within the Multivariate section. Step 5 Here, the region of interest is specified as a sphere around the cursor. The spatial prior implements a sparse coding hypothesis. anatomical hypothesis coding hypothesis

40 Multivariate Bayes in SPM Step 6 Results can be displayed using the BMS button.

41 Model evidence and voxel weights log BF = 3

42 Summary: research questions for MVB How does the brain represent things? Evaluating competing coding hypotheses Where does the brain represent things? Evaluating competing anatomical hypotheses

44 Model-based classification Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImage Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol step 2 — embedding step 1 — modelling measurements from an individual subject subject-specific generative model subject representation in model-based feature space A → B A → C B → B B → C A C B step 3 — classification A C B jointly discriminative connection strengths? step 5 — interpretation classification model 1 0 discriminability of groups? accuracy step 4 — evaluation

45 Model-based classification: model specification anatomical regions of interest y = –26 mm LR

46 Model-based classification patients controls activation- based correlation- based model- based acspmezoflr balanced accuracy 100% 50% 90% 80% 70% 60% n.s. *

47 Generative embedding generative embedding Parameter 3 Voxel 3 Parameter 1 Voxel 1 Voxel 2 Parameter 2 controls patients Voxel-based activity spaceModel-based parameter space classification accuracy 98 % classification accuracy 75 %

48 Model-based classification: interpretation MGB PT HG (A1) MGB PT HG (A1) stimulus input LR

49 Model-based classification: interpretation MGB PT HG (A1) MGB PT HG (A1) stimulus input LR highly discriminative somewhat discriminative not discriminative

50 Model-based clustering Deserno, Sterzer, Wüstenberg, Heinz, & Schlagenhauf (2012) J Neurosci PC dLPFC VC WM stimulus fMRI data acquired during working-memory task & modelled using a three-region DCM 42 patients diagnosed with schizophrenia 41 healthy controls

51 Model-based clustering model selectioninterpretationvalidation Brodersen et al. 2014, NeuroImage: Clinical

52 Summary Classification to assess whether a cognitive state is linked to patterns of activity to visualize the spatial deployment of discriminative activity Multivariate Bayes to evaluate competing anatomical hypotheses to evaluate competing coding hypotheses Generative embedding to assess whether groups differ in terms of model parameter estimates (connectivity) to generate mechanistic subgroup hypotheses

53 Thank you

55 Classification  Pereira, F., Mitchell, T., & Botvinick, M. (2009). Machine learning classifiers and fMRI: A tutorial overview. NeuroImage, 45(1, Supplement 1), S199-S209.  O'Toole, A. J., Jiang, F., Abdi, H., Penard, N., Dunlop, J. P., & Parent, M. A. (2007). Theoretical, Statistical, and Practical Perspectives on Pattern-based Classification Approaches to the Analysis of Functional Neuroimaging Data. Journal of Cognitive Neuroscience, 19(11), 1735-1752.  Haynes, J., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523-534.  Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424-30.  Brodersen, K. H., Haiss, F., Ong, C., Jung, F., Tittgemeyer, M., Buhmann, J., Weber, B., et al. (2010). Model-based feature construction for multivariate decoding. NeuroImage (2010).  Brodersen, K. H., Ong, C., Stephan, K. E., Buhmann, J. (2010). The balanced accuracy and its posterior distribution. ICPR, 3121-3124. Multivariate Bayes  Friston, K., Chu, C., Mourao-Miranda, J., Hulme, O., Rees, G., Penny, W., et al. (2008). Bayesian decoding of brain images. NeuroImage, 39(1), 181-205. Further reading

56 Multivariate approaches can exploit a sampling bias in voxelized images to reveal interesting activity on a subvoxel scale. Why multivariate? Boynton (2005) Nature Neuroscience

57 The last 10 years have seen a notable increase in decoding analyses in neuroimaging. Why multivariate? Haxby et al. (2001) ScienceLautrup et al. (1994) Supercomputing in Brain Research PET prediction

58 Which brain regions are jointly informative of a cognitive state of interest? Spatial deployment of informative regions Nandy & Cordes (2003) MRM Kriegeskorte et al. (2006) PNAS Mourao-Miranda et al. (2005) NeuroImage

59  Classification induces constraints on the experimental design.  When estimating trial-wise Beta values, we need longer ITIs (typically 8 – 15 s).  At the same time, we need many trials (typically 100+).  Classes should be balanced. If they are imbalanced, we can resample the training set, constrain the classifier, or report the balanced accuracy.  Construction of examples  Estimation of Beta images is the preferred approach.  Covariates should be included in the trial-by-trial design matrix.  Temporal autocorrelation  In trial-by-trial classification, exclude trials around the test trial from the training set.  Avoiding double-dipping  Any feature selection and tuning of classifier settings should be carried out on the training set only.  Performance evaluation  Do random-effects or mixed-effects inference.  Correct for multiple tests. Issues to be aware of (as researcher or reviewer)

60  Evaluating the performance of a classification algorithm critically requires a measure of the degree to which unseen examples have been identified with their correct class labels.  The procedure of averaging across accuracies obtained on individual cross-validation folds is flawed in two ways. First, it does not allow for the derivation of a meaningful confidence interval. Second,it leads to an optimistic estimate when a biased classifier is tested on an imbalanced dataset.  Both problems can be overcome by replacing the conventional point estimate of accuracy by an estimate of the posterior distribution of the balanced accuracy. Performance evaluation Brodersen, Ong, Buhmann, Stephan (2010) ICPR

61 Pattern characterization Example – decoding the identity of the person speaking to the subject in the scanner Formisano et al. (2008) Science voxel 1 voxel 2... fingerprint plot (one plot per class)

62 Lessons from the Neyman-Pearson lemma Neyman & Person (1933) Phil Trans Roy Soc London

63 Lessons from the Neyman-Pearson lemma Neyman & Person (1933) Phil Trans Roy Soc London Kass & Raftery (1995) J Am Stat Assoc Friston et al. (2009) NeuroImage

64 Temporal evolution of discriminability Soon et al. (2008) Nature Neuroscience Example – decoding which button the subject pressed motor cortex frontopolar cortex classification accuracy decision response

65 Approach 1. estimation of an encoding model 2. nearest-neighbour classification or voting Identification / inferring a representational space Mitchell et al. (2008) Science

66 Approach 1. estimation of an encoding model 2. model inversion Reconstruction / optimal decoding Paninski et al. (2007) Progr Brain Res Pillow et al. (2008) Nature Miyawaki et al. (2009) Neuron

67 Recent MVB studies

68 Specifying the prior for MVB Voxel 3 is allowed to play a role, but only if its neighbours play weaker roles. Voxel 2 is allowed to play a role.

69 Inverting the model Partition #1 Partition #2 Partition #3 (optimal) Pattern 8 is allowed to make some contribution, independently of the contributions of other patterns. Pattern 3 makes an important positive contribution / is activated.

70 Observations vs. predictions (motion)

71 MVB may outperform conventional point classifiers when using a more appropriate coding hypothesis. Using MVB for point classification Support Vector Machine

72 Model-based analyses by data representation Model-based analyses How do patterns of hidden quantities (e.g., connectivity among brain regions) differ between groups? Structure-based analyses Which anatomical structures allow us to separate patients and healthy controls? Activation-based analyses Which functional differences allow us to separate groups?

73 Model-based clustering functional connectivity effective connectivity 78% 71% best model 71 % supervised learning: SVM classification unsupervised learning: GMM clustering (using effective connectivity) 62 % 78 % 55 % regional activity Brodersen, Deserno, Schlagenhauf, Penny, Lin, Gupta, Buhmann, Stephan (in preparation)

74 Generative embedding and DCM ? ? { patient, control }

75 Model-based classification using DCM structure-based classification activation-based classification model-based classification model selection inference on model parameters vs. ? { group 1, group 2 }

Multivariate models for fMRI data Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen) Translational Neuromodeling Unit (TNU)

Similar presentations

Presentation on theme: "Multivariate models for fMRI data Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen) Translational Neuromodeling Unit (TNU)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multivariate models for fMRI data Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen) Translational Neuromodeling Unit (TNU)

Similar presentations

Presentation on theme: "Multivariate models for fMRI data Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen) Translational Neuromodeling Unit (TNU)"— Presentation transcript:

Similar presentations

About project

Feedback