Jody Culham Brain and Mind Institute Department of Psychology Western University fMRI Techniques to Investigate Neural Coding: Multivoxel Pattern Analysis (MVPA) Last Update: November 28, 2016 Last Course: Psychology 9223, F2016

2 Limitations of Subtraction Logic
Example: We know that neurons in the brain can be tuned for individual faces “Jennifer Aniston” neuron in human medial temporal lobe; Quiroga et al., 2005, Nature

3 fMRI spatial resolution: 1 voxel
high activity fMRI spatial resolution: 1 voxel 3 mm Fusiform Face Area (FFA) 3 mm 3 mm 3 mm low activity 3 mm A voxel might contain millions of neurons, so the fMRI signal represents the population activity

4 Limitations of Subtraction Logic
fMRI resolution is typically around 3 x 3 x 3 mm so each sample comes from millions of neurons. Let’s consider just three neurons. Neuron 1 “likes” Jennifer Aniston Neuron 2 “likes” Julia Roberts Neuron 3 “likes” Brad Pitt Even though there are neurons tuned to each object, the population as a whole shows no preference Firing Rate Firing Rate Firing Rate Activation

5 Two Techniques with “Subvoxel Resolution”
“subvoxel resolution” = the ability to investigate coding in neuronal populations smaller than the voxel size being sampled Multi-Voxel Pattern Analysis (MVPA or decoding or “mind reading”) fMR Adaptation (or repetition suppression or priming)

6 Multivoxel Pattern Analyses (or decoding or “mind reading”)

7 fMRI spatial resolution: 1 voxel
high activity fMRI spatial resolution: 1 voxel 3 mm low activity 3 mm

8 Region Of Interest (ROI): group of voxels
high activity 3 mm 3 mm low activity 3 mm

9 Voxel Pattern Information
Condition 1 Condition 2 3 mm L R 3 mm 3 mm

10 Spatial Smoothing 4 mm FWHM 7 mm FWHM 10 mm FWHM No smoothing most conventional fMRI studies spatially smooth (blur) the data increases signal-to-noise facilitates intersubject averaging loses information about the patterns across voxels

11 Effect of Spatial Smoothing and Intersubject Averaging
3 mm 3 mm 3 mm

12 Standard fMRI Analysis
FACES HOUSES trial 1 trial 1 trial 2 trial 2 trial 3 trial 3 Average Summed Activation

13 Perhaps voxels contain useful information
In traditional fMRI analyses, we average across the voxels within an area, but these voxels may contain valuable information In traditional fMRI analyses, we assume that an area encodes a stimulus if it responds more, but perhaps encoding depends on pattern of high and low activation instead But perhaps there is information in the pattern of activation across voxels

14 Decoding for Dummies Kerri Smith, 2013, Nature, “Reading Minds”

15 Approaches to Multi-Voxel Pattern Analysis
MVPA classifier MVPA correlation: Basic approach MVPA correlation: Representational similarity analysis

16 Preparatory Steps

17 Initial Steps Step 1: Select a region of interest (ROI)
e.g. a cube centred on an activation hotspot [15 mm (5 functional voxels)]3 = 3,375 mm3 = 125 functional voxels DO NOT SPATIALLY SMOOTH THE DATA Step 2 : Extract a measure of brain activation from each of the functional voxels within the ROI β weights z-normalized %-transformed % BOLD signal change minus baseline t-values β/error

18 MVPA Methods block or event-related data resolution
works even with moderate resolution (e.g., 3 mm isovoxel) tradeoff between resolution and coverage, SNR preprocessing usually steps apply (slice scan time correction, motion correction, low pass temporal filter) EXCEPT: No spatial smoothing! Model single subjects, not combined group data (at least initially)

19 Classifier Approach

20 Classifier Approach FACES HOUSES trial 1 trial 1 Training Trials
Can an algorithm correctly “guess” trial identity better than chance (50%)? Test Trials (not in training set)

21 Voxel 1 Voxel 2 Activity in Voxel 1 Activity in Voxel 2 Faces Houses
Each dot is one measurement (trial) from one condition (red circles) or the other (green circles) Activity in Voxel 2 Faces Houses

22 Training set Test set Activity in Voxel 1 Activity in Voxel 2 Faces Houses Classifier

23 Can the classifier generalize to untrained data?
Test set Activity in Voxel 1 Activity in Voxel 2 Faces Correct 6 Classifier Accuracy = = = 75 % Houses 8 Incorrect Classifier

24 Iterative testing (“folds”)
Example: Leave one-pair out 10 trials of faces + 10 trials of houses There are 100 possible combinations of trial pairs F1, H1 F1, H2 F2, H1 F2, H2 F10, H10 We can train on 9/10 trials of each with 1/10 excluded for 100 iterations Average the accuracy across the 100 iterations Many options: e.g., Leave one run out; classify the average of several trials left out

25 9 voxels  9 dimensions simple 2D example
Haynes & Rees, 2006, Nat Rev Neurosci decision boundary Each dot is one measurement (trial) from one trial type (red circles) or the other (blue squares) Classifier can not act on single voxels because distributions overlap Classifier can act on combination of voxels using a linear decision boundary simple 2D example Classifier can act on single voxels. Conventional fMRI analysis would detect the difference. White and black circles show examples of correct and erroneous classification in the test set Classifier would require curved decision boundary

26 Where to “Draw the Line”?
There are different approaches to determining what line/plane/hyperplane to use as the boundary between classes We want an approach with good generalization to untrained data The most common approach is the linear support vector machine (SVM)

27 Support Vector Machine (SVM)
SVM finds a linear decision boundary that discriminates between two sets of points constrained to have a the largest possible distance from the closest points on both sides. response patterns closest to the decision boundary (yellow circles) that defined the margins are called “support vectors”. Mur et al., 2009 Mur et al., 2009

28 Is decoding better than chance?
Two options Use intersubject variability to determine significance Use permutation testing to determine significance

29 Average Accuracy vs. Chance
d Jody’s rant: When the comparison between error bars and a reference value (e.g., chance, zero) is meaningful, confidence intervals are the best choice for error bars, not SEM (See +/- 95% CI Classification Accuracy (%) chance Mean Subject

30 Permutation Testing vs. Chance
randomize all the condition labels run SVMs on the randomized data repeat this many times (e.g., 1000X) get a distribution of expected decoding accuracy test the null hypothesis (H0) that the decoding accuracy you found came from this permuted distribution can be done even in single subjects

31 Is decoding better than chance?
Two options Permutation Testing our data  reject H0 upper bound of 95% confidence limits on permuted distribution upper quartile of permuted distribution median of permuted distribution (should be 33.3%) lower quartile of permuted distribution

32 Example of MVPA classifier approach: decoding future actions
Gallivan et al., 2013, eLife

33 Conditions


35 Hand and Tool Decoding +/- 1 SEM

36 Cross-decoding Logic Task-Across-Effector
Train Grasp vs. Reach for one effector (e.g. Hand) Test Grasp vs. Reach for other effector (e.g., Tool) If (Accuracy > chance), then area codes task regardless of effector

37 Hand and Tool Decoding % Decoding Accuracy L SPOC +/- 1 SEM

38 Hand and Tool Decoding % Decoding Accuracy L SMG L M1 L aIPS L PMd
L PMv % Decoding Accuracy L SPOC +/- 1 SEM

39 Single TR Decoding % Decoding Accuracies Time (volumes) +/- 1 SEM

Summary PMd PMv M1 aIPS IPS/ SMG MTG EBA pIPS SPOC VISION-FOR-ACTION “HOW” STREAM TOOL NETWORK VISION-FOR-PERCEPTION “WHAT” Action Plan Decoding Hand actions only Tool actions only Separate hand and tool actions Common hand and tool actions

41 Basic Correlation Approach

42 First Demonstration

43 MVPA correlation approach
Faces Houses trial 1 trial 1 trial 1 trial 2 trial 2 trial 2 trial 3 trial 3 trial 3 trial 3 Average Summed Activation

44 MVPA correlation approach
Faces Houses trial 1 trial 1 trial 1 trial 1 trial 2 trial 2 trial 2 trial 2 trial 3 trial 3 trial 3 trial 3 trial 3 Average Summed Activation The same category evokes similar patterns of activity across trials

45 MVPA correlation approach
Faces Houses trial 1 trial 1 trial 1 trial 2 trial 2 trial 2 trial 3 trial 3 trial 3 trial 3 Average Summed Activation Similarity Within the same category

46 MVPA correlation approach
Faces Houses trial 1 trial 1 trial 1 trial 1 trial 2 trial 2 trial 2 trial 2 trial 3 trial 3 trial 3 trial 3 Average Summed Activation Similarity Between different categories

47 The brain area contains distinct information about faces and houses
Within-category similarity > Between-category similarity The brain area contains distinct information about faces and houses

48 Category-specificity of patterns of response in the ventral temporal cortex
Haxby et al., 2001, Science

49 Within-category similarity
Category-specificity of patterns of response in the ventral temporal cortex SIMILARITY MATRIX ODD RUNS high EVEN RUNS similarity low Within-category similarity Haxby et al., 2001, Science

50 Between-category similarity
Category-specificity of patterns of response in the ventral temporal cortex SIMILARITY MATRIX ODD RUNS high EVEN RUNS similarity low Between-category similarity Haxby et al., 2001, Science

51 Correlation Approach Using Representational Similarity Analysis

52 Representational similarity approach (RSA)
Differently from the MVPA correlation, RSA does not separate stimuli into a priori categories MVPA correlation RSA high low (correlation) similarity ODD RUNS EVEN RUNS Kriegeskorte et al (2008)

53 No class boundaries! C1 high . . CONDITIONS TRIALS similarity low C96
Fraser Smith & Jason Gallivan

54 Can compare theoretical models to data
high low similarity Kriegeskorte et al (2008)

55 Can compare theoretical models to data
Which prediction matrix is more similar to the real data? high low similarity REAL DATA Kriegeskorte et al (2008)

56 “Metacorrelations” Calculate correlation between
model correlation matrix and data correlation matrix

57 Can look at metacorrelations to determine best model or see similarity between areas
Right FFA pattern is similar to left FFA pattern Right FFA pattern is similar to the fane-anim prototype theoretical model Right FFA pattern is not very similar to a low-level vision theoretical model

58 Metacorrelation Matrix

59 Multidimensional Scaling (MDS) Input = matrix of distances (km here)
Vancouver Winnipeg Toronto Montreal Halifax St. John's Yellowknife Whitehorse 1869 3366 3694 4439 5046 1566 1484 1518 1825 2581 3250 1753 2463 503 1266 2112 3078 4093 792 1613 3194 4261 885 3768 4867 4127 5233 1109

60 Multidimensional Scaling (MDS) Output = representational space (2D here)
Halifax Toronto Montreal St. John’s Winnipeg Vancouver Yellowknife Whitehorse

61 Halifax Toronto Montreal St. John’s Winnipeg Vancouver Yellowknife Whitehorse

62 MDS on MVPA Data MDS

63 Different Representational Spaces in Different Areas

64 Metacorrelation Matrix

65 MDS on Metacorrelations

66 Searchlights

67 Searchlight: 8 Voxel Example

68 Let’s zoom in on 8 voxels

69 Spherical Searchlight Cross-Section
Ideally we’d like to test a spherical volume but the functional brain image is voxelized so we end up with a Lego-like sphere Typical diameter = ~15 mm (e.g., 5 voxels at 3 mm isovoxel resolution) Kriegeskorte, Goebel & Bandettini, 2006, PNAS

70 Moving the Searchlight
55 62 73 67 60 52 48 51 Each value in white is the decoding accuracy for a sphere of 5-voxels diameter centered on a given voxel

71 First- and Second-Level Analysis
55 62 73 67 60 52 48 51 S1 V1 V2 V3 V4 V5 V6 V7 V8 The same 8 voxels in stereotaxic space (e.g., Tal space) SVM Classifier Decoding accuracies for spheres centred at each of the eight voxels in each of the 15 Ss First-level Analysis 46 52 65 69 60 59 53 48 S2 48 55 62 70 58 52 50 49 S3 52 55 59 57 56 43 42 S15 Second-level Analysis 51 59 69 67 55 50 Average Decoding Accuracy 0.3 2.0 4.1 3.7 1.9 1.2 0.8 t(14) Do a univariate t-test (which is an RFX test based on intersubject variability) at each voxel to calculate the probability that the decoding accuracy is higher than chance 2.9 .81 .06 .001 .008 .012 .08 .25 .44 p threshold at p < .05 (or use your favorite way of correcting for multiple comparisons)

72 Thresholded t-map

73 Second-level Analysis
First-level Analysis S1 V1 V2 V3 V4 V5 V6 V7 V8 The principles of a second-level analysis are the same regardless of what dependent variable we are testing V1 V2 V3 V4 V5 V6 V7 V8 Beta Weights (or Differences in Beta Weights = Contrasts) Second-level Analysis 0.1 0.7 1.2 1.5 1.1 0.5 0.3 0.2 Are they sig diff than zero? UNIVARIATE VOXELWISE ANALYSIS Decoding Accuracies 51 59 69 67 59 55 50 50 Are they sig diff than chance? MULTI- VARIATE SEARCHLIGHT ANALYSIS Correlations Between Model and MVPA data .03 .22 .41 .50 .38 .19 -.01 .04 Are they sig diff than zero?

74 Regions vs. Brains Univariate ROI analysis is to univariate voxelwise analysis as multivariate ROI analysis is to multivariate searchlight analysis There are no differences at the second-level analysis It’s a way to find things by searching the whole brain Subjects’ brains must be aligned (Talairach, MNI or surface space) The same problems and solutions for multiple comparisons arise Degrees of freedom = #Ss - 1 There are differences at the first-level analysis Univariate voxelwise analyses are done one voxel at a time Multivariate searchlight analyses are done one sphere at a time

75 Activation- vs. information-based analysis
Activation-based (standard fMRI analysis): regions more strongly active during face than house perception Information-based (searchlight MVPA analysis): regions whose activity pattern distinguished the two categories 35 % of voxels are marked only in the information-based map: category information is lost when data are smoothed Kriegeskorte, Goebel & Bandettini, 2006

76 Activation- vs. information-based analysis
Mur et al., 2009, Social Cognitive and Affective Neuroscience

77 What Is MVPA Picking Up On?

78 Limitations of MVPA MVPA will use whatever information is available, including confounds e.g., reaction time MVPA works best for attributes that are coded at certain spatial scales (e.g., topography: retintopy, somatotopy, etc.) A failure to find effects does not mean that neural representations do not differ information may be present at a finer scale choice of classifier may not have been optimal (e.g., maybe nonlinear would work better) Good classification indicates presence of information (not necessarily neuronal selectivity) (Logothetis, 2008). e.g., successful face decoding in primary visual cortex Pattern-classifier analysis requires many decisions that affect the results (see Misaki et al., 2010) Classifiers and correlations don’t always agree

79 “Mind-Reading”: Reconstructing new stimuli from brain activity

80 Reconstruct new images
Miyawaki et al., 2008

81 Decoding Vision Gallant Lab, UC Berkeley

82 Lie detector Non-linear classifier applied to fMRI data to discriminate spatial patterns of activity associated to lie and truth in 22 individual participants. 88% accuracy to detect lies in participants not included in the training (Davatzikos et al., 2005)

83 Lie detector Non-linear classifier applied to fMRI data to discriminate spatial patterns of activity associated to lie and truth in 22 individual participants. 88% accuracy to detect lies in participants not included in the training The real world is more complex!

84 Reconstruct dreams Measure brain activity while 3 participants were asleep and ask them to describe their dream when awake Comparison between brain activity during sleep and vision of pictures of categories frequently dreamt Activity in higher order visual areas (i.e. FFA) could successfully (accuracy of 75-80%) decode the dream contents 9 seconds before waking the participant! Abstract SfN 2012: Dreaming is a subjective experience during sleep, often accompanied by visual contents, whose neural basis remains unknown. Previous dream research attempted to link physiological states with dreaming, but did not demonstrate how the specific contents of visual experiences during dreaming are represented in the brain activity patterns. The recent advent of neural decoding has allowed for the decoding of various contents of visual experience from brain activity patterns. The technique can thus be used to examine the neural representation of dreams by testing whether neural decoders can predict dream contents from brain activity patterns. Here we performed decoding analyses on semantically labeled human fMRI signals measured from three male subjects during dreaming. To collect dream data efficiently, we measured fMRI signals and collected reports about subjective experiences during hypnagogic periods. We developed a multiple-awakening procedure, in which subjects were awakened when a specific EEG pattern was observed, were asked to freely describe their visual experiences just before awakening, and were then asked to sleep again. We repeated this procedure until we collected over 200 reports in a total of hours of experiment time for each subject. Multiple “synsets,” synonym sets defined in the English “WordNet” lexical database, that correspond to words describing reported visual contents were used to label averaged fMRI volumes during a 9 s period before each awakening. We first performed pairwise classification analyses for all pairs of synsets using fMRI signals in the early (V1-V3) and the higher (around LOC, FFA, and PPA) visual cortices during dreaming (dream-trained decoder). The decoding performance showed a distribution that was significantly higher than chance level in the higher visual areas. We next examined whether “stimulus-trained decoders” that were trained with fMRI signals evoked by natural image viewing could decode the dream contents. Results showed that the stimulus-trained decoders successfully predicted the dream contents more accurately in the higher visual cortex than in the early visual cortex. These results demonstrate that fMRI signals in the visual cortex, especially in the higher visual areas, represent specific visual contents of dreams, allowing for the prediction of dream contents. Furthermore, it supports the hypothesis that dreaming and perception may share neural representations in the higher visual areas. Kamitami Lab ATR Japan

85 Shared Semantic Space from brain activity during observation of movies
Similar colors for categories similarly represented in the brain Huth et al., 2012

86 Shared Semantic Space from brain activity during observation of movies
Similar colors for categories similarly represented in the brain People and communication verbs are represented similarly Huth et al., 2012

87 Continuous Semantic Space across the surface
Each voxel is colored accordingly to which part of the semantic space is selective for

88 Continuous Semantic Space across the surface
Click on each voxel to see which categories it represents FUSIFORM FACE AREA

89 MVPA Tutorial Jody Culham Brain and Mind Institute
Department of Psychology Western University MVPA Tutorial Last Update: January 18, 2012 Last Course: Psychology 9223, W2010, University of Western Ontario Last Update: March 10, 2013 Last Course: Psychology 9223, W2013, Western University

90 Test Data Set Two runs: A and B (same protocol)
5 trials per condition for 3 conditions

91 Measures of Activity β weights t-values % BOLD signal change
z-normalized %-transformed t-values β/error % BOLD signal change minus baseline low activity high activity low βz high βz low β% high β% low t high t

92 Step 1: Trial Estimation
Just as in the Basic GLM, we are running one GLM per voxel Now however, each GLM is estimating activation not across a whole condition but for each instance (trial or block) of a condition

93 Three Predictors Per Instance
2-gamma constant linear within trial 5 instances of motor imagery 5 instances of mental calculation 5 instances of mental singing

94 Step 1: Trial Estimation Dialog

95 Step 1: Trial Estimation Output
Now for each instance of each condition in each run, for each voxel we have an estimate of activation

96 Step 2: Support Vector Machine
SVMs are usually run in a subregion of the brain e.g., a region of interest (= volume of interest) sample data: SMA ROI sample data: 3 Tasks ROI

97 Step 2: Support Vector Machine
test data must be independent of training data leave-one-run-out leave-one-trial-out leave-one-trial-set-out often we will run a series of iterations to test multiple combinations of leave-X-out e.g., with two runs, we can run two iterations of leave-one-run-out e.g., with 10 trials per condition and 3 conditions, we could run up to 103 = 1000 iterations of leave-one-trial-set-out

98 MVP file plots 98 functional voxels intensity = activation
15 trials Run A = training set Run B = test set

99 SVM Output: Train Run A; Test Run B
Guessed Condition Actual Condition 15/15 correct Guessed Condition Actual Condition 10/15 correct (chance = 5/15)

100 SVM Output: Train Run B; Test Run A

101 Permutation Testing randomize all the condition labels
run SVMs on the randomized data repeat this many times (e.g., 1000X) get a distribution of expected decoding accuracy test the null hypothesis (H0) that the decoding accuracy you found came from this permuted distribution

102 Output from Permutation Testing
our data  reject H0 upper bound of 95% confidence limits on permuted distribution upper quartile of permuted distribution median of permuted distribution (should be 33.3%) lower quartile of permuted distribution

103 Voxel Weight Maps voxels with high weights contribute strongly to the classification of a trial to a given condition

104 Review of RSA: Voxels  Correlations
high activity r trial 1 DATA VOXEL MATRICES each cell is an estimate of activation level (e.g., β for a trial or run or condition) trial 2 low activity high DATA CORRELATION MATRIX e.g., each cell is an r for one region (e.g., FFA) similarity low

105 Review of RSA: Correlations  Representations
DATA CORRELATION MATRIX e.g., each cell is an r MULTIDIMENSIONAL SCALING PLOT Similar stimuli are close together; dissimilar stimuli are far apart high similarity low

106 Review of RSA: Correlations  Model Testing
MODEL CORRELATION MATRIX Hypothesis: Faces will be more like faces and houses will be more like houses than faces are like houses DATA CORRELATION MATRIX e.g., each cell is an r “META-CORRELATION” Correlation between data and model high similarity low

107 Review of RSA: Combining Participants
trial 1 DATA VOXEL MATRICES Participant #1 (FFA) We CANNOT combine participants’ data at the voxel level (even if their brains are in a common stereotaxic space) because the voxel activation patterns within an area (e.g., FFA) are not expected to match trial 2 trial 1 DATA VOXEL MATRICES Participant #2 (FFA) trial 2

108 Review of RSA: Combining Participants
DATA CORRELATION MATRIX Participant #1 (FFA) We CAN combine participants at the data correlation matrix level because their patterns of similarities and differences are expected to match DATA CORRELATION MATRIX Participant #2 (FFA)

109 Review of RSA: Model Testing
Variance across participants +/- 95%CI Model-Data r = .53 Participant 1 Correlation between data and model “META-CORRELATION” Participant 2 Model-Data r = .64 MODEL1 MODEL2 Model-Data r = .59 Participant n Both models significantly account for the data but MODEL1 matches more closely than MODEL2

110 Details for aficionados
It can be helpful to compute a noise ceiling (or range of expected values for it). This is a measure of how consistent the data matrices are across Ss and thus how well the best model could be expected to perform. In this cartoon example, Model 1 does a very good job considering the noise ceiling Model-Data r = .53 Participant 1 Correlation between data and model “META-CORRELATION” It may be best to use Fisher-transformed r values than raw r valuss because r’s are not normally distributed (especially for absolute r >~.5) Participant 2 Model-Data r = .64 MODEL1 MODEL2 Model-Data r = .59 Participant n Both models significantly account for the data but MODEL1 matches more closely than MODEL2

111 Review of RSA: Models vs. MDS
Correlation between data and model “META-CORRELATION” MDS Plots Good for visualizing and exploring data in a limited number of dimensions (typically 2D or 3D) May allow you to generate new models based on data-driven analysis Not good for statistical testing because they’ve oversimplified complex (n-dimensional) data and don’t give a measure of intersubject consistency Model Testing Good for statistical testing of data because they it takes into account full (n-dimesnional) data and intersubject variability Can only test the models that you the experimenter came up with

