Download presentation
Presentation is loading. Please wait.
1
fMRI Analysis with emphasis on the general linear model
Jody Culham Department of Psychology University of Western Ontario fMRI Analysis with emphasis on the general linear model Last Update: November 29, 2008 fMRI Course, Louvain, Belgium
2
What data do we start with
… 12 slices * 64 voxels x 64 voxels = 49,152 voxels Each voxel has 136 time points Therefore, for each run, we have 6.7 million data points We often have several runs for each experiment
3
Why do we need stats? We could, in principle, analyze data by voxel surfing: move the cursor over different areas and see if any of the time courses look interesting Slice 9, Voxel 0, 0 Even where there’s no brain, there’s noise Slice 9, Voxel 1, 0 Slice 9, Voxel 22, 7 The signal is much higher where there is brain, but there’s still noise Slice 9, Voxel 14, 42 Here’s a couple that sort of show the right pattern but is it “real”? Slice 9, Voxel 18, 36 Slice 9, Voxel 9, 27 Here’s a voxel that responds well whenever there’s visual stimulation Slice 9, Voxel 13, 41 Here’s one that responds well whenever there’s intact objects
4
Why do we need stats? Clearly voxel surfing isn’t a viable option. We’d have to do it 49,152 times in this data set and it would require a lot of subjective decisions about whether activation was real This is why we need statistics Statistics: tell us where to look for activation that is related to our paradigm help us decide how likely it is that activation is “real” The lies and damned lies come in when you write the manuscript
5
Predicted Responses fMRI is based on the Blood Oxygenation Level Dependent (BOLD) response It takes about 5 sec for the blood to catch up with the brain We can model the predicted activation in one of two ways: shift the boxcar by approximately 5 seconds (2 images x 2 seconds/image = 4 sec, close enough) convolve the boxcar with the hemodynamic response to model the shape of the true function as well as the delay PREDICTED ACTIVATION IN VISUAL AREA PREDICTED ACTIVATION IN OBJECT AREA BOXCAR SHIFTED CONVOLVED WITH HRF
6
Types of Errors Type I Error HIT Type II Error Correct Rejection
Is the region truly active? Does our stat test indicate that the region is active? Yes No HIT Type I Error Type II Error Correct Rejection p value: probability of a Type I error e.g., p <.05 “There is less than a 5% probability that a voxel our stats have declared as “active” is in reality NOT active Slide modified from Duke course
7
Statistical Approaches in a Nutshell
t-tests compare activation levels between two conditions use a time-shift to account for hemodynamic lag correlations model activation and see whether any areas show a similar pattern Fourier analysis Do a Fourier analysis to see if there is energy at your paradigm frequency Fourier analysis images from Huettel, Song & McCarthy, 2004, Functional Magnetic Resonance Imaging
8
Effect of Thresholds r = .40 16% of variance p < .000001 r = .80
9
Complications Not only is it hard to determine what’s real, but there are all sorts of statistical problems Potential problems data may be contaminated by artifacts (e.g., head motion, breathing artifacts) .05 * 49,152 = 2457 “significant” voxels by chance alone many assumptions of statistics (adjacent voxels uncorrelated with each other; adjacent time points uncorrelated with one another) are false What’s wrong with these data? r = .24 6% of variance p < .05
10
The General Linear Model
T-tests, correlations and Fourier analysis work for simple designs and were common in the early days of imaging The General Linear Model (GLM) is now available in many software packages and tends to be the analysis of choice Why is the GLM so great? the GLM is an overarching tool that can do anything that the simpler tests do you can examine any combination of contrasts (e.g., intact - scrambled, scrambled - baseline) with one GLM rather than multiple correlations the GLM allows much greater flexibility for combining data within subjects and between subjects it also makes it much easier to counterbalance orders and discard bad sections of data the GLM allows you to model things that may account for variability in the data even though they aren’t interesting in and of themselves (e.g., head motion) as we will see later in the course, the GLM also allows you to use more complex designs (e.g., factorial designs)
11
A Simple Experiment TIME Blank Screen Intact Objects Scrambled Objects
Lateral Occipital Complex responds when subject views objects Blank Screen Intact Objects Scrambled Objects TIME One volume (12 slices) every 2 seconds for 272 seconds (4 minutes, 32 seconds) Condition changes every 16 seconds (8 volumes)
12
What’s real? A. C. B. D.
13
= + What’s real? signal noise
I created each of those time courses based by taking the predictor function and adding a variable amount of random noise signal = + noise
14
What’s real? Which of the data sets below is more convincing?
15
Formal Statistics Formal statistics are just doing what your eyeball test of significance did Estimate how likely it is that the signal is real given how noisy the data is confidence: how likely is it that the results could occur purely due to chance? “p value” = probability value If “p = .03”, that means there is a .03/1 or 3% chance that the results are bogus By convention, if the probability that a result could be due to chance is less than 5% (p < .05), we say that result is statistically significant Significance depends on signal (differences between conditions) noise (other variability) sample size (more time points are more convincing)
16
Let’s create a time course for one LO voxel
17
We’ll begin with activation
Response to Intact Objects is 4X greater than Scrambled Objects
18
Then we’ll assume that our modelled activation is off because a transient component
19
Our modelled activation could be off for other reasons
All of the following could lead to inaccurate models different shape of function different width of function different latency of function
20
Variability of HRF Aguirre, Zarahn & D’Esposito, 1998
HRF shows considerable variability between subjects different subjects Within subjects, responses are more consistent, although there is still some variability between sessions same subject, same session same subject, different session
21
Now let’s add some variability due to head motion
22
…though really motion is more complex
Head motion can be quantified with 6 parameters given in any motion correction algorithm x translation y translation z translation xy rotation xz rotation yz rotation For simplicity, I’ve only included parameter one in our model Head motion can lead to other problems not predictable by these parameters
23
Now let’s throw in a pinch of linear drift
linear drift could arise from magnet noise (e.g., parts warm up) or physiological noise (e.g., subject’s head sinks)
24
and then we’ll add a dash of low frequency noise
low frequency noise can arise from magnet noise or physiological noise (e.g., subject’s cycles of alertness/drowsiness) low frequency noise would occur over a range of frequencies but for simplicity, I’ve only included one frequency (1 cycle per run) here Linear drift is really just very low frequency noise
25
and our last ingredient… some high frequency noise
high frequency noise can arise from magnet noise or physiological noise (e.g., subject’s breathing rate and heartrate)
26
When we add these all together, we get a realistic time course
27
Now let’s be the experimenter
First, we take our time course and normalize it using z scores z = (x - mean)/SD normalization leads to data where mean = zero SD = 1
28
We create a GLM with 2 predictors
× 1 = + + × 2 = fMRI Signal Design Matrix x Betas + Residuals “what we CAN explain” “how much of it we CAN explain” “what we CANNOT explain” = “our data” x + Statistical significance is basically a ratio of explained to unexplained variance
29
Implementation of GLM in SPM
Intact Predictor Scrambled Predictor Many thanks to Øystein Bech Gadmar for creating this figure in SPM Time SPM represents time as going down SPM represents predictors within the design matrix as grayscale plots (where black = low, white = high) over time SPM includes a constant to take care of the average activation level throughout each run
30
Effect of Beta Weights Adjustments to the beta weights have the effect of raising or lowering the height of the predictor while keeping the shape constant
31
The beta weight is not a correlation
correlations measure goodness of fit regardless of scale beta weights are a measure of scale
32
We create a GLM with 2 predictors
when 1=2 fMRI Signal + Residuals = + when 2=0.5 = Design Matrix x Betas “what we CAN explain” “how much of it we CAN explain” “what we CANNOT explain” = “our data” x + Statistical significance is basically a ratio of explained to unexplained variance
33
Correlated Predictors
Where possible, avoid predictors that are highly correlated with one another This is why we NEVER include a baseline predictor baseline predictor is almost completely correlated with the sum of existing predictors + r = -.53 = r = -.53 r = -.95 Two stimulus predictors Baseline predictor
34
Which model accounts for this data?
x β = 1 x β = 0 + OR + x β = 1 x β = 0 + + x β = 0 x β = -1 Because the predictors are highly correlated, you can’t tell which model is best
35
Contrasts in the GLM We can examine whether a single predictor is significant (compared to the baseline) We can also examine whether a single predictor is significantly greater than another predictor
36
A Real Voxel Here’s the time course from a voxel that was significant in the +Intact -Scrambled comparison
37
= + Maximizing Your Power signal noise
As we saw earlier, the GLM is basically comparing the amount of signal to the amount of noise How can we improve our stats? increase signal decrease noise increase sample size (keep subject in longer)
38
How to Reduce Noise If you can’t get rid of an artifact, you can include it as a “predictor of no interest” to soak up variance Example: Some people include predictors from the outcome of motion correction algorithms Corollary: Never leave out predictors for conditions that will affect your data
39
Reducing Residuals
40
What’s this #*%&ing reviewer complaining about?!
Particularly if you do voxelwise stats, you have to be careful to follow the accepted standards of the field. In the past few years the following approaches have been recommended by the stats mavens: Correction for multiple comparisons Random effects analyses Correction for serial correlations
41
Correction for Multiple Comparisons
With conventional probability levels (e.g., p < .05) and a huge number of comparisons (e.g., 64 x 64 x 12 = 49,152), a lot of voxels will be significant purely by chance e.g., .05 * 49,152 = 2458 voxels significant due to chance How can we avoid this? Bonferroni correction divide desired p value by number of comparisons Example: desired p value: p < .05 number of voxels: 50,000 required p value: p < .05 / 50,000 p < quite conservative can use less stringent values e.g., Brain Voyager can use the number of voxels in the cortical surface small volume correction: use more liberal thresholds in areas of the brain which you expected to be active
42
Correction for Multiple Comparisons
Gaussian field theory Fundamental to SPM If data are very smooth, then the chance of noise points passing threshold is reduced Can correct for the number of “resolvable elements” (“resels”) rather than number of voxels Slide modified from Duke course
43
Test-retest reliability
3) Cluster correction falsely activated voxels should be randomly dispersed set minimum cluster size to be large enough to make it unlikely that a cluster of that size would occur by chance assumes that data from adjacent voxels are uncorrelated (not true) Test-retest reliability Perform statistical tests on each half of the data The probability of a given voxel appearing in both purely by chance is the square of the p value used in each half e.g., .001 x .001 = Alternatively, use the first half to select an ROI and evaluate your hypothesis in the second half.
44
5) Poor man’s Bonferroni
Jack up the threshold till you get rid of the schmutz (especially in air, ventricles, white matter) If you have a comparison where one condition is expected to produce much more activity than the other, turn on both tails of the comparison Jody’s rule of thumb: “If ya can’t trust the negatives, can ya trust the positives?” Example: MT localizer data Moving rings > stationary rings (orange) Stationary rings > moving rings (blue)
45
6) False discovery rate Type I Error HIT Type II Error
Genovese et al, 2002, NeuroImage also “controls the proportion of rejected hypotheses that are falsely rejected” standard p value (e.g., p < .01) means that a certain proportion of all voxels will be significant by chance (1%) FDR uses q value (e.g., q < .01), meaning that a certain proportion of the “activated” (colored) voxels will be significant by chance (1%) works in theory, though in practice, my lab hasn’t been that satisfied Is the region truly active? Yes No HIT Type I Error Yes Does our stat test indicate that the region is active? Type II Error Correct Rejection No
46
Correction for Temporal Correlations
Statistical methods assume that each of our time points is independent. In the case of fMRI, this assumption is false. Even in a screen saver scan, activation in a voxel at one time is correlated with it’s activation within ~6 sec This fact can artificially inflate your statistical significance.
47
Collapsed Fixed Effects Models
assume that the experimental manipulation has same effect in each subject treats all data as one concatenated set with one beta per predictor (collapsed across all subjects) e.g., Intact = 2 Scrambled = .5 strong effect in one subject can lead to significance even when others show weak or no effects you can say that effect was significant in your group of subjects but cannot generalize to other subjects that you didn’t test
48
Separate Subjects Models
one beta per predictor per subject e.g., JC: Intact = 2.1 JC: Scrambled = 0.2 DQ: Intact = 1.5 DQ: Scrambled = 1.0 KV: Intact = 1.2 KV: Scrambled = 1.3 weights each subject equally makes data less susceptible to effects of one rogue subject
49
Random Effects Analysis
Typical fMRI stats test whether the differences between conditions are significant in the sample of subjects we have tested Often, we want to be able to generalize to the population as a whole including all potential subjects, not just the ones we tested Random effects analyses allow you to generalize to the population you tested Brain Voyager recommends you don’t even toy with random effects unless you’ve got 10 or more subjects (and 50+ is best) Random effects analyses can really squash your data, especially if you don’t have many subjects. Sometimes we refer to the random effects button as the “make my activation go away” button. Reviewers are now requesting random effects analyses more frequently You don’t have to worry about it if you’re using the ROI approach because (1) presumably the ROI has already been well-established across multiple labs; and (2) posthoc analyses of results in an ROI approach allow you to generalize to the population (assuming you include individual variance) underpaid graduate students in need of a few bucks!
50
Fixed vs. Random Effects GLM
Sample Data #1 Sample Data #2 Subject Intact beta Scram beta Diff 1 4 3 2 SUM 10 7 Subject Intact beta Scram beta Diff 1 4 3 2 -1 SUM 10 7 Fixed Effects GLM cannot tell the difference between these data sets because (Intact sum - Scram sum) is the same in both cases In Random Effects GLM, Data set #1 would be more likely to be significant because all 3 subjects show a trend in the same direction (intact > scrambled), whereas in data set #2, only 2 of 3 subjects show a difference in that direction
51
Autocorrelation function
time To calculate the magnitude of the problem, we can compute the autocorrelation function For a voxel or ROI, correlate its time course with itself shifted in time Plot these correlations by the degree of shift original shift by 1 volume shift by 2 volumes If there’s no autocorrelation, function should drop from 1 to 0 abruptly – pink line the points circled in yellow suggest there is some autocorrelation, especially at a shift of 1, called AR(1)
52
BV can correct for the autocorrelation to yield revised (usually lower) p values
BEFORE AFTER
53
BV Preprocessing Options
54
Temporal Smoothing of Data
We have the option in our software to temporally smooth our data (i.e., remove high temporal frequencies) However, I recommended that you not use this option Now do you understand why?
55
WARNING: UNDER CONSTRUCTION need to clarify explanation
Clarification correction for temporal correlations is NOT necessary with random effects analyses, only for fixed effects and individual subjects analysis WARNING: UNDER CONSTRUCTION need to clarify explanation
56
Approach #1: Voxelwise Statistics
You don’t necessarily need a priori hypotheses (though sometimes you can use less conservative stats if you have them) Average all of your data together in Talairach space Compare two (or more) conditions using precise statistical procedures within every voxel of the brain. Any area that passes a carefully determined threshold is considered real. Make a list of these areas and publish it. This is the tricky part!
57
Voxelwise Approach: Example
Malach et al., 1995, PNAS Question: Are there areas of the human brain that are more responsive to objects than scrambled objects You will recognize this as what we now call an LO localizer, but Malach was the first to identify LO LO (red) responds more to objects, abstract sculptures and faces than to textures, unlike visual cortex (blue) which responds well to all stimuli LO activation is shown in red, behind MT+ activation in green
58
The Danger of Voxelwise Approaches
This is one of two tables from a paper Some papers publish tables of activation two pages long How can anyone make sense of so many areas? Source: Decety et al., 1994, Nature
59
To Localize or Not to Localize?
Neuroimagers can’t even agree how to SPELL localiser/localizer!
60
Methodological Fundamentalism
The latest review I received…
61
Approach #2: Region of interest (ROI) analysis
If you are looking at a well-established area (such as visual cortex, motor cortex, or the lateral occipital complex), it’s fairly easy to activate and identify the area Do the stats and play with the threshold till you get something believable in the right vicinity based on anatomical location (e.g., sulcal landmarks) or functional location (e.g., Talairach coordinates from prior studies) Once you have found the ROI, do independent experiments, extract the time course information and determine whether activation differences between conditions are significant Because the runs that are used to generate the area are independent from those used to test the hypothesis, liberal statistics (p < .05) can be used
62
Example of ROI Approach
Culham et al., 2003, Experimental Brain Research Does the Lateral Occipital Complex compute object shape for grasping? Step 1: Localize LOC Intact Objects Scrambled Objects
63
Example of ROI Approach
Culham et al., 2003, Experimental Brain Research Does the Lateral Occipital Complex compute object shape for grasping? Step 2: Extract LOC data from experimental runs Grasping Reaching NS p = .35 NS p = .31
64
Example of ROI Approach
Very Simple Stats % BOLD Signal Change Left Hem. LOC Subject Grasping Reaching 1 0.02 0.03 2 0.19 0.08 3 0.04 0.01 4 0.10 0.32 5 1.01 -0.27 6 0.16 0.09 7 0.12 Then simply do a paired t-test to see whether the peaks are significantly different between conditions Extract average peak from each subject for each condition NS p = .35 NS p = .31 Instead of using % BOLD Signal Change, you can use beta weights You can also do a planned contrast in Brain Voyager using a module called the ROI GLM
65
Utility of Doing Both Approaches
We also verified the result with a voxelwise approach Verification of no LOC activation for grasping > reaching even at moderate threshold (p < .001, uncorrected)
66
Example: The Danger of ROI Approaches
Example 1: LOC may be a heterogeneous area with subdivisions; ROI analyses gloss over this Example 2: Some experiments miss important areas (e.g., Kanwisher et al., 1997 identified one important face processing area -- the fusiform face area, FFA -- but did not report a second area that is a very important part of the face processing network -- the occipital face area, OFA -- because it was less robust and consistent than the FFA.
67
Comparing the two approaches
Voxelwise Analyses Require no prior hypotheses about areas involved Include entire brain Often neglect individual differences Can lose spatial resolution with intersubject averaging Can produce meaningless “laundry lists of areas” that are difficult to interpret You have to be fairly stats-savvy and include all the appropriate statistical corrections to be certain your activation is really significant Popular in Europe
68
Comparing the two approaches
Region of Interest (ROI) Analyses Extraction of ROI data can be subjected to simple stats (no need for multiple comparisons, autocorrelation or random effects corrections) Gives you more statistical power (e.g., p < .05) Hypothesis-driven Useful when hypotheses are motivated by other techniques (e.g., electrophysiology) in specific brain regions ROI is not smeared due to intersubject averaging Important for discriminating abutting areas (e.g., V1/V2) Easy to analyze and interpret Neglects other areas which may play a fundamental role If multiple ROIs need to be considered, you can spend a lot of scan time collecting localizer data (thus limiting the time available for experimental runs) Works best for reliable and robust areas with unambiguous definitions Popular in North America
69
A Proposed Resolution There is no reason not to do BOTH ROI analyses and voxelwise analyses ROI analyses for well-defined key regions Voxelwise analyses to see if other regions are also involved Ideally, the conclusions will not differ If the conclusions do differ, there may be sensible reasons Effect in ROI but not voxelwise perhaps region is highly variable in stereotaxic location between subjects perhaps voxelwise approach is not powerful enough Effect in voxelwise but not ROI perhaps ROI is not homogenous or is context-specific
70
Finding the Obvious Non-independence error
UNDER CONSTRUCTION: Need to create animation of cards Non-independence error occurs when statistical tests performed are not independent from the means used to select the brain region Arguments from Vul & Kanwisher, book chapter in press
71
Non-independence Error
Egregious example Identify Area X with contrast of A > B Do post hoc stats showing that A is statistically higher than B Act surprised More subtle example of selection bias Do post hoc stats showing that A is statistically higher than C and C is statistically greater than B UNDER CONSTRUCTION: Improve examples and make graphs to illustrate Arguments from Vul & Kanwisher, book chapter in press
72
Hypothesis- vs. Data-Driven Approaches
Hypothesis-driven Examples: t-tests, correlations, general linear model (GLM) a priori model of activation is suggested data is checked to see how closely it matches components of the model most commonly used approach Data-driven Example: Independent Component Analysis (ICA) no prior hypotheses are necessary multivariate techniques determine the patterns in the data that account for the most variance across all voxels can be used to validate a model (see if the math comes up with the components you would’ve predicted) can be inspected to see if there are things happening in your data that you didn’t predict can be used to identify confounds (e.g., head motion)
73
Example of ICA hardest part is sorting the many components into meaningful ones vs. artifacts “fingerprints” may help
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.