Our entry in the Functional Imaging Analysis contest Jonathan Taylor Stanford Keith Worsley McGill
What is functional Magnetic Resonance Imaging (fMRI) data? Time series of ~200 “frames”, 3D images of brain “activity”, taken every ~2.5s (~8min) Meanwhile, subject receives stimulus or external task (e.g on/off every 10s) Several (~4) time series (“runs”) per session Several (~2) sessions per subject Several (~15) subjects Statistics problem: find the regions of the brain activated by the stimulus or task
Why a Functional Imaging Analysis Contest (FIAC)? Competing packages produce slightly different results, which is “correct”? Simulated data? Real data, compare analyses “Contest” session at 2005 Human Brain Map conference 9 entrants Results in a special issue of Human Brain Mapping in May, 2006
The main participants SPM (Statistical Parametric Mapping, 1993), University College, London, “SAS”, (MATLAB) AFNI (1995), NIH, more display and manipulation, not much stats (C) FSL (2000), Oxford, the “upstart” (C) …. FMRISTAT (2001), McGill, stats only (MATLAB) BRAINSTAT (2005), Stanford/McGill, Python version of FMRISTAT
Alternating hot and warm stimuli separated by rest (9 seconds each). hot warm hot warm Hemodynamic response function: difference of two gamma densities Responses = stimuli * HRF, sampled every 3 seconds Time, seconds Effect of stimulus on brain response Stimulus is delayed and dispersed by ~6s Modeled by convolving the stimulus with the “hemodynamic response function”
fMRI data, pain experiment, one slice T = (hot – warm effect) / S.d. ~ t 110 if no effect
How fMRI differs from other repeated measures data Many reps (~200 time points) Few subjects (~15) Df within subjects is high, so not worth pooling sd across subjects Df between subjects low, so use spatial smoothing to boost df Data sets are huge ~4GB, not easy to use R directly
FMRISTAT / BRAINSTAT statistical analysis strategy Analyse each voxel separately Borrow strength from neighbours when needed Break up analysis into stages 1 st level: analyse each time series separately 2 nd level: combine 1 st level results over runs 3 rd level: combine 2 nd level results over subjects Cut corners: do a reasonable analysis in a reasonable time (or else no one will use it!) MATLAB / Python
1 st level: Linear model with AR(p) errors Data Y t = fMRI data at time t x t = (responses,1, t, t 2, t 3, … )’ to allow for drift Model Y t = x t ’β + ε t ε t = a 1 ε t-1 + … + a p ε t-p + σ F η t, η t ~ N(0,1) i.i.d. Fit in 2 stages: 1 st pass: fit by least squares, find residuals, estimate AR parameters a 1 … a p 2 nd pass: whiten data, re-fit by least squares
Higher levels: Mixed effects model Data E i = effect (contrast in β) from previous level S i = sd of effect from previous level z i = (1, treatment, group, gender, …)’ Model E i = z i ’γ + S i ε i F + σ R ε i R (S i high df, so assumed fixed) ε i F ~ N(0,1) i.i.d. fixed effects error ε i R ~ N(0,1) i.i.d. random effects error Fit by ReML Use EM for stability, 10 iterations
Where we use spatial information 1 st level: smooth AR parameters to lower variability and increase “df” Higher levels: smooth Random / Fixed effects sd ratio to lower variability and increase “df” Final level: use random field theory to correct for multiple comparisons
1 st level: Autocorrelation AR(1) model: ε t = a 1 ε t-1 + σ F η t Fit the linear model using least squares ε t = Y t – Y t â 1 = Correlation (ε t, ε t-1 ) Estimating error t ’s changes their correlation structure slightly, so â 1 is slightly biased: Raw autocorrelation Smoothed 12.4mm Bias corrected â 1 ~ ~ 0
FWHM acor FWHM acor How much smoothing? Hot stimulus Hot-warm stimulus Target = 100 df Residual df = 110 Target = 100 df Residual df = 110 FWHM = 10.3mmFWHM = 12.4mm df acor = df residual ( ) acor(contrast of data) 2 df eff df residual df acor FWHM acor 2 3/2 FWHM data 2 = + Variability in acor lowers df Df depends on contrast Smoothing acor brings df back up: Contrast of data, acor = 0.79 Contrast of data, acor = 0.61 FWHM data = 8.79 df eff
Higher order AR model? Try AR(3): … has little effect on the T statistics: AR(1) seems to be adequate a 1 a a 3 AR(1), df=100AR(2), df= AR(3), df=98 No correlation biases T up ~12% → more false positives
Run 1Run 2Run 3Run 4 Effect, E i Sd, S i T stat, E i / S i nd level nd level: 4 runs, 3 df for random effects sd … and T>15.96 for P<0.05 (corrected): … very noisy sd: … so no response is detected …
Basic idea: increase df by spatial smoothing (local pooling) of the sd. Can’t smooth the random effects sd directly, - too much anatomical structure. Instead, random effects sd fixed effects sd which removes the anatomical structure before smoothing. Solution: Spatial smoothing of the sd ratio sd = smooth fixed effects sd )
Random effects sd, 3 dfFixed effects sd, 440 df Mixed effects sd, ~100 df Random sd / fixed sd Smoothed sd ratio random effect, sd ratio ~1.3 dividemultiply ^ Average S i
df ratio = df random ( ) df eff df ratio df fixed How much smoothing? FWHM ratio 2 3/2 FWHM data 2 = + df random = 3, df fixed = 4 110 = 440, FWHM data = 8mm: 02040Infinity FWHM ratio df eff random effects analysis, df eff = 3 fixed effects analysis, df eff = 440 Target = 100 df FWHM = 19mm
Run 1Run 2Run 3Run 4 Effect, E i Sd, S i T stat, E i / S i nd level Final result: 19mm smoothing, 100 df … less noisy sd: … and T>4.93 for P<0.05 (corrected): … and now we can detect a response!
In between: use Discrete Local Maxima (DLM) Gaussian T, 20 df T, 10 df Gaussianized threshold FWHM of smoothing kernel (voxels) True Bonferroni Random Field Theory Discrete Local Maxima (DLM) High FWHM: use Random Field Theory Low FWHM: use Bonferroni Final level: Multiple comparisons correction
GaussianT, 20 dfT, 10 df P-value FWHM of smoothing kernel (voxels) True Bonferroni Random Field Theory Discrete Local Maxima DLM can ½ P-value when FWHM ~3 voxels In between: use Discrete Local Maxima (DLM) High FWHM: use Random Field Theory Low FWHM: use Bonferroni
FIAC paradigm 16 subjects 4 runs per subject 2 runs: event design 2 runs: block design 4 conditions per run Same sentence, same speaker Same sentence, different speaker Different sentence, same speaker Different sentence, different speaker 3T, 191 frames, TR=2.5s
Events Blocks Response Beginning of block/run
Design matrix for block expt B1, B2 are basis functions for magnitude and delay:
Motion and slice time correction (using FSL) 5 conditions Smoothing of temporal autocorrelation to control the effective df 1 st level analysis 3 contrasts Beginning of block/run Same sent, same speak Same sent, diff speak Diff sent, same speak Diff sent, diff speak Sentence Speaker Interaction01 1
Sd of contrasts (lower is better) for a single run, assuming additivity of responses For the magnitudes, event and block have similar efficiency For the delays, event is much better. Efficiency
2 nd level analysis Analyse events and blocks separately Register contrasts to Talairach (using FSL) Bad registration on 2 subjects - dropped Combine 2 runs using fixed FX Combine remaining 14 subjects using random FX 3 contrasts × event/block × magnitude/delay = 12 Threshold using best of Bonferroni, random field theory, and discrete local maxima (new!) 3 rd level analysis
Part of slice z = -2 mm
Magnitude EventBlock Delay
Events: 0.14±0.04s; Blocks: 1.19±0.23s Both significant, P<0.05 (corrected) (!?!) Answer: take a look at blocks: Events vs blocks for delays in different – same sentence Different sentence (sustained interest) Same sentence (lose interest) Best fitting block Greater magnitude Greater delay
SPM BRAINSTAT
Magnitude increase for Sentence, Event Sentence, Block Sentence, Combined Speaker, Combined at (-54,-14,-2)
Magnitude decrease for Sentence, Block Sentence, Combined at (-54,-54,40)
Delay increase for Sentence, Event at (58,-18,2) inside the region where all conditions are activated
Conclusions Greater %BOLD response for different – same sentences (1.08±0.16%) different – same speaker (0.47±0.0.8%) Greater latency for different – same sentences (0.148±0.035 secs)
z=-12z=2z=5 3 1, The main effects of sentence repetition (in red) and of speaker repetition (in blue). 1: Meriaux et al, Madic; 2: Goebel et al, Brain voyager; 3: Beckman et al, FSL; 4: Dehaene-Lambertz et al, SPM2. Brainstat: combined block and event, threshold at T>5.67, P<0.05.
t (seconds) Estimating the delay of the response Delay or latency to the peak of the HRF is approximated by a linear combination of two optimally chosen basis functions: HRF(t + shift) ~ basis 1 (t) w 1 (shift) + basis 2 (t) w 2 (shift) Convolve bases with the stimulus, then add to the linear model basis 1 basis 2 HRF shift delay
shift (seconds) Fit linear model, estimate w 1 and w 2 Equate w 2 / w 1 to estimates, then solve for shift (Hensen et al., 2002) To reduce bias when the magnitude is small, use shift / (1 + 1/T 2 ) where T = w 1 / Sd(w 1 ) is the T statistic for the magnitude Shrinks shift to 0 where there is little evidence for a response. w1w1 w2w2 w 2 / w 1
Subject id, event experiment Mixed effects Ef Sd T df Magnitude (%BOLD), stimulus average Contour is: average anatomy > 2000 Random /fixed effects sd smoothed mm FWHM (mm) P=0.05 threshold for peaks is +/ y (mm) x (mm)
Subject id, block experiment Mixed effects Ef Sd T df Magnitude (%BOLD), stimulus average Contour is: average anatomy > 2000 Random /fixed effects sd smoothed mm FWHM (mm) P=0.05 threshold for peaks is +/ y (mm) x (mm)
Subject id, event experiment Mixed effects Ef Sd T df Magnitude (%BOLD), diff - same speaker Contour is: average anatomy > 2000 Random /fixed effects sd smoothed mm FWHM (mm) P=0.05 threshold for peaks is +/ y (mm) x (mm)
Subject id, block experiment Mixed effects Ef Sd T df Magnitude (%BOLD), diff - same speaker Contour is: average anatomy > 2000 Random /fixed effects sd smoothed mm FWHM (mm) P=0.05 threshold for peaks is +/ y (mm) x (mm)
Subject id, event experiment Mixed effects Ef Sd T df Magnitude (%BOLD), interaction Contour is: average anatomy > 2000 Random /fixed effects sd smoothed mm FWHM (mm) P=0.05 threshold for peaks is +/ y (mm) x (mm)
Subject id, block experiment Mixed effects Ef Sd T df Magnitude (%BOLD), interaction Contour is: average anatomy > 2000 Random /fixed effects sd smoothed mm FWHM (mm) P=0.05 threshold for peaks is +/ y (mm) x (mm)
STAT_SUMMARY example: single run, hot-warm Detected by DLM, but not by BON or RFT Detected by BON and DLM but not by RFT