Introduction to Permutation for Neuroimaging HST583 2017 Douglas N. Greve Martinos Center
Outline Two examples Implementation Why permutation works Limitations
fMRI Analysis Overview Subject 1 Preprocessing MC, STC, B0 Smoothing Normalization First Level GLM Analysis Raw Data C X Subject 2 Preprocessing MC, STC, B0 Smoothing Normalization First Level GLM Analysis Raw Data C X Higher Level GLM Subject 3 C X Preprocessing MC, STC, B0 Smoothing Normalization First Level GLM Analysis Correction for Multiple Comparisons Raw Data C X Subject 4 Preprocessing MC, STC, B0 Smoothing Normalization First Level GLM Analysis Cluster p<.05? Raw Data C X
fMRI Analysis Overview Subject 1 Preprocessing MC, STC, B0 Smoothing Normalization First Level GLM Analysis Raw Data C X Subject 2 Preprocessing MC, STC, B0 Smoothing Normalization First Level GLM Analysis Raw Data C X Higher Level GLM Subject 3 C X Preprocessing MC, STC, B0 Smoothing Normalization First Level GLM Analysis Correction for Multiple Comparisons Raw Data C X Subject 4 Preprocessing MC, STC, B0 Smoothing Normalization First Level GLM Analysis Cluster p<.05? Raw Data C X
Two examples of parametric statistics and their permutation counterparts
Parametric: GLM and t-Test Convert t to a p-value using Student’s t-Test
Parametric GLM: Two Group GLM Analysis = y = X*b Does Group 1 differ from Group 2? C = [1 -1], Contrast = C*b = bG1- bG2 Compute T from t-test t-test assumes: Gaussian (skew=0), independent, homoscedastic If not, then p-values are not accurate = 1 bG1 bG2 y = X*b
Two Group GLM using Permutation = 1 bG1 bG2 Shuffle y (permutation) Run analysis Compute simulation test statistic Ts Go back to step 1 Repeat a large (~10k) times, get 10k values of Ts Analyze your true data Compute observed test statistic To p value = How often To >Ts Justification: under the NULL, labelings in design matrix are irrelevant
Cluster Correction for Multiple Comparisons Sig Map pVox < .001 Cluster Map
Cluster Table Cluster Size (mm3) 1 51368 2 41184 3 3784 4 1768 R L
Parametric: Gaussian Random Field Theory pcluster = f(aVox,N,FWHM,ClusterSize) aVox Voxel-wise, Cluster-forming Threshold N – Search space. FWHM – Smoothing level ClusterSize – size of cluster to be tested pcluster – Cluster p-value Assumptions: Voxel-wise threshold sufficiently stringent (eg, .001, not .01) Smoothing level sufficiently high (eg, 2 voxels) Spatial smoothness is Gaussian (not to be confused with Gaussian noise assumptions)
Cluster Table Cluster Size (mm3) p-value 1 51368 .00015 2 41184 .00250 3784 .02600 4 1768 .04900 R L
Eklund Significance Statement “Here, we used resting-state fMRI data from 499 healthy controls to conduct 3 million task group analyses. Using this null data with different experimental designs, we estimate the incidence of significant results. In theory, we should find 5% false positives (for a significance threshold of 5%), but instead we found that the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.” Cluster Failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Eklund, Nichols, Knutsson, 2016. PNAS Problem extends to at least thickness and VBM analysis
Cluster Analysis using Permutation = 1 bG1 bG2 Shuffle y (permutation) Run analysis at every voxel Threshold, find max cluster size Ts Go back to step 1 Repeat a large (~10k) times, get 10k values of Ts Analyze your true data Compute observed cluster size To Cluster p value = How often To >Ts Permutation fixed problems found by Eklund, et al, 2016 as well as in thickness.
How permutation is implemented
… First, what is the “best” way to compute p-value True Experiment under Non-Null Conditions 1000 Experiments under Null Conditions Distribution of Si is the NULL distribution p value = #(So > Si)/1000 No/few assumptions, but very expensive and time consuming vs Parametric: compute Null distribution from data; done.
Permutation Distribution of is the NULL distribution True Experiment under Non-Null Conditions 1000 Permutations Distribution of is the NULL distribution p value = #(So > )/1000
Permutation: Shuffling and Sign Flipping Can also do both Which method you use depends on the nature of the data One-sample group mean has to be done with sign flipping
Implementation usually done by changing X = Swap Inputs 2 and 3 Swap Rows 2 and 3
Number of Permutations N observations: N! possible shufflings 2N possible sign flips May be a huge number of possibilities N=100 10157 shufflings Use a random subset of 1,000-10,000 Maybe smaller if using restricted exchanges
Why permutation works
Why Parametric Fails
Covariance Structure “Joint distribution” Nobservations x Nobservations
Covariance Structure Nobservations x Nobservations “iid” – independent, identically distributed
Reminder… Must have distribution of Si be that of the NULL. True Experiment under Non-Null Conditions 1000 Permutations Must have distribution of Si be that of the NULL. When does that happen?
Limitations on permutation
What happens to Sn under permutation Shuffle inputs 1 and 2 Flip sign of input 2. Diagonal not affected
“Exchangeability” – Joint distribution cannot change Shuffle All subjects have same var Cov between subjects same Exchangeable Errors (EE) Compound Symmetry Sign Flip Subjects can have diff var Cov must = 0 Independent, Symmetric Errors (ISE) “Exchangeability” – Joint distribution cannot change
Exchangeability and Off-Diagonals Caused by systematic variation in the residual, eg, if one input is above the mean, then another also tends to be above the mean In time series, correlation across time (temporal correlation) In cross-subject analysis, can be caused by continuous nuisance effects like age present under the NULL. Longitudinal Analysis, siblings, … Affects both shuffling (off-diagonals not equal) and sign flipping (off-diagonals not zero)
Restricted Exchangeability Covariance matrix may have block-like structure Eg, repeated (two) measures Two subjects, A and B Matrix not exchangeable!
Exchangeability and Skew Skew (3rd moment) indicates an asymmetric distribution Rayleigh, Poisson, Exponential, Gamma Eg, Income distribution Invalidates sign flipping, one sample group mean (see Eklund 2016) Statistical tests for skew Not a problem when computing a difference (eg, Group1 vs Group2)
Restricted Exchangeability Covariance matrix may have block-like structure Eg, repeated (two) measures Two subjects, A and B Matrix not exchangeable! Block Exchange: Shuffle A and B blocks without changing order inside A or B Can become quite elaborate Reduces total number of possible permutations Winkler, et al, 2014 & 2015, Neuroimage.
Permutation with Nuisance Variables Design matrix partitioning X – effects of interest Z – nuisance effects (eg, age) Many methods All approximate ter Braak – compute residuals from full model and permute residuals Freedman-Lane – recommended by Winkler 2014 Note: if age is a variable of interest, then you can just permute Winkler, et al, 2014, NI, Permutation Inference for the General Linear Model.
Computational Considerations Need several thousand iterations People used to getting parametric results instantly May take a few hours May take a few days Parallelize (CPUs, GPUs) Early termination Tail approximations Winkler, et al, 2016, NI. Faster permutation in brain imaging
Parametric vs Permutation Parametric Assumptions t-Test – Gaussian, independent, homoskedastic GRFT Cluster – Gaussian smoothness, smoothness level, cluster-forming threshold Parametric is Fast and Easy Permutation Makes many fewer assumptions Flexible – almost any quantity can be used, don’t need a closed-form solution of distribution Can be computationally intense Exchangeability is complicated May be a little less powerful than parametric when parametric assumptions are met
Summary Cluster Failure is driving interest in permutation Shuffling and sign flipping (or both) Exchangeability - joint distribution cannot change Restricted exchange Nuisance variables Skew