Download presentation
Presentation is loading. Please wait.
Published byMarian Anderson Modified over 9 years ago
1
1 Massively Univariate Inference for fMRI Thomas Nichols, Ph.D. Assistant Professor Department of Biostatistics University of Michigan http://www.sph.umich.edu/~nichols ISBI April 15, 2004
2
2 Introduction & Overview Inference in fMRI –Where’s the blob!? Overview I.Statistics Background II.Assessing Statistic Images III.Thresholding Statistic Images
3
I. Statistics Background3 Hypothesis Testing Fixed vs Random Effects Nonparametric/resampling inference
4
I. Statistics Background4 Null Hypothesis H 0 Test statistic T –t observed realization of T level –Acceptable false positive rate –P( T>u | H 0 ) = P-value –Assessment of t assuming H 0 –P( T > t | H 0 ) Prob. of obtaining stat. as large or larger in a new experiment –P(Data|Null) not P(Null|Data) Hypothesis Testing uu Null Distribution of T t P-val Null Distribution of T
5
I. Statistics Background5 Random Effects Models GLM has only one source of randomness –Residual error But people are another source of error –Everyone activates somewhat differently…
6
I. Statistics Background6 Subj. 1 Subj. 2 Subj. 3 Subj. 4 Subj. 5 Subj. 6 0 Fixed vs. Random Effects Fixed Effects –Intra-subject variation suggests all these subjects different from zero Random Effects –Inter-subject variation suggests population not very different from zero Sampling dist n s of each subject’s effect Population dist n of effect
7
I. Statistics Background7 Random Effects for fMRI Summary Statistic Approach –Easy Create contrast images for each subject Analyze contrast images with one-sample t –Limited Only allows one scan per subject Assumes balanced designs and homogeneous meas. error. Full Mixed Effects Analysis –Hard Requires iterative fitting REML to estimate inter- and intra subject variance –SPM2 & FSL implement this, very differently –Very flexible
8
I. Statistics Background8 Nonparametric Inference Parametric methods –Assume distribution of statistic under null hypothesis –Needed to find P-values, u Nonparametric methods –Use data to find distribution of statistic under null hypothesis –Any statistic! 5% Parametric Null Distribution 5% Nonparametric Null Distribution
9
I. Statistics Background9 Nonparametric Inference: Permutation Test Assumptions –Null Hypothesis Exchangeability Method –Compute statistic t –Resample data (without replacement), compute t * –{t * } permutation distribution of test statistic –P-value = #{ t * > t } / #{ t * } Theory –Given data and H 0, each t * has equal probability –Still can assume data randomly drawn from population
10
I. Statistics Background10 Permutation Test Toy Example Data from V1 voxel in visual stim. experiment A: Active, flashing checkerboard B: Baseline, fixation 6 blocks, ABABAB Just consider block averages... Null hypothesis H o –No experimental effect, A & B labels arbitrary Statistic –Mean difference ABABAB 103.0090.4899.9387.8399.7696.06
11
I. Statistics Background11 Permutation Test Toy Example Under H o –Consider all equivalent relabelings –Compute all possible statistic values –Find 95%ile of permutation distribution AAABBB 4.82ABABAB 9.45BAAABB -1.48BABBAA -6.86 AABABB -3.25ABABBA 6.97BAABAB 1.10BBAAAB 3.15 AABBAB -0.67ABBAAB 1.38BAABBA -1.38BBAABA 0.67 AABBBA -3.15ABBABA -1.10BABAAB -6.97BBABAA 3.25 ABAABB 6.86ABBBAA 1.48BABABA -9.45BBBAAA -4.82
12
I. Statistics Background12 Permutation Test Toy Example Under H o –Consider all equivalent relabelings –Compute all possible statistic values –Find 95%ile of permutation distribution 048-4-8
13
I. Statistics Background13 Permutation Test Strengths Requires only assumption of exchangeability –Under Ho, distribution unperturbed by permutation –Allows us to build permutation distribution Subjects are exchangeable –Under Ho, each subject’s A/B labels can be flipped fMRI scans not exchangeable under Ho –Due to temporal autocorrelation
14
I. Statistics Background14 Permutation Test Limitations Computational Intensity –Analysis repeated for each relabeling –Not so bad on modern hardware No analysis discussed below took more than 3 hours Implementation Generality –Each experimental design type needs unique code to generate permutations Not so bad for population inference with t-tests
15
I. Statistics Background15 Nonparametric Inference: The Bootstrap Theoretical differences –Independence instead of exchangeability –Asymtotically valid For each design, finite sample size properties should be evaluated Practical differences –Resample residuals with replacement – Residuals, not data, resampled (1) Estimate model parameter b or statistic t (2) Create residuals (3) Resample residuals e * (4) Add model fit to e*, constitute Bootstrap dataset y * (5) Estimate b* or t* using y * (6) Repeat (3)-(5) to build Bootstrap distribution of b * or t * Many important details –See books: Efron & Tibshirani (1993), Davison & Hinkley (1997)
16
II. Assessing Statistic Images16 I. Assessing Statistic Images Where’s the signal? t > 0.5 t > 3.5t > 5.5 High ThresholdMed. ThresholdLow Threshold Good Specificity Poor Power (risk of false negatives) Poor Specificity (risk of false positives) Good Power...but why threshold?!
17
II. Assessing Statistic Images17 Blue-sky inference: What we’d like Don’t threshold, model! –Signal location? Estimates and CI’s on (x,y,z) location –Signal intensity? CI’s on % change –Spatial extent? Estimates and CI’s on activation volume Robust to choice of cluster definition...but this requires an explicit spatial model
18
II. Assessing Statistic Images18 Blue-sky inference...a spatial model? No routine spatial modeling methods exist –High-dimensional mixture modeling problem –Activations don’t look like Gaussian blobs –Need realistic shapes, sparse representation One initial attempt: –Hartvig, N. and Jensen, J. (2000). HBM, 11(4):233-248.
19
II. Assessing Statistic Images19 Blue-sky inference: What we get Signal location –Local maximum (no inference) –Center-of-mass (no inference) Sensitive to blob-defining-threshold Signal intensity –Local maximum intensity (P-value, CI’s) Spatial extent –Volume above blob-defining-threshold (P-value, no CI’s) Sensitive to blob-defining-threshold
20
II. Assessing Statistic Images20 Voxel-wise Tests Retain voxels above -level threshold u Gives best spatial specificity –The null hyp. at a single voxel can be rejected Significant Voxels space uu No significant Voxels
21
II. Assessing Statistic Images21 Cluster-wise tests Two step-process –Define clusters by arbitrary threshold u clus –Retain clusters larger than -level threshold k Cluster not significant u clus space Cluster significant kk kk
22
II. Assessing Statistic Images22 Inference on Images Cluster-wise tests Typically better sensitivity Worse spatial specificity –The null hyp. of entire cluster is rejected –Only means that one or more of voxels in cluster active Cluster not significant u clus space Cluster significant kk kk
23
II. Assessing Statistic Images23 Voxel-Cluster Inference Joint Inference –Combine both voxel height T & cluster size C –Poline et al used bivariate probability JCBFM (1994) 14:639-642. P( max x Cluster T(x) > t, C > c ) Only for Gaussian images, Gaussian ACF –Bullmore et al use “cluster mass” IEEE-TMI (1999) 18(1):32-42. Test statistic = x Cluster (T(x)-u clus ) dx No RFT result, uses permutation test
24
II. Assessing Statistic Images24 Voxel-Cluster Inference Not a new “level” of inference –cf, Friston et al (1996), Detecting Activations in PET and fMRI: Levels of Inference and Power. NI 4:223- 325 Joint null is intersection of nulls –Joint null = {Voxel null} {Cluster null} –If joint null rejected, only know one or other is false –Can’t point to any voxel as active –Joint voxel-cluster inference really just “Intensity-informed cluster-wise inference”
25
II. Assessing Statistic Images25 Multiple Comparisons Problem Which of 100,000 voxels are sig.? – =0.05 5,000 false positive voxels Which of (random number, say) 100 clusters significant? – =0.05 5 false positives clusters t > 0.5 t > 1.5 t > 2.5 t > 3.5 t > 4.5 t > 5.5t > 6.5
26
II. Assessing Statistic Images26 MCP Solutions: Measuring False Positives Familywise Error Rate (FWER) –Familywise Error Existence of one or more false positives –FWER is probability of familywise error False Discovery Rate (FDR) –FDR = E(V/R) –R voxels declared active, V falsely so Realized false discovery rate: V/R
27
II. Assessing Statistic Images27 Measuring False Positives FWER vs FDR Signal+Noise Noise
28
II. Assessing Statistic Images28 FWE 6.7% 10.4%14.9%9.3%16.2%13.8%14.0% 10.5%12.2%8.7% Control of Familywise Error Rate at 10% 11.3% 12.5%10.8%11.5%10.0%10.7%11.2%10.2%9.5% Control of Per Comparison Rate at 10% Percentage of Null Pixels that are False Positives Control of False Discovery Rate at 10% Occurrence of Familywise Error Percentage of Activated Pixels that are False Positives
29
II. Assessing Statistic Images29 MCP Solutions: Measuring False Positives Familywise Error Rate (FWER) –Familywise Error Existence of one or more false positives –FWER is probability of familywise error False Discovery Rate (FDR) –FDR = E(V/R) –R voxels declared active, V falsely so Realized false discovery rate: V/R
30
III. Thresholding Statistic Images30 III. Thresholding Statistic Images FWER MCP Solutions Bonferroni Maximum Distribution Methods –Random Field Theory –Permutation
31
III. Thresholding Statistic Images31 FWE MCP Solutions: Bonferroni For a statistic image T... –T i i th voxel of statistic image T...use = 0 /V – 0 FWER level (e.g. 0.05) –V number of voxels –u -level statistic threshold, P(T i u ) = By Bonferroni inequality... FWER= P(FWE) = P( i {T i u } | H 0 ) i P( T i u | H 0 ) = i = i 0 /V = 0
32
III. Thresholding Statistic Images32 FWER MCP Solutions Bonferroni Maximum Distribution Methods –Random Field Theory –Permutation
33
III. Thresholding Statistic Images33 FWER MCP Solutions: Controlling FWER w/ Max FWER & distribution of maximum FWER= P(FWE) = P( i {T i u} | H o ) = P( max i T i u | H o ) 100(1- )%ile of max dist n controls FWER FWER = P( max i T i u | H o ) = –where u = F -1 max (1- ). uu
34
III. Thresholding Statistic Images34 FWER MCP Solutions: Random Field Theory Euler Characteristic u –Topological Measure #blobs - #holes –At high thresholds, just counts blobs –FWER= P(Max voxel u | H o ) = P(One or more blobs | H o ) P( u 1 | H o ) E( u | H o ) Random Field Suprathreshold Sets Threshold No holes Never more than 1 blob
35
III. Thresholding Statistic Images35 RFT Details: Expected Euler Characteristic E( u ) ( ) | | 1/2 (u 2 -1) exp(-u 2 /2) / (2 ) 2 – Search region R 3 – ( volume – | | 1/2 roughness Assumptions –Multivariate Normal –Stationary* –ACF twice differentiable at 0 *Stationary –Results valid w/out stationary –More accurate when stat. holds Only very upper tail approximates 1-F max (u)
36
III. Thresholding Statistic Images36 Random Field Theory Smoothness Parameterization E( u ) depends on | | 1/2 – roughness matrix: Smoothness parameterized as Full Width at Half Maximum –FWHM of Gaussian kernel needed to smooth a white noise random field to roughness Autocorrelation Function FWHM
37
III. Thresholding Statistic Images37 RESELS – Resolution Elements –1 RESEL = FWHM x FWHM y FWHM z –RESEL Count R R = ( ) | | Volume of search region in units of smoothness Eg: 10 voxels, 2.5 FWHM, 4 RESELS Wrong RESEL interpretation –“Number of independent ‘things’ in the image” See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res.. Random Field Theory Smoothness Parameterization
38
III. Thresholding Statistic Images38 Random Field Intuition Corrected P-value for voxel value t P c = P(max T > t) E( t ) ( ) | | 1/2 t 2 exp(-t 2 /2) Statistic value t increases –P c decreases (but only for large t) Search volume increases –P c increases (more severe MCP) Roughness increases (Smoothness decreases) –P c increases (more severe MCP)
39
III. Thresholding Statistic Images39 General form for expected Euler characteristic 2, F, & t fields restricted search regions D dimensions E [ u ( )] = d R d ( ) d ( u ) RFT Details: Super General Formula R d ( ):d-dimensional Minkowski functional of – function of dimension, space and smoothness: R 0 ( )= ( ) Euler characteristic of R 1 ( )=resel diameter R 2 ( )=resel surface area R 3 ( )=resel volume d ( ):d-dimensional EC density of Z(x) – function of dimension and threshold, specific for RF type: E.g. Gaussian RF: 0 (u)=1- (u) 1 (u)=(4 ln2) 1/2 exp(-u 2 /2) / (2 ) 2 (u)=(4 ln2) exp(-u 2 /2) / (2 ) 3/2 3 (u)=(4 ln2) 3/2 (u 2 -1) exp(-u 2 /2) / (2 ) 2 4 (u)=(4 ln2) 2 (u 3 -3u) exp(-u 2 /2) / (2 ) 5/2
40
III. Thresholding Statistic Images40 5mm FWHM 10mm FWHM 15mm FWHM Expected Cluster Size –E(S) = E(N)/E(L) –S cluster size –N suprathreshold volume ( T > u clus }) –L number of clusters E(N) = ( ) P( T > u clus ) E(L) E( u ) –Assuming no holes Random Field Theory Cluster Size Tests
41
III. Thresholding Statistic Images41 Random Field Theory Cluster Size Distribution Gaussian Random Fields (Nosko, 1969) –D: Dimension of RF t Random Fields (Cao, 1999) –B: Beta dist n –U’s: 2 ’s –c chosen s.t. E(S) = E(N) / E(L)
42
III. Thresholding Statistic Images42 Random Field Theory Cluster Size Corrected P-Values Previous results give uncorrected P-value Corrected P-value –Bonferroni Correct for expected number of clusters Corrected P c = E(L) P uncorr –Poisson Clumping Heuristic (Adler, 1980) Corrected P c = 1 - exp( -E(L) P uncorr )
43
III. Thresholding Statistic Images43 Random Field Theory Limitations Sufficient smoothness –FWHM smoothness 3-4 times voxel size –More like ~10 times for low-df data Smoothness estimation –Estimate is biased when images not sufficiently smooth Multivariate normality –Virtually impossible to check Several layers of approximations Lattice Image Data Continuous Random Field
44
III. Thresholding Statistic Images44 Real Data fMRI Study of Working Memory –12 subjects, block design Marshuetz et al (2000) –Item Recognition Active:View five letters, 2s pause, view probe letter, respond Baseline: View XXXXX, 2s pause, view Y or N, respond Second Level RFX –Difference image, A-B constructed for each subject –One sample t test... D yes... UBKDA Active... N no... XXXXX Baseline
45
III. Thresholding Statistic Images45 Real Data: RFT Result Threshold –S = 110,776 –2 2 2 voxels 5.1 5.8 6.9 mm FWHM –u = 9.870 Result –5 voxels above the threshold –0.0063 minimum FWE-corrected p-value -log 10 p-value
46
III. Thresholding Statistic Images46 FWER-controlling MCP Solutions Bonferroni Maximum Distribution Methods –Random Field Theory –Permutation
47
III. Thresholding Statistic Images47 Controlling FWER: Permutation Test Parametric methods –Assume distribution of max statistic under null hypothesis Nonparametric methods –Use data to find distribution of max statistic under null hypothesis –Any max statistic! 5% Parametric Null Max Distribution 5% Nonparametric Null Max Distribution
48
III. Thresholding Statistic Images48 Permutation Test Other Statistics Nonparametric allows arbitrary statistics –Standard ones are usually best, but... Consider smoothed variance t statistic –To regularize low-df variance estimate skip
49
III. Thresholding Statistic Images49 Permutation Test Smoothed Variance t Nonparametric allows arbitrary statistics –Standard ones are usually best, but... Consider smoothed variance t statistic t-statistic variance mean difference
50
III. Thresholding Statistic Images50 Permutation Test Smoothed Variance t Nonparametric allows arbitrary statistics –Standard ones are usually best, but... Consider smoothed variance t statistic Smoothed Variance t-statistic mean difference smoothed variance
51
III. Thresholding Statistic Images51 Real Data: Permutation Result Permute! –2 12 = 4,096 ways to flip 12 A/B labels –For each, note maximum of t image. Permutation Distribution Maximum t Orthogonal Slice Overlay Thresholded t
52
III. Thresholding Statistic Images52 t 11 Statistic, RF & Bonf. Threshold t 11 Statistic, Nonparametric Threshold u RF = 9.87 u Bonf = 9.80 5 sig. vox. u Perm = 7.67 58 sig. vox. Smoothed Variance t Statistic, Nonparametric Threshold 378 sig. vox. Test Level vs. t 11 Threshold
53
III. Thresholding Statistic Images Does this Generalize? RFT vs Bonf. vs Perm.
54
III. Thresholding Statistic Images RFT vs Bonf. vs Perm.
55
III. Thresholding Statistic Images55 MCP Solutions: Measuring False Positives Familywise Error Rate (FWER) –Familywise Error Existence of one or more false positives –FWER is probability of familywise error False Discovery Rate (FDR) –FDR = E(V/R) –R voxels declared active, V falsely so Realized false discovery rate: V/R
56
III. Thresholding Statistic Images56 Controlling FDR: Benjamini & Hochberg Select desired limit q on E(FDR) Order p-values, p (1) p (2) ... p (V) Let r be largest i such that Reject all hypotheses corresponding to p (1),..., p (r). p (i) i/V q p(i)p(i) i/Vi/V i/V q p-value 01 0 1
57
III. Thresholding Statistic Images57 Benjamini & Hochberg: Key Properties FDR is controlled E(obsFDR) q m 0 /V –Conservative, if large fraction of nulls false Adaptive –Threshold depends on amount of signal More signal, More small p-values, More p (i) less than i/V q/c(V)
58
III. Thresholding Statistic Images58 P-value threshold when no signal: /V P-value threshold when all signal: Ordered p-values p (i) Fractional index i/V Adaptiveness of Benjamini & Hochberg FDR
59
III. Thresholding Statistic Images59 Benjamini & Hochberg Procedure c(V) = 1 –Positive Regression Dependency on Subsets P(X 1 c 1, X 2 c 2,..., X k c k | X i =x i ) is non-decreasing in x i Only required of test statistics for which null true Special cases include –Independence –Multivariate Normal with all positive correlations –Same, but studentized with common std. err. c(V) = i=1,...,V 1/i log(V)+0.5772 –Arbitrary covariance structure Benjamini & Yekutieli (2001). Ann. Stat. 29:1165-1188
60
III. Thresholding Statistic Images60 Benjamini & Hochberg Dependence Intuition Illustrating BH under dependence –Extreme example of positive dependence p(i)p(i) i/Vi/V i/V q/c(V) p-value 01 0 1 8 voxel image 32 voxel image (interpolated from 8 voxel image)
61
III. Thresholding Statistic Images61 Other FDR Methods John Storey JRSS-B (2002) 64:479-498 –pFDR “Positive FDR” FDR conditional on one or more rejections Critical threshold is fixed, not estimated pFDR and Bayes: pFDR(t) = P( H=0 | T > t ) –Asymptotically valid under “clumpy” dependence James Troendle JSPI (2000) 84:139-158 –Normal theory FDR More powerful than BH FDR Requires numerical integration to obtain thresholds –Exactly valid if whole correlation matrix known
62
III. Thresholding Statistic Images62 FWER Perm. Thresh. = 7.67 58 voxels Real Data: FDR Example FDR Threshold = 3.83 3,073 voxels Threshold –Indep/PosDep u = 3.83 –Arb Cov u = 13.15 Result –3,073 voxels above Indep/PosDep u –<0.0001 minimum FDR-corrected p-value
63
63 Conclusions Must account for multiplicity –Otherwise have a fishing expedition FWER –Very specific, not very sensitive FDR –Less specific, more sensitive –Sociological calibration still underway Intuition is not everything –Lots of details not covered here –See references, and reference’s references
64
64 References Most of this talk covered in these papers TE Nichols & S Hayasaka, Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review. Statistical Methods in Medical Research, 12(5): 419-446, 2003. TE Nichols & AP Holmes, Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping, 15:1-25, 2001. CR Genovese, N Lazar & TE Nichols, Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate. NeuroImage, 15:870-878, 2002.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.