Sparse inference and large-scale multiple comparisons

Slides:

Advertisements

Similar presentations

Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research University of Zurich With many thanks for slides & images to: FIL Methods.

Advertisements

Topological Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course London, May 2014 Many thanks to Justin.

Classical inference and design efficiency Zurich SPM Course 2014

Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich With.

07/01/15 MfD 2014 Xin You Tai & Misun Kim

Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich With.

Multiple comparison correction Methods & models for fMRI data analysis 18 March 2009 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.

Connectivity between MS lesion density and cortical thickness Keith Worsley Arnaud Charil Jason Lerch Department of Mathematics and Statistics, McConnell.

Multiple comparison correction Methods & models for fMRI data analysis 29 October 2008 Klaas Enno Stephan Branco Weiss Laboratory (BWL) Institute for Empirical.

CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.

Connectivity of aMRI and fMRI data Keith Worsley Arnaud Charil Jason Lerch Francesco Tomaiuolo Department of Mathematics and Statistics, McConnell Brain.

General Linear Model & Classical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM M/EEGCourse London, May.

2nd Level Analysis Jennifer Marchant & Tessa Dekker

TSTAT_THRESHOLD (~1 secs execution) Calculates P=0.05 (corrected) threshold t for the T statistic using the minimum given by a Bonferroni correction and.

Detecting connectivity between images: MS lesions, cortical thickness, and the 'bubbles' task in an fMRI experiment Keith Worsley, Math + Stats, Arnaud.

Multiple Comparison Correction in SPMs Will Penny SPM short course, Zurich, Feb 2008 Will Penny SPM short course, Zurich, Feb 2008.

Random Field Theory Will Penny SPM short course, London, May 2005 Will Penny SPM short course, London, May 2005 David Carmichael MfD 2006 David Carmichael.

Basics of fMRI Inference Douglas N. Greve. Overview Inference False Positives and False Negatives Problem of Multiple Comparisons Bonferroni Correction.

Random field theory Rumana Chowdhury and Nagako Murase Methods for Dummies November 2010.

Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich With.

Keith Worsley Department of Mathematics and Statistics, and McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University Correlation.

FMRI Methods Lecture7 – Review: analyses & statistics.

Multiple comparisons in M/EEG analysis Gareth Barnes Wellcome Trust Centre for Neuroimaging University College London SPM M/EEG Course London, May 2013.

Methods for Dummies Random Field Theory Annika Lübbert & Marian Schneider.

Classical Inference on SPMs Justin Chumbley SPM Course Oct 23, 2008.

**please note** Many slides in part 1 are corrupt and have lost images and/or text. Part 2 is fine. Unfortunately, the original is not available, so please.

Correlation random fields, brain connectivity, and astrophysics Keith Worsley Arnaud Charil Jason Lerch Francesco Tomaiuolo Department of Mathematics and.

Random Field Theory Will Penny SPM short course, London, May 2005 Will Penny SPM short course, London, May 2005.

Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.

Spatial smoothing of autocorrelations to control the degrees of freedom in fMRI analysis Keith Worsley Department of Mathematics and Statistics, McGill.

Statistical Analysis An Introduction to MRI Physics and Analysis Michael Jay Schillaci, PhD Monday, April 7 th, 2007.

Detecting connectivity: MS lesions, cortical thickness, and the “bubbles” task in the fMRI scanner Keith Worsley, McGill (and Chicago) Nicholas Chamandy,

Multiple comparisons problem and solutions James M. Kilner

Topological Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course London, May 2015 With thanks to Justin.

Statistical analysis of fMRI data, ‘bubbles’ data, and the connectivity between the two Keith Worsley, McGill (and Chicago) Nicholas Chamandy, McGill and.

Mixed effects models for a hierarchical analysis of fMRI data, and Bayesian model selection applied to activation detection Keith Worsley 12, Chuanhong.

Yun, Hyuk Jin. Theory A.Nonuniformity Model where at location x, v is the measured signal, u is the true signal emitted by the tissue, is an unknown.

Keith Worsley, McGill Jonathan Taylor,

Group Analyses Guillaume Flandin SPM Course London, October 2016

Detecting sparse signals and sparse connectivity in scale-space, with applications to the 'bubbles' task in an fMRI experiment Keith Worsley, Nicholas.

Topological Inference

The general linear model and Statistical Parametric Mapping

2nd Level Analysis Methods for Dummies 2010/11 - 2nd Feb 2011

Statistics in MSmcDESPOT

The geometry of random fields in astrophysics and brain mapping

Model-driven statistical analysis of fMRI data

Winter Conference Borgafjäll

Detecting Sparse Connectivity: MS Lesions, Cortical Thickness, and the ‘Bubbles’ Task in an fMRI Experiment Keith Worsley, Nicholas Chamandy, McGill Jonathan.

Detecting Sparse Connectivity: MS Lesions, Cortical Thickness, and the ‘Bubbles’ Task in an fMRI Experiment Keith Worsley, Nicholas Chamandy, McGill Jonathan.

Maxima of discretely sampled random fields

Methods for Dummies Random Field Theory

Multiple comparisons in M/EEG analysis

Topological Inference

Contrasts & Statistical Inference

The General Linear Model

The statistical analysis of surface data

Statistical Parametric Mapping

PSY 626: Bayesian Statistics for Psychological Science

The general linear model and Statistical Parametric Mapping

The statistical analysis of fMRI data using FMRISTAT and MINC

Jonathan Taylor, Stanford Keith Worsley, McGill

Contrasts & Statistical Inference

The General Linear Model

Keith Worsley, McGill Jonathan Taylor,

Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich.

Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich.

Mixture Models with Adaptive Spatial Priors

The General Linear Model

Keith Worsley, Math + Stats,

Contrasts & Statistical Inference

Presentation transcript:

Sparse inference and large-scale multiple comparisons Maxima of discretely sampled random fields, with an application to ‘bubbles’ Keith Worsley, McGill Nicholas Chamandy, McGill and Google Jonathan Taylor, Stanford and Université de Montréal Frédéric Gosselin, Université de Montréal Philippe Schyns, Fraser Smith, Glasgow

What is ‘bubbles’?

Nature (2005)

Subject is shown one of 40 faces chosen at random … Happy Sad Fearful Neutral

… but face is only revealed through random ‘bubbles’ First trial: “Sad” expression Subject is asked the expression: “Neutral” Response: Incorrect 75 random bubble centres Smoothed by a Gaussian ‘bubble’ What the subject sees Sad

Your turn … Trial 2 Subject response: “Fearful” CORRECT

Your turn … Trial 3 Subject response: “Happy” INCORRECT (Fearful)

Your turn … Trial 4 Subject response: “Happy” CORRECT

Your turn … Trial 5 Subject response: “Fearful” CORRECT

Your turn … Trial 6 Subject response: “Sad” CORRECT

Your turn … Trial 7 Subject response: “Happy” CORRECT

Your turn … Trial 8 Subject response: “Neutral” CORRECT

Your turn … Trial 9 Subject response: “Happy” CORRECT

Your turn … Trial 3000 Subject response: “Happy” INCORRECT (Fearful)

Bubbles analysis E.g. Fearful (3000/4=750 trials): Trial 1 + 2 + 3 + 4 + 5 + 6 + 7 + … + 750 = Sum Correct trials Thresholded at proportion of correct trials=0.68, scaled to [0,1] Use this as a bubble mask Proportion of correct bubbles =(sum correct bubbles) /(sum all bubbles)

Happy Sad Fearful Neutral Results Mask average face But are these features real or just noise? Need statistics … Happy Sad Fearful Neutral

Statistical analysis Correlate bubbles with response (correct = 1, incorrect = 0), separately for each expression Equivalent to 2-sample Z-statistic for correct vs. incorrect bubbles, e.g. Fearful: Very similar to the proportion of correct bubbles: Z~N(0,1) statistic Trial 1 2 3 4 5 6 7 … 750 Response 0 1 1 0 1 1 1 … 1

Happy Sad Fearful Neutral Results Thresholded at Z=1.64 (P=0.05) Sparse inference and large-scale multiple comparisons - correction? Z~N(0,1) statistic Average face Happy Sad Fearful Neutral

Three methods so far The set-up: S is a subset of a D-dimensional lattice (e.g. pixels); Z(s) ~ N(0,1) at most points s in S; Z(s) ~ N(μ(s),1), μ(s)>0 at a sparse set of points; Z(s1), Z(s2) are spatially correlated. To control the false positive rate to ≤α we want a good approximation to α = P(maxS Z(s) ≥ t): Bonferroni (1936) Random field theory (1970’s) Discrete local maxima (2005, 2007)

P(maxS Z(s) ≥ t) = 0.05 Bonferroni Random field theory 1 2 3 4 5 6 7 8 9 10 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 P value FWHM (Full Width at Half Maximum) of smoothing filter -2 P(maxS Z(s) ≥ t) = 0.05 Bonferroni Random field theory Discrete local maxima Z(s)

Random field theory: Euler Characteristic (EC) = #blobs - #holes -4 -3 -2 -1 1 2 3 4 -20 -10 10 20 30 Threshold, t Euler Characteristic, EC Observed Expected Excursion set {s: Z(s) ≥ t } for neutral face EC = 0 0 -7 -11 13 14 9 1 0 Heuristic: At high thresholds t, the holes disappear, EC ~ 1 or 0, E(EC) ~ P(max Z ≥ t). Exact expression for E(EC) for all thresholds, E(EC) ~ P(max Z ≥ t) is extremely accurate.

Random field theory: The details Z ( s ) » N ; 1 i a n o t r p c G u d m ¯ e l , 2 < w h ¸ £ = V ¡ @ ¢ P µ x S ¶ ¼ E C \ : g z + A 3 v F W H M 4 . Intrinsic volumes or Minkowski functionals

Random field theory: The brain mapping version Z ( s ) i w h t e n o m d a r p c G u ¯ l F W H M x P µ 2 S ¸ ¶ ¼ E C \ : g = 1 ¡ z + 4 A 3 EC0(S) Resels0(S) Resels1(S) EC1(S) Resels2(S) EC2(S) Resels (Resolution elements) EC densities

Discrete local maxima Bonferroni applied to events: {Z(s) ≥ t and Z(s) is a discrete local maximum} i.e. {Z(s) ≥ t and neighbour Z’s ≤ Z(s)} Conservative If Z(s) is stationary, with Cor(Z(s1),Z(s2)) = ρ(s1-s2), all we need is P{Z(s) ≥ t and neighbour Z’s ≤ Z(s)} a (2D+1)-variate integral Z(s2) ≤ Z(s-1)≤ Z(s) ≥Z(s1) ≥ Z(s-2)

Discrete local maxima: “Markovian” trick If ρ is “separable”: s=(x,y), ρ((x,y)) = ρ((x,0)) × ρ((0,y)) e.g. Gaussian spatial correlation function: ρ((x,y)) = exp(-½(x2+y2)/w2) Then Z(s) has a “Markovian” property: conditional on central Z(s), Z’s on different axes are independent: Z(s±1) ┴ Z(s±2) | Z(s) So condition on Z(s)=z, find P{neighbour Z’s ≤ z | Z(s)=z} = dP{Z(s±d) ≤ z | Z(s)=z} then take expectations over Z(s)=z Cuts the (2D+1)-variate integral down to a bivariate integral Z(s2) ≤ Z(s-1)≤ Z(s) ≥Z(s1) ≥ Z(s-2)

T h e r s u l t o n y i v c a ½ b w j x g , = 1 ; : D . F G P Á ( z ) ¡ 2 ¼ © Z ¯ Q + ® µ ³ ´ - f m S ¸ ¶ · Y

Results, corrected for search Random field theory threshold: Z=3.92 (P=0.05) DLM threshold: Z=3.92 (P=0.05) – same Bonferroni threshold: Z=4.87 (P=0.05) – nothing Z~N(0,1) statistic Average face Happy Sad Fearful Neutral

Results, corrected for search FDR threshold: Z= 4.87 3.46 3.31 4.87 (Q=0.05) Z~N(0,1) statistic Average face Happy Sad Fearful Neutral

Comparison Bonferroni (1936) Discrete local maxima (2005, 2007) Conservative Accurate if spatial correlation is low Simple Discrete local maxima (2005, 2007) Accurate for all ranges of spatial correlation A bit messy Only easy for stationary separable Gaussian data on rectilinear lattices Even if not separable, always seems to be conservative Random field theory (1970’s) Approximation based on assuming S is continuous Accurate if spatial correlation is high Elegant Easily extended to non-Gaussian, non-isotropic random fields

Random field theory: Non-Gaussian non-iostropic ( s ) = Z 1 ; : n i a u c t o . d G r m ¯ e l » N , 2 < D w h V ³ @ ´ ¤ £ p b y L z - K g v S P µ x ¸ ¶ ¼ E C \ X ½ & A 3 B ± R j ¡ [ F W 7

Referee report Why bother? Why not just do simulations?

500 1000 First scan of fMRI data -5 5 T statistic for hot - warm effect 100 200 300 870 880 890 hot rest warm Highly significant effect, T=6.59 800 820 No significant effect, T=-0.74 790 810 Drift Time, seconds fMRI data: 120 scans, 3 scans each of hot, rest, warm, rest, hot, rest, … T = (hot – warm effect) / S.d. ~ t110 if no effect

Bubbles task in fMRI scanner Correlate bubbles with BOLD at every voxel: Calculate Z for each pair (bubble pixel, fMRI voxel) – a 5D “image” of Z statistics … Trial 1 2 3 4 5 6 7 … 3000 fMRI

Discussion: thresholding Thresholding in advance is vital, since we cannot store all the ~1 billion 5D Z values Resels5=(image Resels2=146.2) × (fMRI Resels3=1057.2) for P=0.05, threshold is t = 6.22 (approx) The threshold based on Gaussian RFT can be improved using new non-Gaussian RFT based on saddle-point approximations (Chamandy et al., 2006) Model the bubbles as a smoothed Poisson point process The improved thresholds are slightly lower, so more activation is detected Only keep 5D local maxima Z(pixel, voxel) > Z(pixel, 6 neighbours of voxel) > Z(4 neighbours of pixel, voxel)

Discussion: modeling The random response is Y=1 (correct) or 0 (incorrect), or Y=fMRI The regressors are Xj=bubble mask at pixel j, j=1 … 240x380=91200 (!) Logistic regression or ordinary regression: logit(E(Y)) or E(Y) = b0+X1b1+…+X91200b91200 But there are only n=3000 observations (trials) … Instead, since regressors are independent, fit them one at a time: logit(E(Y)) or E(Y) = b0+Xjbj However the regressors (bubbles) are random with a simple known distribution, so turn the problem around and condition on Y: E(Xj) = c0+Ycj Equivalent to conditional logistic regression (Cox, 1962) which gives exact inference for b1 conditional on sufficient statistics for b0 Cox also suggested using saddle-point approximations to improve accuracy of inference … Interactions? logit(E(Y)) or E(Y)=b0+X1b1+…+X91200b91200+X1X2b1,2+ …

P(maxS Z(s) ≥ t) = 0.05 Bonferroni Random field theory 1 2 3 4 5 6 7 8 9 10 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 P value FWHM (Full Width at Half Maximum) of smoothing filter -2 P(maxS Z(s) ≥ t) = 0.05 Bonferroni Random field theory Discrete local maxima Z(s)

Bayesian Model Selection (thanks to Ed George) Z-statistic at voxel i is Zi ~ N(mi,1), i = 1, … , n Most of the mi’s are zero (unactivated voxels) and a few are non-zero (activated voxels), but we do not know which voxels are activated, and by how much (mi) This is a model selection problem, where we add an extra model parameter (mi) for the mean of each activated voxel Simple Bayesian set-up: each voxel is independently active with probability p the activation is itself drawn independently from a Gaussian distribution: mi ~ N(0,c) The hyperparameter p controls the expected proportion of activated voxels, and c controls their expected activation

Surprise! This prior setup is related to the canonical penalized sum-of-squares criterion AF = Σactivated voxels Zi2 – F q where - q is the number of activated voxels and - F is a fixed penalty for adding an activated voxel Popular model selection criteria simply entail - maximizing AF for a particular choice of F - which is equivalent to thresholding the image at √F Some choices of F: - F = 0 : all voxels activated - F = 2 : Mallow’s Cp and AIC F = log n : BIC F = 2 log n : RIC P(Z > √F) = 0.05/n : Bonferroni (almost same as RIC!)

The Bayesian relationship with AF is obtained by re-expressing the posterior of the activated voxels, given the data: P(activated voxels | Z’s) α exp ( c/2(1+c) AF ) where F = (1+c)/c {2 log[(1-p)/p] + log(1+c)} Since p and c control the expected number and size of the activation, the dependence of F on p and c provides an implicit connection between the penalty F and the sorts of models for which its value may be appropriate

The awful truth: p and c are unknown Empirical Bayes idea: use p and c that maximize the marginal likelihood, which simplifies to L(p,c | Z’s) α Пi [ (1-p)exp(-Zi2/2) + p(1+c)-1/2exp(-Zi2/2(1+c) ) ] This is identical to fitting a classic mixture model with - a probability of (1-p) that Zi ~ N(0,1) - a probability of p that Zi ~ N(0,c) - √F is the value of Z where the two components are equal Using these estimated values of p and c gives us an adaptive penalty F, or equivalently a threshold √F, that is implicitly based on the SPM All we have to do is fit the mixture model … but does it work?

Same data as before: hot – warm stimulus, four runs: - proportion of activated voxels p = 0.57 - variance of activated voxels c = 5.8 (sd = 2.4) - penalty F = 1.59 (a bit like AIC) - threshold √F = 1.26 (?) seems a bit low … AIC: √F = 2 FDR (0.05): √F = 2.67 BIC: √F = 3.21 RIC: √F = 4.55 Bon (0.05): √F = 4.66 Null model N(0,1) √F threshold where components are equal Histogram of SPM (n=30786): Mixture 57% activated voxels, N(0,5.8) 43% unactivated voxels, N(0,1) Z

Same data as before: hot – warm stimulus, one run: - proportion of activated voxels p = 0.80 - variance of activated voxels c = 1.55 - penalty F = -3.02 (?) - all voxels activated !!!!!! What is going on? AIC: √F = 2 FDR (0.05): √F = 2.67 BIC: √F = 3.21 RIC: √F = 4.55 Bon (0.05): √F = 4.66 Null model N(0,1) components are never equal! Histogram of SPM (n=30768): 80% activated voxels, N(0,1.55) Mixture 20% unactivated voxels, N(0,1) Z

MS lesions and cortical thickness Idea: MS lesions interrupt neuronal signals, causing thinning in down-stream cortex Data: n = 425 mild MS patients 5.5 5 4.5 4 Average cortical thickness (mm) 3.5 3 2.5 Correlation = -0.568, T = -14.20 (423 df) 2 Charil et al, NeuroImage (2007) 1.5 10 20 30 40 50 60 70 80 Total lesion volume (cc)

MS lesions and cortical thickness at all pairs of points Dominated by total lesions and average cortical thickness, so remove these effects Cortical thickness CT, smoothed 20mm Subtract average cortical thickness Lesion density LD, smoothed 10mm Find partial correlation(lesion density, cortical thickness) removing total lesion volume linear model: CT-av(CT) ~ 1 + TLV + LD, test for LD Repeat of all voxels in 3D, nodes in 2D ~1 billion correlations, so thresholding essential! Look for high negative correlations …

Thresholding? Cross correlation random field Correlation between 2 fields at 2 different locations, searched over all pairs of locations one in R (D dimensions), one in S (E dimensions) sample size n MS lesion data: P=0.05, c=0.300, T=6.46 Cao & Worsley, Annals of Applied Probability (1999)

Cluster extent rather than peak height (Friston, 1994) Choose a lower level, e.g. t=3.11 (P=0.001) Find clusters i.e. connected components of excursion set Measure cluster extent by Distribution: fit a quadratic to the peak: Dbn. of maximum cluster extent: Bonferroni on N = #clusters ~ E(EC). Z D=1 extent L D t Peak height s L D ( c l u s t e r ) » Y Â ® k Cao, Advances in Applied Probability (1999)

How do you choose the threshold t for defining clusters? If signal is focal i.e. ~FWHM of noise Choose a high threshold i.e. peak height is better If signal is broad i.e. >>FWHM of noise Choose a low threshold i.e. cluster extent is better Conclusion: cluster extent is better for detecting broad signals Alternative: smooth data with filter that matches signal (Matched Filter Theorem)… try range of filter widths … scale space search … correct using random field theory … a lot of work … Cluster extent is easier!

Thresholding? Cross correlation random field Correlation between 2 fields at 2 different locations, searched over all pairs of locations one in R (D dimensions), one in S (E dimensions) MS lesion data: P=0.05, c=0.300, T=6.46 P µ m a x R ; S C o r e l t i n > c ¶ ¼ D X d = E L ( ) ½ ¡ 1 ! 2 + k j Cao & Worsley, Annals of Applied Probability (1999)

‘Conditional’ histogram: scaled to same max at each distance 0.5 1 1.5 2 2.5 x 10 5 correlation Same hemisphere 50 100 150 -0.5 -0.4 -0.3 -0.2 -0.1 0.1 0.2 0.4 0.6 0.8 distance (mm) Different hemisphere Histogram threshold threshold ‘Conditional’ histogram: scaled to same max at each distance threshold threshold

The details …

2 Tube(S,r) r S

3

B A

6 Λ is big TubeΛ(S,r) S r Λ is small

2 ν U(s1) s1 S S Tube Tube s2 s3 U(s3) U(s2)

Z2 R r Tube(R,r) Z1 N2(0,I)

Tube(R,r) R z t-r t z1 Tube(R,r) r R R z2 z3

Summary

Comparison Both depend on average correct bubbles, rest is ~ constant Z=(Average correct bubbles average incorrect bubbles) / pooled sd Proportion correct bubbles = Average correct bubbles / (average all bubbles * 4)

Random field theory results For searching in D (=2) dimensions, P-value of max Z is P(maxs Z(s) ≥ t) ~ E( Euler characteristic of {s: Z(s) ≥ t}) = ReselsD(S) × ECD(t) (+ boundary terms) ReselsD(S) = Image area / (bubble FWHM)2 = 146.2 (unitless) ECD(t) = (4 log(2))D/2 tD-1 exp(-t2/2) / (2π)(D+1)/2