Issues with analysis and interpretation

Slides:



Advertisements
Similar presentations
Mkael Symmonds, Bahador Bahrami
Advertisements

Statistical Inference and Random Field Theory Will Penny SPM short course, London, May 2003 Will Penny SPM short course, London, May 2003 M.Brett et al.
Bayesian models for fMRI data
Topological Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course London, May 2014 Many thanks to Justin.

Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich With.
07/01/15 MfD 2014 Xin You Tai & Misun Kim
Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich With.
Multiple comparison correction Methods & models for fMRI data analysis 18 March 2009 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
Circular analysis in systems neuroscience – with particular attention to cross-subject correlation mapping Nikolaus Kriegeskorte Laboratory of Brain and.
Comparison of Parametric and Nonparametric Thresholding Methods for Small Group Analyses Thomas Nichols & Satoru Hayasaka Department of Biostatistics U.
Multiple comparison correction Methods & models for fMRI data analysis 29 October 2008 Klaas Enno Stephan Branco Weiss Laboratory (BWL) Institute for Empirical.
False Discovery Rate Methods for Functional Neuroimaging Thomas Nichols Department of Biostatistics University of Michigan.
General Linear Model & Classical Inference
2nd Level Analysis Jennifer Marchant & Tessa Dekker
Multiple Comparison Correction in SPMs Will Penny SPM short course, Zurich, Feb 2008 Will Penny SPM short course, Zurich, Feb 2008.
Random Field Theory Will Penny SPM short course, London, May 2005 Will Penny SPM short course, London, May 2005 David Carmichael MfD 2006 David Carmichael.
Basics of fMRI Inference Douglas N. Greve. Overview Inference False Positives and False Negatives Problem of Multiple Comparisons Bonferroni Correction.
With a focus on task-based analysis and SPM12
Computational Biology Jianfeng Feng Warwick University.
Multiple comparisons in M/EEG analysis Gareth Barnes Wellcome Trust Centre for Neuroimaging University College London SPM M/EEG Course London, May 2013.
Methods for Dummies Random Field Theory Annika Lübbert & Marian Schneider.
**please note** Many slides in part 1 are corrupt and have lost images and/or text. Part 2 is fine. Unfortunately, the original is not available, so please.
Random Field Theory Will Penny SPM short course, London, May 2005 Will Penny SPM short course, London, May 2005.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
1 Identifying Robust Activation in fMRI Thomas Nichols, Ph.D. Assistant Professor Department of Biostatistics University of Michigan
Statistical Analysis An Introduction to MRI Physics and Analysis Michael Jay Schillaci, PhD Monday, April 7 th, 2007.
Multiple comparisons problem and solutions James M. Kilner
Topological Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course London, May 2015 With thanks to Justin.
False Discovery Rate for Functional Neuroimaging Thomas Nichols Department of Biostatistics University of Michigan Christopher Genovese & Nicole Lazar.
General Linear Model & Classical Inference London, SPM-M/EEG course May 2016 Sven Bestmann, Sobell Department, Institute of Neurology, UCL
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
What a Cluster F… ailure!
Logic of Hypothesis Testing
Group Analyses Guillaume Flandin SPM Course London, October 2016
Statistics for fMRI.
Topological Inference
General Linear Model & Classical Inference
The general linear model and Statistical Parametric Mapping
The General Linear Model
2nd Level Analysis Methods for Dummies 2010/11 - 2nd Feb 2011
Correction for multiple comparisons in FreeSurfer
Introduction to Permutation for Neuroimaging HST
Statistical Inference
Inferential Statistics
Wellcome Trust Centre for Neuroimaging University College London
Methods for Dummies Random Field Theory
The General Linear Model (GLM)
Methods for Dummies Second-level Analysis (for fMRI)
Chapter 11: Introduction to Hypothesis Testing Lecture 5b
Topological Inference
Contrasts & Statistical Inference
The General Linear Model
Statistical Parametric Mapping
The general linear model and Statistical Parametric Mapping
SPM2: Modelling and Inference
The General Linear Model
Hierarchical Models and
The General Linear Model (GLM)
Contrasts & Statistical Inference
Bayesian Inference in SPM2
The General Linear Model
Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich.
Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich.
WellcomeTrust Centre for Neuroimaging University College London
Statistical Inference
The General Linear Model
The General Linear Model
Will Penny Wellcome Trust Centre for Neuroimaging,
Contrasts & Statistical Inference
Presentation transcript:

Issues with analysis and interpretation Micha Heilbron, UCL Daniel Yon, Birkbeck College London

Outline Recap: errors in statistical inference Multiple comparisons problem (and solutions) Non-independent analyses Erroneous interpretations of interactions Violating assumptions in parametric methods

Classical statistical inference? Prove me wrong!  

  Probability t value

Calculate an priori false positive rate you’d be happy with (e. g. α = Probability t value

  Probability t value

Type I and Type II Errors Each voxel can therefore be classified as one of four types: Truly Active Truly Inactive Declared Active ✓ Type I (False positive) Type II (False Negative)

Multiple Comparisons Problem SPM adopts a ‘mass univariate’ approach Independent voxel-wise statistics (e.g. t value for each voxel) >80,000 tests, p<.05 4000 false positive voxels!

Possible Solution 0: Uncorrected threshold Set an arbitrarily stringent threshold at the voxel level (e.g. p<.001) Better than the behavioural sciences! ‘Unprincipled’ way of controlling error rates when different numbers of voxels are included in different analyses E.g. If 10,000 vs 60,000 voxels different numbers of false positives are generated Even dead salmon pass this threshold! Bennett (2009)

Possible Solution 1: Limit Family-Wise Error Rate (FWER)  

Possible Solution 1: Limit Family-Wise Error Rate (FWER)  

Possible Solution 2: Limit False Discovery Rate  

While 5% of the ‘effects’ in the bottom-left panel are known false positives, most of the signal is recovered.

Bennett et al (2009) suggest there is no reason why you can’t show uncorrected and FDR-controlled results at the same time. Important thing is that readers know the probability of false positives associated with the results.

Possible Solution 3: Cluster-extent thresholding Incorporates knowledge that false positives are unlikely to be across spatially contiguous voxels, but real effects are. Set primary threshold at voxel-level e.g. p<.001 Estimate an extent threshold for k number of contiguous voxels that would be expected to be active under the null hypothesis. This extent threshold is calculated with respect to the intrinsic smoothness of the data, such that you have a cluster FWER p<.05

...potentially misleading…   Woo, Krishnan & Wager (2014)

...or theoretically uninterpretable Woo, Krishnan & Wager (2014)

Woo, Krishnan & Wager (2014)

Outline Recap: errors in statistical inference Multiple comparisons problem (and solutions) Non-independent analyses Erroneous interpretations of interactions Violating assumptions in parametric methods Suggestion for design: 4 recent ‘controversies’ in neuroscience on data analysis and interpretation

Non-independent selective analyses (‘double dipping’) Vul et al. (2008)

Finding ‘voodoo correlations’ in 4 easy steps See which voxels correlate with some behavioural measure (e.g. neuroticism) Take the voxels which have correlations above a certain threshold (e.g. r=.8) and use these to define your ROI Correlate voxel activity in your ROI with your behavioural measure Voila! A strong brain-behaviour correlation 23

Double dipping ‘Double dipping’ refers to using the same data twice Technically, this conflates exploratory research (hypothesis formation) and confirmatory research (hypothesis testing) This creates the illusion of a second result, while you merely reproduce initial findings If the initial was ‘spurious’ (e.g. non-corrected for multiple comparison) the resulting correlation can be downright non-existing —> “voodoo correlation” 24

How to prevent double dipping? Define ROIs independently Anatomically (standardised or subject-specific) Split runs (i.e. define ROI from one set of scans, conduct analyses on another) Though Poldrack & Mumford (2009) show that it’s still possible to obtain very high correlations (e.g. .7) when this is taken into account! 25

Outline Recap: errors in statistical inference Multiple comparisons problem (and solutions) Non-independent analyses Erroneous interpretations of interactions Violating assumptions in parametric methods Suggestion for design: 4 recent ‘controversies’ in neuroscience on data analysis and interpretation

Imagine a study: Q: Did valium reduce anxiety? Wat is the effect of Valium on ‘event-related anxiety’? Two groups of students are exposed to stress (e.g. presenting a methods class) One group gets a placebo, the other diazepam (Valium) Anxiety is measured before and after the presentation, e.g. with fMRI Q: Did valium reduce anxiety? 27

Imagine a conclusion: “Students receiving placebo showed increased anxiety-related BOLD activity after the presentation (P < 0.01); in students receiving Valium this difference was abolished (F < 1)” Who can spot the error? 28

Who can spot the error? Not everyone: analysis of 513 top-ranked papers, 79 erroneous analysis vs. 78 correct Nieuwenhuizen et al. (2011). Nature Neuroscience 29

So, what was the mistake? The studies reported significance and non- significance — but no interaction effect. Directly comparing significance levels is misleading because the difference between significant and non-significant is itself (often) not significant. Intuition: Usain Bolt may be the fastest man in the world, but that doesn’t mean he is significantly faster than number 2 or 3. 30

How to avoid the mistake? In general, never compare significance values directly — always test interaction effects In neuroimaging, look at your unthresholded statistical parametric maps alongside your threhsolded data (the ‘blobs’). Image courtesy: Neuroskeptic (2014): ‘Ban the blob’ 31

Outline Recap: errors in statistical inference Multiple comparisons problem (and solutions) Non-independent analyses Erroneous interpretations of interactions Violating assumptions of parametric methods Suggestion for design: 4 recent ‘controversies’ in neuroscience on data analysis and interpretation

Violating assumptions in parametric methods Q: What is the ‘real’ type I error rate of the three most common fMRI software packages? Method: “Task-based” analysis of resting state fmri Answer: ‘empirical’ FWE-rates of up to 60% (nominal would be 5%) “These findings speak to the need of validating the statistical methods being used in the neuroimaging field.” Eklund et al. (2015) Arxiv Preprint 33

Violating assumptions in parametric methods Eklund et al. (2015) Arxiv Preprint 33

Violating assumptions in parametric methods Eklund et al. (2015) Arxiv Preprint

Ongoing controversy… Authors claim parametric assumptions cause effects However, others have questioned: Validity of resting-state data as null-distribution Ill-defined design matrix (unrealistic analysis) Unrepresentative thresholding and smoothing … Disclaimer: paper not yet peer-reviewed! Eklund et al. (2015) Arxiv Preprint

Take home message: Beware of p=0.01 cluster forming results Consider your (statistical) assumptions If some assumptions cannot be guaranteed use non-parametric approaches Eklund et al. (2015) Arxiv Preprint

Take home message: fMRI offers many opportunities for errors However, non of the discussed errors represent some ‘fundamental flaw’ of fMRI Always formulate a clear hypothesis … and think through all processing steps before collecting data There is no ‘magic bullet’ against statistical errors

It’s about balance…