Computational Biology Jianfeng Feng Warwick University.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Mkael Symmonds, Bahador Bahrami
Statistical Inference and Random Field Theory Will Penny SPM short course, London, May 2003 Will Penny SPM short course, London, May 2003 M.Brett et al.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Multiple comparison correction
Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research University of Zurich With many thanks for slides & images to: FIL Methods.
Topological Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course London, May 2014 Many thanks to Justin.
Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich With.
07/01/15 MfD 2014 Xin You Tai & Misun Kim
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich With.
Multiple comparison correction Methods & models for fMRI data analysis 18 March 2009 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Differentially expressed genes
Comparison of Parametric and Nonparametric Thresholding Methods for Small Group Analyses Thomas Nichols & Satoru Hayasaka Department of Biostatistics U.
Multiple comparison correction Methods & models for fMRI data analysis 29 October 2008 Klaas Enno Stephan Branco Weiss Laboratory (BWL) Institute for Empirical.
False Discovery Rate Methods for Functional Neuroimaging Thomas Nichols Department of Biostatistics University of Michigan.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result = statistically significant result) Ladislav.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Multiple Comparison Correction in SPMs Will Penny SPM short course, Zurich, Feb 2008 Will Penny SPM short course, Zurich, Feb 2008.
Multiple testing correction
Chapter 10 Hypothesis Testing
Multiple testing in high- throughput biology Petter Mostad.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
Lucio Baggio - Lucio Baggio - False discovery rate: setting the probability of false claim of detection 1 False discovery rate: setting the probability.
Fundamentals of Hypothesis Testing: One-Sample Tests
Random Field Theory Will Penny SPM short course, London, May 2005 Will Penny SPM short course, London, May 2005 David Carmichael MfD 2006 David Carmichael.
Basics of fMRI Inference Douglas N. Greve. Overview Inference False Positives and False Negatives Problem of Multiple Comparisons Bonferroni Correction.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
Random field theory Rumana Chowdhury and Nagako Murase Methods for Dummies November 2010.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich With.
Multiple comparison correction Methods & models for fMRI data analysis October 2013 With many thanks for slides & images to: FIL Methods group & Tom Nichols.
Multiple comparisons in M/EEG analysis Gareth Barnes Wellcome Trust Centre for Neuroimaging University College London SPM M/EEG Course London, May 2013.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
1 Inference on SPMs: Random Field Theory & Alternatives Thomas Nichols, Ph.D. Department of Statistics & Warwick Manufacturing Group University of Warwick.
Methods for Dummies Random Field Theory Annika Lübbert & Marian Schneider.
Thresholding and multiple comparisons
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Classical Inference on SPMs Justin Chumbley SPM Course Oct 23, 2008.
**please note** Many slides in part 1 are corrupt and have lost images and/or text. Part 2 is fine. Unfortunately, the original is not available, so please.
Random Field Theory Will Penny SPM short course, London, May 2005 Will Penny SPM short course, London, May 2005.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
The False Discovery Rate A New Approach to the Multiple Comparisons Problem Thomas Nichols Department of Biostatistics University of Michigan.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
1 Identifying Robust Activation in fMRI Thomas Nichols, Ph.D. Assistant Professor Department of Biostatistics University of Michigan
Statistical Analysis An Introduction to MRI Physics and Analysis Michael Jay Schillaci, PhD Monday, April 7 th, 2007.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Multiple comparisons problem and solutions James M. Kilner
Topological Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course London, May 2015 With thanks to Justin.
Multiple comparison correction
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
False Discovery Rate for Functional Neuroimaging Thomas Nichols Department of Biostatistics University of Michigan Christopher Genovese & Nicole Lazar.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
What a Cluster F… ailure!
Topological Inference
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Methods for Dummies Random Field Theory
Topological Inference
Inference on SPMs: Random Field Theory & Alternatives
Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich.
Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich.
Presentation transcript:

Computational Biology Jianfeng Feng Warwick University

Outline 1.Multiple comparisons 2. FWER Correction 3.FDR correction 4.Example

1: Multiple Comparisons

Localizing Activation 1.Construct a model for each voxel of the brain. –“Massive univariate approach” –Regression models (GLM) commonly used. Y  Xβ  εY  Xβ  ε ε ~ N (0, V)ε ~ N (0, V)

Localizing Activation 2.Perform a statistical test to determine whether task related activation is present in the voxel.  0 0 H: cβH: cβ 0 T Statistical image: Map of t-tests across all voxels (a.k.a t-map).

Localizing Activation 3.Choose an appropriate threshold for determining statistical significance. Statistical parametric map: Each significant voxel is color-coded according to the size of its p-value.

Null Hypothesis H 0 –Statement of no effect (e.g.,  1 =0). Test statistic T –Measures compatibility between the null hypothesis and the data. P-value – Probability that the test statistic would take a value as or more extreme than that actually observed if H 0 is true, i.e P( T > t | H 0 ). Significance level. –Threshold u  controls false positive rate at level  = P( T>u  | H 0 ) uu  Null Distribution of T t P-val Null Distribution of T Hypothesis Testing

Making Errors There are two types of errors one can make when performing significance tests: –Type I error H 0 is true, but we mistakenly reject it (False positive). Controlled by significance level . –Type II error H 0 is false, but we fail to reject it (False negative) The probability that a hypothesis test will correctly reject a false null hypothesis is the power of the test.

Making Errors There are two types of errors one can make when performing significance tests: –Type I error H 0 is true, but we mistakenly reject it (False positive). Controlled by significance level . –Type II error H 0 is false, but we fail to reject it (False negative)

Making Errors Consider an example of discrimination, we have P positive (patients) and N negative samples (healthy controls) Sensitivity or true positive rate (TPR) TPR = TP / P = TP/ ( TP + FN ) Specificity or True Negative Rate TNR = TN /N = TN / (TN+FP) Accuracy ACC = (TP + TN) / (P+N)

Receiver operating characteristic (ROC) curve

Multiple Comparisons Choosing an appropriate threshold is complicated by the fact we are dealing with a family of tests. If more than one hypothesis test is performed, the risk of making at least one Type I error is greater than the  value for a single test. The more tests one performs, the greater the likelihood of getting at least one false positive.

Multiple Comparisons Which of 100,000 voxels are significant? –  =0.05  5,000 false positive voxels Choosing a threshold is a balance between sensitivity (true positive rate) and specificity (true negative rate). t > 1t > 1 t > 2t > 2 t > 3t > 3 t > 4t > 4 t > 5t > 5

Measures of False Positives There exist several ways of quantifying the likelihood of obtaining false positives. Family-Wise Error Rate (FWER) –Probability of any false positives False Discovery Rate (FDR) –Proportion of false positives among rejected tests

2: FWER Correction

Family-Wise Error Rate The family-wise error rate (FWER) is the probability of making one or more Type I errors in a family of tests, under the null hypothesis. FWER controlling methods: –Bonferroni correction –Random Field Theory –Permutation Tests

Problem Formulation Let H 0i be the hypothesis that there is no activation in voxel i, where i  V ={1,…. m }, m is the voxel number. Let T i be the value of the test statistic at voxel i. The family-wise null hypothesis, H 0, states that there is no activation in any of the m voxels. H 0   H 0i i  V

If we reject a single voxel null hypothesis, H 0i, we will reject the family-wise null hypothesis. A false positive at any voxel gives a Family-Wise Error (FWE) Assuming H 0 is true, we want the probability of falsely rejecting H 0 to be controlled by , i.e. P T u| HT u| H iViV i 0i 0      Problem Formulation

P  T i  u | H 0   mm  PTi  u | H0  PTi  u | H0  i        i m Bonferroni Correction Choose the threshold u so that Hence, Boole’s Inequality FWER  PT u| H PT u| H iViV i  0 

Example Generate 100  100 voxels from an iid N(0,1) distribution Threshold at u=1.645 Approximately 500 false positives.

To control for a FWE of 0.05, the Bonferroni correction is 0.05/10,000. This corresponds to u=4.42. On average only 5 out of every 100 generated in this fashion will have one or more values above u. No false positives Example

Bonferroni Correction The Bonferroni correction is very conservative, i.e. it results in very strict significance levels. It decreases the power of the test (probability of correctly rejecting a false null hypothesis) and greatly increases the chance of false negatives. It is not optimal for correlated data, and most fMRI data has significant spatial correlation.

Spatial Correlation We may be able to choose a more appropriate threshold by using information about the spatial correlation in the data. Random field theory allows one to incorporate the correlation into the calculation of the appropriate threshold. It is based on approximating the distribution of the maximum statistic over the whole image.

Maximum Statistic Link between FWER and max statistic. FWER = P(FWE) = P(  i {T i  u} | H o ) P(any t-value exceeds u under null) = P( max i T i  u | H o ) P(max t-value exceeds u under null) uu  Choose the threshold u such that the max only exceeds it  % of the time

Random Field Theory A random field is a set of random variables defined at every point in D-dimensional space. A Gaussian random field has a Gaussian distribution at every point and every collection of points. A Gaussian random field is defined by its mean function and covariance function.

Random Field Theory Consider a statistical image to be a lattice representation of a continuous random field. Random field methods are able to: –approximate the upper tail of the maximum distribution, which is the part needed to find the appropriate thresholds; and –account for the spatial dependence in the data.

Consider a random field Z(s) defined on s    R D where D is the dimension of the process. Random Field Theory

Euler Characteristic Euler Characteristic  u –A property of an image after it has been thresholded. –Counts #blobs - #holes –At high thresholds, just counts #blobs u = 0.5 u = 2.75  u  1 u = 3.5  u  2  u  28  1  27 Random FieldThreshold

Controlling the FWER Link between FWER and Euler Characteristic. FWER = P(max i T i  u | H o ) = P(One or more blobs | H o ) no holes exist  P(  u  1 | H o ) never more than 1 blob  E(  u | H o ) Closed form results exist for E(  u ) for Z, t, F and  2 continuous random fields.

For large search regions: Here V is the volume of the search region and the full width at half maximum (FWHM) represents the smoothness of the image estimated from the data. R = Resolution Element (Resel) For details: please refer to Adler, Random Field on Manifold 3D Gaussian Random Fields 2 2 22 2 2 3/ 223/ 22 E()  R(4 log 2)(u1)eE()  R(4 log 2)(u1)e u2u2  u FWHM x FWHM y FWHM zFWHM x FWHM y FWHM z V R R  where

Controlling the FWER u 2u 2 For large u: FWER  R(4 log 2) 3/ 2 (u 2  1)e  2  2    2 where FWHM x FWHM y FWHM zFWHM x FWHM y FWHM z V R R  Properties: -As u increases, FWER decreases (Note u large). -As V increases, FWER increases. -As smoothness increases, FWER decreases.

RFT Assumptions The entire image is either multivariate Gaussian or derived from multivariate Gaussian images. The statistical image must be sufficiently smooth to approximate a continuous random field. –FWHM at least twice the voxel size. –In practice, FWHM smoothness 3-4×voxel size is preferable. The amount of smoothness is assumed known. –Estimate is biased when images not sufficiently smooth. Several layers of approximations.

Applications Imaging genetics [1] Ge T. et al. 2013, NeuroImaging Using ADNI data, we, for the first time in the literature, established a link between gene (SNP) and structure changes in the brain [2] Gong XH et al, 2014, Human Brain Mapping Using Genotyping experiments, we identified DISC1 and brain area for scz. More

3: FDR Correction

Issues with FWER Methods that control the FWER (Bonferroni, RFT, Permutation Tests) provide a strong control over the number of false positives. While this is appealing the resulting thresholds often lead to tests that suffer from low power. Power is critical in fMRI applications because the most interesting effects are usually at the edge of detection.

False Discovery Rate The false discovery rate (FDR) is a recent development in multiple comparison problems due to Benjamini and Hochberg (1995). While the FWER controls the probability of any false positives, the FDR controls the proportion of false positives among all rejected tests.

Suppose we perform tests on m voxels. Notation Declared Inactive Declared Active Truly inactiveTN FP m0m0 Truly activeFNTPm-m 0 m-RRm

Definitions The FDR is defined to be 0 if R=0.

Properties A procedure controlling the FDR ensures that on average the FDR is no bigger than a pre- specified rate q which lies between 0 and 1. However, for any given data set the FDR need not be below the bound. An FDR-controlling technique guarantee controls of the FDR in the sense that FDR ≤ q.

BH Procedure 1.Select desired limit q on FDR (e.g., 0.05) 2.Rank p-values, p (1)  p (2) ...  p (m) 3.Let r be largest i such that p (i)  i/m  q 4.Reject all hypotheses corresponding to p (1),..., p (r). p(i)p(i) p -value i/m  q

Comments If all null hypothesis are true, the FDR is equivalent to the FWER. Any procedure that controls the FWER also controls the FDR. A procedure that controls the FDR only can be less stringent and lead to a gain in power. Since FDR controlling procedures work only on the p-values and not on the actual test statistics, it can be applied to any valid statistical test. For details, please refer to Efron B’s book

4: Example

Example Signal = Signal + Noise + Noise

 =0.10, No correction Percentage of false positives FWER control at 10% FWER Occurrence of false positive FDR control at 10% Percentage of active voxels that are false positives

Uncorrected Thresholds literature. Most published PET and fMRI studies use arbitrary uncorrected thresholds (e.g., p<0.001). –A likely reason is that with available sample sizes, corrected thresholds are so stringent that power is extremely low. Using uncorrected thresholds is problematic when interpreting conclusions from individual studies, as many activated regions may be false positives. Null findings are hard to disseminate, hence it is difficult to refute false positives established in the

Extent Threshold in clusters. Sometimes an arbitrary extent threshold is used when reporting results. Here a voxel is only deemed truly active if it belongs to a cluster of k contiguous active voxels (e.g., p<0.001, 10 contingent voxels). Unfortunately, this does not necessarily correct the problem because imaging data are spatially smooth and therefore false positives may appear

Activation maps with spatially correlated noise thresholded at three different significance levels. Due to the smoothness, the false-positive activation are contiguous regions of multiple voxels.  =0.10  =0.01  =0.001 Example Note: All images smoothed with FWHM=12mm

 =0.10  =0.01  =0.001 Note: All images smoothed with FWHM=12mm Similar activation maps using null data. Example