Multiple testing adjustments European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham,

Slides:



Advertisements
Similar presentations
Chapter 7 Hypothesis Testing
Advertisements

1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
Relationship Mining Correlation Mining Week 5 Video 1.
1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Hypothesis Testing. Research hypothesis are formulated in terms of the outcome that the experimenter wants, and an alternative outcome that he doesn’t.
Differentially expressed genes
Lecture 14 – Thurs, Oct 23 Multiple Comparisons (Sections 6.3, 6.4). Next time: Simple linear regression (Sections )
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
8-2 Basics of Hypothesis Testing
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Chapter 8 Introduction to Hypothesis Testing
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Q-Vals (and False Discovery Rates) Made Easy Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert.
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 5 – Testing for equivalence or non-inferiority. Power.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result = statistically significant result) Ladislav.
Multiple testing correction
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Multiple testing in high- throughput biology Petter Mostad.
Confidence Intervals and Hypothesis Testing - II
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Lucio Baggio - Lucio Baggio - False discovery rate: setting the probability of false claim of detection 1 False discovery rate: setting the probability.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
Week 8 Fundamentals of Hypothesis Testing: One-Sample Tests
Essential Statistics in Biology: Getting the Numbers Right
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Slide Slide 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim about a Proportion 8-4 Testing a Claim About.
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
Significance Testing of Microarray Data BIOS 691 Fall 2008 Mark Reimers Dept. Biostatistics.
Chapter 21: More About Tests “The wise man proportions his belief to the evidence.” -David Hume 1748.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Five.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
Chapter 21: More About Test & Intervals
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Statistical Testing with Genes Saurabh Sinha CS 466.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
Various Topics of Interest to the Inquiring Orthopedist Richard Gerkin, MD, MS BGSMC GME Research.
Statistical Techniques
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Hypothesis test flow chart
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Fewer permutations, more accurate P-values Theo A. Knijnenburg 1,*, Lodewyk F. A. Wessels 2, Marcel J. T. Reinders 3 and Ilya Shmulevich 1 1Institute for.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Step 1: Specify a null hypothesis
Section Testing a Proportion
Differential Gene Expression
Sequence comparison: Multiple testing correction
False discovery rate estimation
CHAPTER 16: Inference in Practice
Presentation transcript:

Multiple testing adjustments European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham,

Motivation Already come across several cases where need to correct p-values Exp 1Exp 2Exp 3Exp 4Exp 5Exp 6 Exp Exp Exp Exp Exp Exp 6 Pairwise gene expression data What happens if we perform several vaccine trials?

Motivation 10 new vaccines are trialled Declare vaccine a success if test has p-value of less than 0.05 If none of the vaccines work, what is our chance of success?

Motivation 10 new vaccines are trialled Declare vaccine a success if test has p-value of less than 0.05 Each trial has probability of 0.05 of “success” (false positive) Each trial has probability of 0.95 of “failure” (true negative) Probability of at least one= 1 - Probability of none = 1 - (Probability a trial unsuccessful) 10 = = 0.4 If none of the vaccines work, what is our chance of a “success”? Rule of Thumb Multiple size of test by number of tests

Motivation More extreme example: test entire population for disease True negativeFalse positive False negativeTrue positive Mixture: some of population have disease, some don’t Find individuals with disease Family Wise Error Rate Control probability that any false positive occurs False Discovery Rate Control proportion of false positives discovered True status Healthy Diseased Test report Healthy Diseased FDR = # false positives = # false positives # positives # true positives + # false positives

Cumulative distribution Simple examination by eye: The cumulative distribution should be approximately linear Rank Rank data Plot rank against p-value P-value 01 1 n N.B. Often scale ranks to (0,1] by dividing by largest rank Start(0,1) End(1,n) Never decreases

Cumulative distribution Five sets of uniformly distributed p-values Non-uniformly distributed data. Excess of extreme p-values (small) Examples: For 910 p-values Could use a one-sided Kolmogorov test if desired

A little set theory Test 1 false positive Test 2 false positive Test 3 false positive No test gives false positive All tests give false positive Represent all possible outcomes of three tests in a Venn diagram Areas are probabilities of events happening

A little set theory + + ≤ P(any test gives a false positive)

A little set theory + + ≤

Bonferroni adjustment Want to control thisKnow how to control each of these (the size of each test) Keep things simple: do all tests at same size If we have n tests, each at size nthen

Bonferroni adjustment If we have n tests, each at size nthen Family-Wise Error Rate

Example 1 Look at deviations from Chargaff’s 2 nd parity rule A and T content of genomes for 910 bugs Many show significant deviations First 9 pvalues e e e e e e e e e-24 Unadjusted pvalues pvalue < pvalue < pvalue < 1e-5559 Bonferroni adjusted pvalues pvalue < pvalue < pvalue < 1e-5461 First 9 adjusted pvalues e e e e e e e e e-21

Aside: pvalues measure evidence Shown that many bugs deviate substantial from Chargaff’s 2 nd rule p-values tell us that there is significant evidence for a deviation median Upper quantile Lower quantile Lots of bases and so ability to detect small deviations from 50% Powerful test 1st Qu. Median 3rd Qu

Bonferroni is conservative Conservative: actual size of test is less than bound Not too bad for independent tests Worst when positively correlated Applying same test to subsets of data Applying similar tests to same data More subtle problem Mixture of blue and red circles Null hypothesis: Is blue Red circles are never false positives

Bonferroni is conservative + + ≤ If experiment really is different from null, then Over adjusted p-value Number of potential false positives may be less than number of tests

Holm’s method Holm(1979) suggests repeatedly applying Bonferroni Initial Bonferroni:InsignificantSignificant InsignificantSignificant No false positive?Been overly strict, apply Bonferroni only to insignificant set. False positive?More won’t hurt, so may as well test again Step 2 InsignificantSignificantStep 3 Stop when “insignificant” set does not shrink further

Example 2 Bonferroni adjusted pvalues pvalue < pvalue < pvalue < 1e-5461 First 9 adjusted pvalues e e e e e e e e e-21 Return to Chargaff data 910 bugs but more than half are significantly different after adjustment There is strong evidence that we’ve over-corrected First 9 Holm adjusted pvalues e e e e e e e e e-21 Holm adjusted pvalues pvalue < (+24) pvalue < (+14) pvalue < 1e-5472 (+12) Gained a couple of percent more but notice that gains tail off

Hochberg’s method Consider a pathological case Apply same test to same data multiple times # Ten identical pvalues pvalues <- rep(0.01,10) # None are significant with Bonferroni p.adjust(pvalues,method=“bonferroni”) # None are significant with Holm p.adjust(pvalues,method=“holm”) # Hochberg recovers correctly adjusted pvalues p.adjust(pvalues,method=“hochberg”) First 9 Hochberg adjusted pvalues e e e e e e e e e-21 Hochberg adjusted pvalues pvalue < pvalue < pvalue < 1e-5472 Hochberg adjustment is identical to Holm for Chargaff data …. but requires additional assumptions

False Discovery Rates New methods, dating back to 1995 Gaining popularity in literature but mainly used for large data sets Useful for enriching data sets for further analysis Recap FWER:control probability of any false positive occurring FDR:control proportion of false positives that occur “q-value” is proportion of significant tests expected to be false positives q-value times number significant = expected number of false positives Methods Benjamini & Hochberg (1995) Benjamini & Yekutieli (2001) Storey (2002,2003) aka “positive false discovery rate”

Example 3 Returning once more to the Chargaff data First 9 FDR q-values e e e e e e e e e-24 FDR q-values qvalue < qvalue < qvalue < 1e-5547 Q-values have a different interpretation from p-values Use qvalues to get the expected number of false positives qvalue = 0.05expect 38 false positives (759 x 0.05) qvalue = 0.01expect 7 false positives (713 x 0.01) qvalue = 1e-5expect 1/200 false positives

Summary Holm is always better than Bonferroni Hochberg can be better but has additional assumptions FDR is a more powerful approach - finds more things significant controls a different criteria more useful for exploratory analyses than publications A little question Suppose results are published if the p-value is less than 0.01, what proportion of the scientific literature is wrong?