Hypothesis Testing Statistics for Microarray Data Analysis – Lecture 3 supplement The Fields Institute for Research in Mathematical Sciences May 25, 2002.

Slides:



Advertisements
Similar presentations
Multiple testing and false discovery rate in feature selection
Advertisements

Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Statistical Modeling and Data Analysis Given a data set, first question a statistician ask is, “What is the statistical model to this data?” We then characterize.
1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Microarray Data Analysis Statistical methods to detect differentially expressed genes.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Business Statistics - QBM117
Differentially expressed genes
The Need For Resampling In Multiple testing. Correlation Structures Tukey’s T Method exploit the correlation structure between the test statistics, and.
False Discovery Rate Methods for Functional Neuroimaging Thomas Nichols Department of Biostatistics University of Michigan.
 Goal A: Find groups of genes that have correlated expression profiles. These genes are believed to belong to the same biological process and/or are co-regulated.
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
8-2 Basics of Hypothesis Testing
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
BCOR 1020 Business Statistics
Statistics for Microarrays
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Multiple Testing Procedures Examples and Software Implementation.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result = statistically significant result) Ladislav.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Multiple testing correction
Multiple Testing in the Survival Analysis of Microarray Data
Lecture Slides Elementary Statistics Twelfth Edition
Multiple testing in high- throughput biology Petter Mostad.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Overview Basics of Hypothesis Testing
Essential Statistics in Biology: Getting the Numbers Right
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Chapter 8 Introduction to Hypothesis Testing
Differential Expression II Adding power by modeling all the genes Oct 06.
Significance Testing of Microarray Data BIOS 691 Fall 2008 Mark Reimers Dept. Biostatistics.
1 False Discovery Rate Guy Yehuda. 2 Outline Short introduction to statistics The problem of multiplicity FDR vs. FWE FDR control procedures and resampling.
False Discovery Rates for Discrete Data Joseph F. Heyse Merck Research Laboratories Graybill Conference June 13, 2008.
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
Differential Expressions Classical Methods Lecture Topic 7.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Statistical Testing with Genes Saurabh Sinha CS 466.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Multiple testing in large-scale gene expression experiments Statistics 246, Spring 2002 Week 8, Lecture 2.
The Broad Institute of MIT and Harvard Differential Analysis.
Multiple testing in large-scale gene expression experiments
A significance test or hypothesis test is a procedure for comparing our data with a hypothesis whose truth we want to assess. The hypothesis is usually.
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Bonferroni adjustment Bonferroni adjustment (equally weighted) – Reject H 0j with p i
1 השוואות מרובות מדדי טעות, עוצמה, רווחי סמך סימולטניים ד"ר מרינה בוגומולוב מבוסס על ההרצאות של פרופ' יואב בנימיני ופרופ' מלכה גורפיין.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Canadian Bioinformatics Workshops
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Fewer permutations, more accurate P-values Theo A. Knijnenburg 1,*, Lodewyk F. A. Wessels 2, Marcel J. T. Reinders 3 and Ilya Shmulevich 1 1Institute for.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Review and Preview and Basics of Hypothesis Testing
Differential Gene Expression
Chapters 20, 21 Hypothesis Testing-- Determining if a Result is Different from Expected.
Multiple Testing Methods for the Analysis of Gene Expression Data
False discovery rate estimation
Presentation transcript:

Hypothesis Testing Statistics for Microarray Data Analysis – Lecture 3 supplement The Fields Institute for Research in Mathematical Sciences May 25, 2002

p-values The p-value or observed significance level p is the chance of getting a test statistic as or more extreme than the observed one, under the null hypothesis H of no differential expression.

Many tests: a simulation study Simulations of this process for 6,000 genes with 8 treatments and 8 controls. All the gene expression values were simulated i.i.d from a N (0,1) distribution, i.e. NOTHING is differentially expressed.

genetp-value indexvalue(unadj.)           Unadjusted p-values Clearly we can’t just use standard p-value thresholds (.05,.01).

Multiple hypothesis testing: Counting errors Assume we are testing H 1, H 2, , H m. m 0 = # of true hypotheses R = # of rejected hypotheses # true# false null hypo. # non-signif.UTm - R # significantVSR m0m0 m-m 0 V = # Type I errors [false positives] T = # Type II errors [false negatives]

Multiple testing procedure As we will see, there is a bewildering variety of multiple testing procedures. How can we choose which to use? There is no simple answer here, but each can be judged according to a number of criteria: Interpretation: does the procedure answer a relevant question for you? Type of control: strong or weak? Validity: are the assumptions under which the procedure applies clear and definitely or plausibly true, or are they unclear and most probably not true? Computability: are the procedure’s calculations straightforward to calculate accurately, or is there possibly numerical or simulation uncertainty, or discreteness?

Type I error rates Per-comparison error rate (PCER). The PCER is defined as the expected value of (number of Type I errors/number of hypotheses), i.e., PCER = E(V/m). Family-wise error rate (FWER). The FWER is defined as the probability of at least one Type I error, i.e., FWER = pr(V >0). False discovery rate (FDR). The FDR of Benjamini & Hochberg (1995) is the expected proportion of Type I errors among the rejected hypotheses, i.e., FDR = E(V/R).

Strong vs. weak control The Type I error probabilities are conditional on which hypotheses are true. Strong control refers to control of the Type I error rate under any combination of true and false hypotheses, i.e., any value of m 0. Weak control refers to control of the Type I error rate only when all the null hypotheses are true, i.e., under the complete null hypothesis with m 0 =m. In general, weak control without any other safeguards is unsatisfactory.

Computing p-values by permutations We focus on one gene only. For the bth iteration, b = 1, , B; 1. Permute the n data points for the gene (x). The first n 1 are referred to as “treatments”, the second n 2 as “controls”. 2. For each gene, calculate the corresponding two sample t-statistic, t b. After all the B permutations are done; 3. Put p = #{b: |t b | ≥ |t|}/B (plower if we use >). With all permutations in the Apo AI data, B = n!/n 1 ! n 2 ! = 12,870.

p-value adjustments: single-step Define adjusted p-values π such that the FWER is controlled at level  where H i is rejected when π i ≤ . Bonferroni: π i = min (mp i, 1) Sidák: π i = 1 - (1 - p i ) m Bonferroni always gives strong control. Sidák is less conservative than Bonferroni. When the genes are independent, it gives strong control exactly (FWER=  ), proof later. It controls FWER in many other cases, but is still conservative. Less conservative procedures make use of the dependence structure of the test statistics and/or are sequential.

Single-step adjustments (ctd) The minP method of Westfall and Young:  i = pr( min P l ≤ p i | H) 1≤l≤m Based on the joint distribution of the p-values {P l }. This is the most powerful of the three single-step adjustments. If P i  U [0,1], it gives a FWER exactly = . It always confers weak control, and gives strong control under a condition known as subset pivotality (definition omitted). That applies here.

Permutation-based single-step minP adjustment of p-values For the bth iteration, b = 1, , B; 1.Permute the n columns of the data matrix X, obtaining a matrix X b. The first n 1 columns are referred to as “treatments”, the second n 2 columns as “controls”. 2.For each gene, calculate the corresponding unadjusted p-values, p i,b, i= 1,2,  m, (e.g. by further permutations) based on the permuted matrix X b. After all the B permutations are done. 3.Compute the adjusted p-values π i = #{b: min l p l,b ≤ p i }/B.

More powerful methods: step-down adjustments The idea: S Holm’s modification of Bonferroni. Also applies to Sidák, maxT, and minP.

S Holm’s modification of Bonferroni Order the unadjusted p-values such that p r 1 ≤ p r 2 ≤  ≤ p r m.. The indices r 1, r 2, r 3,.. are fixed for given data. For control of the FWER at level, the step-down Holm adjusted p- values are π rj = max k  {1,…,j} {min((m-k+1)p rk, 1). The point here is that we don’t multiply every p rk by the same factor m, but only the smallest. The others are multiplied by successively smaller factors: m-1, m-2,..,. down to multiplying p rm by 1. By taking successive maxima of the first terms in the brackets, we can get monotonicity of these adjusted p-values.

Step-down adjustment of minP Order the unadjusted p-values such that p r 1 ≤ p r 2 ≤  ≤ p r m. Step-down adjustment: it has a complicated formula, see below, but in effect is 1.Compare min{P r 1, , P r m } with p r1 ; 2.Compare min{P r 2, , P r m } with p r2 ; 3 Compare min{P r 3 , P r m } with p r i3 ……. m. Compare P r m with p r m. Enforce monotonicity on the adjusted p r i. The formula is π rj = max k  { 1,,…,j} {pr(min l  {rk,…rm} P l ≤ p rk | H 0 C )}.

The computing challenge: iterated permutations The procedure is quite computationally intensive if B is very large (typically at least 10,000) and we estimate all unadjusted p-values by further permutations. Typical numbers: To compute one unadjusted p-value B = 10,000 # unadjusted p-values needed B = 10,000 # of genes m = 6,000. In general run time is O(mB 2 ).

Avoiding the computational difficulty of single-step minP adjustment maxT method: (Chapter 4 of Westfall and Young) π i = Pr( max |T l | ≥ | t i | | H 0 C ) 1≤l≤m needs B = 10,000 permutations only. However, if the distributions of the test statistics are not identical, it will give more weight to genes with heavy tailed distributions (which tend to have larger t-values) There is a fast algorithm which does the minP adjustment in O(mBlogB+mlogm) time.

Adjusted p-values, Westfall & Young(1993) For strong control of the FWER at level , let |t rm|  …  |t r2 |  |t r1 |denote the ordered test statistics and define the adjusted p-values as Takes into account the dependence structure between the hypotheses.

Conclusion: unsuitable. Too much discreteness.

gene tunadj. pminPplowermaxT indexstatistic (  10 4 ) adjust                  

Apo AI. Genes with maxT p-value ≤ 0.01

False discovery rate (Benjamini and Hochberg 1995) Definition: FDR = E(V/R |R>0) P(R >0). Rank the p-values p r1 ≤ p r2 ≤ …≤ p rm. The adjusted p-values are to control FDR when P i are independently distributed are given by the step-up formula:  ri = min k  {i…m} { min (mp rk /k,1) }. We use this as follows: reject p r1,p r2,…,,p rk* where k* is the largest k such that p rk ≤ k/m. . This keeps the FDR ≤  under independence, proof not given. Compare the above with Holm’s adjustment to control FWE, the step- down version of Bonferroni, which is  i = max k  {1,…i} { min (kp rk,1) }.

Positive false discovery rate (Storey, 2001, independent case) A new definition of FDR, called positive false discovery rate (pFDR) pFDR= E(V/R | R >0) The logic behind this is that in practice, at lease one gene should be expected to be differentially expressed. The adjusted p-value (called q-value in Storey’s paper) are to control pFDR.  i = min k  {1..,i} {m/k p k  0 } Note  0 = m 0 /m can be estimated by the following formula for suitable   0 = #{p i >  }/ {(1-  ) m}. One choice for  is 1/2; another is the median of the observed (unadjusted) p-values.

Positive false discovery rate ( Storey, 2001, dependent case) In order to incorporate dependence, we need to assume identical distributions. Specify  0 to be a small number, say 0.2, where most t-statistics will fall between (-  0,  0 ) for a null hypothesis, and  to be a large number, say 3, where we reject the hypotheses whose t-statistics exceeds . For the original data, find the W = #{i: |t i |   0 } and R= #{i: |t i |   }. We can do B permutations, for each one, we can compute W b and R b simply by: W b = #{i: |t i |   0 } and R b = #{i: |t i |   }, b=1,…, B. The we can compute the proportion of genes expected to be null  0 =W/{(W 1 +W 2 +…+W b )/B) An estimate of the pFDR at the point  will be  0 {(R 1 +R 2 +…+R B )/B}/R. Further details can be found in the references.

Discussion The minP adjustment seems more conservative than the maxT adjustment, but is essentially model-free. With the Callow data, we see that the adjusted minP values are very discrete; it seems that 12,870 permutations are not enough for 6,000 tests. With the Golub data, we see that the number of permutations matters. Discreteness is a real issue here to, but we do have enough permutations. The same ideas extend to other statistics: Wilcoxon, paired t, F, blocked F. Same speed-up works with the bootstrap.

Discussion, ctd. Major question in practice: Control of FWER or some form of FDR? In the first case, use minP, maxT or something else? In the second case, FDR, pFDR or something else. If minP is to be used, we need guidelines for its use in terms of sample sizes and number of genes. Another approach: Empirical Bayes. There are links with pFDR.

Acknowledgments Multiple testing section: based on Y. Ge (Lecture 8 in stat246, statistics in genetics). S. Dudoit (Bioconductor short course lecture 2)