Multiple Testing in the Survival Analysis of Microarray Data

Slides:

Advertisements

Similar presentations

Multiple testing and false discovery rate in feature selection

Advertisements

Statistical Modeling and Data Analysis Given a data set, first question a statistician ask is, “What is the statistical model to this data?” We then characterize.

1 An Overview of Multiple Testing Procedures for Categorical Data Joe Heyse IMPACT Conference November 20, 2014.

Multiple Testing and Prediction and Variable Selection Class web site: Statistics for Microarrays.

Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.

Analyzing Factorially designed microarray experiments Scholtens, D. et al. Journal of Multivariate Analysis, to appear Presented by M. Carme Ruíz de Villa.

The Closure Principle Revisited Dror Rom Prosoft Clinical IMPACT Symposium November 20, 2014 Contributions by Chen Chen.

Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.

Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.

Analysis of gene expression data (Nominal explanatory variables) Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH)

Differentially expressed genes

Lecture 14 – Thurs, Oct 23 Multiple Comparisons (Sections 6.3, 6.4). Next time: Simple linear regression (Sections )

1 Data Analysis for Gene Chip Data Part I: One-gene-at-a-time methods Min-Te Chao 2002/10/28.

Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.

The Need For Resampling In Multiple testing. Correlation Structures Tukey’s T Method exploit the correlation structure between the test statistics, and.

False Discovery Rate Methods for Functional Neuroimaging Thomas Nichols Department of Biostatistics University of Michigan.

 Goal A: Find groups of genes that have correlated expression profiles. These genes are believed to belong to the same biological process and/or are co-regulated.

Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.

Statistics for Microarrays

Statistics 03 Hypothesis Testing ( 假设检验 ). When we have two sets of data and we want to know whether there is any statistically significant difference.

Multiple Testing Procedures Examples and Software Implementation.

Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets.

Statistical hypothesis testing – Inferential statistics I.

False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result = statistically significant result) Ladislav.

Hypothesis Testing Statistics for Microarray Data Analysis – Lecture 3 supplement The Fields Institute for Research in Mathematical Sciences May 25, 2002.

Multiple testing in high- throughput biology Petter Mostad.

Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.

Candidate marker detection and multiple testing

Applying False Discovery Rate (FDR) Control in Detecting Future Climate Changes ZongBo Shang SIParCS Program, IMAGe, NCAR August 4, 2009.

Essential Statistics in Biology: Getting the Numbers Right

Differential Expression II Adding power by modeling all the genes Oct 06.

Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.

Significance Testing of Microarray Data BIOS 691 Fall 2008 Mark Reimers Dept. Biostatistics.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

1 False Discovery Rate Guy Yehuda. 2 Outline Short introduction to statistics The problem of multiplicity FDR vs. FWE FDR control procedures and resampling.

False Discovery Rates for Discrete Data Joseph F. Heyse Merck Research Laboratories Graybill Conference June 13, 2008.

Controlling FDR in Second Stage Analysis Catherine Tuglus Work with Mark van der Laan UC Berkeley Biostatistics.

Multiple Testing in Microarray Data Analysis Mi-Ok Kim.

CHAPTER 17: Tests of Significance: The Basics

Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark 1/31 Prediction of significant positions in biological sequences.

Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.

Confidence intervals and hypothesis testing Petter Mostad

Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.

Correct decisions –The null hypothesis is true and it is accepted –The null hypothesis is false and it is rejected Incorrect decisions –Type I Error The.

Differential Expressions Classical Methods Lecture Topic 7.

Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.

The Multiple Comparisons Problem in IES Impact Evaluations: Guidelines and Applications Peter Z. Schochet and John Deke June 2009, IES Research Conference.

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.

Statistical Testing with Genes Saurabh Sinha CS 466.

Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.

Comp. Genomics Recitation 10 4/7/09 Differential expression detection.

Multiple testing in large-scale gene expression experiments Statistics 246, Spring 2002 Week 8, Lecture 2.

Optimality Considerations in Testing Massive Numbers of Hypotheses Peter H. Westfall Ananda Bandulasiri Texas Tech University.

Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.

The Broad Institute of MIT and Harvard Differential Analysis.

Multiple testing in large-scale gene expression experiments

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.

Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets Adam Kirsch, Michael Mitzenmacher, Havard University Andrea.

Bonferroni adjustment Bonferroni adjustment (equally weighted) – Reject H 0j with p i

1 השוואות מרובות מדדי טעות, עוצמה, רווחי סמך סימולטניים ד"ר מרינה בוגומולוב מבוסס על ההרצאות של פרופ' יואב בנימיני ופרופ' מלכה גורפיין.

Canadian Bioinformatics Workshops

Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.

Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.

Differential Gene Expression

Presentation transcript:

Multiple Testing in the Survival Analysis of Microarray Data José A. Correa, Florida Atlantic University Sandrine Dudoit, Univ. California Berkeley Darlene R. Goldstein, École Polytechnique Fédérale de Lausanne Contact: linkage@stat.berkeley.edu Software: http://www.math.fau.edu/correa/

cDNA gene expression data Data on m genes for n samples mRNA samples sample1 sample2 sample3 sample4 sample5 … 1 0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49 0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10 0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.06 1.06 1.35 1.09 -1.09 ... Genes Gene expression level of gene i in mRNA sample j 3 = (normalized) Log( Red intensity / Green intensity)

Multiple Testing Problem Simultaneously test m null hypotheses, one for each gene j Hj: no association between expression level of gene j and the covariate or response Because microarray experiments simultaneously monitor expression levels of thousands of genes, there is a large multiplicity issue Would like some sense of how ‘surprising’ the observed results are

Hypothesis Truth vs. Decision # not rejected # rejected totals # true H U V (F +) m0 # non-true H T S m1 m - R R m Decision Truth

Type I (False Positive) Error Rates Per-family Error Rate PFER = E(V) Per-comparison Error Rate PCER = E(V)/m Family-wise Error Rate FWER = p(V ≥ 1) False Discovery Rate FDR = E(Q), where Q = V/R if R > 0; Q = 0 if R = 0

Strong vs. Weak Control All probabilities are conditional on which hypotheses are true Strong control refers to control of the Type I error rate under any combination of true and false nulls Weak control refers to control of the Type I error rate only under the complete null hypothesis (i.e. all nulls true) In general, weak control without other safeguards is unsatisfactory

Comparison of Type I Error Rates In general, for a given multiple testing procedure, PCER  FWER  PFER, and FDR  FWER, with FDR = FWER under the complete null

Adjusted p-values (p*) If interest is in controlling, e.g., the FWER, the adjusted p-value for hypothesis Hj is: pj* = inf {: Hj is rejected at FWER } Hypothesis Hj is rejected at FWER  if pj*   Adjusted p-values for other Type I error rates are similarly defined

Some Advantages of p-value Adjustment Test level (size) does not need to be determined in advance Some procedures most easily described in terms of their adjusted p-values Usually easily estimated using resampling Procedures can be readily compared based on the corresponding adjusted p-values

A Little Notation For hypothesis Hj, j = 1, …, m observed test statistic: tj observed unadjusted p-value: pj Ordering of observed (absolute) tj: {rj} such that |tr1|  |tr2|  …  |trm| Ordering of observed pj: {rj} such that |pr1|  |pr2|  …  |prm| Denote corresponding RVs by upper case letters (T, P)

Control of the FWER Bonferroni single-step adjusted p-values pj* = min (mpj, 1) Holm (1979) step-down adjusted p-values prj* = maxk = 1…j {min ((m-k+1)prk, 1)} Hochberg (1988) step-up adjusted p-values (Simes inequality) prj* = mink = j…m {min ((m-k+1)prk, 1) }

Control of the FWER Westfall & Young (1993) step-down minP adjusted p-values prj* = maxk = 1…j { p(maxl{rk…rm} Pl  prk H0C )} Westfall & Young (1993) step-down maxT adjusted p-values prj* = maxk = 1…j { p(maxl{rk…rm} |Tl| ≥ |trk| H0C )}

Westfall & Young (1993) Adjusted p-values Step-down procedures: successively smaller adjustments at each step Take into account the joint distribution of the test statistics Less conservative than Bonferroni, Holm, or Hochberg adjusted p-values Can be estimated by resampling but computer-intensive (especially for minP)

maxT vs. minP The maxT and minP adjusted p-values are the same when the test statistics are identically distributed (id) When the test statistics are not id, maxT adjustments may be unbalanced (not all tests contribute equally to the adjustment) maxT more computationally tractable than minP maxT can be more powerful in ‘small n, large m’ situations

Control of the FDR Benjamini & Hochberg (1995): step-up procedure which controls the FDR under some dependency structures prj* = mink = j…m { min ([m/k] prk, 1) } Benjamini & Yuketieli (2001): conservative step- up procedure which controls the FDR under general dependency structures prj* = mink = j…m { min (m [1/j]/k] prk, 1) } Yuketieli & Benjamini (1999): resampling based adjusted p-values for controlling the FDR under certain types of dependency structures

Identification of Genes Associated with Survival Data: survival yi and gene expression xij for individuals i = 1, …, n and genes j = 1, …, m Fit Cox model for each gene singly: h(t) = h0(t) exp(jxij) For any gene j = 1, …, m, can test Hj: j = 0 Complete null H0C: j = 0 for all j = 1, …, m The Hj are tested on the basis of the Wald statistics tj and their associated p-values pj

Datasets Lymphoma (Alizadeh et al.) 40 individuals, 4026 genes Melanoma (Bittner et al.) 15 individuals, 3613 genes Both available at http://lpgprot101.nci.nih.gov:8080/GEAW

Results: Lymphoma

Results: Melanoma

Other Proposals from the Microarray Literature ‘Neighborhood Analysis’, Golub et al. In general, gives only weak control of FWER ‘Significance Analysis of Microarrays (SAM)’ (2 versions) Efron et al. (2000): weak control of PFER Tusher et al. (2001): strong control of PFER SAM also estimates ‘FDR’, but this ‘FDR’ is defined as E(V|H0C)/R, not E(V/R)

Controversies Whether multiple testing methods (adjustments) should be applied at all Which tests should be included in the family (e.g. all tests performed within a single experiment; define ‘experiment’) Alternatives Bayesian approach Meta-analysis

Situations where inflated error rates are a concern It is plausible that all nulls may be true A serious claim will be made whenever any p < .05 is found Much data manipulation may be performed to find a ‘significant’ result The analysis is planned to be exploratory but wish to claim ‘sig’ results are real Experiment unlikely to be followed up before serious actions are taken

Discussion (I) Lack of significant findings Small sample sizes FWER-controlling procedures may be too stringent in microarray applications FDR could perhaps be made even more powerful by taking into account the joint distribution of gene expression levels

Discussion (II) Computational considerations All computing done in the R statistical environment (Ihaka and Gentleman) For max T, Cox model analysis was repeated for each of 100,800 random permutations of survival times Exact maximum likelihood calculation took about 60 hours per machine in cluster of 24 PCs, each with 1 GHz Pentium III and 256 MB memory Time can be reduced substantially by using a score approximation to obtain parameter estimates, and by calling C language code from within R

References Alizadeh et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503-511 Benjamini and Hochberg (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSSB 57: 289-200 Benjamini and Yuketieli (2001) The control of false discovery rate in multiple hypothesis testing under dependency. Annals of Statistics Bittner et al. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406: 536-540 Efron et al. (2000) Microarrays and their use in a comparative experiment. Tech report, Stats, Stanford Golub et al. (1999) Molecular classification of cancer. Science 286: 531-537

References Hochberg (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75: 800-802 Holm (1979) A simple sequentially rejective multiple testing procedure. Scand. J Statistics 6: 65-70 Ihaka and Gentleman (1996) R: A language for data analysis and graphics. J Comp Graph Stats 5: 299-314 Tusher et al. (2001) Significance analysis of microarrays applied to transcriptional responses to ionizing radiation. PNAS 98: 5116 -5121 Westfall and Young (1993) Resampling-based multiple testing: Examples and methods for p-value adjustment. New York: Wiley Yuketieli and Benjamini (1999) Resampling based false discovery rate controlling multiple test procedures for correlated test statistics. J Stat Plan Inf 82: 171-196

Acknowledgements Debashis Ghosh Erin Conlon