Multiple Testing Procedures Examples and Software Implementation.

Slides:



Advertisements
Similar presentations
COMPUTER INTENSIVE AND RE-RANDOMIZATION TESTS IN CLINICAL TRIALS Thomas Hammerstrom, Ph.D. USFDA, Division of Biometrics The opinions expressed are those.
Advertisements

Multiple testing and false discovery rate in feature selection
Statistical Modeling and Data Analysis Given a data set, first question a statistician ask is, “What is the statistical model to this data?” We then characterize.
Inferential Statistics
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Multiple Testing and Prediction and Variable Selection Class web site: Statistics for Microarrays.
Microarray Data Analysis Statistical methods to detect differentially expressed genes.
The Closure Principle Revisited Dror Rom Prosoft Clinical IMPACT Symposium November 20, 2014 Contributions by Chen Chen.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Analysis of gene expression data (Nominal explanatory variables) Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH)
Differentially expressed genes
Statistical Analysis of Microarray Data
1 Data Analysis for Gene Chip Data Part I: One-gene-at-a-time methods Min-Te Chao 2002/10/28.
Final Review Session.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
The Need For Resampling In Multiple testing. Correlation Structures Tukey’s T Method exploit the correlation structure between the test statistics, and.
False Discovery Rate Methods for Functional Neuroimaging Thomas Nichols Department of Biostatistics University of Michigan.
 Goal A: Find groups of genes that have correlated expression profiles. These genes are believed to belong to the same biological process and/or are co-regulated.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Multiple testing, correlation and regression, and clustering in R Multtest package Anscombe dataset and stats package Cluster package.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Statistics for Microarrays
5-3 Inference on the Means of Two Populations, Variances Unknown
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
General Linear Model & Classical Inference
Hypothesis Testing Statistics for Microarray Data Analysis – Lecture 3 supplement The Fields Institute for Research in Mathematical Sciences May 25, 2002.
Multiple Testing in the Survival Analysis of Microarray Data
Multiple testing in high- throughput biology Petter Mostad.
Candidate marker detection and multiple testing
Essential Statistics in Biology: Getting the Numbers Right
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
Differential Expression II Adding power by modeling all the genes Oct 06.
Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.
Significance Testing of Microarray Data BIOS 691 Fall 2008 Mark Reimers Dept. Biostatistics.
ANOVA (Analysis of Variance) by Aziza Munir
1 False Discovery Rate Guy Yehuda. 2 Outline Short introduction to statistics The problem of multiplicity FDR vs. FWE FDR control procedures and resampling.
Controlling FDR in Second Stage Analysis Catherine Tuglus Work with Mark van der Laan UC Berkeley Biostatistics.
Multiple Testing Mark J. van der Laan Division of Biostatistics U.C. Berkeley
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Techniques for Analysing Microarrays Which genes are involved in ovarian and prostate cancer?
The Multiple Comparisons Problem in IES Impact Evaluations: Guidelines and Applications Peter Z. Schochet and John Deke June 2009, IES Research Conference.
Contrasts & Statistical Inference
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
8.2 Testing the Difference Between Means (Independent Samples,  1 and  2 Unknown) Key Concepts: –Sampling Distribution of the Difference of the Sample.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Comparison of 2 Population Means Goal: To compare 2 populations/treatments wrt a numeric outcome Sampling Design: Independent Samples (Parallel Groups)
The False Discovery Rate A New Approach to the Multiple Comparisons Problem Thomas Nichols Department of Biostatistics University of Michigan.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Multiple testing in large-scale gene expression experiments Statistics 246, Spring 2002 Week 8, Lecture 2.
Chapter 10 Statistical Inference for Two Samples More than one but less than three! Chapter 10B < X
The Broad Institute of MIT and Harvard Differential Analysis.
Multiple testing in large-scale gene expression experiments
1 Significance analysis of Microarrays (SAM) Applied to the ionizing radiation response Tusher, Tibshirani, Chu (2001) Dafna Shahaf.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Bonferroni adjustment Bonferroni adjustment (equally weighted) – Reject H 0j with p i
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Canadian Bioinformatics Workshops
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 16 : Summary Marshall University Genomics Core Facility.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Multiple Testing Methods for the Analysis of Microarray Data
Differential Gene Expression
Elementary Statistics
Incorporating the sample correlation between two test statistics to adjust the critical points for the control of type-1 error Dror Rom and Jaclyn McTague.
Presentation transcript:

Multiple Testing Procedures Examples and Software Implementation

Multiple Testing in Action Examples From New Book Multiple Testing Procedures with Applications to Genomics (2007). S. Dudoit and M. J. van der Laan.

Multiple Testing Software R package multtest()

Main functions: mt.rawp2adjp() Adjusted p-values are computed for simple (Marginal) FWER and FDR controlling procedures based on a vector of raw (unadjusted) p-values. Possible methods –Bonferroni single-step adjusted p-values for strong control of the FWER. –Holm (1979) step-down adjusted p-values for strong control of the FWER. –Hochberg (1988) step-up adjusted p-values for strong control of the FWER (for raw (unadjusted) p-values satisfying the Simes inequality). –Sidak single-step adjusted p-values for strong control of the FWER (for positive orthant dependent test statistics). –Sidak step-down adjusted p-values for strong control of the FWER (for positive orthant dependent test statistics). –BH adjusted p-values for the Benjamini & Hochberg (1995) step-up FDR controlling procedure (independent and positive regression dependent test statistics). –BY adjusted p-values for the Benjamini & Yekutieli (2001) step-up FDR controlling procedure (general dependency structures). Returns adjusted p-values and rank index

Main functions: MTP() A user-level function to perform multiple testing procedures (MTP). Available Tests (robust versions available for t-tests and f-tests) –One-sample t-test –Two-sample t-test (equal unequal variances, and paired) –F-test (block design as well) –lm.XvsZ : t-stat for coefficients of X j ~Z, for each gene (X j ) in matrix –lm.YvsXZ : t-stat for coefficients of Y~X j + Z, where Z are additional covariates –coxph.YvsXZ: same as lm.YvsXZ but for cox proportional hazards survival models Controls Error Rates –Fwer –gFwer –FDR –TPPFP Multiple Testing Methods –single-step maxT –single-step minP –step-down maxT –step-down minP Bootstrap and permutation null distributions are available. Returns estimates, statistics, raw and adjusted p-values, etc.

Software Example Objective: Identify differentially expressed genes between B-cell acute lymphoblastic leukemia (ALL) patients with BCR/ABL fusion and cytogenetically normal B-cell ALL patients BCR/ABL is one of the most frequent cytogenetic abnormalities in human leukemia Known to be highly expressed in chronic myeloid leukemia (CML) and acute myeloid leukemia (AML), studies are investigating its prognostic relevance in B-cell ALL patients Identify differentially expressed genes which distinguish BCR/ABL ALL patients from normal ALL patients. Data available online in Bioconductor experimental data package ALL Data is reduced to only B-cell ALL samples of BCR/ABL or NEG (normal) molecular types 79 patients total: 37 BCR/ABL and 42 NEG Probe set (12,625) is filtered according to von Heydebreck et al. (2004), and mapped into genes  2073 genes remaining

Single-step maxT procedure using MTP() Based on 2-sample Welch t-statistics and non- parametric estimation of null distribution using bootstrap sample of B=5,000 X=gene set, Y=BCR/ABL classification, seed=999 X=gene set, Y=BCR/ABL classification, seed=999 SSmaxT is class MTP with attributes Summary, print, and plot methods are available

maxT Results summary(SSmaxT)print(SSmaxT)

plot(SSmaxT)

Single-step minP procedure using MTP() If keep.nulldist=TRUE in original MTP call, to apply alternative multiple testing procedure, MTP() object can be updated summary(SSminP)

minP Results print(SSmaxT)

plot(SSminP)

Comparing Single-step minP and maxT Results At FWER level  =0.05 –maxT identifies 13 genes –minP identifies 25 genes 12 genes are identified by both methods

FWER controlling Marginal Mutiple testing using mt.rawp2adjp() Bootstrap unadjusted p-values are provided by MTP() call (SSmaxT) Apply Marginal FWER controlling procedures (Bonferroni, Holm, and Hochberg) using mt.rawp2adjp() Apply Marginal FWER controlling procedures (Bonferroni, Holm, and Hochberg) using mt.rawp2adjp()

FWER controlling Marginal Mutiple testing using mt.rawp2adjp() Compare the number of rejected null hypotheses and their ranks at various  cut-offs

Comparison Plots

Summary

MTPs

Acknowledgments Sandrine Dudoit who provided the slides and examples for this presentation Mark van der Laan

References for Section 3

References for Section 3 (con’t)