1 An Overview of Multiple Testing Procedures for Categorical Data Joe Heyse IMPACT Conference November 20, 2014.

Slides:



Advertisements
Similar presentations
1 In-Vitro Screening for Combination Drug Discovery John J. Peterson GlaxoSmithKline Pharmaceuticals, R&D 2009 Midwest Biopharmaceutical Statistics Workshop.
Advertisements

1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition
Relationship Mining Correlation Mining Week 5 Video 1.
By Trusha Patel and Sirisha Davuluri. “An efficient method for accommodating potentially underpowered primary endpoints” ◦ By Jianjun (David) Li and Devan.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 18, 2012.
Multiple testing adjustments European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham,
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
The Closure Principle Revisited Dror Rom Prosoft Clinical IMPACT Symposium November 20, 2014 Contributions by Chen Chen.
Analysis of gene expression data (Nominal explanatory variables) Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH)
Differentially expressed genes
Lecture 9: One Way ANOVA Between Subjects
False Discovery Rate Methods for Functional Neuroimaging Thomas Nichols Department of Biostatistics University of Michigan.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Chapter 10 Hypothesis Testing
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Testing Dose-Response with Multivariate Ordinal Data Bernhard Klingenberg Asst. Prof. of Statistics Williams College, MA Paper available at
Statistics for Microarrays
False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result = statistically significant result) Ladislav.
Multiple testing correction
Overview of Statistical Hypothesis Testing: The z-Test
Hypothesis Testing Statistics for Microarray Data Analysis – Lecture 3 supplement The Fields Institute for Research in Mathematical Sciences May 25, 2002.
Chapter 10 Hypothesis Testing
Multiple Testing in the Survival Analysis of Microarray Data
Multiple testing in high- throughput biology Petter Mostad.
Lucio Baggio - Lucio Baggio - False discovery rate: setting the probability of false claim of detection 1 False discovery rate: setting the probability.
Week 8 Fundamentals of Hypothesis Testing: One-Sample Tests
Applying False Discovery Rate (FDR) Control in Detecting Future Climate Changes ZongBo Shang SIParCS Program, IMAGe, NCAR August 4, 2009.
Essential Statistics in Biology: Getting the Numbers Right
Statistical problems in network data analysis: burst searches by narrowband detectors L.Baggio and G.A.Prodi ICRR TokyoUniv.Trento and INFN IGEC time coincidence.
Significance Testing of Microarray Data BIOS 691 Fall 2008 Mark Reimers Dept. Biostatistics.
ANOVA (Analysis of Variance) by Aziza Munir
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
False Discovery Rates for Discrete Data Joseph F. Heyse Merck Research Laboratories Graybill Conference June 13, 2008.
One-Sample Tests of Hypothesis Chapter 10 McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
Don’t cry because it is all over, smile because it happened.
Chapter 8 Introduction to Hypothesis Testing ©. Chapter 8 - Chapter Outcomes After studying the material in this chapter, you should be able to: 4 Formulate.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
All-or-None procedure: An outline Nanayaw Gyadu-Ankama Shoubhik Mondal Steven Cheng.
Estimating  0 Estimating the proportion of true null hypotheses with the method of moments By Jose M Muino.
A Comparison of Some Methods for Detection of Safety Signals in Randomised Controlled Clinical Trials Raymond Carragher Project Supervisors: Prof. Chris.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
The Multiple Comparisons Problem in IES Impact Evaluations: Guidelines and Applications Peter Z. Schochet and John Deke June 2009, IES Research Conference.
1 Comparing multiple tests for separating populations Juliet Popper Shaffer Paper presented at the Fifth International Conference on Multiple Comparisons,
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
The False Discovery Rate A New Approach to the Multiple Comparisons Problem Thomas Nichols Department of Biostatistics University of Michigan.
Ark nr.: 1 | Forfatter: Øyvind Langsrud - a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk Rotation Tests - Computing exact adjusted.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
© Copyright McGraw-Hill 2004
1 Drug Screening and the False Discovery Rate Charles W Dunnett McMaster University 3 rd International Conference on Multiple Comparisons, Bethesda, Maryland,
John W. Tukey’s Multiple Contributions to Statistics at Merck Joseph F. Heyse Merck Research Laboratories Third International Conference on Multiple Comparisons.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Guidelines for Multiple Testing in Impact Evaluations Peter Z. Schochet June 2008.
False Discovery Rate for Functional Neuroimaging Thomas Nichols Department of Biostatistics University of Michigan Christopher Genovese & Nicole Lazar.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Chapter 9: Inferences Involving One Population
Hypothesis testing using contrasts
Multiple Endpoint Testing in Clinical Trials – Some Issues & Considerations Mohammad Huque, Ph.D. Division of Biometrics III/Office of Biostatistics/OPaSS/CDER/FDA.
Multiplicity Testing Procedure Selection in Clinical Trials Rachael Wen Ph.D JSM 2018 of 8.
Covering Principle to Address Multiplicity in Hypothesis Testing
Chapter 9 Hypothesis Testing: Single Population
Detecting Treatment by Biomarker Interaction with Binary Endpoints
Incorporating the sample correlation between two test statistics to adjust the critical points for the control of type-1 error Dror Rom and Jaclyn McTague.
Presentation transcript:

1 An Overview of Multiple Testing Procedures for Categorical Data Joe Heyse IMPACT Conference November 20, 2014

Abstract Multiple comparison and multiple endpoint procedures are applied universally in a broad array of experimental settings. In confirmatory clinical trials of candidate drug and vaccine products the interest is in controlling the family-wise error rate (FWER) at a specified level α. Gaining popularity in many other discovery settings is an interest in maintaining the false discovery rate (FDR) as an attractive alternative to strict FWER control. Yosef Hochberg made impactful contributions to both FWER methods (Hochberg, 1988) and FDR methods (Benjamini and Hochberg, 1995) which are widely used in biopharmaceutical applications. When one or more of the hypotheses being tested is based on categorical data, it is possible to increase the power of FWER and FDR controlling procedures. This talk will trace the development of multiple comparison procedures for categorical data, starting with a proposal by Mantel (1980), and continuing to the development of fully discrete FDR controlling procedures. Special attention will be given to Hochberg’s contributions. The situation with multiple correlated endpoints will also be discussed. Simulations and theoretical arguments demonstrate the clear power advantages of multiplicity procedures that take proper accounting of the discreteness in the data. 3

Overview Yosef Hochberg made impactful contributions to both FWER methods and FDR methods which are widely used in biopharmaceutical applications. With discrete data it is possible to increase the power of FWER and FDR controlling procedures. This talk will trace the development of multiple comparison procedures for categorical data from early FWER procedures to fully discrete FDR controlling procedures. Special attention will be given to Hochberg’s contributions. The situation with multiple correlated endpoints will also be discussed. Simulations and theoretical arguments demonstrate the clear power advantages of multiplicity procedures that take proper accounting of the discreteness in the data. 4

Outline 1.Rodent carcinogenicity study circa Tests based on P min 3.Discrete Bonferroni method 4.Discrete Hochberg stepwise method 5.Non-independent hypotheses 6.Discrete FDR methods 7.Concluding remarks 5

Summary of Statistical Results From a Long-Term Carcinogenicity Study in Male Mice 6 Trend P-value is reported 1-tailed using exact permutational distribution. N indicates 1-tailed P-value for negative trend.

Multiplicity of Statistical Tests Liver, hepatocellular carcinoma was only 1 of K=15 tumor sites encountered. P (1) = was the most extreme individual trend P-value. Interest is in the likelihood of observing P (1) = as the most extreme P-value among the K=15 in this study. Need to consider the discrete nature of the data since several tumor sites may not be able to achieve significance levels of P (1).

P-value Adjustment Methods

Bonferroni Bound

Discrete Adjusted Bonferroni

Modified Bonferroni for Discrete Data

Notes for Discrete Bonferroni

Bonferroni: Probability of Falsely Rejecting 1 or More True Hypotheses for Increasing Numbers Hypotheses Number of Hypotheses

Nucleotide Changes in cDNA Transcripts (Tarone, 1990) Ordered Nucleotide 1-Sided P-Value 11/108/ /113/ /114/ /103/ /92/ /112/ / / /82/

Multiplicity Adjustment for cDNA data Unadjusted P-value for Nucleotide 1: P = Bonferroni: P-adj = (= 9 x ) Tarone Adjusted Bonferroni: P-adj = (= 2 x ) Discrete Bonferroni: P-adj = (= )

Hochberg (1988) Step-up Procedure

Hochberg: Probability of Falsely Rejecting 1 or More True Hypotheses for Increasing Numbers Hypotheses Number of Hypotheses

Nucleotide Changes in cDNA Transcripts (Tarone, 1990) Ordered Nucleotide 1-Sided P-Value Hochberg Adj. P Hochberg Discrete Adj. P 11/108/ /113/ /114/ /103/ /92/ /112/ / / /82/

Non-independent Hypotheses Accounting for a known structure can improve the power of the testing procedure. Heyse and Rom (1988) proposed a multivariate permutation test for the rodent carcinogenicity experiment. Westfall and Young (1989, 1993) developed broader resampling approaches which have become the standard application (PROC MULTTEST). Possible to construct exact null distribution for discrete test statistics.

Illustration: Multiresponse representation of tumor data Control (X=0) Treated (X=1)Total Tumor Site A Only167 Tumor Site B Only303 Tumor Sites A&B123 No Tumor Number on Study Site A: S A = 8, E(S A ) = 5 Site B: S B = 2, E(S B ) = 3

Bivariate distribution of scores 21

Rejection regions based on bivariate distribution of scores S A =6S A =7S A =8S A =9S A = AS B =3 --- AS B =4 --- AS B =5 BBBAB S B =6 Modified Bonferroni S A =6S A =7S A =8S A =9S A = AS B =3 --- AS B =4 --- AS B =5 BBAB S B =6 Hochberg

Return to Rodent Carcinogenicity Study MethodAdjusted P-value Adjust Extreme K= Mantel (K*=9) Mantel et al. (Discrete) Heyse and Rom (Permutation)0.2363

False Discovery Rate (FDR) Almost all multiplicity considerations in clinical trial applications are designed to control the Family Wise Error Rate (FWER). Benjamini and Hochberg (1995) argued that in certain settings, requiring control of the FWER is often too conservative. They suggested controlling the “False Discovery Rate” (FDR) as a more powerful alternative. Accounting for the categorical endpoints can further improve the power of FDR (and FWER) methods.

Benjamini & Hochberg FDR Controlling Procedure

Alternate Formulation of B&H Method Using Adjusted P-values

Modified FDR for Discrete Data

Properties of FDR Control The B&H sequential procedure controls the FDR at FDR < FWER and equality holds if K=K 0. The Hochberg (1988) stepwise procedure compares while the FDR procedure compares FDR is potentially more powerful than FWER controlling procedures. for independent hypotheses.

FDR for Categorical Data

Other Approaches for Categorical Data

Example: Genetic Variants of HIV Gilbert (2005) compared the mutation rates at 118 positions in HIV amino-acid sequences of 73 patients with subtype C to 73 patients with subtype B. The B-H FDR procedure identified 12 significant positions. The Tarone modified FDR procedure reduced the dimensionality to 25 and identified 15 significant positions. The fully discrete FDR identified 20 significant positions.

Recent Developments Using mid P-values Heller and Gur (2012) proposed using the B&H method on the mid P- values to reduce the conservatism with discrete data. Also developed a novel step-down procedure for discrete data. Simulations showed that the fully discrete adjustment method controlled the FDR and was most powerful among the tests considered. Example: the 27 most extreme signals from post marketing surveillance of spontaneous adverse experiences was evaluated. –B&H supported 22 signals –B&H applied to mid P supported 25 signals –Fully discrete B&H supported all 27 signals

Simulation Study for Independent Hypotheses A simulation study was conducted to evaluate the statistical properties of the FDR controlling methods for discrete data using Fisher’s Exact Test. Simulation parameters –Number of Hypotheses: K = 5, 10, 15, 20 –Varying numbers of false hypotheses (K-K 0 ) –Background rates chosen randomly from U(.01,.5) –Odds Ratios for Effect Size: OR = 1.5, 2, 2.5, 3 –Sample sizes: N = 10, 25, 50, 100 –  = Tailed

Rate of Rejecting True Hypotheses When All Hypotheses Are True (K 0 =K) 34

Rate of Rejecting True Hypotheses When Some Hypotheses Are False (K 0 <K) 35

Rate of Rejecting False Hypotheses 36

Concluding Remarks Understanding and evaluating multiplicity has been a critically important element in biopharmaceutical statistical applications. Multiplicity issues arise throughout the drug and vaccine development process from discovery, through clinical development, and into the post approval periods. Yosef Hochberg is to be commended for his important contributions. Hochberg’s methods for both FWER and FDR control can be applied in setting with discrete data. Thank you!!

References Benjamini Y and Hochberg Y: Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society, Series B, 57: (1995). Gilbert PB: A Modified False Discovery Rate Multiple-Comparisons Procedure for Discrete Data, Applied to Human Immunodeficienty Virus Genetics. Appl. Statist, 54: (2005). Heller R and Gur H: False Discovery Rate Controlling Procedures for Discrete Tests. Arxiv.org/abs/ (2012). Heyse J: A False Discovery Rate Procedure for Categorical Data. Recent Advances in Biostatistics edited by Bhattacharjee et al., World Scientific Press, (2011). Heyse JF, Rom D: Adjusting for Multiplicity of Statistical Tests in the Analysis of Carcinogenicity Studies. Biom J. 30: , (1988). Hochberg Y: A Sharper Bonferroni Procedure for Multiple Significance Testing. Biometrika 75, (1988). Mantel N, Assessing Laboratory Evidence for Neoplastic Activity. Biometrics, 36: (1980). Mantel N, Tukey JW, Ciminera JL, and Heyse JF: Tumorigenicity Assays, Including Use of the Jackknife. Biom J. 24: , (1982). Rom DM: Strengthening Some Common Multiple Test Procedures for Discrete Data. Statistics in Medicine 11: (1992). Tarone RE: A Modified Bonferroni Method for Discrete Data. Biometrics, 46: (1990). Westfall PH and Young SS: P-Value Adjustments for Multiple Tests in Multivariate Binomial Models. Journal of the American Statistical Association, 84: (1989).