Detecting Treatment by Biomarker Interaction with Binary Endpoints

Slides:

Advertisements

Similar presentations

Lecture 11 (Chapter 9).

Advertisements

Breakout Session 4: Personalized Medicine and Subgroup Selection Christopher Jennison, University of Bath Robert A. Beckman, Daiichi Sankyo Pharmaceutical.

The %LRpowerCorr10 SAS Macro Power Estimation for Logistic Regression Models with Several Predictors of Interest in the Presence of Covariates D. Keith.

Federal Institute for Drugs and Medical Devices | The Farm is a Federal Institute within the portfolio of the Federal Ministry of Health (Germany) How.

Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P

Analysis of variance (ANOVA)-the General Linear Model (GLM)

Topic 6: Introduction to Hypothesis Testing

Department Seminar Merck Research Laboratories Jan 10, 2008

Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.

Chapter 3 Analysis of Variance

Overview of Meta-Analytic Data Analysis. Transformations Some effect size types are not analyzed in their “raw” form. Standardized Mean Difference Effect.

Adaptive Designs for Clinical Trials

Regression and Correlation Methods Judy Zhong Ph.D.

Overview of Meta-Analytic Data Analysis

1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.

Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.

False Discovery Rates for Discrete Data Joseph F. Heyse Merck Research Laboratories Graybill Conference June 13, 2008.

April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.

Therapeutic Equivalence & Active Control Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.

Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.

All-or-None procedure: An outline Nanayaw Gyadu-Ankama Shoubhik Mondal Steven Cheng.

Introduction to the Statistical Analysis of the Clinical Trials

A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.

LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]

Homogeneity test for correlated data in ophthalmologic studies Chang-Xing Ma University at Buffalo 1.

Statistics for Political Science Levin and Fox Chapter Seven

Course: Research in Biomedicine and Health III Seminar 5: Critical assessment of evidence.

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

Jump to first page Inferring Sample Findings to the Population and Testing for Differences.

1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.

Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.

A Parametrized Strategy of Gatekeeping, Keeping Untouched the Probability of Having at Least One Significant Result Analysis of Primary and Secondary Endpoints.

Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.

Methods of Presenting and Interpreting Information Class 9.

European Patients’ Academy on Therapeutic Innovation Challenges in Personalised Medicine.

Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences

Chapter 11 Analysis of Variance

Nonparametric Statistics

Chapter 7. Classification and Prediction

Statistics in Clinical Trials: Key Concepts

CHAPTER 7 Linear Correlation & Regression Methods

Chapter 13 Nonlinear and Multiple Regression

Statistical Data Analysis - Lecture /04/03

REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY

Association between two categorical variables

Genome Wide Association Studies using SNP

CJT 765: Structural Equation Modeling

Statistical inference: distribution, hypothesis testing

Random error, Confidence intervals and P-values

Chapter 7 The Hierarchy of Evidence

Power, Sample Size, & Effect Size:

R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15

I271B Quantitative Methods

Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II

Nonparametric Statistics

Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.

Statistics review Basic concepts: Variability measures Distributions

Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II

Chapter 11: Inference for Distributions of Categorical Data

Risk ratios 12/6/ : Risk Ratios 12/6/2018 Risk ratios StatPrimer.

PSY 626: Bayesian Statistics for Psychological Science

Product moment correlation

Volume 46, Issue 5, Pages (May 2007)

Chapter 10 Introduction to the Analysis of Variance

Chapter 15 Analysis of Variance

Optimal Basket Designs for Efficacy Screening with Cherry-Picking

Björn Bornkamp, Georgina Bermann

Yu Du, PhD Research Scientist Eli Lilly and Company

Logical Inference on Treatment Efficacy When Subgroups Exist

Hong Zhang, Judong Shen & Devan V. Mehrotra

Presentation transcript:

Detecting Treatment by Biomarker Interaction with Binary Endpoints Radha A. Railkar & Devan V. Mehrotra, Merck & Co., Inc. JSM – Denver, CO August 1st, 2019

Outline Precision Medicine Introduction TxB Interaction on Which Scale? Logistic Regression (1 and 2 df tests) New methods F* statistics for measuring the strength of the treatment by biomarker interaction on 5 common scales P-value combination methods – harmonic mean p-value and Aggregated Cauchy Association Test Simulation Study & Results Summary

Precision/Personalized/Stratified Medicine Precision medicine’s objective is to optimize a specific preventive, diagnostic or therapeutic intervention in a given subpopulation of patients, which would most likely benefit from it by taking into account patient characteristics such as disease subtype, clinical features and/or biomarkers Precision Medicine approach is the opposite of the “one-size-fits-all” approach, in which disease treatment or prevention strategies are driven with the model of an average person, with few consideration for individuals Precision medicine makes prevention and therapy more efficient or more adapted to each person in taking into account individual differences

Introduction Context: A phase II/III randomized clinical trial with a binary endpoint Treatment A [control] vs. Treatment B [new] Y = subject-level response to treatment (yes/no) p = E(Y) = true proportion of responders T = treatment (0=A, 1=B) B = biomarker subgroup (0=B-, 1=B+) Goal: to determine whether there is a treatment by biomarker (TxB) interaction

TxB Interaction on Which Scale? When the endpoint is binary, different treatment effect measures (e.g., risk difference, relative risk, odds ratio) yield different interaction tests A TxB interaction may exist on the risk difference (proportion) scale, relative risk (log) scale, odds ratio (logit) scale, or on other scales It is important to be able to detect interactions on these different scales as any such interactions may suggest that the new treatment works better/produces harm in a particular biomarker subgroup Need to develop a test against the null hypothesis of no TxB on any scale

Logistic Regression for Detecting TxB Interaction Analysis model: logit(p) = β0 + βTT + βBB + βTBTxB (other covariates can be added) Traditional Approach Test Hnull: βTB=0 (= no TxB interaction on logit scale) 1 df likelihood ratio test Method proposed by Mehrotra et al (JSM 2017 presentation) Test Hnull: βB=βTB=0 (= no TxB interaction on any scale) 2 df likelihood ratio test Why? TxB interactions rarely exist in the absence of B main effects. This is a much more powerful way to discover biomarkers with TxB interactions

F* Statistic for Measuring TxB Interaction on 5 Common Scales Use a statistic analogous to the Brown and Forsythe (1974) F* statistic to measure the TxB interaction on the following 5 common scales: proportion, logit, log, square root and arcsin 𝐹 𝑔 ∗ = 𝑖=0 1 𝑛 𝑖 ∆ 𝑖 − ∆ 2 𝑖=0 1 1− 𝑛 𝑖 𝑁 𝑛 𝑖 𝑉 ∆ 𝑖 Where 𝑛 𝑖 = 𝑛 𝑖0 𝑛 𝑖1 𝑛 𝑖0 + 𝑛 𝑖1 , 𝑁= 𝑖=0 1 𝑛 𝑖 , ∆ 𝑖 =𝑔 𝑝 𝑖1 − 𝑔 𝑝 𝑖0 , where g(.) is the scale of interest, and ∆ = 𝑖=0 1 𝑛 𝑖 ∆ 𝑖 𝑁 ; 𝑉 ∆ 𝑖 can be derived using a first order Taylor Series approximation Each F*g statistic measures whether the treatment effect on a given scale is consistent across the 2 strata Large values of F*g indicate strong evidence of TxB for scale g(.)

F* Statistic (cont’d) Under the null hypothesis of no TxB interaction on scale g(.), 𝐹 𝑔 ∗ is approximately distributed as F(f1, f2), where 𝑓 1 = 𝑖=0 1 𝑛 𝑖 𝑉 ∆ 𝑖 − 𝑖=0 1 𝑛 𝑖 2 𝑉 ∆ 𝑖 𝑁 2 𝑖=0 1 𝑛 𝑖 𝑉 ∆ 𝑖 2 + 𝑖=0 1 𝑛 𝑖 2 𝑉 ∆ 𝑖 𝑁 2 −2 𝑖=0 1 𝑛 𝑖 𝑛 𝑖 𝑉 ∆ 𝑖 2 𝑁 (Mehrotra 1997) 𝑓 2 = 𝑖=0 1 1− 𝑛 𝑖 𝑁 𝑛 𝑖 𝑉 ∆ 𝑖 2 𝑖=0 1 1− 𝑛 𝑖 𝑁 2 𝑛 𝑖 𝑉 ∆ 𝑖 2 𝑛 𝑖 −1 Each F*g statistic measures whether the treatment effect on a given scale is consistent across the 2 strata Large values of F*g indicate strong evidence of TxB for scale g(.)

F* Statistic (cont’d) Each F*g statistic measures whether the treatment effect on a given scale g(.) is constant across the 2 levels of B Large values of F*g indicate strong evidence of TxB for scale g(.) Obtain 5 p-values corresponding to the F*g statistics on each of the 5 scales Testing Strategy: Combine the 5 p-values in order to test the null hypothesis of no TxB on any scale

P-value Combination Methods Harmonic Mean p-value (HMP) (Wilson 2019): The HMP test combines p-values and corrects for multiple testing while controlling the FWER in a way that is more powerful than common methods such as Bonferroni and Simes procedures, more stringent than controlling the FDR, and is robust to positive correlations between tests 𝑝 𝐻𝑀 = 𝑤 𝑖 𝑤 𝑖 𝑝 𝑖 , where 𝑤 𝑖 =1 If 𝑝 𝐻𝑀 ≤ α then reject the null hypothesis of no TxB on any scale Used in big data applications such as GWAS

P-value Combination Methods (cont’d) Aggregated Cauchy Association Test (ACAT) (Liu 2019): Defined as a weighted sum of Cauchy transformation of individual p-values. It is a powerful and computationally efficient p-value combination method to boost power in sequencing studies 𝑇 𝐴𝐶𝐴𝑇 = 𝑤 𝑖 𝑡𝑎𝑛 0.5− 𝑝 𝑖 𝜋 ,where 𝑤 𝑖 =1 =𝑤 𝑝 𝐴𝐶𝐴𝑇 ≈0.5− arctan 𝑇 𝑤 /𝜋 If 𝑝 𝐴𝐶𝐴𝑇 ≤ α then reject the null hypothesis of no TxB on any scale

Simulation Study True Proportions B- (70%) B+ (30%) B-A (Proportion) B-A (Logit) B-A (Log) No TxB Scale A (Control) B (New) B- B+ All (Null) 0.20 0.40 0.98 0.69 None1 0.18 0.28 0.25 0.68 0.10 0.43 0.57 1.85 0.44 1.00 None2 0.26 0.73 0.06 0.53 0.34 2.38 1.29 Arcsin 0.37 0.49 0.83 0.91 0.12 0.08 0.09 Log 0.41 0.92 0.04 0.17 0.86 Logit 0.19 0.75 0.90 0.21 0.15 1.04 1.09 0.74 Proportion 0.70 1.35 Square root N=200 per treatment group

Simulation Results No TxB Scale Logistic Regression (1 df TxB Test) Logistic Regression (2 df B, TxB Test) F* HMP F* ACATP Type I Error Rate % (α=5%) All (Null) 4.96 4.93 4.43 5.36 Power % None1 74.93 99.96 83.38 59.53 None2 98.43 100 99.30 29.91 Arcsin 7.66 12.15 12.01 Log 21.22 10.67 11.00 Logit 5.39 50.94 43.50 Proportion 11.03 30.63 27.06 Square root 17.75 8.36 8.74 N=200 per treatment group; 10,000 simulations

Summary For a binary endpoint different treatment effect measures yield different interaction tests; therefore need to develop a test against the null hypothesis of no TxB on any scale The logistic regression joint (B, TxB) 2 df test is the most powerful for testing the null of no TxB interaction on any scale Tests based on defining a F* statistic to measure the strength of the TxB on each of the 5 common scales and using a p-value combination method are less powerful; the HM p-value combination method has good power for some of the simulated conditions

References Brown, M. B. and Forsythe, A. B. (1974). The small sample behavior of some statistics which test the equality of several means. Technometrics, 16, 129-132. Liu Y. et al. ACAT (2019): A fast and powerful p-value combination method for rare-variant analysis in sequencing studies. American Journal of Human Genetics on ScienceDirect. Volume 104, Issue 3, 7 March 2019, Pages 410-421. Mehrotra, D. V. (1997). Improving the Brown-Forsythe solution to the generalized Behrens-Fisher problem. Communications in Statistics, Simulation and Computation, 26, 1139-1145. Mehrotra, D. V. et al. A Powerful Learn-and-Confirm Pharmacogenomics Methodology for Randomized Clinical Trials. Presentation at JSM 2017. Wilson, D. J. (2019). The harmonic mean p-value and model averaging by mean maximum likelihood. Proceedings of the National Academy of Sciences Jan 2019, 116 (4) 1195-1200; DOI:10.1073/pnas.1814092116.