Advancing Pharmacogenomics Analysis of Drug Response in Early-phase Clinical Trials Hong Zhang, Judong Shen & Devan V. Mehrotra Early Development Statistics, BARDS, Merck 07/29/2019
Pharmacogenomics (PGx) towards precision medicine Genetic markers as stratifying factors Adams et al. Clinical Pharmacogenomics Applications in Nephrology. CJASN 2018 Schork, Personalized medicine: Time for one-person trials, Nature 2015.
Costs/benefits of PGx Based on a prospective, open-label, randomized controlled trial (n=110). Elliot et al. Clinical impact of pharmacogenetic profiling with a clinical decision support tool in polypharmacy home health patients: A prospective pilot randomized controlled trial . PLOS ONE 2017
Clinical PGx strategy in Merck Phase I Phase II Phase III Stratified Trial Enriched for responders YES Inform Discovery Genetic Marker/ Positive Result Candidate Genes Drug metabolism, drug targets Primary Discovery GWAS + WES Continued Discovery GWAS, targeted genotyping, +/- WES NO GENETICS Genetic variation explains PK variability Genetic variation Explains variable safety and efficacy Validation: Phase II GWAS “hit” predicts for response in Phase 3 IMMUNO-ONCOLOGY Biomarker Validation, Mechanism of Action, Novel Biomarkers, New Targets Modified from Urban et al., 2014
PGx in early-phase clinical development Objective: identify genetic markers that impact drug responses Drug exposure: Pharmacokinetics (PK) parameters (i.e., Cmax, AUC, etc.) Candidate (ADME) gene study used in phase 1 study. Drug safety and efficacy Candidate gene study and Genome Wide Association Study (GWAS) used in phase 2 or later phase. Approaches: test the association between genetic variants and drug responses Treatment arm only vs. both arms; Single variant vs. multiple variants; Single trait vs. multiple traits. Statistical challenges Inflated type I error and limited power for discovery due to small sample size; Bias adjustment of the effect size for PGx confirmatory study; Lack of actionable findings.
PGx single variant regression model Consider a generalized linear model in the PGx scenario 𝑔 𝐸 𝑌 = 𝛽 0 + 𝛽 𝑇 𝑇+ 𝛽 𝑋 𝑋+ 𝛽 𝐺 𝐺+ 𝛽 𝐺𝑇 𝐺𝑇, where 𝑌 is the response, 𝑇 is the treatment indicator, 𝑋 are control covariates, 𝐺 is the genotype, 𝐺𝑇 is the genotype-by-treatment interaction. We want to test the joint effect: 𝐻 0 : 𝛽 𝐺 = 𝛽 𝐺𝑇 =0 v.s. 𝐻 1 : 𝛽 𝐺 ≠0 𝑜𝑟 𝛽 𝐺𝑇 ≠0, Notation T is binary, placebo T=0, treatment T=1. G or single nucleotide polymorphism (SNP) is coded as 0, 1, or 2. ……
Currently existing single variant joint test methods Likelihood ratio test: 2 𝑙 0 − 𝑙 1 ∼ 𝜒 2 2 , where 𝑙 𝑖 is the maximized log likelihood under 𝐻 𝑖 , 𝑖=0,1. F test (for continuous trait): (𝑛−𝑝)(𝑅𝑆 𝑆 0 −𝑅𝑆 𝑆 1 ) 2𝑅𝑆 𝑆 1 ∼𝐹(2, 𝑛−𝑝) where 𝑅𝑆 𝑆 𝑖 is the residual sum of squares under 𝐻 𝑖 , 𝑖=0,1, 𝑛 is the sample size and 𝑝 is the number of variables. Firth’s penalized likelihood ratio test (for binary trait): 2 𝑙 0 − 𝑙 1 ∼ 𝜒 2 2 , where 𝑙 is the maximized penalized likelihood with penalty function 1 2 lndet(𝐼), where 𝐼 is the Fisher’s information matrix. Firth’s method corrects the bias of maximum likelihood estimation (except for linear regression) to improve inference on small samples.
Motivation for developing alterative methods of LRT and Firth MK phase 2 trial, continuous trait, N=118 MK phase 3 trial, binary trait, N=704
𝑔 𝐸 𝑌 = 𝛽 0 + 𝛽 𝑇 𝑇+ 𝛽 𝑋 𝑋+ 𝛽 𝐺 𝐺+ 𝛽 𝐺𝑇 𝐺𝑇. Weighted joint test Recall a generalized linear model 𝑔 𝐸 𝑌 = 𝛽 0 + 𝛽 𝑇 𝑇+ 𝛽 𝑋 𝑋+ 𝛽 𝐺 𝐺+ 𝛽 𝐺𝑇 𝐺𝑇. Define composite variable 𝑍 𝑤 =𝑤𝐺+(1−|𝑤|)𝐺𝑇, −1≤𝑤≤1. The marginal score statistic: 𝑆 𝑤 =𝑍 𝑤 ′ 𝑢, where the residuals 𝑢=𝑌− 𝜇 , 𝜇 is the estimator of 𝐸 𝑌 under 𝐻 0 . We can also conduct a 1df LRT based on 𝑍 𝑤 : 2 𝑙 0 − 𝑙 1 ∼ 𝜒 1 2 , where 𝑙 0 is the maximized likelihood under 𝐻 0 , and 𝑙 1 is the maximized likelihood under 𝐻 1 :𝑔 𝐸 𝑌 = 𝛽 0 + 𝛽 𝑇 𝑇+ 𝛽 𝑋 𝑋+𝛽 𝑍 𝑤 * Dey R, Schmidt EM, Abecasis GR, Lee S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am. J. Hum. Genet. 2017;2:014.
Weighted joint test: another perspective 𝑤 2 Recall a generalized linear model 𝑔 𝐸 𝑌 = 𝛽 0 + 𝛽 𝑇 𝑇+ 𝛽 𝑋 𝑋+ 𝛽 𝐺 𝐺+ 𝛽 𝐺𝑇 𝐺𝑇 = 𝛽 0 + 𝛽 𝑇 𝑇+ 𝛽 𝑋 𝑋+ (|𝛽 𝐺 |+| 𝛽 𝐺𝑇 |)( 𝛽 𝐺 |𝛽 𝐺 |+ |𝛽 𝐺𝑇 | 𝐺+ 𝛽 𝐺𝑇 |𝛽 𝐺 |+| 𝛽 𝐺𝑇 | 𝐺𝑇) = 𝛽 0 + 𝛽 𝑇 𝑇+ 𝛽 𝑋 𝑋+𝛽 𝑤 1 𝐺+ 𝑤 2 𝐺𝑇 , where 𝛽= |𝛽 𝐺 |+ |𝛽 𝐺𝑇 | and 𝑤 1 , 𝑤 2 ∈ −1, 1 , 𝑤 1 + 𝑤 2 ≡1. In light of this transformation, the marginal score test is the score test of a single variable 𝑍 𝑤 = 𝑤 1 𝐺+ 𝑤 2 𝐺𝑇 that combines the main term and the interaction term according to some weight. The optimal weights are of course 𝑤 1 = 𝛽 𝐺 |𝛽 𝐺 |+ |𝛽 𝐺𝑇 | and 𝑤 2 = 𝛽 𝐺𝑇 |𝛽 𝐺 |+| 𝛽 𝐺𝑇 | , which we never know – therefore we developed an adaptive way to approximate the optimal weight. 𝑤 1
𝑃 𝑚𝑖𝑛𝑃>𝑝 =𝑃( 𝑆 𝑤 < 𝐹 𝑤 −1 𝑝 , 𝑤= 𝑤 1 ,…, 𝑤 𝑑 ). New method 1: Adaptive Weighted jOint Test (AWOT) To choose the optimal weight, we propose an adaptive approach by grid search. For example, Equal-step weights: 𝑤 1 =−1, −0.9, …0, 0.1,…,1. Weights focus on interaction effect: 𝑤 1 =−1, −0.5,−0.1, −0.05, −0.01, 0, 0.01,0.05, 0.1, 0.5, 1. Let 𝑝 𝑤 be the p-value of 𝑆 𝑤 = (𝑤 1 𝐺+ 𝑤 2 𝐺𝑇)′𝑢, the adaptive score test statistic is 𝑚𝑖𝑛𝑃= min 𝑤 𝑝 𝑤 . The p-value of minP can be calculated by noting that 𝑃 𝑚𝑖𝑛𝑃>𝑝 =𝑃( 𝑆 𝑤 < 𝐹 𝑤 −1 𝑝 , 𝑤= 𝑤 1 ,…, 𝑤 𝑑 ). 𝑆 𝑤 , 𝑤= 𝑤 1 ,…, 𝑤 𝑑 , follows a multivariate normal distribution with mean zero and covariance matrix: Σ 𝑖𝑗 = 𝑍 𝑤 𝑖 ′ 𝐶𝑜𝑣(𝑢) 𝑍 𝑤 𝑗 ′ .
New method 2: Omnibus Test - Cauchy Weighted jOint Test (CWOT) minP approach is difficult to extend to 1df-LRT since the joint distribution of score and LRT are unknown. To include LRT to cover more signal patterns, we used the Cauchy p-value combination method* 𝑇= 1 2𝑑 𝑖=1 𝑑 𝑗=1 2 tan 𝜋 0.5− 𝑝 𝑖𝑗 , where 𝑝 𝑖𝑗 , 𝑖=1,…,𝑑, 𝑗=1,2 are p-values of 1df-LRT (𝑗=1) or 1df score test (𝑗=2) based on weight 𝑤 𝑖 . Then 𝑃 𝑇>𝑡 ≈𝑃 𝐶>𝑡 = 1 2 − tan −1 𝑡 𝜋 , for large 𝑡, where 𝐶 is standard Cauchy random variable. * Yaowu Liu & Jun Xie (2019) Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures, Journal of the American Statistical Association, DOI: 10.1080/01621459.2018.1554485
AWOT real data analysis results for continuous trait (CT) MK phase 2 trial, continuous trait, N=118, LRT MK phase 2 trial, continuous trait, N=118, AWOT
Simulation results: type I error, joint test for continuous trait MAF n 𝜶=𝟓×𝟏 𝟎 −𝟖 𝜶=𝟏×𝟏 𝟎 −𝟓 CWOT_Score AWOT LRT FT 0.05 200 1.07E-08 1.87E-08 1.44E-07 4.80E-08 5.64E-06 8.66E-06 1.86E-05 9.90E-06 0.15 1.60E-08 4.00E-08 1.72E-07 6.00E-08 5.87E-06 8.83E-06 1.89E-05 1.01E-05 0.25 8.00E-09 2.50E-08 1.91E-07 4.10E-08 5.99E-06 8.84E-06 1.88E-05 0.03 500 2.01E-08 8.97E-08 5.79E-08 8.34E-06 8.54E-06 1.31E-05 3.20E-08 3.40E-08 8.60E-08 4.60E-08 8.42E-06 8.63E-06 3.10E-08 3.30E-08 8.11E-08 5.01E-08 8.31E-06 8.37E-06 3.00E-08 7.90E-08 4.50E-08 0.01 1000 4.40E-08 4.70E-08 6.90E-08 5.70E-08 9.22E-06 9.07E-06 1.14E-05 9.99E-06 4.90E-08 7.30E-08 5.50E-08 9.08E-06 8.92E-06 1.13E-05 9.89E-06 3.70E-08 3.80E-08 6.10E-08 1.12E-05 9.86E-06 5.00E-08 7.10E-08 6.20E-08 9.32E-06 9.09E-06 9.94E-06 3.50E-08 6.40E-08 9.31E-06 9.11E-06 1.00E-05 Type I error ≥ 1.3α is marked in red, 𝛽 𝑇 =0.5, number of simulations = 1E9 CWOT_Score: CWOT by using only score test p-values from each weight FT: F test LRT tends to generate inflated type I error while AWOT has much better type I error control (slightly conservative). F test controls type I error the best.
Simulation results: power, joint test for continuous trait, 𝜶=𝟓×𝟏 𝟎 −𝟖
CWOT real data analysis results for binary trait (BT) MK phase 3 trial, binary trait, N=704
Simulation results: type I error, joint test for binary trait, 𝜶=𝟓×𝟏 𝟎 −𝟖 MAF n CWOT LRT 0.05 200 3.40E-08 2.90E-08 3.30E-08 3.80E-08 4.64E-08 0.15 5.30E-08 1.02E-07 4.60E-08 6.90E-08 2.93E-08 6.59E-08 0.25 5.78E-08 7.68E-08 5.43E-08 9.57E-08 2.59E-08 7.97E-08 0.03 500 5.31E-08 6.50E-08 4.81E-08 3.94E-08 2.66E-08 1.98E-08 5.56E-08 7.35E-08 4.00E-08 3.70E-08 3.23E-08 3.35E-08 5.10E-08 5.70E-08 4.10E-08 7.50E-08 2.73E-08 4.48E-08 5.80E-08 4.40E-08 6.70E-08 2.98E-08 6.52E-08 0.01 1000 2.00E-08 2.10E-08 5.00E-08 3.41E-08 5.60E-08 7.90E-08 5.20E-08 3.49E-08 2.48E-08 6.10E-08 6.00E-08 3.11E-08 2.45E-08 4.80E-08 4.36E-08 8.49E-08 3.89E-08 6.75E-08 Type I error ≥ 6.5E-8 (2SE) is marked in red, number of simulations = 1E9 LRT tends to generate inflated type I error while CWOT controls type I error well (slightly conservative).
Simulation results: type I error, joint test for binary trait, 𝜶=𝟏×𝟏 𝟎 −𝟓 MAF n AWOT CWOT LRT Firth 0.05 200 9.60E-06 1.21E-05 1.33E-05 1.40E-06 9.20E-06 8.80E-06 8.00E-06 5.20E-06 6.60E-06 6.10E-06 6.30E-06 5.00E-06 0.15 9.80E-06 1.40E-05 1.59E-05 6.80E-06 8.60E-06 6.20E-06 3.80E-06 6.97E-06 1.06E-05 5.60E-06 0.25 9.00E-06 1.09E-05 1.32E-05 1.12E-05 1.61E-05 7.20E-06 2.20E-06 1.02E-05 1.56E-05 4.60E-06 0.03 500 1.10E-05 1.26E-05 1.43E-05 1.18E-05 8.50E-06 6.90E-06 5.35E-06 4.24E-06 7.40E-06 1.22E-05 1.19E-05 1.04E-05 5.40E-06 6.00E-06 1.14E-05 1.00E-05 7.80E-06 1.05E-05 1.17E-05 8.20E-06 1.16E-05 9.90E-06 1.30E-05 4.20E-06 8.59E-06 1.52E-05 0.01 1000 8.90E-06 1.01E-05 1.80E-06 8.40E-06 3.40E-06 4.80E-06 1.28E-05 6.40E-06 9.40E-06 8.38E-06 6.16E-06 7.00E-06 1.29E-05 1.08E-05 1.44E-05 7.82E-06 8.73E-06 9.35E-06 1.38E-05 9.30E-06 1.20E-05 9.50E-06 1.11E-05 7.41E-06 9.34E-06 Type I error ≥ 1.3E-05 (2SE) is marked in red, number of simulations = 5E6 Compared to CWOT, LRT tends to generate inflated type I error while CWOT controls type I error well. Firth method has the most conservative type I error rate.
Simulation results: power, joint test for binary trait, 𝜶=𝟓×𝟏 𝟎 −𝟖 In the most interesting PGx scenarios, that is the main effect is weak while the interaction effect is large and in the same direction with the treatment effect, the proposed CWOT method has larger power than 2-df LRT and Firth test. MAF = 15%, 𝛽 𝑇 = 0.5, 𝛽 0 = -1.75, sample size is adjusted for proper power. G=0: SNP-; G=1: SNP+
Conclusions and challenges Joint test for association studies LRT tends to generate inflated type I error while A/CWOT controls type I error well. Firth method has the most conservative type I error rate. The proposed A/CWOT has higher power than 2df- LRT in the most interesting PGx signal patterns. Challenges of PGx in early-phase clinical trials Winner’s curse adjustment is needed for designing confirmatory study. PGx signals may be too weak in early-phase studies. Prediction/stratification using (many) genetic markers: polygenic score?
Thank you!
Appendix: Score test with SPA Let 𝜇 be the estimator of 𝐸 𝑌 under the null model. Define the residuals 𝑢=𝑌− 𝜇 . The marginal score statistic 𝑆= 𝐺 ′ 𝑢. The null distribution of 𝑆 𝑤 can be approximated by normal distribution. The accuracy can be further improved by saddle point approximation (SPA*) for binary trait, because the cumulant-generating function 𝐾(𝑡) of 𝑆 can be explicitly written out. Then SPA calculates the p-value of 𝑆 by P 𝑆>s = 𝐹 (𝑠)≈1−Φ 𝑥+ 1 𝑥 log 𝑦 𝑥 , where 𝑥=𝑠𝑖𝑔𝑛 𝑡 2( 𝑡 𝑠−𝐾( 𝑡 )) , 𝑦= 𝑡 𝐾 ′′ ( 𝑡 ) , 𝑡 is the solution of 𝐾 ′ 𝑡 =𝑠. * Dey R, Schmidt EM, Abecasis GR, Lee S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am. J. Hum. Genet. 2017;2:014.