It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of.

It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual Changes in the way we do phase II trials Adaptive methods for the refinement and evaluation of predictive biomarkers in the pivotal trials in a non-exploratory manner Use of archived tissues in focused “prospective-retrospective” designs based on randomized pivotal trials

Biomarker Adaptive Threshold Design
Wenyu Jiang, Boris Freidlin & Richard Simon JNCI 99: , 2007

Biomarker Adaptive Threshold Design
Randomized trial of T vs C Have identified a univariate biomarker index B thought to be predictive of patients likely to benefit from T relative to C Eligibility not restricted by biomarker No threshold for biomarker determined Biomarker value scaled to range (0,1) Time-to-event data

Procedure A Fallback Procedure
Compare T vs C for all patients If results are significant at level .04 claim broad effectiveness of T Otherwise proceed as follows

Procedure A Test T vs C restricted to patients with biomarker B > b
Let S(b) be log likelihood ratio statistic Repeat for all values of b Let S* = max{S(b)} Compute null distribution of S* by permuting treatment labels If the data value of S* is significant at 0.01 level, then claim effectiveness of T for a patient subset Compute point and interval estimates of the threshold b

Sample Size Planning (A)
Standard broad eligibility trial is sized for 80% power to detect reduction in hazard D at significance level 5% Biomarker adaptive threshold design is sized for 80% power to detect same reduction in hazard D at significance level 4% for overall analysis

Model Hazard reduction for those who benefit Overall Power Adaptive Test Everyone benefits 33% .775 .751 50% benefit 60% .888 .932 25% .429 .604

Biomarker Adaptive Threshold A
Estimated Power of Broad Eligibility Design (n=386 events) vs Adaptive Design A (n=412 events) 80% power for 30% hazard reduction Model Broad Eligibility Design Biomarker Adaptive Threshold A 40% reduction in 50% of patients (22% overall reduction) .70 .78 60% reduction in 25% of patients (20% overall reduction) .65 .91 79% reduction in 10% of patients (14% overall reduction) .35 .93

Estimation of Threshold

506 prostate cancer patients were randomly allocated to one of four arms: Placebo and 0.2 mg of diethylstilbestrol (DES) were combined as control arm C 1.0 mg DES, or 5.0 mg DES were combined as E. The end-point was overall survival (death from any cause). Covariates: Age: In years Performance status (pf): Not bed-ridden at all vs other Tumor size (sz): Size of the primary tumor (cm2) Index of a combination of tumor stage and histologic grade (sg) Serum phosphatic acid phosphatase levels (ap)

# patients with measured covariate
Prostate Cancer Data Covariate # patients with measured covariate Overall Test p value Procedure A Stage 2 Procedure B AP 505 .084 .019 .041 SG 494 .110 .025 .050

# patients with measured covariate
Prostate Cancer Data Covariate # patients with measured covariate Estimated Threshold 95% CI 80% CI AP 505 36 (9,170) (25,108) SG 494 11 (10,13) (11,11)

Procedure B S(b)=log likelihood ratio statistic for treatment effect in subset of patients with Bb T=max{S(0)+R, max{S(b)}} Compute null distribution of T by permuting treatment labels If the data value of T is significant at 0.05 level, then reject null hypothesis that T is ineffective Compute point and interval estimates of the threshold b

Sample Size Planning (B)
Estimate power of procedure B relative to standard broad eligibility trial based on Table 1 for the row corresponding to the expected proportion of sensitive patients ( ) and the target hazard ratio for sensitive patients e.g. =25% and =.4 gives RE=.429/.641=.67 When B has power 80%, overall test has power 80*.67=53% Use formula B.2 to determine the approximate number of events needed for overall test to have power 53% for detecting =.4 limited to =25% of patients

Example Sample Size Planning for Procedure B
Design a trial to detect =0.4 (60% reduction) limited to =25% of patients Relative efficiency from Table /.641=.67 When procedure B has power 80%, standard test has power 80%*.67=53% Formula B.2 gives D’=230 events to have 53% power for overall test and thus approximate 80% power for B Overall test needs D=472 events for 80% power for detecting the diluted treatment effect

Events needed to Detect Hazard Ratio  With Proportional Hazards

Events (D’) Needed for Overall Test to Detect Hazard Ratio  Limited to Fraction 

Multiple Biomarker Design A Generalization of the Biomarker Adaptive Threshold Design
Have identified K candidate binary classifiers B1 , …, BK thought to be predictive of patients likely to benefit from T relative to C RCT comparing new treatment T to control C Eligibility not restricted by candidate classifiers Let the B0 classifier classify all patients positive

Test T vs C restricted to patients positive for Bk for k=0,1,…,K
Let S(Bk) be a measure of treatment effect in patients positive for Bk Let S* = max{S(Bk)} , k* = argmax{S(Bk)} S* is the largest treatment effect observed k* is the marker that identifies the patients where the largest treatment effect is observed

For a global test of significance
Randomly permute the treatment labels and repeat the process of computing S* for the shuffled data Repeat this to generate the distribution of S* under the null hypothesis that there is no treatment effect for any subset of patients The statistical significance level is the area in the tail of the null distribution beyond the value of S* obtained for the un-suffled data If the data value of S* is significant at 0.05 level, then claim effectiveness of T for patients positive for marker k*

Repeating the analysis for bootstrap samples of cases provides
an estimate of the stability of k* (the indication)

Adaptive Signature Design An adaptive design for generating and prospectively testing a gene expression signature for sensitive patients Boris Freidlin and Richard Simon Clinical Cancer Research 11:7872-8, 2005

Adaptive Signature Design End of Trial Analysis
Compare E to C for all patients at significance level 0.04 If overall H0 is rejected, then claim effectiveness of E for eligible patients Otherwise

Otherwise: Using only the first half of patients accrued during the trial, develop a binary classifier that predicts the subset of patients most likely to benefit from the new treatment E compared to control C Compare E to C for patients accrued in second stage who are predicted responsive to E based on classifier Perform test at significance level 0.01 If H0 is rejected, claim effectiveness of E for subset defined by classifier

Sensitive subset .01 level test 42.2
Treatment effect restricted to subset. 10% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400 patients. Test Power Overall .05 level test 46.7 Overall .04 level test 43.1 Sensitive subset .01 level test (performed only when overall .04 level test is negative) 42.2 Overall adaptive signature design 85.3

Sensitive subset .01 level test 1.0
Overall treatment effect, no subset effect. 10% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400 patients. Test Power Overall .05 level test 74.2 Overall .04 level test 70.9 Sensitive subset .01 level test 1.0 Overall adaptive signature design

True Model

Classifier Development
Using data from stage 1 patients, fit all single gene logistic models (j=1,…,M) Select genes with interaction significant at level 

Classification of Stage 2 Patients
For i’th stage 2 patient, selected gene j votes to classify patient as preferentially sensitive to T if

Classification of Stage 2 Patients
Classify i’th stage 2 patient as differentially sensitive to T relative to C if at least G selected genes vote for differential sensitivity of that patient

Empirical Power Response Rate for Control Patients 25%
Response Rate in Sensitive Subset Overall .05 Overall .04 Subset .01 Overall Adaptive 98% 49.5 45.4 75.8 85.7 95% 43.0 38.5 63.1 75.0 87% 36.7 31.7 34.5 51.6 80% 31.6 28.4 17.6 38.8 71% 26.0 22.6 6.3 26.3

Chief, Biometric Research Branch, National Cancer Institute
Adaptive Signature Design for Clinical Trial of Advanced Prostate Cancer Richard Simon, D.Sc. Chief, Biometric Research Branch, National Cancer Institute

Cancers of a primary site often represent a heterogeneous group of diverse molecular diseases which vary fundamentally with regard to the oncogenic mutations that cause them their responsiveness to specific drugs

How can we develop new drugs in a manner more consistent with modern tumor biology and obtain reliable information about what regimens work for what kinds of patients?

Developing a drug with a companion test increases complexity and cost of development but should improve chance of success and has substantial benefits for patients and for the economics of medical care

Although the randomized clinical trial remains of fundamental importance for predictive genomic medicine, some of the conventional wisdom of how to design and analyze rct’s requires re-examination The concept of doing an rct of thousands of patients to answer a single question about average treatment effect for a target population presumed homogeneous with regard to the direction of treatment efficacy in many cases no longer has an adequate scientific basis

Predictive biomarkers
Measured before treatment to identify who will benefit from a particular treatment

Prospective Co-Development of Drugs and Companion Diagnostics in Ideal Settings
Develop a completely specified classifier identifying the patients most likely to benefit from a new drug Based on biology, pre-clinical data and phase I-II studies Establish analytical validity of the classifier Design and analyze a focused clinical trial to evaluate effectiveness of the new treatment and how it relates to the classifier

Cancer biology is complex and it is not always possible to have the right single completely defined predictive classifier identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual Adaptive methods for the refinement and evaluation of predictive biomarkers in the pivotal trials in a non-exploratory manner Use of archived tissues in focused “prospective-retrospective” designs based on previously conducted randomized pivotal trials Simon, Paik, Hayes; JNCI 101:1-7, 2009

Adaptive Signature Design
Boris Freidlin and Richard Simon Clinical Cancer Research 11:7872-8, 2005

Adaptive Signature Design End of Trial Analysis
Compare X to C for all patients at significance level 0.01 If overall H0 is rejected, then claim effectiveness of X for eligible patients Otherwise Compare X to C in adaptively defined subset of patients using threshold of statistical significance 0.04

Divide the patients randomly into a training set T and a validation set V. The training set will contain one-third of the patients. Using the biomarker information, treatment and outcome for the patients in T, develop a binary classifier that identifies the subset of patients who appear most likely to benefit from the new treatment X compared to control C f(B1,B2,B3,B4) = log hazard ratio of death for X relative to C as a function of biomarker values If f(B1,B2,B3,B4)/ser <c then Classifier(B1,B2,B3,B4)=X If f(B1,B2,B3,B4)/ser >c then Classifier(B1,B2,B3,B4)=C Cutpoint c optimized

Use the classifier developed in training set T to classify the patients in the validation set V.
Let VX denote the subset of patients in V who are classified as likely to benefit from X Compare survivals of patients who received T to survivals of those who received C for patients accrued in VX If the difference in survival is significant at level 0.04, then the new treatment is more effective than the control for patients with biomarker values for which Classifier(B1,B2,B3,B4) =X.

This approach can also be used to identify the subset of patients who don’t benefit from X in cases where X is superior to C overall at the 0.01 level. The patients in VC= V – VX are predicted not to benefit from X. Survivals of X vs C can be examined for patients in that subset and a confidence interval for the hazard ratio calculated.

This design has improved statistical power for identifying treatments that benefit a subset of patients in molecularly heterogeneous diseases It has greater specificity than the standard approach which results in over-treatment of vast numbers of patients with approved drugs that do not benefit them

Sample Size Planning for Advanced Prostate Cancer Trial
Survival endpoint Final analysis when there are 700 deaths total 90% power for detecting a 25% overall reduction in hazard at two-sided 0.01 significance level (increase in median from 12 months to 9 months) Power for evaluating treatment in adaptively determined subset 157 deaths required for 80% power to detect 37% reduction in hazard at two-sided 0.04 significance level. If one-third of patients in the validation set are classifier positive, then to have 157 deaths in the subset we need 157*3=471 deaths in the validation set. Since the validation set is two-thirds of the total, we require 707 total deaths. To have 700 deaths at final analysis, 935 patients will be accrued and followed till the event rate is 75%

Sample Size Planning For this example, the sample size is strongly dependent on having high statistical power for detecting relatively modest treatment effect overall and in an adaptively defined subset consisting of only 33% of the patients. The number of required patients can be substantially reduced by Targeting larger treatment effects Targeting treatment benefits that apply to more than 33% of the patients Refining the simple interim analysis for futility described for this example

Tumor specimen at entry as condition for eligibility
Specimen preserved for later assay Assays will be performed prior to analysis using analytically validated tests Reproducible, robust and accurate for use with archived tissue No cut-point required Additional markers could be included prior to using specimens

Interim Futility Analysis
Interim futility analysis conducted when there are approximately 340 patients who have been followed for 6 months after randomization The analysis will use 6-month progression-free survival as intermediate endpoint. If difference between X group and C group is not significant at one-sided 0.20 level, then accrual will be terminated Power 90% for detecting 12 percentage point increase in proportion free of recurrence at 6 months from baseline of 40%

Interim Futility Analysis
Interim futility analysis does not utilize any of the 5% type I error of the study Using 6 month PFS as endpoint for interim futility analysis does not assume that PFS is a valid surrogate of survival; only that it is plausible to not expect a survival benefit if there is no PFS benefit Using PFS enables trial to be stopped earlier if there is no evidence of benefit for X The one-sided 0.20 significance level is used because the overall effect may be weak if the treatment benefits only a 33% subset of the patients.

If the Markers Were Measured at Randomization
Analytically validated tests would be required by the start of accrual The interim analysis could involve marker-defined subsets of patients Restricting accrual based on interim evaluation of marker specific treatment effects could substantially reduce sample size but would introduce additional issues not addressed in the current design

Key Features Trial-wise type I error limited to 0.05
Chance of any false positive conclusion of treatment benefit limited to 0.05 Randomized treatment assignment Regulatory endpoint Sample size sufficient for evaluating treatment effect in 33% subset Biomarkers measured using analytically validated tests Analysis algorithm pre-defined, and specific analysis plan defined prior to any assaying of tumors or data analysis

This approach is as sound statistically as the conventional one treatment fits all design
It provides strong evidence for evaluating the new treatment overall and within the classifier positive subset and for evaluating the classifier In settings where a single conventional “average effect” trials would be the basis for drug approval, this design should be the basis for approval either overall or for the identified subset.

This approach is more science based and consistent with tumor biology than the standard approach of treating thousands of patients with a heterogeneous disease to answer one question of whether the average treatment effect is zero and then treating everyone in a one treatment fits all manner.

Cross-Validated Adaptive Signature Design
Wenyu Jiang, Boris Freidlin, Richard Simon Clin Ca Res 16:691-8, 2010

Cross-Validated Adaptive Signature Design End of Trial Analysis
Compare T to C for all patients at significance level overall If overall H0 is rejected, then claim effectiveness of T for eligible patients Otherwise

Otherwise Partition the full data set into K parts
Form a training set by omitting one of the K parts. The omitted part is the test set Using the training set, develop a predictive classifier of the subset of patients who benefit preferentially from the new treatment T compared to control C using the methods developed for the ASD Classify the patients in the test set as either sensitive or not sensitive to T relative to C Repeat this procedure K times, leaving out a different part each time After this is completed, all patients in the full dataset are classified as sensitive or insensitive

Perform test at significance level 0.05 - overall
Compare T to C for sensitive patients by computing a test statistic S e.g. the difference in response proportions or log-rank statistic (for survival) Generate the null distribution of S by permuting the treatment labels and repeating the entire K-fold cross-validation procedure Perform test at significance level overall If H0 is rejected, claim effectiveness of E for subset defined by classifier The sensitive subset is determined by developing a classifier using the full dataset

80% Response to T in Sensitive Patients 25% Response to C otherwise 25% Response to C 10% Patients Sensitive ASD CV-ASD Overall 0.05 Test 0.223 0.240 Overall 0.04 Test 0.198 0.209 Sensitive Subset 0.01 Test 0.205 0.661 Overall Power 0.351 0.714

70% Response to T in Sensitive Patients 25% Response to T Otherwise 25% Response to C 20% Patients Sensitive ASD CV-ASD Overall 0.05 Test 0.486 0.503 Overall 0.04 Test 0.452 0.471 Sensitive Subset 0.01 Test 0.207 0.588 Overall Power 0.525 0.731

70% Response to T in Sensitive Patients 25% Response to T Otherwise 25% Response to C 30% Patients Sensitive ASD CV-ASD Overall 0.05 Test 0.830 0.838 Overall 0.04 Test 0.794 0.808 Sensitive Subset 0.01 Test 0.306 0.723 Overall Power 0.825 0.918

35% Response to T 25% Response to C No Subset Effect
ASD CV-ASD Overall 0.05 Test 0.586 0.594 Overall 0.04 Test 0.546 0.554 Sensitive Subset 0.01 Test 0.009 Overall Power

25% Response to T 25% Response to C No Subset Effect
ASD CV-ASD Overall 0.05 Test 0.047 0.056 Overall 0.04 Test 0.04 0.048 Sensitive Subset 0.01 Test 0.001 Overall Power 0.041

Predictive Analysis of Clinical Trials
Using cross-validation we can evaluate our methods for analysis of clinical trials, including complex subset analysis algorithms, in terms of their effect on improving patient outcome via informing therapeutic decision making R. Simon Clinical trials for predictive medicine, Clinical Trials 2010:1-9

Define an algorithm for predicting optimal treatment as function of covariate vector x using training dataset D For patients with covariate vector x, the algorithm predicts preferred treatment R(x | D) = T Or R(x | D) = C

At the conclusion of the trial randomly partition the patients into 10 equally sized sets P1 , … , P10 Let D-i denote the full dataset minus data for patients in Pi Using 10-fold complete cross-validation, omit patients in Pi Apply the defined algorithm to analyze the trial using only data in D-i For each patient j in Pi record the treatment recommendations based on fitting the algorithm to data D-I i.e. Rj=T or Rj=C.

Repeat the above for all 10 loops of the cross-validation
When all 10 loops are completed, all patients have been classified as what their optimal treatment is predicted to be

Test of Significance for Effectiveness of T vs C Using the New Algorithm
Let A denote the set of patients for whom treatment T is predicted optimal with the new algorithm i.e. Rj=T Compare outcomes for patients in A who actually received T to those in A who actually received C Let z = standardized log-rank statistic Compute statistical significance of z by randomly permuting treatment labels and repeating the entire procedure Do this 1000 or more times to generate the permutation null distribution of treatment effect for the patients predicted to be the best candidates for T

The significance test based on comparing T vs C for the adaptively defined subset is the basis for demonstrating that T is more effective than C for some patients. This test may be more powerful than the standard overall test in cases where the proportion of patients who benefit from T is limited. Although there is less certainty about which patients actually benefit, prediction accuracy may be substantially greater than for the standard single null hypothesis test based method have greater specificity for identifying the right patients

506 prostate cancer patients were randomly allocated to one of four arms: Placebo and 0.2 mg of diethylstilbestrol (DES) were combined as control arm C 1.0 mg DES, or 5.0 mg DES were combined as E. The end-point was overall survival (death from any cause). Covariates: Age: In years Performance status (pf): Not bed-ridden at all vs other Tumor size (sz): Size of the primary tumor (cm2) Index of a combination of tumor stage and histologic grade (sg) Serum phosphatic acid phosphatase levels (ap)

After removing records with missing observations in any of the covariates, 485 observations remained. A proportional hazards regression model was developed using patients in both E and C groups. Main effect of treatment, main effect of covariates and treatment by covariate interactions were considered. log[HR(z,x)]=a z + b’x + z c’x z = 0,1 treatment indicator (z=0 for control) x = vector of covariates log[HR(1,x)] – log[HR(0,x)] = a + c’x Define classifier C(X) = 1 if a + c’x < c = 0 otherwise c was fixed to be the median of the a + c’x values in the training set.

Figure 1: Overall analysis. The value of the log-rank statistic is 2
Figure 1: Overall analysis. The value of the log-rank statistic is 2.9 and the corresponding p-value is The new treatment thus shows no benefit overall at the 0.05 level.

Figure 2: Cross-validated survival curves for patients predicted to benefit from the new treatment. log-rank statistic = 10.0, permutation p-value is .002

Figure 3: Survival curves for cases predicted not to benefit from the new treatment. The value of the log-rank statistic is 0.54.

Proportional Hazards Model Fitted to Full Dataset
coef p-value Treatment age pf(Normal.Activity) sz sg ap Treatment*age Treatment*pf(Normal.Activity) Treatment*sz Treatment*sg Treatment*ap

By applying the analysis algorithm to the full RCT dataset D, recommendations are developed for how future patients should be treated; i.e. R(x|D) for all x vectors. The stability of the recommendations can be evaluated based on the distribution of R(x|D(b)) for non-parametric bootstrap samples D(b) from the full dataset D.

Characteristics of Patients for Whom Classifications are Stable
< 20% of classifiers favor X >80% of classifiers favor X Median IQR Age 76 4 68 12 Size 7 8 17 Stage 9 2 11 Acid Phos 0.6 0.4 1.45 9.42 Perf Status 144 good 31 poor 166 good 2 poor

Standard Analysis Algorithm
Test the overall H0 If you reject H0 then treat all future patients with T, otherwise treat all future patients with C

Expected K-Year DFS Using Standard Analysis
If the overall null hypothesis is not rejected Expected K-Year DFS is the observed K-year DFS in the control group If the overall null hypothesis is rejected Expected K-Year DFS is the observed K-year DFS in T group

Expected K-Year DFS Using New Algorithm
Let S(T) = observed K-year DFS for patients j for whom Rj=T and who received treatment T mT such patients Let S(C) = observed K-year DFS for patients j for whom Rj=C and who received treatment C mC such patients Expected K-Year DFS using new algorithm {mT S(T) + mC S(C)}/{mT + mC} Confidence limits for this estimate can be obtained by bootstrapping the complete cross-validation procedure

Hence, alternative methods for analyzing RCT’s can be evaluated in an unbiased manner with regard to their value to patients using the actual RCT data

It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of.

Similar presentations

Presentation on theme: "It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of.

Similar presentations

Presentation on theme: "It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of."— Presentation transcript:

Similar presentations

About project

Feedback