Choice of Endpoints for Salvage Studies
Clinical Endpoints AIDS-defining events Survival QOL Marker-based Endpoints for Efficacy HIV-1 RNA CD4 Choice of Endpoints
Choice of Endpoints (Cont.) Endpoints for Toxicity Time to treatment discontinuation Targeted adverse events (e.g. lipodystrophy) Composite Endpoint Combine information across different endpoint categories. Time to treatment discontinuation for virological failure or intolerance.
HIV RNA Endpoints Quantitative (change from baseline to Week x) Time to Virological Failure Binary Cross-sectional; e.g. Above/below threshold at week x Failed by Week x
Cross-Sectional vs. Failure Over Time Above/Below threshold at week x Snapshot; not affected by transitional changes in HIV levels. Frequent monitoring not required (batch assaying). Missing data at timepoint especially problematic. Failure Endpoints Assessment of response over time; may be affected by transitional changes in HIV levels. Frequent monitoring required (real time assaying). Missing data strategies need to be defined/evaluated.
Time to Failure vs. Cumulative Proportion Time to Failure Patterns of failure depend on failure time (assumptions). Can be evaluated within an interim analysis (accommodates differential follow-up). Cumulative Proportion Time to failure not considered in analysis. Evaluation with interim analysis may be complicated.
If the pooled failure rate is > 50%, a time-to-event endpoint has appreciable sample size advantages Example: 6 months accrual, 1 year additional follow-up, 2 arm trial: e.g., 6 months accrual, 1 year additional follow-up, 2 arm trial: 50% pooled failure rate, 5% sample size savings 70% pooled failure rate, 15% sample size savings e.g., 1 year accrual, 6 months additional follow-up, 2 arm trial: 50% pooled failure rate, 12% sample size savings 70% pooled failure rate, 25% sample size savings Power Advanatges of Time to Event
With moderate study withdrawal, the sample size savings of the time-to-event endpoint increases further. The sample size savings are larger at interim analyses than at final analyses, in proportion to the fraction of subjects who have less follow-up time than the specified interim analysis time. Time-to-event endpoints also have advantages for evaluating covariate effects and for flexibility in extending the study by prolonging the follow-up period. Analysis Issues
Purely Virologic vs. Composite Purely Virologic Focuses on virologic response only tolerability and safety can be assessed separately Follow-up for viral load is essential after treatment discontinuation. Composite Combines virologic efficacy, tolerability and safety; overall picture. May differ substantially from purely virologic if toxicity rate is high. Purely virologic should be done as secondary endpoint.
Issues in Definition of: Virologic Failure Early failure (rise above nadir/baseline, insufficient decline) Amount of time allowed to go below suppression threshold Choice of threshold for suppression and for loss of suppression Fluctuations due to treatment holds, intercurrent illness, etc. Regimen Completion Virologic failure definition (see above) Number of drugs added/changed before declare treatment failure Subjectivity of treatment discontinuation reasons
Clinical Beliefs Underlying the Appropriate Use of Each Endpoint Purely Virologic Endpoint: The effect of the investigated therapies on plasma HIV-1 RNA levels captures the essential information needed to define the role of the therapies in clinical practice for the target population. Regimen completion endpoint: The necessity to change regimens more closely measures tangible benefit to a patient than does virological failure alone, and, assessing the virologic effect of treatment is unnecessary.
Types of Study Endpoint in HIV Disease Studies Time to Failure Regimen Completion (384, 372A, A5025) Virologic Failure Week x (388, A5076, A5095) Binary Below Threshold at Week x (359, 364, 370, 373, A5086) Not Fail by Week x: failure is defined as: Rise Above Threshold (A5073) Rise Above Threshold, Early Failure (347, 368, 398, A5080) Rise Above Threshold, Early Failure, Off Treatment (372B, 400, A5064) Cumulative Virologic Failure (343)
Composite Endpoints Combine efficacy and toxicity information (e.g. time to Rx discontinuation) Will be more numerous than pure virologic endpoints, but may dilute the effect of treatment. Especially a concern if Rx discontinuation may be unrelated to Rx (pregnancy, imprisonment, moving).
Example Suppose effect of Rx A (compared to B) reduces percentage reaching event from 35% to 17.5%. We need 100 patients per arm to have 80% power. Assume Rx discontinuation rate is 10%/yr for both treatments, and is included in endpoint definition. We have more endpoints but only 60% power to detect the treatment difference. We need 50 additional patients per arm for 80% power.
Example Continued “Pure” Failure 100 Evaluable Patients Failure including Rx Discontinuation 100 Evaluable Patients
ACTG 359:Proportion vs. Change ACTG 359 is a randomized, partially double-blinded, multicenter factorial study of six oral combination antiretroviral regimens: DLV-RD RTVADV-RA DLV + ADV-RDA SQV DLV-ND NFVADV-NA DLV + ADV-NDA Subjects received randomized study treatment for 24 weeks
ACTG 359: Proportion Below Detection
ACTG 359: Mean HIV-1 RNA Change from Baseline
Data Completeness Data Descriptions Above 90% of subjects had week 16 virologic and immunologic data. # of subjects with missing RNA data at week 16. TreatmentRDRARDANDNANDA n Data were assumed to be missing at random.
Primary Efficacy Comparison Proportions of HIV-RNA below 500 at week 16 RTVNFV 28% (35/125)33% (42/127) P = 0.513, Fisher’s exact test DLV ADV DLV + ADV 40% (34/85) 18% (16/88) 33% (17/79) P = 0.006, Chi-square test
Secondary Efficacy Analysis: RNA Change HIV RNA week 16 median change from baseline Treatment RD RA RDA ND NA NDA in log RTV vs. NFV: p = (Logrank), p = (Prentice-Wilcoxon) DLV vs. ADV: p = 0.003, p = 0.011; DLV vs. DLV + ADV: p = 0.262, p = 0.231; ADV vs. DLV + ADV: p = 0.104, p =
ACTG 364
Loss to Follow-Up Need a policy for handling loss to follow- up Drop-out as censored/failure may be biased Sensitivity analyses with various levels of association between drop-out and failure events
ACTG 398 Subjects were stratified for prior PI (protease inhibitor) exposure, by selective randomization to one of four treatment arms: SQV Arm: Amprenavir (APV) + Saquinavir (SQVsgc) + Abacavir (ABC) + Efavirenz (EFV) + Adefovir (ADV) IDV Arm: APV + Indinavir (IDV) + ABC + EFV + ADV NFV Arm: APV + Nelnavir (NFV) + ABC + EFV + ADV Placebo Arm: APV + Placebo (matched to SQVsgc, IDV or NFV)+ ABC + EFV + ADV
ACTG 398 Continued Design and Ideal Enrollment Arms Prior PI ExposureSQVIDVNFVPlaceboTotal SQV only X IDV/RTV only 25 X NFV only X 1565 NFV and IDV/RTV 33 X X NFV and SQV X 33X SQV and IDV/RTV X X NFV, SQV and IDV/RTV Total
ACTG 398
ACTG 398 Continued Estimated Virologic Failure at Week 24 for MAR and M=F (Kaplan-Meier) Treatment NNRTI M=F MAR Arm Experienced? Failure (95%CI) Failure (95%CI) SQV Yes 0.85 (0.74, 0.95) 0.76 (0.62, 0.90) No 0.54 (0.43, 0.66) 0.41 (0.29, 0.53) IDV Yes 0.87 (0.75, 0.99) 0.80 (0.66, 0.94) No 0.53 (0.37, 0.69) 0.42 (0.26, 0.59) NFV Yes 0.73 (0.62, 0.83) 0.66 (0.54, 0.77) No 0.55 (0.43, 0.67) 0.48 (0.36, 0.60) Placebo Yes 0.91 (0.83, 0.98) 0.82 (0.73, 0.92) No 0.63 (0.54, 0.73) 0.52 (0.42, 0.63)
ACTG 398 Continued Primary Comparison of Treatment Arms vs. Placebo P-values for RNA < 200 copies/ml at Week 24 SQV vs Placebo IDV vs Placebo NFV vs Placebo SQV/IDV/NFV vs Placebo Results based on the exact test with stratification by prior PI and NNRTI experience.
ACTG 398 Continued P-values for Confirmed Virologic Failure at/before Week 24 SQV vs Placebo IDV vs Placebo NFV vs Placebo SQV/IDV/NFV vs Placebo M=F MAR Notes: Results based on the exact test with stratification by prior PI and NNRTI experience. MAR = Missing-at-random (missing RNA samples ignored) P-values for Time to Confirmed Virologic Failure SQV vs Placebo IDV vs Placebo NFV vs Placebo SQV/IDV/NFV vs Placebo M=F MAR Notes: Results based on the stratified log-rank test with stratification by prior PI and NNRTI experience. MAR = Missing-at-random (missing RNA samples ignored)
Analysis of Quantitative Endpoints Censored data methods required (log-rank, Prentice log-rank Bias results from excluding missing data Lost observations carried forward can be very biased Consider last rank carried forward for rank-based analysis
Count study withdrawal as failure or as censored? each analysis is likely biased recommend carrying out both analyses as well as more sophisticated sensitivity analyses Discussion Points
What are the criteria for selecting a primary endpoint? Optimally addresses the primary objective, taking into account the patient population and the study drugs Within the pool of possible surrogate markers, it is maximally accurate as a replacement for true clinical endpoints Discussion Points
If the primary endpoint is binary, the Chi- squared test and Fisher’s exact test for a treatment difference are biased if there are censored data A Z-test based on the difference in Kaplan-Meier estimates of the proportion failed is unbiased and efficient use this test routinely Analysis Points