Department Seminar Merck Research Laboratories Jan 10, 2008

Name: Department Seminar Merck Research Laboratories Jan 10, 2008
Uploaded: 2017-10-18T05:32:52+00:00
Duration: PTM16S45
Description: Department Seminar Merck Research Laboratories Jan 10, 2008

Department Seminar Merck Research Laboratories Jan 10, 2008
Analysis of Stratified Trials – Challenging the “Standard” Methods Devan V. Mehrotra Clinical Biostatistics Department Seminar Merck Research Laboratories Jan 10, 2008

Outline Part I: binary response variable > Mantel-Haenszel test
> Minimum risk weights > Simulation results > Conclusions Part II: continuous non-normal response variable > Motivating example > Technical details

Analysis of Binary Data
Part I Analysis of Binary Data

Stratified Trials with Binary Endpoints
2 treatments (A and B), number of strata = s Binary response (responder/non-responder) pij = true (population) proportion for strat i, trt j i = piA - piB = true difference for strat i fi = true (population) relative frequency for strat i = true overall difference = observed proportion for strat i, trt j nij = observed number of subjects in strat i, trt j

Hypothesis Testing: General Framework Superiority or Non-Inferiority Trials

Mantel-Haenszel Test (1959) Superiority Trials
Note: MH test is optimal is constant across strata.

Choice of Variance Null variance [Miettinen & Nurminen 1985, Farrington & Manning, 1990] m.l.e. of under the restriction Note: MH test uses the null variance. Observed (OBS) variance Note: With 1:1 randomization, for superiority trials, and usually so (but not always) for non-inferiority trials.

(pA, pB) pairs where Null or Observed Variance is “Better” Non-Inferiority Margin = 15%

(pA, pB) pairs where Null or Observed Variance is “Better” Non-Inferiority Margin = 5%

>> Estimator of  is ~ unbiased.
Choice of Weights Cochran-Mantel-Haenszel (CMH) weights >> Estimator of  is ~ unbiased. Minimum Risk (MR) weights [Mehrotra & Railkar, 2000] >> Estimator of  has smallest mean squared error. >> If (optimal weights!)

Choice of Finite Sample Term
With CMH weights (i.e., with MH test): is used. With MR weights: is recommended.

Choice of Continuity Correction
With CMH weights: is used by original MH test. However, is a less conservative choice. With MR weights: is recommended. See Mehrotra & Railkar, Stats in Med, 2000

Motivating Example Revisited Test for Superiority

Simulation Results Test for Superiority (2 strata)
(f1 = .7, f2 = .3); No TxS interaction on (a) logit, (b) proportion, (c ) square root, and (d) log scales; 100,000 simulations.

Illustrative Example # 2 Test for Non-Inferiority

Simulation Results Test for Non-inferiority (2 strata)

Simulation Results: Power Test for Non-inferiority (2 strata)

Summary (Part I) For stratified trials with binary responses:
The popular Mantel-Haenszel test uses sample size (CMH) weights with null variances. It has good power properties if and only if the odds ratio is constant across strata. Using minimum risk (MR) weights with observed (OBS) variances will usually provide notably more power than CMH weights with null variances for both superiority and non-inferiority trials. Recommendation: consider MR_OBS as a default, but use simulations to quantify power differences between methods when planning a new trial.

Analysis of Continuous Data Using Ranks
Part II Analysis of Continuous Data Using Ranks

Stratum  Placebo Vaccine Females 3.90, 3.96 1.40, 2.80 2.90 Males
Motivating Example Hypothetical viral loads of HIV+ subjects (log10 copies/ml) Stratum  Placebo Vaccine Females 3.90, 3.96 1.40, 2.80 2.90 Males 3.50, 3.50 3.56, 3.59 3.69, 3.85 4.06, 4.36 4.36, 4.43 4.68, 4.69 4.70, 4.85 5.06, 5.50 1.79, 2.32 2.54, 3.42 3.59, 3.89 4.64, 5.23 5.32

Motivating Example (continued)
Observed viral load summaries (log10 copies/ml): Compared to placebo, the VLs for vaccine appear to be “shifted” to the left (i.e., are numerically smaller). Is the shift statistically significant? Stratum  Summary Placebo Vaccine Females Mean Median SD n 3.93 0.04 2 2.37 2.80 0.84 3 Males 4.27 4.36 0.62 16 3.64 3.59 1.27 9

Stratified rank-based analysis: SAS implementation PROC FREQ; TABLES gender * trt * vload/CMH SCORES=RANK; RUN; TABLES gender * trt * vload/CMH SCORES=MODRIDIT; PROC TWOSAMPL; [Part of PROC StatXact module] WI/AS; PO trt; RE vload; ST gender;

2-tailed p-values using the three “methods”: Different conclusions at =.05 … why? PROC FREQ > Ranks based on pooled sample within each stratum (“stratum-specific” ranks) > SCORES = RANK  equal stratum weights SCORES = MODRIDIT  unequal stratum weights PROC TWOSAMPL: Ranks based on overall pooled sample, ignoring strata (“stratum-invariant” ranks), with equal stratum weights. PROC FREQRANK PROC FREQMODRIDIT PROC TWOSAMPL p = .1506 p = .0642 p = .0436*

Technical Details

Technical Details (continued)

Three Popular Rank-Based Tests Test Stratum weights Comments TEQ wi = 1 PROC FREQ SCORES = RANK Stratum-specific ranks TvE wi = 1/(ni + 1) SCORES = MODRIDIT van Elteren test (1960) PROC TWOSAMPL Stratum-invariant ranks

If there is no true treatment by stratum interaction (i =  for all i), the van Elteren test is optimal among all the stratified test, i.e., wi = 1/(ni + 1) are optimal weights. However, if interaction exists, the van Elteren test can suffer from a power loss. In general, is there an asymptotically optimal test (with optimal weights) that allows for interaction? YES … we derived it , based on stratum-specific ranks.

Estimate and 100(1-)% CI for  Obtained by Inverting the Given Test
Technical Details (continued) Estimate and 100(1-)% CI for  Obtained by Inverting the Given Test Let Let p(c) = 1-tailed p-value for test applied to Obtained via a numerical search.

Motivating Example Revisited 2-tailed p-values
TEQ .1505 [with stratum-invariant ranks] .0435* TvE van Elteren test, .0643 TBrunner described on slide 29 .0250* .0990 Tadap .0654 Talign Aligned rank test Note: All methods except use stratum-specific ranks

Motivating Example Revisited Estimates and 95% CIs for  (selected methods)
Stratum  Summary Placebo Vaccine P - V Females Median n 3.93 2 2.80 3 1.13 Males 4.36 16 3.59 9 0.77 Method: TEQ TvE Talign p-value .1506 .0435 .0643 .0654 Estimate .80 0.94 1.00 .84 95% CI (-0.28, 1.61) (.01, 1.71) (-.04, 1.61) (-.09, 1.53)

Simulation Study 2 treatments, 1:1 randomization per stratum
Number of strata = 2, 4, 6, 8, 10, and 12 Stratum size (ni): 10*i for stratum i Different choices of i: constant for each stratum (no TxS interaction) positively or negatively associated with stratum size (TxS interaction, with 50% power to detect it) Four different distributions for Y: Normal Log Normal Mixture of Normals: 0.9N(m,v) + 0.1N(m*,v*) t3

Simulation Results Type I Error Rate (nominal  = 5%) Normal Distribution
Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)

Simulation Results Type I Error Rate (nominal  = 5%) Lognormal Distribution

Simulation Results Type I Error Rate (nominal  = 5%) Mixture of Normals Distribution

Simulation Results Type I Error Rate (nominal  = 5%) t3 Distribution

Simulation Results: Power (%) No T x S interaction (constant

Simulation Results: Power (%) No T x S interaction

Simulation Results: Power (%) Normal Distribution

Simulation Results: Power (%) Lognormal Distribution

Simulation Results: Power (%) Mixture of Normals

Simulation Results: Power (%) t3 Distribution

Conclusions (Part II) For rank-based analyses of stratified trials:
> No single method is uniformly the best  Recommendation: use the aligned rank test (Talign) or either of the proposed adaptive tests (Tadap1 or Tadap2). Both tests were more powerful than the van Elteren test (TvE) in every case studied, notably so when there was a true (but hard to detect) T x S interaction. > It is time to retire the popular van Elteren test!

References Brunner, E., Puri, M. L., and Sun, S. (1995). Nonparametric Methods for Stratified Two-Sample Designs with Application to Multiclinic Trials. Journal of American Statistical Association, 90, Hodges, J. L. and Lehman, E. C. (1962). Rank Methods for Combination of Independent Experiments in the Analysis of Variance. Annals of Mathematical Statistics, 33, Mehrotra, D.V. and Railkar, R. (2000). Minimum Risk Weights for Comparing Treatments in Stratified Binomial Trials. Statistics in Medicine, 19, Wang, W., Mehrotra, D.V., Chan, I.S.F. and Heyse, J.F. (2006). Non-Inferiority /Equivalence Trials in Vaccine Development. Journal of Biopharmaceutical Statistics, 16, Öhrvik, J. (1999). Aligned Ranks: A Method of Gaining Efficiency in Rank Tests. van Elteren, P. H. (1960). On the Combination of Independent Two Sample Tests of Wilcoxon. Bulletin of the Institute of International Statistics, 37,

Department Seminar Merck Research Laboratories Jan 10, 2008

Similar presentations

Presentation on theme: "Department Seminar Merck Research Laboratories Jan 10, 2008"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Department Seminar Merck Research Laboratories Jan 10, 2008

Similar presentations

Presentation on theme: "Department Seminar Merck Research Laboratories Jan 10, 2008"— Presentation transcript:

Similar presentations

About project

Feedback