Download presentation
1
Department Seminar Merck Research Laboratories Jan 10, 2008
Analysis of Stratified Trials – Challenging the “Standard” Methods Devan V. Mehrotra Clinical Biostatistics Department Seminar Merck Research Laboratories Jan 10, 2008
2
Outline Part I: binary response variable > Mantel-Haenszel test
> Minimum risk weights > Simulation results > Conclusions Part II: continuous non-normal response variable > Motivating example > Technical details
3
Analysis of Binary Data
Part I Analysis of Binary Data
4
Stratified Trials with Binary Endpoints
2 treatments (A and B), number of strata = s Binary response (responder/non-responder) pij = true (population) proportion for strat i, trt j i = piA - piB = true difference for strat i fi = true (population) relative frequency for strat i = true overall difference = observed proportion for strat i, trt j nij = observed number of subjects in strat i, trt j
5
Hypothesis Testing: General Framework Superiority or Non-Inferiority Trials
6
Mantel-Haenszel Test (1959) Superiority Trials
Note: MH test is optimal is constant across strata.
7
Choice of Variance Null variance [Miettinen & Nurminen 1985, Farrington & Manning, 1990] m.l.e. of under the restriction Note: MH test uses the null variance. Observed (OBS) variance Note: With 1:1 randomization, for superiority trials, and usually so (but not always) for non-inferiority trials.
8
(pA, pB) pairs where Null or Observed Variance is “Better” Non-Inferiority Margin = 15%
9
(pA, pB) pairs where Null or Observed Variance is “Better” Non-Inferiority Margin = 5%
10
>> Estimator of is ~ unbiased.
Choice of Weights Cochran-Mantel-Haenszel (CMH) weights >> Estimator of is ~ unbiased. Minimum Risk (MR) weights [Mehrotra & Railkar, 2000] >> Estimator of has smallest mean squared error. >> If (optimal weights!)
11
Choice of Finite Sample Term
With CMH weights (i.e., with MH test): is used. With MR weights: is recommended.
12
Choice of Continuity Correction
With CMH weights: is used by original MH test. However, is a less conservative choice. With MR weights: is recommended. See Mehrotra & Railkar, Stats in Med, 2000
13
Motivating Example Revisited Test for Superiority
14
Simulation Results Test for Superiority (2 strata)
(f1 = .7, f2 = .3); No TxS interaction on (a) logit, (b) proportion, (c ) square root, and (d) log scales; 100,000 simulations.
15
Illustrative Example # 2 Test for Non-Inferiority
16
Simulation Results Test for Non-inferiority (2 strata)
17
Simulation Results: Power Test for Non-inferiority (2 strata)
18
Summary (Part I) For stratified trials with binary responses:
The popular Mantel-Haenszel test uses sample size (CMH) weights with null variances. It has good power properties if and only if the odds ratio is constant across strata. Using minimum risk (MR) weights with observed (OBS) variances will usually provide notably more power than CMH weights with null variances for both superiority and non-inferiority trials. Recommendation: consider MR_OBS as a default, but use simulations to quantify power differences between methods when planning a new trial.
19
Analysis of Continuous Data Using Ranks
Part II Analysis of Continuous Data Using Ranks
20
Stratum Placebo Vaccine Females 3.90, 3.96 1.40, 2.80 2.90 Males
Motivating Example Hypothetical viral loads of HIV+ subjects (log10 copies/ml) Stratum Placebo Vaccine Females 3.90, 3.96 1.40, 2.80 2.90 Males 3.50, 3.50 3.56, 3.59 3.69, 3.85 4.06, 4.36 4.36, 4.43 4.68, 4.69 4.70, 4.85 5.06, 5.50 1.79, 2.32 2.54, 3.42 3.59, 3.89 4.64, 5.23 5.32
21
Motivating Example (continued)
Observed viral load summaries (log10 copies/ml): Compared to placebo, the VLs for vaccine appear to be “shifted” to the left (i.e., are numerically smaller). Is the shift statistically significant? Stratum Summary Placebo Vaccine Females Mean Median SD n 3.93 0.04 2 2.37 2.80 0.84 3 Males 4.27 4.36 0.62 16 3.64 3.59 1.27 9
22
Motivating Example (continued)
Stratified rank-based analysis: SAS implementation PROC FREQ; TABLES gender * trt * vload/CMH SCORES=RANK; RUN; TABLES gender * trt * vload/CMH SCORES=MODRIDIT; PROC TWOSAMPL; [Part of PROC StatXact module] WI/AS; PO trt; RE vload; ST gender;
23
Motivating Example (continued)
2-tailed p-values using the three “methods”: Different conclusions at =.05 … why? PROC FREQ > Ranks based on pooled sample within each stratum (“stratum-specific” ranks) > SCORES = RANK equal stratum weights SCORES = MODRIDIT unequal stratum weights PROC TWOSAMPL: Ranks based on overall pooled sample, ignoring strata (“stratum-invariant” ranks), with equal stratum weights. PROC FREQRANK PROC FREQMODRIDIT PROC TWOSAMPL p = .1506 p = .0642 p = .0436*
24
Technical Details
25
Technical Details (continued)
26
Technical Details (continued)
Three Popular Rank-Based Tests Test Stratum weights Comments TEQ wi = 1 PROC FREQ SCORES = RANK Stratum-specific ranks TvE wi = 1/(ni + 1) SCORES = MODRIDIT van Elteren test (1960) PROC TWOSAMPL Stratum-invariant ranks
27
Technical Details (continued)
If there is no true treatment by stratum interaction (i = for all i), the van Elteren test is optimal among all the stratified test, i.e., wi = 1/(ni + 1) are optimal weights. However, if interaction exists, the van Elteren test can suffer from a power loss. In general, is there an asymptotically optimal test (with optimal weights) that allows for interaction? YES … we derived it , based on stratum-specific ranks.
28
Technical Details (continued)
29
Technical Details (continued)
30
Technical Details (continued)
31
Technical Details (continued)
32
Technical Details (continued)
33
Technical Details (continued)
34
Estimate and 100(1-)% CI for Obtained by Inverting the Given Test
Technical Details (continued) Estimate and 100(1-)% CI for Obtained by Inverting the Given Test Let Let p(c) = 1-tailed p-value for test applied to Obtained via a numerical search.
35
Motivating Example Revisited 2-tailed p-values
TEQ .1505 [with stratum-invariant ranks] .0435* TvE van Elteren test, .0643 TBrunner described on slide 29 .0250* .0990 Tadap .0654 Talign Aligned rank test Note: All methods except use stratum-specific ranks
36
Motivating Example Revisited Estimates and 95% CIs for (selected methods)
Stratum Summary Placebo Vaccine P - V Females Median n 3.93 2 2.80 3 1.13 Males 4.36 16 3.59 9 0.77 Method: TEQ TvE Talign p-value .1506 .0435 .0643 .0654 Estimate .80 0.94 1.00 .84 95% CI (-0.28, 1.61) (.01, 1.71) (-.04, 1.61) (-.09, 1.53)
37
Simulation Study 2 treatments, 1:1 randomization per stratum
Number of strata = 2, 4, 6, 8, 10, and 12 Stratum size (ni): 10*i for stratum i Different choices of i: constant for each stratum (no TxS interaction) positively or negatively associated with stratum size (TxS interaction, with 50% power to detect it) Four different distributions for Y: Normal Log Normal Mixture of Normals: 0.9N(m,v) + 0.1N(m*,v*) t3
38
Simulation Results Type I Error Rate (nominal = 5%) Normal Distribution
Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)
39
Simulation Results Type I Error Rate (nominal = 5%) Lognormal Distribution
Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)
40
Simulation Results Type I Error Rate (nominal = 5%) Mixture of Normals Distribution
Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)
41
Simulation Results Type I Error Rate (nominal = 5%) t3 Distribution
Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)
42
Simulation Results: Power (%) No T x S interaction (constant
43
Simulation Results: Power (%) No T x S interaction
44
Simulation Results: Power (%) Normal Distribution
45
Simulation Results: Power (%) Lognormal Distribution
46
Simulation Results: Power (%) Mixture of Normals
47
Simulation Results: Power (%) t3 Distribution
48
Conclusions (Part II) For rank-based analyses of stratified trials:
> No single method is uniformly the best Recommendation: use the aligned rank test (Talign) or either of the proposed adaptive tests (Tadap1 or Tadap2). Both tests were more powerful than the van Elteren test (TvE) in every case studied, notably so when there was a true (but hard to detect) T x S interaction. > It is time to retire the popular van Elteren test!
49
References Brunner, E., Puri, M. L., and Sun, S. (1995). Nonparametric Methods for Stratified Two-Sample Designs with Application to Multiclinic Trials. Journal of American Statistical Association, 90, Hodges, J. L. and Lehman, E. C. (1962). Rank Methods for Combination of Independent Experiments in the Analysis of Variance. Annals of Mathematical Statistics, 33, Mehrotra, D.V. and Railkar, R. (2000). Minimum Risk Weights for Comparing Treatments in Stratified Binomial Trials. Statistics in Medicine, 19, Wang, W., Mehrotra, D.V., Chan, I.S.F. and Heyse, J.F. (2006). Non-Inferiority /Equivalence Trials in Vaccine Development. Journal of Biopharmaceutical Statistics, 16, Öhrvik, J. (1999). Aligned Ranks: A Method of Gaining Efficiency in Rank Tests. van Elteren, P. H. (1960). On the Combination of Independent Two Sample Tests of Wilcoxon. Bulletin of the Institute of International Statistics, 37,
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.