© 2008 Millennium Pharmaceuticals Inc., The Takeda Oncology Company Type I error control using law of iterated logarithm in cumulative meta-analysis Mingxiu Hu, Ph.D. Head of Biostatistics Millennium Pharmaceuticals/The Takeda Oncology Company Collaborators: Gordon Lan and Joseph Cappelleri Midwest Biopharmaceutical Statistics Workshop May 19, 2009
Outline ▐ Meta-analysis vs. cumulative meta- analysis ▐ Key challenge with conventional methods ▐ Law of Iterated Logarithm ▐ Simulation scope and results ▐ Summary
Meta-Analysis Statistical analysis of data from multiple studies Synthesize and summarize results, especially useful for rare events in safety analyses Integrated safety analyses Integrated efficacy analyses Quantify sources of possible heterogeneity & bias
Meta-Analysis vs. Mega-Analysis Meta-analysis: Obtain one estimate from each study and then combine the estimates to obtain an overall estimate via weighted average Mega-analysis: lump all data from different studies together to obtain one estimate. Treat patients from different studies as if they were from the same study: Ignore between-study variation Different studies may have different effects
A Reason for Not Simply Lumping Results Meta analysis: RR=0.5 Mega analysis: RR=0.39
Statistical Models of Pooling in Meta-Analysis Fixed Effect Model –assumes a common treatment effect –weights studies by the inverse of the within-study (sampling) variance Random Effect Model – allows for different treatment effects. –weights studies by the inverse of the total variance (sum of within- study variation and between-study variation) – tends to be more conservative (gives broader confidence interval)
Famous Meta-Analysis Nissen and Wolski (NEJM, 2007) on GSK’s Avandia: Methods We conducted searches of the published literature, the Web site of the Food and Drug Administration, and a clinical-trials registry maintained by the drug manufacturer (GlaxoSmithKline). Criteria for inclusion in our meta-analysis included a study duration of more than 24 weeks, the use of a randomized control group not receiving rosiglitazone, and the availability of outcome data for myocardial infarction and death from cardiovascular causes. Of 116 potentially relevant studies, 42 trials met the inclusion criteria. We tabulated all occurrences of myocardial infarction and death from cardiovascular causes. Results Data were combined by means of a fixed-effects model. In the 42 trials, the mean age of the subjects was approximately 56 years, and the mean baseline glycated hemoglobin level was approximately 8.2%. In the rosiglitazone group, as compared with the control group, the odds ratio for myocardial infarction was 1.43 (95% confidence interval [CI], 1.03 to 1.98; P=0.03), and the odds ratio for death from cardiovascular causes was 1.64 (95% CI, 0.98 to 2.74; P=0.06). Conclusions Rosiglitazone was associated with a significant increase in the risk of myocardial infarction and with an increase in the risk of death from cardiovascular causes that had borderline significance. Our study was limited by a lack of access to original source data, which would have enabled time-to-event analysis. Despite these limitations, patients and providers should consider the potential for serious adverse cardiovascular effects of treatment with rosiglitazone for type 2 diabetes.
Famous Meta-Analysis Summary on Nissen & Wolski example: ▐ Drug: Avandia (generic name: Rosiglitazone) ▐ Data source: FDA website ▐ Number of trials: 42 ▐ Method: meta-analysis ▐ Statistical model: fixed-effects ▐ Primary endpoints: Myocardial infarction (MI), death from cardiovascular causes ▐ Results: ▌ Odds ratio for MI =1.43, p=0.03 ▌ Odds ratio for death =1.64, p=0.06 ▐ Conclusion: Avandia associated with increased cardiovascular events ▐ Impact: Multi-billion dollar drop in market cap ▐ Statistical controversy: probably not significant if random-effects models are used
Famous Meta-Analysis Juni et al. (Lancet, 2004) on Merck’s Vioxx:
Famous Meta-Analysis Summary on Juni’s example: ▐ Drug: Vioxx (generic name: refecoxib) ▐ Data source: FDA ▐ Number of trials: 18 RCTs and 11 observational studies ▐ Method: Cumulative meta-analysis ▐ Objective: When did Vioxx’s CV risk become evident ▐ Statistical model: random-effects ▐ Primary endpoints: Myocardial infarction (MI) ▐ Results: ▌ RR=2.30 with p=0.010 at the end of 2000 ▌ RR=2.24 with p=0.007 a year later ▐ Conclusion: Vioxx should have been withdrawn several years earlier ▐ Impact: legal implications ▐ Statistical controversy: no adjustment for repeated testing
Cumulative Meta-Analysis (chronologically ordered RCTs) ▐ Conduct a new statistical pooling every time a new trial or a set of new trials become available ▐ Performed retrospectively, to identify the year when sufficient evidence had been accumulated to show a treatment was effective or toxic ▐ Performed prospectively, effective treatment or toxicity may be identified at the earliest possible moment ▐ Reveals (temporal) trend towards superiority of the treatment or the control, or indifference
Example of Cumulative Meta-Analysis (Lau et al. NEJM 1992)
What are the key challenges ▐ How to control overall type I error for repeated testing ▐ It does not fit into the conventional group sequential framework due to the heterogeneity between studies ▌ May spread out for a long period of time ▌ Patient population may not be identical ▌ Medical technology change ▐ Not know in advance how many tests we will have, which makes multiple comparison methods hard to apply (also due to the complexity of correlations between tests) ▐ Unreliable between-study variance estimation especially at the beginning of the testing process when we only have a small number of studies
What are the key challenges Overall type I error rates of conventional methods and group sequential methods (nominal =0.025, =0.4314, =1) based on 100,000 simulation replications. Two sample continuous case Average Sample Size Maximum Number of tests Traditional CMA method O’Brien Fleming Boundary Pocock Boundary LIL Method
Law of Iterated Logarithm: Motivation ▐ X i ~ iid N(µ, 2 =1) ▐ H 0 : µ = 0 vs. H 1 : µ > 0 ▐ S n = X 1 + … + X n ~ N(0, n) under H 0 ▌ standardized test statistic Z n = S n / n ~ N(µ=0,1) under H 0 ▌ each Z n “practically bounded,” i.e., P(Z n > z | H 0 ) = ▐ Infinite sequence {Z 1, … Z n, …} is not “practically bounded” ▌ P(Z n > C, for some n | H 0 ) = 1 for any C, no matter how large ▐ Implication to CMA: Under the null, the null will be rejected eventually with probability of 1 by repeated testing
Law of Iterated Logarithm: Motivation ▐ LIL states that Modified test statistic Z* n = Therefore, for any given (no matter how small), we can find a constant C such that Pr (Z* n >C for some n) Sequence {Z* n } is bounded in probability
CMA Tests Based on LIL: One-Sample Continuous Case ▐ In order to make C=Z , i.e., Pr (Z*(n)>Z , for some n)= , we replace “2” in the denominator of Z * n by an adjusting factor to obtain test statistic for cumulative meta-analysis When we conduct group sequential analyses, at the k-th inspection, we will use test statistics n ck = cumulative number of patients at the k th inspection S(k) = sum up to the k th inspection = correction factor to be determined by simulations
CMA Tests Based on LIL: Extension to General Case ▐ In general, LIL-based cumulative standard test statistic at k th inspection (replace n ck by I ck ) = correction factor to control alpha level weighted sum of treatment effects cumulative information up to the kth inspection
CMA Tests Based on LIL: Information ▐ Two sample mean difference (continuous): = 0 for fixed effects model
CMA Tests Based on LIL: Information ▐ Two sample odds ratio (binary) ▐ Two sample relative risk (binary) ▐ Two sample risk difference (binary)
CMA Tests Based on LIL: Between-Study Variance ▐ Traditional between-study variance estimates: ▐ This can be negative and unstable when the number of studies is small. For the first 5 studies, use a conservative estimate:
CMA Tests Based on LIL: Correction factor ▐ Two sample mean difference (continuous): =2 ▐ Two sample odds ratio (binary): =2 ▐ Two sample relative risk (binary): =2 ▐ Two sample risk difference (binary): =1.5 ▐ These values work for (one-sided) or 0.05 (two-sided). ▐ For smaller type I error rates such as =0.01, the adjusting factors need to be increased slightly (our paper provided a formula which gives a rough but practical estimate. Can also get it through simulation)
Simulation Scope: What have been Evaluated ▐ Maximum number of inspections: 5, 10, 15, 20, 25, 50, 100 ▐ Average number of subjects per study: ▐ Ratio of two group within-study variance varying between [1/4, 4] and one of them is simulated from 2 (5)/5 ▐ Ratio of between-study SD to within-study SD: 0.1 to 10 ▐ In the discrete case, P~N(P 0, 2 ) and P 0 varies from 0.05 to 0.9 ▐ Sample size allocation to 2 groups: 40% to 60% ▐ Number of studies per inspection: single or multiple (Poison (1.5))
Simulation Results: A Taste of the Flavor-Alpha Sample Size Number of maximum tests Traditional CMA method O’Brien Fleming Boundary Pocock Boundary LIL Method Comparison of overall type I error rates (nominal =0.025, =0.4314, / ~[0.2, 5],E 2 =1) based on 100,000 simulation replications. Two sample continuous case For the LIL-based method, the type I error rates are not appreciably different as the number of inspections increases further to 50 and 100 (lnln(n) kicks in when n gets large)
Simulation Results: A Taste of the Flavor-Power Comparison of power ( =0.025, =0.4314, / ~[0.2, 5],E 2 =1) based on 100,000 simulation replications. Treatment Difference Average Study Size Number of Inspections Power* LIL MethodTraditional Method * Not really comparable because different testing size
Random Effects Cumulative Meta-Analysis for Stroke Example (Single Study per Inspection): Standardized Test Statistics Example
Summary and Discussion ▐ Meta-analysis is naturally useful for safety analysis, especially for rare AEs, which need extremely large sample size to reach conclusions ▐ Type I error cannot be controlled by traditional group sequential methods because of the between-study variations and unknown maximum information ▐ General multiple comparison methods do not apply because we do not know in advance how many tests that will be conducted
Summary and Discussion ▐ LIL-based method with a tailor-made between-study variance estimate at the beginning of a testing process controls type I error for a broad range of practical situations ▐ A universal adjusting factor may be conservative when the maximum number of inspections are small. If one knows the maximum number of inspections in advance or in retrospective meta-analyses, we may reduce to increase power (if necessary) using simulations ▐ In meta-analyses, power is usually lesser a concern
References ▐ Hu, Cappelleri, and Lan, Clinical Trials 2007, ▐ Lan, Hu, and Cappelleri, Statistica Sinica, 2003, ▐ Berkey et al. Controlled Clinical Trials 1996;17: ▐ Pogue, Yusuf. Controlled Clinical Trials 1997;18: ▐ Whitehead. Statistics in Medicine 1997;16: