Download presentation
Presentation is loading. Please wait.
Published byUrsula Sherman Modified over 8 years ago
1
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician http://gcrc.LAbiomed.org/Biostat Session 1: Demonstrating Equivalence of Active Treatments: Non-inferiority Studies
2
Terminology Superiority and/or Inferiority Study: Two or more treatments are assumed equal and the study is designed to find overwhelming evidence of a difference. Usually, one treatment is a control, sham, or placebo. Most common comparative study type. It is rare to assess only one of superiority or inferiority (“one-sided” statistical tests), unless there is biological impossibility of one of them.
3
Terminology Equivalence Study: Two treatments are assumed to differ and the study is designed to find overwhelming evidence that they are equal. Usually, the quantity of interest is a measure of biological activity or potency and “treatments” are drugs or lots or batches of drugs. AKA, bioequivalence. Sometimes used to compare clinical outcomes for two active treatments, e.g., statins or vaccines, if neither treatment can be considered standard or accepted. This usually requires large numbers of subjects
4
Terminology Non-Inferiority Study: Usually a new treatment or regimen is compared with an accepted treatment or regimen or standard of care. The new treatment is assumed inferior to the standard and the study is designed to show overwhelming evidence that it is at least nearly as good, i.e., non- inferior. It usually has other advantages, e.g., oral vs. inj. A negative inferiority study fails to detect inferiority, but does not necessarily give evidence for non-inferiority. The accepted treatment is usually known to be efficacious already, but an added placebo group may also be used. The distinguishing feature is an attempt to prove negativity, not the one-sidedness of the inference.
5
Case Study Ophthalmology 2006; 113:70-76.
6
Abstract
7
Primary Outcome and Study Size Study Size - Page 72 bottom of column 1: Primary Outcome - Page 72 middle of column 1: Needs Consensus PI’s Gamble
8
Making Inference from Study Results Regardless of study aim – to prove treatments equivalent or to prove them different - inference can be based on: Primary Outcome: Δ = Δ u – Δ f, where Δ f = mean IOP reduction with fixed therapy Typical superiority/inferiority study: Compare to 0. Non-inferiority study: Compare to E, a pre-specified margin of equivalence (1.5 here). = 95% CI for Δ u – Δ f = “non-ruled-out values for Δ”
9
Typical Analysis: Inferiority or Superiority H 0 : Δ u – Δ f = 0 H 1 : Δ u – Δ f ≠ 0 Aim: H 1 → therapies differ α = 0.05 & N=2194 Power = 80% for: Δ= Δ u – Δ f ≥1 or ≤-1 Fixed is inferior = 95% CI for Δ u – Δ f = “non-ruled-out values for Δ” Fixed is superior 0 0 Δ u – Δ f [Not used in this paper] 0 No difference detected Δ u – Δ f
10
Typical Analysis: Inferiority or Superiority [Not used in this paper] Subjects needed to detect treatment differences: Note that this uses α=0.025 to test only inferiority or superiority.
11
Typical Analysis: Inferiority Only H 0 : Δ u – Δ f ≤ 0 H 1 : Δ u – Δ f > 0 Aim: H 1 → fixed is inferior α = 0.025 & N=2194 Power = 80% for: Δ= Δ u – Δ f >1 Fixed is inferior = 95% CI for Δ u – Δ f = “non-ruled-out values for Δ” 0 0 Δ u – Δ f [Not used in this paper] 0 Inferiority not detected Δ u – Δ f ( α = 0.05 → N=2153 )
12
Non-Inferiority H 0 : Δ u – Δ f ≥ 1.5 H 1 : Δ u – Δ f < 1.5 Aim: H 1 → fixed is non-inferior α = 0.025 & N=2194 Power = 80% for: Δ= Δ u – Δ f = 0.5 Fixed is non-inferior = 95% CI for Δ u – Δ f = “non-ruled-out values for Δ” 0 0 Δ u – Δ f [As in this paper] 0 Non-Inferiority not detected Δ u – Δ f 1.5 Fixed is non-inferior (cannot claim superior)
13
Non-Inferiority [As in this paper] Subjects needed to detect treatment differences: PI’s Gamble Needs Consensus
14
Inferiority and Non-Inferiority The authors used α=0.025 for non-inferiority, so perhaps allowance for two tests (0.05 total) was made, but only one explicitly powered: α = 0.025 Power = 80% for Δ= Δ u – Δ f ≥ 1.5 α = 0.025 Power = 80% for Δ= Δ u – Δ f = 0.5 H 0 : Δ u – Δ f ≥ 1.5 H 1 : Δ u – Δ f < 1.5 Aim: H 1 → fixed is non-inferior H 0 : Δ u – Δ f ≤ 0 H 1 : Δ u – Δ f > 0 Aim: H 1 → fixed is inferior
15
Inferiority and Non-Inferiority Fixed is non-inferior = 95% CI for Δ u – Δ f = “non-ruled-out values for Δ” 0 0 0 Neither is detected Δ u – Δ f 1.5 Fixed is inferior 01.5 Fixed is “non-clinically” inferior Δ u = 9.0 Δ f = 8.7 Δ = 0.3 95% CI = -0.1 to 0.7 Observed Results: Fixed is non-inferior 01.5
16
Conclusions: General “Negligibly inferior” would be a better term than non- inferior. All inference can be based on confidence intervals. Pre-specify the comparisons to be made. Cannot test for both non-inferiority and superiority. Power for only one or for multiple comparisons, e.g., non-inferiority and inferiority. Power can be different for different comparisons. Very careful consideration must be given to choice of margin of equivalence (1.5 here). You can be risky and gamble on what expected differences will be (0.5 here), but the study is worthless if others in the field would find your margin too large.
17
FDA Guidelines http://www.fda.gov/cder/guidance/4155fnl.pdf FDA has at least 4 major concerns: 1.Need strong evidence that standard treatment is effective. 2.Must have acceptable margin of equivalence that is much smaller than the effect of the standard over placebo. 3.Trial design must be very close to that which established the effectiveness of the standard treatment. 4.Study conduct must be high quality. This sounds like business-speak about “excellence”, but it’s really referring to the fact that superiority studies are by nature conservative: e.g., non-compliance and misclassification bias the results toward no effect. Those flaws in a non-inferiority study have the same bias, making it easier to falsely prove the aim.
18
Other Statistical Issues in this Paper Regression to the mean from screening to baseline visit is common. This study avoided it. See p 71, middle of 2 nd column: Differences among centers were accounted for by analysis of covariance and Cochran-Mantel- Haenszel. ITT with last-value-carried-forward was used. This causes a bias toward non-inferiority. Note correct use of SD and SE in Table 3, but technically incorrect p-values that refer to the typical test for differences (which should not be done), not non-inferiority. Way too many p-values for secondary efficacy analyses in column 2 of p 73. Should be descriptive.
19
Self-Quiz 1.Give an example in your specialty area for a superiority /inferiority study. Now modify it to an equivalence study. Now modify it to a non-inferiority study. 2.T or F: The main point about non-inferiority studies is that we are asking whether a treatment is as good or better vs. worse than another treatment, so it uses a one-sided test. 3.Power for a typical superiority test is the likelihood that you will declare treatment differences (p<0.05) if treatments really differ by some magnitude Δ. Explain what power means for a non-inferiority study. 4.T or F: Last-value-carried-forward is a good way to handle drop-outs in a non-inferiority study. Explain. continued
20
Self-Quiz 5.Many comparative studies have an evaluator who is masked (blinded) as to subjects’ treatment, especially for subjective outcomes, to prevent bias. Explain how such an evaluator in a non-inferiority study has the power to completely bias the results to prove the aim, if the outcome is a final value rather than change score. 6.T or F: In a non-inferiority study, you should first test for non- inferiority with a confidence interval, and then use a t-test to test for superiority, but only if non-inferiority was established at the first step. 7.What is the meaning of the equivalence margin, and how do you determine it? 8.What do you conclude if the CI for treatment difference does not lie to the left of the equivalence margin? continued
21
Self-Quiz 9.Suppose the primary outcome for a study is a serum inflammatory marker. If it’s assay is poor (low reproducibility), then it is more difficult to find treatment differences in a typical superiority/inferiority study than for a better assay, due to this noise. Would it be easier or more difficult to find non-inferiority with this assay, compared to a better assay? 10.Does the assumed treatment difference (0.5 here) for power calculations have the same meaning as the difference used for power calculations in a typical superiority/inferiority study?
22
Appendix: Possible Errors in Study Conclusions Truth: H 0 : No EffectH 1 : Effect No Effect Effect Study Claims: Correct Error (Type I) Error (Type II) Power: Maximize. Typically Choose N for 80% Set α=0.05 Specificity=95% Specificity Sensitivity Typical study to demonstrate superiority/inferiority
23
Appendix: Graphical Representation of Power H0H0 H1H1 H 0 : true effect=0 H 1 : true effect=3 Effect in study=1.13 \\\ = False Positive: Probability of concluding H 1 if H 0 is true. 41% 5% Effect (Group B mean – Group A mean) /// = False Neg: Prob of concluding H 0 if H 1 is true. Power=100-41=59%. Note greater power if larger N, and/or if true effect>3, and/or less subject heterogeneity. N=100 per Group Larger Ns give narrower curves Typical study to demonstrate superiority/inferiority Choose H 1 Choose H 0
24
www.stat.uiowa.edu/~rlenth/Power Appendix: Online Study Size / Power Calculator Does NOT include tests for equivalence or non- inferiority or non- superiority
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.