Download presentation
Presentation is loading. Please wait.
1
Biostatistics Case Studies 2007
Session 5: Demonstrating Lack of Treatment Effect: Equivalence or Non-inferiority Peter D. Christenson Biostatistician
2
Terminology Superiority and/or Inferiority Study:
Two or more treatments are assumed equal and the study is designed to find overwhelming evidence of a difference. Usually, one treatment is a control, sham, or placebo. Most common comparative study type. It is rare to assess only one of superiority or inferiority (“one-sided” statistical tests), unless there is biological impossibility of one of them.
3
Terminology Equivalence Study:
Two treatments are assumed to differ and the study is designed to find overwhelming evidence that they are equal. Usually, the quantity of interest is a measure of biological activity or potency and “treatments” are drugs or lots or batches of drugs. AKA, bioequivalence. Sometimes used to compare clinical outcomes for two active treatments, e.g., statins or vaccines, if neither treatment can be considered standard or accepted. This usually requires large numbers of subjects
4
Terminology Non-Inferiority Study:
Usually a new treatment or regimen is compared with an accepted treatment or regimen or standard of care. The new treatment is assumed inferior to the standard and the study is designed to show overwhelming evidence that it is at least nearly as good, i.e., non- inferior. It may has other advantages, e.g., oral vs. inj. A negative inferiority study fails to detect inferiority, but does not necessarily give evidence for non-inferiority. The accepted treatment is usually known to be efficacious already, but an added placebo group may also be used. The distinguishing feature is an attempt to prove negativity, not the one-sidedness of the inference.
5
Case Study
8
pASA+PPI = 1.5% Demonstrate: pclop – pASA+PPI ≤ 4% N=145/group Power=80% for what?
9
Typical Analysis: Inferiority or Superiority
[Not used in this paper] H0: pclop – pASA+PPI = 0% H1: pclop – pASA+PPI ≠ 0% H1 → therapies differ α = 0.05 Power = 80% for Δ=|pclop - pASA+PPI| =? = 95% CI for pclop – pASA+PPI Clop inferior pclop – pASA+PPI Clop superior pclop – pASA+PPI No diff detected* pclop – pASA+PPI * and 80% chance that a Δ of (?) or more would be detected.
10
Typical Analysis: Inferiority or Superiority
[Not used in this paper] H0: pclop – pASA+PPI = 0% H1: pclop – pASA+PPI ≠ 0% H1 → therapies differ α = 0.05 Power = 80% for Δ=|pclop - pASA+PPI| =? Detectable Δ = 5.5%-1.5%=4% So, N=331/group → 80% chance that a Δ of 4% or more would be detected.
11
Typical Analysis: Inferiority or Superiority
[Not used in this paper] H0: pclop – pASA+PPI = 0% H1: pclop – pASA+PPI ≠ 0% H1 → therapies differ α = 0.05 Power = 80% for Δ=|pclop - pASA+PPI| =4% Note that this could be formulated as two one-sided tests (TOST): H0: pclop – pASA+PPI ≤ 0% H1: pclop – pASA+PPI > 0% H1 → clop inferior α = 0.025 Power = 80% for pclop - pASA+PPI =4% H0: pclop – pASA+PPI ≥ 0% H1: pclop – pASA+PPI < 0% H1 → clop superior α = 0.025 Power = 80% for pclop - pASA+PPI =-4%
12
Demonstrating Equivalence
[Not used in this paper] H0: |pclop – pASA+PPI| ≥ E% H1: |pclop – pASA+PPI| < E% H1 → therapies “equivalent”, within E Note that this could be formulated as two one-sided tests (TOST): H0: pclop – pASA+PPI ≤ -4% H1: pclop – pASA+PPI > -4% H1 → clop non-superior α = 0.025 Power = 80% for pclop - pASA+PPI = 0% H0: pclop – pASA+PPI ≥ 4% H1: pclop – pASA+PPI < 4% H1 → clop non-inferior α = 0.025 Power = 80% for pclop - pASA+PPI = 0%
13
Demonstrating Equivalence
H0: |pclop – pASA+PPI | ≥ 4% H1: |pclop – pASA+PPI | < 4% H1 → equivalence α = 0.05 Power = 80% for pclop - pASA+PPI = 0 = 95% CI for pclop – pASA+PPI pclop – pASA+PPI Clop non-superior -4 4 pclop – pASA+PPI Clop non-inferior -4 4 pclop – pASA+PPI Equivalence* -4 4 * both non-superior and non-inferior.
14
This Paper: Inferiority and Non-Inferiority
Apparently, two one-sided tests (TOST), but only one explicitly powered: H0: pclop – pASA+PPI ≤ 0% H1: pclop – pASA+PPI > 0% H1 → clop inferior α = 0.025 Power = 80% for pclop - pASA+PPI = ?% H0: pclop – pASA+PPI ≥ 4% H1: pclop – pASA+PPI < 4% H1 → clop non-inferior α = 0.025 Power = 80% for pclop - pASA+PPI = 0% The authors chose E=4% as the maximum therapy difference that therapies are considered equivalent.
15
This Paper: Inferiority and Non-Inferiority
= 95% CI for pclop – pASA+PPI Decisions: pclop – pASA+PPI Clop inferior -4 4 pclop – pASA+PPI Clop non-inferior -4 4 “Non-clinical” inferiority* pclop – pASA+PPI -4 4 * clop is statistically inferior, but not enough for clinical significance. Observed Results: pclop = 8.6%; pASA+PPI = 0.7%; 95% CI = 3.4 to 12.4 pclop – pASA+PPI Clop inferior -4 4 12
16
Power for Test of Clopidrogrel Non-Inferiority
H0: pclop – pASA+PPI ≥ 4% H1: pclop – pASA+PPI < 4% H1 → clop non-inferior α = 0.025 Power = 80% for pclop - pASA+PPI = 0%
17
Power for Test of Clopidrogrel Inferiority
H0: pclop – pASA+PPI ≤ 0% H1: pclop – pASA+PPI > 0% H1 → clop inferior α = 0.025 Power = 80% for pclop - pASA+PPI = 7.3% Detectable Δ = 8.8%-1.5%=7.3%
18
Conclusions: This Paper
In this paper, clop was so inferior that investigators were apparently lucky to have enough power for detecting it. The CI was too wide with this N for detecting a smaller therapy difference. Investigators justify testing non-inferiority of clop only (and not of Aspirin + Nexium) with the lessened desirability of combination therapy (?). This is a good approach for size and power for a new competing therapy against a standard, if the N for clop inferiority had been considered also. Note that power calculations were based on actual %s of subjects, whereas cumulative 12-month incidence was used in the analysis. There are not power calculations for equivalency tests using survival analysis, that I know of.
19
Conclusions: General “Negligibly inferior” would be a better term than non- inferior. All inference can be based on confidence intervals. Pre-specify the comparisons to be made. Cannot test for both non-inferiority and superiority. Power for only one or for multiple comparisons, e.g., non-inferiority and inferiority. Power can be different for different comparisons. Very careful consideration must be given to choice of margin of equivalence (4% here). The study is worthless if others in the field would find your margin too large.
20
FDA Guidelines http://www.fda.gov/cder/guidance/4155fnl.pdf
FDA has at least 4 major concerns: Need strong evidence that standard treatment is effective. Must have acceptable margin of equivalence that is much smaller than the effect of the standard over placebo. Trial design must be very close to that which established the effectiveness of the standard treatment. Study conduct must be high quality. This sounds like business-speak about “excellence”, but it’s really referring to the fact that superiority studies are by nature conservative: e.g., non-compliance and misclassification bias the results toward no effect. Those flaws in a non-inferiority study have the same bias, making it easier to falsely prove the aim.
21
Appendix: Possible Errors in Study Conclusions
Typical study to demonstrate superiority/inferiority Truth: Study Claims: H0: No Effect H1: Effect No Effect Correct Error (Type II) Specificity Sensitivity Effect Error (Type I) Correct Set α=0.05 Specificity=95% Power: Maximize Choose N for 80%
22
Appendix: Graphical Representation of Power
Typical study to demonstrate superiority/inferiority H0: true effect=0 HA: true effect=3 Effect in study=1.13 N=100 per Group Larger Ns give narrower curves 41% HA H0 5% Effect (Group B mean – Group A mean) \\\ = Probability of concluding HA if H0 is true. /// = Probability of concluding H0 if HA is true. Power=100-41=59% Note greater power if larger N, and/or if true effect>3, and/or less subject heterogeneity.
23
Appendix: Online Study Size / Power Calculator
Does NOT include tests for equivalence or non-inferiority or non-superiority
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.