Biostatistics in Practice Peter D. Christenson Biostatistician Session 6: Data and Analyses: Too Little or Too Much.

Biostatistics in Practice Peter D. Christenson Biostatistician http://gcrc.LABioMed.org/Biostat Session 6: Data and Analyses: Too Little or Too Much

Too Little Too few subjects: study not sufficiently powered (Session 4) A biasing characteristic not measured: attributability of effects questionable (Session 5) Subjects do not complete study, or do not comply, e.g., take all doses (This session) “Too Much” All subjects, not a sample (This session) Irrelevant detectability (This session) Too Little or Too Much: Data

Too Few: Miss an Effect Too Many: Spurious Results Numerous analyses due to: Multiple possible outcomes. Ongoing analyses as more subjects accrue. Many potential subgroups. Too Little or Too Much: Analyses

Non-Completing or Non-Complying Subjects

All Study Subjects or “Appropriate” Subset What is the most relevant group of studied subjects: all randomized, or mostly compliant, or completed study, or …?

Study Goal: Scientific effect? Societal impact? Potential Biased Conclusions: Why not completed? Study arms equivalent? Criteria for Appropriate Subset Primarily Compliance Primarily Dropout

Possible Study Populations Per-Protocol Subjects: Had all measurements, visits, doses, etc. “Modified”: relaxations, e.g., 85% of doses. Emphasis on scientific effect. Intention-to-Treat Subjects: Everyone who was randomized. “Modified”: slight relaxations, e.g., ≥ 1 dose. Emphasis on non-biased policy conclusion.

Possible Bias Using Only Completers Comparison: % cured, placebo vs. treated. Many more placebo subjects are not curing and go elsewhere; do not complete study. Cure rate is biased upward in placebo completers. Conclude treatment not as good as it really is. Other scenarios?

Intention-to-Treat (ITT) ITT specifies the population; it includes non- completers. Still need to define outcomes for non- completers, i.e., “impute” values. Example from last slide: Typical to define non-completers as not cured.

ITT: Two Ways to Impute Unknown Values Change from Baseline Baseline Final Visit Intermediate Visit 0 Change from Baseline Intermediate Visit Final VisitBaseline 0 LOCF: Ignore Presumed Progression LRCF: Maintain Expected Relative Progression Individual Subjects Ranks Observations

“Too Much” Data

All Possible Data, No Sample “Too much” data to need probabilistic statements; already have the whole truth. Not always as obvious as it sounds. Examples: EMT records, some chart reviews; site-specific, not samples. Confidence intervals usually irrelevant. Reference ranges, some non - generalizable comparisons may be valid.

Irrelevant (?) Detectability with Large Study Significant differences (p<0.05) in %s between placebo and treatment groups: N/Group Difference #Treated* to Cure 1 100 50% vs. 63.7% 7 1000 50% vs. 54.4% 23 5000 50% vs. 52.0% 50 10000 50% vs. 51.4% 71 50000 50% vs. 50.6% 167 *NNT = Number Needed to Treat = 100/Δ

Too Little or Too Much: Analyses

Multiple: Outcomes Subgroups Ongoing effects Exploring vs. Proving

Balance Between Missing an Effect and Spurious Results Food Additives and Hyperactivity Study: Uses composite score. Many other indicators of hyperactivity. Multiple Outcomes

GHA: Global Hyperactivity Aggregate Teacher ADHD Parent ADHD Class ADHD Conner … … … … 10 Items 12 Items 4 Items Could perform: 10 + 10 + 12 + 4 = 36 item analyses.

pp. 1667-69 Editorial: Multiple Subgroup Analyses: Example

Comparing Two Treatments in 25 Subgroups + Overall Multiple Subgroup Analyses: Example

Multiple Subgroup Analyses Lagakos NEJM 354(16):1667-1669. False Positive Conclusions 72% chance of claiming at least one false effect with 25 comparisons Next Slide

A Correction for Multiple Analyses No Correction: If using p<0.05, then P[correct neg conclusion] = 0.95. If 25 comparisons are independent, P[no false pos] = P[all correct neg] = (1-0.05) 25 = (0.95) 25 = 0.28. So, P[at least 1 false pos] = 1 - 0.28 = 0.72. Bonferroni Correction: To maintain P[no false pos in k tests] = 0.95 = (1-p*) k, need to use p* = 1 - (0.95) 1/k ≈ 0.05/k So, use p<0.05/k to maintain <5% overall false positive rate.

Some formal corrections “built-in” to p-values: Bonferroni: general purpose Tukey: for pairs of group means, >2 groups Dunnett: for means of 1 control group vs. each of ≥2 treatment groups Accounting for Multiple Analyses Formal corrections not necessary: Transparency of what was done is most important. Should be aware yourself of number of analyses and report it with any conclusions.

Cohan, Crit Care Med 33(10):2358-2366. Reporting Multiple Analyses Clopidogrel paper 4 slides back: No p-values or probabilistic conclusions for 25 subgroups, and: Another paper’s transparency:

Multiple Mid-Study Analyses Should effects be monitored as more and more subjects complete? Some mid-study analyses: Interim analyses Study size re-evaluation Feasibility analyses

Mid-Study Analyses Effect 0 Number of Subjects Enrolled Time → Too many analyses Wrong early conclusion Need to monitor, but also account for many analyses

Mid-Study Analyses Mid-study comparisons should not be made before study completion unless planned for (interim analyses). Early comparisons are unstable, and can invalidate final comparisons. Interim analyses are planned comparisons at specific times, usually by an unmasked advisory board. They allow stopping the study early due to very dramatic effects, and final comparisons, if study continues, are adjusted to validly account for “peeking”. Continued …

Mid-Study Analyses Mid-study reassessment of study size is advised for long studies. Only standard deviations to date, not effects themselves, are used to assess original design assumptions. Feasibility analysis: –may use the assessment noted above to decide whether to continue the study. –may measure effects, like interim analyses, by unmasked advisors, to project ahead on the likelihood of finding effects at the planned end of study. Continued …

Mid-Study Analyses Study 1: Groups do not differ; plan to add more subjects. Consequence → final p-value not valid; probability requires no prior knowledge of effect. Study 2: Groups differ significantly; plan to stop study. Consequence → use of this p-value not valid; the probability requires incorporating later comparison. Examples: Studies at Harbor Randomized; not masked; data available to PI. Compared treatment groups repeatedly, as more subjects were enrolled.

Conclusions: Bad Science That Seems So Good 1.Re-examining data, or using many outcomes, seeming to be due diligence. 2.Adding subjects to a study that is showing marginal effects; stopping early due to strong results. 3.Looking for effects in many subgroups. Actually bad? Could be negligent NOT to do these, but need to account for doing them.

Course Over? Already? Nils Simonson, in Furberg & Furberg, Evaluating Clinical Research

Biostatistics in Practice Peter D. Christenson Biostatistician Session 6: Data and Analyses: Too Little or Too Much.

Similar presentations

Presentation on theme: "Biostatistics in Practice Peter D. Christenson Biostatistician Session 6: Data and Analyses: Too Little or Too Much."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Biostatistics in Practice Peter D. Christenson Biostatistician Session 6: Data and Analyses: Too Little or Too Much.

Similar presentations

Presentation on theme: "Biostatistics in Practice Peter D. Christenson Biostatistician Session 6: Data and Analyses: Too Little or Too Much."— Presentation transcript:

Similar presentations

About project

Feedback