Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Principles for Clinical Research Sponsored by: NIH General Clinical Research Center Los Angeles Biomedical Research Institute at Harbor-UCLA.

Similar presentations


Presentation on theme: "Statistical Principles for Clinical Research Sponsored by: NIH General Clinical Research Center Los Angeles Biomedical Research Institute at Harbor-UCLA."— Presentation transcript:

1 Statistical Principles for Clinical Research Sponsored by: NIH General Clinical Research Center Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center November 1, 2007 Peter D. Christenson Conducting Clinical Trials 2007

2 Speaker Disclosure Statement The speaker has no financial relationships relevant to this presentation.

3 Recommended Textbook: Making Inference Design issues Biases How to read papers Meta-analyses Dropouts Non-mathematical Many examples

4 Example: Harbor Study Protocol 18 Pages of Background and Significance, Preliminary Studies, and Research Design and Methods. Then: “Pearson correlation, repeated measure of the general linear model, ANOVA analyses and student t tests will be used where appropriate. … The [two] main parameters of interest will be … [A and B. For A, using a t-test] 40 subjects provide 80% assurance that a XX reduction … will be detected, with p<0.05. Similar comparisons as for … [A and B] will be carried out …”

5 Example: Harbor Study Protocol The good …. “The [two] main parameters of interest will be … [A and B. For A, using a t-test,] 40 subjects provide 80% assurance that a XX reduction … will be detected, with p<0.05.” Because: Explicit: Specifies primary outcome of interest. Explicit: Justification for # of subjects.

6 Example: Harbor Study Protocol … the Bad … “Pearson correlation, repeated measure of the general linear model, ANOVA analyses and student t tests will be used where appropriate. …” Because: Boilerplate. These methods are almost always used. “Where appropriate”? Tries to satisfy reviewer, not science.

7 Example: Harbor Study Protocol … and the Ugly. “Similar comparisons as for … [A and B] will be carried out …” Because: 1º OK: Diff b/w 2 visits for 2 measures, A & B. But, 15 measures taken at each of 19 visits. Torture the data long enough, and it will confess to something.

8 Goals of this Presentation More good. Less bad. Less ugly.

9 Biostatistical Involvement in Studies Off-site statistical design and analysis Multicenter studies; data coordinating center. In house drug company statisticians. CRO through NIH or drug company. Local study contracted elsewhere e.g. UCLA, USC, CRO. Local protocol, and statistical design and analysis Occasionally multicenter.

10 Studies with Off-Site Biostatistics Not responsible for statistical design and analysis. Are responsible for study conduct that may: … impact analysis, believability of results. … reduce sensitivity (power) of the study to be able to detect effects.

11 Review of Basic Method of Inference from Clinical Studies

12 Typical Study Data Analysis Large enough “signal-to-noise ratio” → Proves an effect beyond a reasonable doubt. Often: Observed Effect Natural Variation/√N Signal Noise Ratio== Difference in Means SD/√N For a t-test comparing two groups: t Ratio= Degree of allowable doubt → How large t needs to be. 5% (p ~2

13 Meaning of p-value p-value: Probability of a test statistic (ratio) that is at least as deviant as was observed, if there is really no effect. Smaller p-values ↔ more evidence of effect. Validity of p-value interpretation typically requires: Proper data generation, e.g., randomness. Subjects provide independent information. Data is not used in other statistical tests. or: an accounting for not satisfying these criteria. → p-values are earned by satisfying appropriately.

14 Truth: No EffectEffect No Effect Effect Study Claims: Correct Error Power: Maximize. Choose N for 80% Set p≤0.05 Specificity=95% Specificity Sensitivity Analogy with Diagnostic Testing ← Typical → Analogy True Effect ↔ Disease Study Claim ↔ Diagnosis

15 Study Conduct Impacting Analysis Non-adherence of study personnel to the protocol in general. [Increases variation.] Enrolling subjects who do not satisfy inclusion or exclusion criteria. [ E.g., no effect in 10% wrongly included & real effect=50% → ~0.9(50%) = 45% observed effect. Can decrease observed effect.] Subjects not completing entire study. [May decrease N, or give potentially conflicting results.] ↓ effect detectability (and ↓ratio) results from:

16 Potentially Conflicting Results Example: Subjects not completing the entire study.

17 Tigabine Study Results: How Believable? 1 2 3 Conclusions differ depending on how non-completing subjects (24%) are handled in the analysis. Primary analysis here is specified, but we would prefer robustness to the method of analysis (agreement), which is more likely with more completing subjects.

18 Study Conduct Impacting Analysis Intention-to-Treat (ITT) Continued … ITT typically specifies that all subjects are included in analysis, regardless of treatment compliance or whether lost to follow-up. Purposes: Avoid bias from subjective exclusions or differential exclusion between treatment groups; sometimes argued to mimic non-compliance in real world setting. More emphasis on policy implications of societal effectiveness than on scientific efficacy. Not appropriate for many studies.

19 Study Conduct Impacting Analysis Lost to follow-up: Always minimize; no “real world” analogy as for treatment compliance. Need to define outcomes for non-completing subjects. Current Harbor study: N≈1200 would need N≈3000 if ITT used, 20% lost, and lost counted as treatment failures. Intention-to-Treat (ITT)

20 ITT: Need to Impute Unknown Values Change from Baseline Baseline Final Visit Intermediate Visit 0 Change from Baseline Intermediate Visit Final VisitBaseline 0 LOCF: Ignore Presumed Progression LRCF: Maintain Expected Relative Progression Individual Subjects Ranks Observations

21 Study Conduct Impacting Feasibility Potential Effects of Slow Enrollment Needed N may be impossible → Study stopped. Competitive site enrollment → Local financial loss. Insufficient person-years (PY) of observation for some studies, even if N is attained: 0 1 2 PlannedSlower YetSlower Area = PY N # of Subjects Year Detects Effect=Δ Detects Effect=1.1Δ Detects Effect=1.7Δ

22 Biostatistical Involvement in Studies Off-site statistical design and analysis Multicenter studies; data coordinating center. In-house drug company statisticians. By CRO through NIH or drug company. Local study contracted elsewhere e.g. UCLA, USC, CRO Local protocol, and statistical design and analysis Occasionally multicenter.

23 Local Protocols and Data Analysis 1.Develop protocol and data analysis plan. 2.Have randomization and blinding strategy, if study requires. 3.Data management. 4.Perform data analyses.

24 Local Data Analysis Resources Biostatistician: Peter Christenson, PChristenson@labiomed.org. Develop study design, analysis plan. Advise throughout for any study. Perform all non-basic analyses. Full responsibility for studies with funded %FTE. Review some protocols for committees. Data Management: Database development for GCRC studies by database manager.

25 Statistical Components of Protocols Target population / source of subjects. Quantification of aims, hypotheses. Case definitions, endpoints quantified. Randomization plan, if any. Masking, if used. Study size: screen, enroll, complete. Use of data from non-completers. Justification of study size (power, precision, other). Methods of analysis. Mid-study analyses.

26 Selected Statistical Components and Issues

27 Case Definitions and Endpoints Primary case definitions and endpoints need careful thought. Will need to report results based on these. Example: Study at Harbor Definition of cure very strict. Analyzed data with this definition. Cure rates too low - would not be taken seriously. Scientific method → need to report them; otherwise cherry-picking. Publication: Use primary definition; explain; also report with secondary definition. Less credible.

28 Randomization Helps assure attributability of treatment effects. Blocked randomization assures approximate chronologic equality of numbers of subjects in each treatment group. Recruiters must not have access to randomization list. List can be created with a random number generator in software, printed tables in stat texts, or even shuffled slips of paper.

29 Non-completing Subjects Enrolled subjects are never “dropouts”. Protocol should specify: –Primary analysis set (e.g., ITT or per- protocol). –How final values will be assigned to non- completers. Time-to-event (survival analysis) studies may not need final assignments; use time followed. Study size estimates should incorporate the number of expected non-completers.

30 Study Size: Power Power = Probability of detecting real effects of a specified minimal (clinically relevant) magnitude Power will be different for each outcome. Power depends on the statistical method. Five factors including power are inter-related. Fixing four of these specifies the fifth: –Study size –Heterogeneity among subjects (SD) –Magnitude of treatment effect to be detected –Power to detect this magnitude of effect –Acceptable chance of false positive conclusion, usually 0.05

31 Free Study Size Software www.stat.uiowa.edu/~rlenth/Power

32 Free Study Size Software: Example Pilot data: SD=8.19 in 36 subjects. We propose N=40 subjects/group in order to provide 80% power to detect (p<0.05) an effect Δ of 5.2:

33 Study Size : May Not be Based on Power Precision refers to how well a measure is estimated. Margin of error = the ± value (half-width) of the 95% confidence interval. Smaller margin of error ←→ greater precision. To achieve a specified margin of error, solve the CI formula for N. Polls: N ≈ 1000→ margin of error on % ≈ 1/√N ≈ 3%. Pilot Studies, Phase I, Some Phase II: Power not relevant; may have a goal of obtaining an SD for future studies.

34 Mid-Study Analyses Mid-study comparisons should not be made before study completion unless planned for (interim analyses). Early comparisons are unstable, and can invalidate final comparisons. Interim analyses are planned comparisons at specific times, usually by an unmasked advisory board. They allow stopping the study early due to very dramatic effects, and final comparisons, if study continues, are adjusted to validly account for “peeking”. Continued …

35 Mid-Study Analyses Effect 0 Number of Subjects Enrolled Time → Too many analyses Wrong early conclusion Need to monitor, but also account for many analyses

36 Mid-Study Analyses Mid-study reassessment of study size is advised for long studies. Only standard deviations to date, not effects themselves, are used to assess original design assumptions. Feasibility analysis: –may use the assessment noted above to decide whether to continue the study. –may measure effects, like interim analyses, by unmasked advisors, to project ahead on the likelihood of finding effects at the planned end of study. Continued …

37 Mid-Study Analyses Study 1: Groups do not differ; plan to add more subjects. Consequence → final p-value not valid; probability requires no prior knowledge of effect. Study 2: Groups differ significantly; plan to stop study. Consequence → use of this p-value not valid; the probability requires incorporating later comparison. Examples: Studies at Harbor Randomized; not masked; data available to PI. Compared treatment groups repeatedly, as more subjects were enrolled.

38 Multiple Analyses at Study End Lagakos NEJM 354(16):1667-1669. Replacing “Subgroup” with “Analysis” Gives a Similar Problem Torturing Data False Positive Conclusions

39 Multiple Analyses at Study End There are formal methods to incorporate the number of multiple analyses. Bonferroni Tukey Dunnett Transparency of what was done is most important. Should be aware of number of analyses and report it with any conclusions.

40 Summary: Bad Science That May Seem So Good 1.Re-examining data, or using many outcomes, seeming to be performing due diligence. 2.Adding subjects to a study that is showing marginal effects; or, stopping early due to strong results. 3.Examining effects in subgroups. See NEJM 2006 354(16):1667-1669. Actually bad? Could be negligent NOT to do these, but need to account for doing them.

41 Statistical Software

42 Professional Statistics Software Package Output Enter code; syntax. Stored data; access- ible.

43 Microsoft Excel for Statistics Primarily for descriptive statistics. Limited output.

44 Almost Free On-Line Statistics Software Run from browser; not local. $5/ 6 months usage. Potential HIPPA concerns www.statcrunch.com Supported by NSF

45 Typical Statistics Software Package Select Methods from Menus Output after menu selection Data in spreadsheet www.ncss.com www.minitab.com www.stata.com $100 - $500

46 http://gcrc.labiomed.org/biostat This and other biostat talks posted

47 Conclusions Don’t put off slow enrollment; find the cause; solve it. I am available. Do put off analyses of efficacy, not of design assumptions. I am available. P-values are earned, by following methods which are needed for them to be valid. I am available. You may have to pay for lack of attention to protocol decisions, to satisfy the scientific method. I am available. Software always takes more time than expected.

48 Thank You Nils Simonson, in Furberg & Furberg, Evaluating Clinical Research


Download ppt "Statistical Principles for Clinical Research Sponsored by: NIH General Clinical Research Center Los Angeles Biomedical Research Institute at Harbor-UCLA."

Similar presentations


Ads by Google