Statistical Principles for Clinical Research Sponsored by: NIH General Clinical Research Center Los Angeles Biomedical Research Institute at Harbor-UCLA.

Slides:



Advertisements
Similar presentations
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 1: Study Design for Demonstrating Lack of Treatment.
Advertisements

LSU-HSC School of Public Health Biostatistics 1 Statistical Core Didactic Introduction to Biostatistics Donald E. Mercante, PhD.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 1: Quantitative Needs in Biological Research.
Estimation of Sample Size
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 5: Reporting Subgroup Results.
Common Problems in Writing Statistical Plan of Clinical Trial Protocol Liying XU CCTER CUHK.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Clinical Trials Hanyan Yang
Sample Size Determination
Chapter 14 Inferential Data Analysis
Introduction to the design (and analysis) of experiments James M. Curran Department of Statistics, University of Auckland
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Biostatistics in Clinical Research Peter D. Christenson Biostatistician January 12, 2005IMSD U*STAR RISE.
Biostatistics for Coordinators Peter D. Christenson REI and GCRC Biostatistician GCRC Lecture Series: Strategies for Successful Clinical Trials Session.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Inference for a Single Population Proportion (p).
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 2: Sample Size & Power for Inequality and Equivalence Studies.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson Session 1: Quantitative and Inferential.
Chapter 8 Introduction to Hypothesis Testing
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Hypothesis Testing PowerPoint Prepared by Alfred.
Biostatistics: An Introduction RISE Program 2010 Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center January 15, 2010 Peter D. Christenson.
Significance Toolbox 1) Identify the population of interest (What is the topic of discussion?) and parameter (mean, standard deviation, probability) you.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
Biostatistics: Study Design Peter D. Christenson Biostatistician Summer Fellowship Program July 2, 2004.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 1: Sample Size & Power for Inequality and Equivalence Studies.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
Chapter 20 Testing hypotheses about proportions
Landmark Trials: Recommendations for Interpretation and Presentation Julianna Burzynski, PharmD, BCOP, BCPS Heme/Onc Clinical Pharmacy Specialist 11/29/07.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
What is a non-inferiority trial, and what particular challenges do such trials present? Andrew Nunn MRC Clinical Trials Unit 20th February 2012.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 4: Study Size and Power.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Statistics in Biomedical Research RISE Program 2011 Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center January 13, 2011 Peter D. Christenson.
Statistics in Biomedical Research RISE Program 2012 Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center January 19, 2012 Peter D. Christenson.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 4: An Alternative to Last-Observation-Carried-Forward:
Issues concerning the interpretation of statistical significance tests.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 1: Quantitative and Inferential Issues.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power.
Statistics for Decision Making Basic Inference QM Fall 2003 Instructor: John Seydel, Ph.D.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 1: Demonstrating Equivalence of Active Treatments:
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
Sample Size Determination
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 3: An Alternative to Last-Observation-Carried-Forward:
Session 6: Other Analysis Issues In this session, we consider various analysis issues that occur in practice: Incomplete Data: –Subjects drop-out, do not.
Biostatistics in Practice Session 6: Data and Analyses: Too Little or Too Much Youngju Pak Biostatistician
CONSORT 2010 Balakrishnan S, Pondicherry Institute of Medical Sciences.
Chapter 13 Understanding research results: statistical inference.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 6: Data and Analyses: Too Little or Too Much.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 1: Demonstrating Equivalence of Active Treatments:
Inference for a Single Population Proportion (p)
Sample Size Determination
Biostatistics Case Studies 2007
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
S1316 analysis details Garnet Anderson Katie Arnold
Common Problems in Writing Statistical Plan of Clinical Trial Protocol
Chapter 12 Power Analysis.
Psych 231: Research Methods in Psychology
Introduction to the design (and analysis) of experiments
Presentation transcript:

Statistical Principles for Clinical Research Sponsored by: NIH General Clinical Research Center Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center November 1, 2007 Peter D. Christenson Conducting Clinical Trials 2007

Speaker Disclosure Statement The speaker has no financial relationships relevant to this presentation.

Recommended Textbook: Making Inference Design issues Biases How to read papers Meta-analyses Dropouts Non-mathematical Many examples

Example: Harbor Study Protocol 18 Pages of Background and Significance, Preliminary Studies, and Research Design and Methods. Then: “Pearson correlation, repeated measure of the general linear model, ANOVA analyses and student t tests will be used where appropriate. … The [two] main parameters of interest will be … [A and B. For A, using a t-test] 40 subjects provide 80% assurance that a XX reduction … will be detected, with p<0.05. Similar comparisons as for … [A and B] will be carried out …”

Example: Harbor Study Protocol The good …. “The [two] main parameters of interest will be … [A and B. For A, using a t-test,] 40 subjects provide 80% assurance that a XX reduction … will be detected, with p<0.05.” Because: Explicit: Specifies primary outcome of interest. Explicit: Justification for # of subjects.

Example: Harbor Study Protocol … the Bad … “Pearson correlation, repeated measure of the general linear model, ANOVA analyses and student t tests will be used where appropriate. …” Because: Boilerplate. These methods are almost always used. “Where appropriate”? Tries to satisfy reviewer, not science.

Example: Harbor Study Protocol … and the Ugly. “Similar comparisons as for … [A and B] will be carried out …” Because: 1º OK: Diff b/w 2 visits for 2 measures, A & B. But, 15 measures taken at each of 19 visits. Torture the data long enough, and it will confess to something.

Goals of this Presentation More good. Less bad. Less ugly.

Biostatistical Involvement in Studies Off-site statistical design and analysis Multicenter studies; data coordinating center. In house drug company statisticians. CRO through NIH or drug company. Local study contracted elsewhere e.g. UCLA, USC, CRO. Local protocol, and statistical design and analysis Occasionally multicenter.

Studies with Off-Site Biostatistics Not responsible for statistical design and analysis. Are responsible for study conduct that may: … impact analysis, believability of results. … reduce sensitivity (power) of the study to be able to detect effects.

Review of Basic Method of Inference from Clinical Studies

Typical Study Data Analysis Large enough “signal-to-noise ratio” → Proves an effect beyond a reasonable doubt. Often: Observed Effect Natural Variation/√N Signal Noise Ratio== Difference in Means SD/√N For a t-test comparing two groups: t Ratio= Degree of allowable doubt → How large t needs to be. 5% (p ~2

Meaning of p-value p-value: Probability of a test statistic (ratio) that is at least as deviant as was observed, if there is really no effect. Smaller p-values ↔ more evidence of effect. Validity of p-value interpretation typically requires: Proper data generation, e.g., randomness. Subjects provide independent information. Data is not used in other statistical tests. or: an accounting for not satisfying these criteria. → p-values are earned by satisfying appropriately.

Truth: No EffectEffect No Effect Effect Study Claims: Correct Error Power: Maximize. Choose N for 80% Set p≤0.05 Specificity=95% Specificity Sensitivity Analogy with Diagnostic Testing ← Typical → Analogy True Effect ↔ Disease Study Claim ↔ Diagnosis

Study Conduct Impacting Analysis Non-adherence of study personnel to the protocol in general. [Increases variation.] Enrolling subjects who do not satisfy inclusion or exclusion criteria. [ E.g., no effect in 10% wrongly included & real effect=50% → ~0.9(50%) = 45% observed effect. Can decrease observed effect.] Subjects not completing entire study. [May decrease N, or give potentially conflicting results.] ↓ effect detectability (and ↓ratio) results from:

Potentially Conflicting Results Example: Subjects not completing the entire study.

Tigabine Study Results: How Believable? Conclusions differ depending on how non-completing subjects (24%) are handled in the analysis. Primary analysis here is specified, but we would prefer robustness to the method of analysis (agreement), which is more likely with more completing subjects.

Study Conduct Impacting Analysis Intention-to-Treat (ITT) Continued … ITT typically specifies that all subjects are included in analysis, regardless of treatment compliance or whether lost to follow-up. Purposes: Avoid bias from subjective exclusions or differential exclusion between treatment groups; sometimes argued to mimic non-compliance in real world setting. More emphasis on policy implications of societal effectiveness than on scientific efficacy. Not appropriate for many studies.

Study Conduct Impacting Analysis Lost to follow-up: Always minimize; no “real world” analogy as for treatment compliance. Need to define outcomes for non-completing subjects. Current Harbor study: N≈1200 would need N≈3000 if ITT used, 20% lost, and lost counted as treatment failures. Intention-to-Treat (ITT)

ITT: Need to Impute Unknown Values Change from Baseline Baseline Final Visit Intermediate Visit 0 Change from Baseline Intermediate Visit Final VisitBaseline 0 LOCF: Ignore Presumed Progression LRCF: Maintain Expected Relative Progression Individual Subjects Ranks Observations

Study Conduct Impacting Feasibility Potential Effects of Slow Enrollment Needed N may be impossible → Study stopped. Competitive site enrollment → Local financial loss. Insufficient person-years (PY) of observation for some studies, even if N is attained: PlannedSlower YetSlower Area = PY N # of Subjects Year Detects Effect=Δ Detects Effect=1.1Δ Detects Effect=1.7Δ

Biostatistical Involvement in Studies Off-site statistical design and analysis Multicenter studies; data coordinating center. In-house drug company statisticians. By CRO through NIH or drug company. Local study contracted elsewhere e.g. UCLA, USC, CRO Local protocol, and statistical design and analysis Occasionally multicenter.

Local Protocols and Data Analysis 1.Develop protocol and data analysis plan. 2.Have randomization and blinding strategy, if study requires. 3.Data management. 4.Perform data analyses.

Local Data Analysis Resources Biostatistician: Peter Christenson, Develop study design, analysis plan. Advise throughout for any study. Perform all non-basic analyses. Full responsibility for studies with funded %FTE. Review some protocols for committees. Data Management: Database development for GCRC studies by database manager.

Statistical Components of Protocols Target population / source of subjects. Quantification of aims, hypotheses. Case definitions, endpoints quantified. Randomization plan, if any. Masking, if used. Study size: screen, enroll, complete. Use of data from non-completers. Justification of study size (power, precision, other). Methods of analysis. Mid-study analyses.

Selected Statistical Components and Issues

Case Definitions and Endpoints Primary case definitions and endpoints need careful thought. Will need to report results based on these. Example: Study at Harbor Definition of cure very strict. Analyzed data with this definition. Cure rates too low - would not be taken seriously. Scientific method → need to report them; otherwise cherry-picking. Publication: Use primary definition; explain; also report with secondary definition. Less credible.

Randomization Helps assure attributability of treatment effects. Blocked randomization assures approximate chronologic equality of numbers of subjects in each treatment group. Recruiters must not have access to randomization list. List can be created with a random number generator in software, printed tables in stat texts, or even shuffled slips of paper.

Non-completing Subjects Enrolled subjects are never “dropouts”. Protocol should specify: –Primary analysis set (e.g., ITT or per- protocol). –How final values will be assigned to non- completers. Time-to-event (survival analysis) studies may not need final assignments; use time followed. Study size estimates should incorporate the number of expected non-completers.

Study Size: Power Power = Probability of detecting real effects of a specified minimal (clinically relevant) magnitude Power will be different for each outcome. Power depends on the statistical method. Five factors including power are inter-related. Fixing four of these specifies the fifth: –Study size –Heterogeneity among subjects (SD) –Magnitude of treatment effect to be detected –Power to detect this magnitude of effect –Acceptable chance of false positive conclusion, usually 0.05

Free Study Size Software

Free Study Size Software: Example Pilot data: SD=8.19 in 36 subjects. We propose N=40 subjects/group in order to provide 80% power to detect (p<0.05) an effect Δ of 5.2:

Study Size : May Not be Based on Power Precision refers to how well a measure is estimated. Margin of error = the ± value (half-width) of the 95% confidence interval. Smaller margin of error ←→ greater precision. To achieve a specified margin of error, solve the CI formula for N. Polls: N ≈ 1000→ margin of error on % ≈ 1/√N ≈ 3%. Pilot Studies, Phase I, Some Phase II: Power not relevant; may have a goal of obtaining an SD for future studies.

Mid-Study Analyses Mid-study comparisons should not be made before study completion unless planned for (interim analyses). Early comparisons are unstable, and can invalidate final comparisons. Interim analyses are planned comparisons at specific times, usually by an unmasked advisory board. They allow stopping the study early due to very dramatic effects, and final comparisons, if study continues, are adjusted to validly account for “peeking”. Continued …

Mid-Study Analyses Effect 0 Number of Subjects Enrolled Time → Too many analyses Wrong early conclusion Need to monitor, but also account for many analyses

Mid-Study Analyses Mid-study reassessment of study size is advised for long studies. Only standard deviations to date, not effects themselves, are used to assess original design assumptions. Feasibility analysis: –may use the assessment noted above to decide whether to continue the study. –may measure effects, like interim analyses, by unmasked advisors, to project ahead on the likelihood of finding effects at the planned end of study. Continued …

Mid-Study Analyses Study 1: Groups do not differ; plan to add more subjects. Consequence → final p-value not valid; probability requires no prior knowledge of effect. Study 2: Groups differ significantly; plan to stop study. Consequence → use of this p-value not valid; the probability requires incorporating later comparison. Examples: Studies at Harbor Randomized; not masked; data available to PI. Compared treatment groups repeatedly, as more subjects were enrolled.

Multiple Analyses at Study End Lagakos NEJM 354(16): Replacing “Subgroup” with “Analysis” Gives a Similar Problem Torturing Data False Positive Conclusions

Multiple Analyses at Study End There are formal methods to incorporate the number of multiple analyses. Bonferroni Tukey Dunnett Transparency of what was done is most important. Should be aware of number of analyses and report it with any conclusions.

Summary: Bad Science That May Seem So Good 1.Re-examining data, or using many outcomes, seeming to be performing due diligence. 2.Adding subjects to a study that is showing marginal effects; or, stopping early due to strong results. 3.Examining effects in subgroups. See NEJM (16): Actually bad? Could be negligent NOT to do these, but need to account for doing them.

Statistical Software

Professional Statistics Software Package Output Enter code; syntax. Stored data; access- ible.

Microsoft Excel for Statistics Primarily for descriptive statistics. Limited output.

Almost Free On-Line Statistics Software Run from browser; not local. $5/ 6 months usage. Potential HIPPA concerns Supported by NSF

Typical Statistics Software Package Select Methods from Menus Output after menu selection Data in spreadsheet $100 - $500

This and other biostat talks posted

Conclusions Don’t put off slow enrollment; find the cause; solve it. I am available. Do put off analyses of efficacy, not of design assumptions. I am available. P-values are earned, by following methods which are needed for them to be valid. I am available. You may have to pay for lack of attention to protocol decisions, to satisfy the scientific method. I am available. Software always takes more time than expected.

Thank You Nils Simonson, in Furberg & Furberg, Evaluating Clinical Research