Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: Null hypothesis.
Advertisements

Biostatistics in Practice Session 3: Testing Hypotheses Peter D. Christenson Biostatistician
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 1: Study Design for Demonstrating Lack of Treatment.
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
1 1 Slide Chapter 9 Hypothesis Tests Developing Null and Alternative Hypotheses Developing Null and Alternative Hypotheses Type I and Type II Errors Type.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
Statistical Issues in Research Planning and Evaluation
Estimation of Sample Size
Introduction to Statistics
Inference: Confidence Intervals
SAMPLE SIZE ESTIMATION
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
Nemours Biomedical Research Statistics March 19, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Inferences About Process Quality
Statistics for Health Care
Sample Size Determination
Sample size and study design
Effect Sizes, Power Analysis and Statistical Decisions Effect sizes -- what and why?? review of statistical decisions and statistical decision errors statistical.
Sample Size Determination Ziad Taib March 7, 2014.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Hypothesis Testing.
1 GE5 Lecture 6 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson.
Dr Mohammad Hossein Fallahzade Determining the Size of a Sample In the name of God.
Multiple Choice Questions for discussion
STAT 5372: Experimental Statistics Wayne Woodward Office: Office: 143 Heroy Phone: Phone: (214) URL: URL: faculty.smu.edu/waynew.
Biostatistics in Clinical Research Peter D. Christenson Biostatistician January 12, 2005IMSD U*STAR RISE.
Biostatistics for Coordinators Peter D. Christenson REI and GCRC Biostatistician GCRC Lecture Series: Strategies for Successful Clinical Trials Session.
Inference for a Single Population Proportion (p).
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 2: Sample Size & Power for Inequality and Equivalence Studies.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Biostatistics: An Introduction RISE Program 2010 Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center January 15, 2010 Peter D. Christenson.
Statistics for Health Care Biostatistics. Phases of a Full Clinical Trial Phase I – the trial takes place after the development of a therapy and is designed.
A Broad Overview of Key Statistical Concepts. An Overview of Our Review Populations and samples Parameters and statistics Confidence intervals Hypothesis.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 6: Case Study.
Biostatistics: Study Design Peter D. Christenson Biostatistician Summer Fellowship Program July 2, 2004.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 1: Sample Size & Power for Inequality and Equivalence Studies.
Biostatistics in practice Session 3 Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center LA BioMed/Harbor-UCLA.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 4: Study Size and Power.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Chapter 20 Testing Hypothesis about proportions
Biostatistics in Practice Peter D. Christenson Biostatistician Session 6: Case Study.
Statistics in Biomedical Research RISE Program 2011 Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center January 13, 2011 Peter D. Christenson.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Statistics in Biomedical Research RISE Program 2012 Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center January 19, 2012 Peter D. Christenson.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 2: Correlation of Time Courses of Simultaneous.
© Copyright McGraw-Hill 2004
Sample Size Determination
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 3: Testing Hypotheses.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.2 Tests About a Population.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
1 Lecture 5 Introduction to Hypothesis Tests Slides available from Statistics & SPSS page of Social Science Statistics Module.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Hypothesis Testing and Statistical Significance
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 1: Demonstrating Equivalence of Active Treatments:
Biostatistics Case Studies 2016 Youngju Pak, PhD. Biostatistician Session 2 Understanding Equivalence and Noninferiority testing.
Copyright © 2010 Pearson Education, Inc. Slide
Hypothesis Tests for 1-Proportion Presentation 9.
Statistics 19 Confidence Intervals for Proportions.
Confidence Intervals and Hypothesis Tests Week 5.
Biostatistics Case Studies 2007
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Presentation transcript:

Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power

Session 4 Issue How many subjects?

Session 4 Preparation We have been using a recent study on hyperactivity in children under diets with various amounts of food additives for the concepts in this course. The questions below based on this paper are intended to prepare you for session 4, which is on determining the size of a study. 1.How many children were deemed necessary to complete the entire study? Use the second column on the 4th page of the paper.

Session 4 Preparation #1

Session 4 Preparation #2 2. The authors accounted for some children to start, but not complete the study. What percentage of "dropouts" did they build into their calculations? The statistical requirements are for 80 “evaluable” subjects. They decided on a study size of 120, so they were allowing up to 40/120 = 33% of subjects to not complete.

Session 4 Preparation #3 3. The authors will perform a test similar to the t-test we discussed last week, to conclude whether there is evidence that hyperactivity differs under Mix A than placebo. There are two mistakes that they may make in this decision. What are they? I.Conclude Mix A ≠ Placebo, but Mix A = Placebo II.Conclude Mix A = Placebo, but Mix A ≠ Placebo

Session 4 Preparation #4 and #5 4. How large a difference between Mix A and placebo do they want to detect? 5. Does the value of 0.32 in the study size description (second column on the 4th page) refer to a difference? They seem to imply it is a SD. Based on what we have said about tests comparing "signal" to "noise", do you think both a difference and SD are relevant for determining the study size?

Session 4 Preparation: #4 and #5

Session 4 Preparation #4 and #5 They want to detect a difference Δ of 0.32 in GHA. [ Smallest clinically relevant Δ? ] Both the Δ and SD need to be accounted for. Effect size = Δ / SD = “# of SDs”. Remember, reference range = 4 to 6 SDs. For this study (unusual) GHA is scaled to have a SD of 1, so Δ = effect size =0.32.

Session 4 Goals Review estimating and testing Δ, SD and N in estimating and testing False positive and false negative conclusions from tests What is needed to determine study size Software for study size

Review Estimation Typically: 1.Have sample of N representing “all”. 2.Find mean and SD from the N units. 3.Expect new unit to be within mean ± 2SD. 4.Confident (95%) that mean of all is in mean ± 2SD/√N. May have this info for one or multiple groups.

Study Size to Achieve Precision Precision refers to how well a measure is estimated. Margin of error = the ± value (half-width) of the 95% confidence interval. Lower margin of error ↔ greater precision. To achieve a specified margin of error, say d, solve the CI formula for N: For a mean, d = 2SD/√N, so N=(2SD/d) 2. For a proportion p, d = 2[p(1-p)/N] 1/2 ≤ 1/√N. Most polls use N ≈ 1000, so margin of error on % ≈ 3%

Review Statistical Tests 1. Calculate a standardized quantity for the particular test, a “test statistic”: Often: t = (Mean – Expected) / SE(Mean) If 1 group, Mean may be a change score. If 2 groups, Mean may be the difference between means for two groups. Expected = 0 if no effect. Looking for evidence to contradict “no effect”. Rarely: Mean is not a Δ and Expected ≠ 0.

Review Statistical Tests 2.Compare the test statistic to the range of values it should be if expectations are correct. Often: The range has approx’ly normal bell curve. 3.Declare “effect” if test statistic is too extreme, relative to this range. Often: |test statistic| >~2 → Declare effect.

t-Test Expect 95% Chance Declare effect if test statistic is “too extreme”. How extreme? Convention: “Too extreme” means < 5% chance of wrongly declaring an effect. 2.5% Effect No Effect Effect Declare: t = (mean – expected) SD/√N

t-Test Expect 95% Chance Declare effect if test statistic is “too extreme”. Convention: “Too extreme” means < 5% chance of wrongly declaring an effect. But, what are the chances of wrongly declaring no effect? 2.5% Effect No Effect Effect Declare:

t-Test Expect 95% Chance Declare effect if test statistic is “too extreme”. But, what are the chances of wrongly declaring no effect? To answer, we need a similar curve for the range of values expected when there is an effect. 2.5% Effect No Effect Effect Declare:

Two Possible Errors from t-test No Effect Real Effect No real effect (0) Real effect = 3 Effect in study=1.13 \\\ = Probability: Conclude Effect, But no Real Effect (5%). /// = Probability: Conclude No Effect, But Real Effect (41%). 41% 5% Δ = Effect (Difference Between Group Means) RedBlue Green Just Δ, not t = Δ/SE(Δ)Conclude effect. Consider just one possible real effect, the value 3.

Graphical Representation of t-test No Effect Real Effect No real effect (0) Real effect = 3 Effect in study= % 5% Δ = Effect (Difference Between Group Means) RedBlue Green Just Δ, not t = Δ/SE(Δ)Conclude effect. Suppose we need stronger proof; i.e., shift cutoff to right. Then, chance of false positive is reduced to ~1%, but false negative is increased to ~60%.

Power of a Study Statistical power is the sensitivity of a study to detect real effects, if they exist. It is =59% two slides back.

Truth: No DiseaseDisease No Disease Disease Diagnosis: Correct Error Want high for a screening test Need high in follow-up test Specificity Sensitivity Two Possible Errors in a Diagnostic Test Specificity ↓ as Sensitivity↑

Truth: No EffectEffect No Effect Effect Study Claims: Correct Error (Type I) Error (Type II) Power: Maximize. Choose N for 80% Set α=0.05 Specificity=95% Specificity Sensitivity Analogy with Diagnostic Testing ← Typical →

Summary: Factors Related to Study Size Five factors are inter-related. Fixing four of these specifies the fifth: 1. Study size, N. 2. Power (often 80% is desirable). 3. p-value cutoff (level of significance, e.g., 0.05). 4. Magnitude of the effect to be detected (Δ). 5. Heterogeneity among subjects (SD). The next slide shows how these factors (except SD) are typically presented in a study protocol.

Quote from Local Protocol Example Thus, with a total of the planned 80 subjects, we are 80% sure to detect (p<0.05) group differences if treatments actually differ by at least 5.2 mm Hg in MAP change, or by a mean 0.34 change in number of vasopressors.

Comments on the Previous Table Typically power=80% and almost always p<0.05. SD was not mentioned. There may be several estimates from other studies (different populations, intervention characteristics such as dosage, time, etc). Here, a pilot study exactly like the trial was performed by the same investigators. Detectable difference refers to the unknown true difference for “all”, not the difference that will be seen eventually in the N study subjects. N ↑ as detectable difference ↓. So, the major consideration is usually a tradeoff between N and the detectable difference.

Free Study Size Software

Local Protocol Example: Calculations Pilot data: SD=8.16 for ΔMAP in 36 subjects. For p-value<0.05, power=80%, N=40/group, the detectable Δ of 5.2 in the previous table is found as:

Hyperactivity Study Size Study is 1-sample or paired (for each age group). SD=1 Δ=0.32 Use p-value<0.05. Want power=80%. Solve for N in software to get N=79.

Study Size for Some Other Study Types 1.Phase I: Dose escalation. Safety, not efficacy. No power. Use N=3 low dose; if safe N=3 in higher dose, etc. 2.Phase II: Small, primarily safety; look for enough evidence of efficacy to go on to Phase III. Often staged: e.g., if 3/10 respond, test 10 more, etc. 3.Mortality studies: Patterns of deaths over time can be used in sample size calculations. Software not in the online package.

Summary: Study Size and Power 1.Power analysis assures that effects of a specified magnitude can be detected. 2.Five factors including power are inter-related. Fixing four of these specifies the fifth. 3.For comparing means, need pilot or data from other studies to estimate SD for the outcome measure. Comparing %s does not require SD. 4.Helps support the believability of studies if the conclusions turn out to be negative.