Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power.

Slides:

Advertisements

Similar presentations

Biostatistics in Practice Session 3: Testing Hypotheses Peter D. Christenson Biostatistician

Advertisements

Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 1: Study Design for Demonstrating Lack of Treatment.

Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition

More About Type I and Type II Errors. O.J. Simpson trial: the situation O.J. is assumed innocent. Evidence collected: size 12 Bruno Magli bloody footprint,

1 Hypothesis testing. 2 A common aim in many studies is to check whether the data agree with certain predictions. These predictions are hypotheses about.

Statistical Issues in Research Planning and Evaluation

Chapter 10 Section 2 Hypothesis Tests for a Population Mean

Confidence Intervals © Scott Evans, Ph.D..

EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.

Introduction to Hypothesis Testing

SADC Course in Statistics Comparing Means from Independent Samples (Session 12)

Introduction to Hypothesis Testing

BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.

Part III: Inference Topic 6 Sampling and Sampling Distributions

Inferences About Process Quality

Sample Size Determination

Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.

Getting Started with Hypothesis Testing The Single Sample.

Sample Size and Statistical Power Epidemiology 655 Winter 1999 Jennifer Beebe.

Sampling Theory Determining the distribution of Sample statistics.

Sample Size Determination Ziad Taib March 7, 2014.

Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.

Fall 2012Biostat 5110 (Biostatistics 511) Discussion Section Week 8 C. Jason Liang Medical Biometry I.

Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing.

AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.

McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.

Hypothesis Testing A hypothesis is a conjecture about a population. Typically, these hypotheses will be stated in terms of a parameter such as  (mean)

Statistical Analysis Statistical Analysis

Sampling Theory Determining the distribution of Sample statistics.

Biostatistics in Clinical Research Peter D. Christenson Biostatistician January 12, 2005IMSD U*STAR RISE.

More About Significance Tests

Biostatistics for Coordinators Peter D. Christenson REI and GCRC Biostatistician GCRC Lecture Series: Strategies for Successful Clinical Trials Session.

Inference for a Single Population Proportion (p).

Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Comparing Two Population Means

RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.

Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.

Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.

Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology.

Chapter 8 Introduction to Hypothesis Testing

Biostatistics: An Introduction RISE Program 2010 Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center January 15, 2010 Peter D. Christenson.

LECTURE 19 THURSDAY, 14 April STA 291 Spring

Biostatistics: Study Design Peter D. Christenson Biostatistician Summer Fellowship Program July 2, 2004.

10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.

Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.

Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.

Biostatistics in practice Session 3 Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center LA BioMed/Harbor-UCLA.

Biostatistics in Practice Peter D. Christenson Biostatistician Session 1: Design and Fundamentals of Inference.

Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 4: Study Size and Power.

Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?

통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.

Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry

Biostatistics in Practice Peter D. Christenson Biostatistician Session 6: Case Study.

Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.

Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.

Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses.

Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.

Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power.

Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 2: Correlation of Time Courses of Simultaneous.

© Copyright McGraw-Hill 2004

STA Lecture 221 !! DRAFT !! STA 291 Lecture 22 Chapter 11 Testing Hypothesis – Concepts of Hypothesis Testing.

1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.

Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 3: Testing Hypotheses.

Sampling Theory Determining the distribution of Sample statistics.

Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 1: Demonstrating Equivalence of Active Treatments:

Biostatistics Case Studies 2016 Youngju Pak, PhD. Biostatistician Session 2 Understanding Equivalence and Noninferiority testing.

Hypothesis Tests for 1-Proportion Presentation 9.

Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.

Biostatistics Case Studies 2007

Presentation transcript:

Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power

Readings for Session 4 from StatisticalPractice.com Sample Size Calculations Some underlying theory and some practical advice. Controlled trials

Outline for this Session 1.Example from a current local protocol. 2.Review statistical hypothesis testing. 3.Formulate example as hypothesis test. 4.Software for study size and power. 5.Other issues.

Local Protocol Example Brief study outline: Subjects arrive at ER with TBI (traumatic brain injury). Those with low cortisol, indicating possible adrenal insufficiency and pituitary damage, may or may not recover better if given hydrocortisone (HC) injections. Subjects who consent are randomized to receive HC or placebo for 4 days. Changes in recovery status from pre to post injection periods are compared between HC and placebo groups. Project #10038: Dan Kelly & Pejman Cohan Hypopituitarism after Moderate and Severe Head Injury

Local Protocol Example, Cont’d “The primary outcomes for the hydrocortisone trial are changes in mean MAP and vasopressor use from the 12 hours prior to initiation of randomized treatment to the 96 hours after initiation.” Mean changes in placebo subjects will be compared with hydrocortisone subjects using a two sample t-test. Project #10038: Dan Kelly & Pejman Cohan Hypopituitarism after Moderate and Severe Head Injury Before examining the study size, let’s first discuss how the results will be analyzed.

Recall Statistical (t) test From Last Session Suppose results from the study are plotted as: Is Δ large enough to claim that HC is more effective? Use t-test. HCPlacebo Change in MAP Each point is the change in MAP for an individual subject. [Of course, the real study will have many more subjects.] Δ

Local Protocol Example: Analysis with t-test We are testing: H 0 : μ HC - μ Placebo = 0 vs. H A : μ HC - μ Placebo ≠ 0 where μ HC is the expected post-pre change in “all potential TBI patients” if HC therapy is applied as in this study. Our decision rule is: Choose H A if the estimate of μ HC - μ Placebo from our limited sample, i.e., the observed mean change under HC minus the observed mean change under placebo, call it Δ, is too far from 0 (which is specified by H 0 ). “Too far” is > t c *SE or < t c *SE, where t c is usually about 2. SE is SE(Δ), calculated from the data, and is ↓ for larger N and smaller SD. In other words, choose H A if |Δ| > t c *SE, or |t|=|Δ/SE| > t c. By following this rule, there is only a 5% probability of choosing H A if in fact H 0 is true.

Potentially Underpowered Studies From the previous slide: By following this rule, there is only a 5% probability of choosing H A if in fact H 0 is true. So, the probability is small (5%) that our study will (incorrectly) recommend that TBI subjects receive HC if it is worthless. But, is it able to correctly recommend that TBI subjects receive HC if it is effective? The probability of this is called the power of the study. Actually, there is not a single value for power. The study may have, say, 59% power if the true mean HC effect is 3 mmHg in MAP, but will have more power if the true effect is 4, since the subjects are more likely to reflect this greater effectiveness. Let’s go back to last session’s graph to see this.

Graphical Representation of Power H0H0 HAHA H 0 : true effect=0 H A : true effect=3 Effect in study=1.13 \\\ = Probability of concluding H A if H 0 is true. /// = Probability of concluding H 0 if H A is true. Power=100-41=59% Note greater power if larger N, and/or if true effect>3. 41% 5% Effect (HC change – Placebo change)

P-Value Recall that our decision rule is: Choose H A if |Δ| > t c *SE, or |t|=|Δ/SE| > t c. By following this rule, there is only a 5% probability of choosing H A if in fact H 0 is true. In practice, though, we do not just report our decision as H A or H 0. The p-value is the probability, if H 0 is correct, that we would observe a Δ as far from 0 as actually eventually occurred in the study. Here, p=Prob(Δ>1.13), which is the area under H 0 to the right of the green line in the previous figure. Small p-values support H A. Choosing H A is equivalent to p<0.05, so the study result is reported as the p-value. HC is declared to have an effect if p<0.05.

Summary: Factors that Determine Study Size Five factors including power are inter-related. Fixing four of these specifies the fifth: 1. Study size, N. 2. Power (often 80% is desirable). 3. p-value (level of significance, e.g., 0.05). 4. Magnitude of treatment effect to be detected. 5. Heterogeneity among subjects (standard deviation, SD). The next slide shows how these factors (except SD) are typically presented in a study protocol.

Quote from Local Protocol Example Thus, with a total of the planned 80 subjects, we are 80% sure to detect (p<0.05) group differences if treatments actually differ by at least 5.2 mm Hg in MAP change, or by a mean 0.34 change in number of vasopressors.

Comments on Table on Previous Slide Typically power=80% and almost always p<0.05 are fixed. SD was not mentioned. If available, several estimates of SD may be used (different populations, intervention characteristics such as dosage, time, etc). Here, a pilot study exactly like the trial was performed by the investigators. Detectable difference refers to the unknown true difference, μ HC - μ Placebo, not the difference that will eventually be seen in the study. N ↑ as detectable difference ↓. So, the major consideration is usually a tradeoff between N and the detectable difference.

Software for Study Size Calculations Calculations depend on the specific statistical method. We are using the t-test as an example, but the same concepts apply for, say, comparing % subjects who respond to treatment using another method such as a chi-square test. In software, you specify the method, and 4 of the 5 factors. The value of the fifth factor is calculated. Two free sites for calculations:

A Software Site for Study Size Calculations

Local Protocol Example, Calculations Pilot data: SD=8.16 for ΔMAP in 36 subjects. For p-value<0.05, power=80%, N=40/group, the detectable Δ of 5.2 in the previous table is found as:

Summary: Study Size and Power 1. Power analysis assures that effects of a specified magnitude can be detected. 1. Five factors including power are inter-related. Fixing four of these specifies the fifth. 2. For comparing means, need pilot or data from other studies on variability of subjects for the outcome measure. [E.g., Std dev from previous study.] Comparing rates (%s) does not require pilot variability data. Use if no pilot data is available for means. 3. Helps support the believability of (superiority) studies if the conclusions turn out to be negative. 4. To prove no effect (e.g., that a less invasive therapy is equally as effective as standard care), use an equivalency study design.

Self-Test Exercise #1 A study was powered to detect a 10 point mean reduction in LDL cholesterol. A colleague claims that this means that if the subjects decrease LDL cholesterol by a mean 10 points, then p<0.05 and this will be a significant reduction. Explain.

Self-Test Exercise #2 True story: A protocol was designed with 80% power to detect (p<0.05) a 10% disease incidence in subjects receiving placebo vs. a 3.5% incidence in subjects receiving a new drug. This corresponds to a 65% reduction in disease incidence. A comment on the study was: “… there may not be a large enough sample to see the effect size required for a successful outcome. Power calculations indicate that the study is looking for a 65% reduction in incidence of … [disease]. Wouldn’t it also be of interest if there were only a 50% or 40% reduction, thus requiring smaller numbers and making the trial more feasible?” What is your comment on the comment?