Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: Null hypothesis.
Advertisements

Power and sample size.
Statistical Issues in Research Planning and Evaluation
What size of trial do I need? Peter T. Donnan Professor of Epidemiology and Biostatistics Co-Director of TCTU Statistics for Health Research.
SAMPLE SIZE ESTIMATION
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
BS704 Class 7 Hypothesis Testing Procedures
Chapter 9 Hypothesis Testing.
Today Concepts underlying inferential statistics
Sample Size Determination
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Sample Size Determination Ziad Taib March 7, 2014.
Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
Dr Mohammad Hossein Fallahzade Determining the Size of a Sample In the name of God.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Chapter 7 Statistical Issues in Research Planning and Evaluation.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Sample Size Determination Donna McClish. Issues in sample size determination Sample size formulas depend on –Study design –Outcome measure Dichotomous.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
Introduction to inference Use and abuse of tests; power and decision IPS chapters 6.3 and 6.4 © 2006 W.H. Freeman and Company.
1 Lecture 19: Hypothesis Tests Devore, Ch Topics I.Statistical Hypotheses (pl!) –Null and Alternative Hypotheses –Testing statistics and rejection.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Lecture 16 Dustin Lueker.  Charlie claims that the average commute of his coworkers is 15 miles. Stu believes it is greater than that so he decides to.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Sample Size August, 2007 Charles E. McCulloch Professor and Head, Division of Biostatistics Department of Epidemiology and Biostatistics.
Statistical Power The power of a test is the probability of detecting a difference or relationship if such a difference or relationship really exists.
Sample Size Considerations for Answering Quantitative Research Questions Lunch & Learn May 15, 2013 M Boyle.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 4: Study Size and Power.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Chapter 20 Testing Hypothesis about proportions
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Medical Statistics as a science
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
© Copyright McGraw-Hill 2004
Sample Size Determination
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 13 Understanding research results: statistical inference.
Sample size Power Random allocation R.Raveendran.
Hypothesis Testing and Statistical Significance
C HAPTER 2  Hypothesis Testing -Test for one means - Test for two means -Test for one and two proportions.
How Many Subjects Will I Need? Jane C. Johnson Office of Research Support A.T. Still University of Health Sciences Kirksville, MO USA.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
What size of trial do I need?
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Hypothesis Testing: Hypotheses
Statistical significance using p-value
Interpreting Epidemiologic Results.
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Professor Dept. of Family & Community Medicine College.
Type I and Type II Errors
Statistical Power.
Presentation transcript:

Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology & Statistics Department of Medical & Molecular Genetics King’s College London With thanks to Irene Rebollo Mesa and Frühling Rijsdijk

Outline Power and Sample size2 1.Concepts of power 2.Power and types of error 3.Software to calculate power 4.Power for continuous outcome 5.Power for proportion, success/failure 6.Quiz!

Power and Sample size3 Planning a Study Question : What are the study endpoints? Types of Endpoints: Binary clinical outcome: Death from disease. Quantitative : Creatinine, cholesterol levels, QOL. Time to Event: Time to graft failure, time to death, time to recovery Good Qualities: -Clinically meaningful -Practical and feasible to measure -Occur frequently enough throughout the duration of the trial

4 Planning a Study Question : What is the expected prevalence of outcome (discrete) or variability of the outcome (continuous)? Based on previous studies, pilot study or hospital/NHS report. Variability and prevalence are vital for power. Both are best at intermediate levels. Question:What is the expected difference between groups in proportion of events (if discrete), or in mean measure (if continuous) Based on previous studies or pilot study Alternatively, minimum difference clinically relevant The larger the difference the higher the power Power and Sample size

5 Design: What is your Hypothesis 1.Superiority Objective  To determine whether there is evidence of statistical difference in the comparison of interest between two Tx regimes: A: Tx of InterestB: Placebo or Active control Tx H 0 : The two Txs have equal effect with respect to the mean response H 1 : The two Txs are different with respect to the mean response

6 Statistical Power Power and Sample size

7 Power Definition: The expected proportion of samples in which we decide correctly against the null hypothesis It depends on: 1.Size of the (treatment) effect in the population (  ) 2.The significance level at which we reject the null (0.05) 3.Sample size (N) 4.Design of the study: parallel or crossover etc. 5.Endpoint measurement (categorical, ordinal, continuous) 6.The expected dropout rate Power and Sample size

8 Power primer We summarise results of a trial in a statistical analysis with a test statistic (e.g. chi-squared, Z score) Provide a measure of support for a certain hypothesis Pre-determine threshold on test statistic to reject null hypothesis Test statistic Inevitably leads to two types of mistake : false positive (YES instead of NO)(Type I) false negative (NO instead of YES) (Type II) YES OR NO decision-making : significance testing YES NO Power and Sample size

9 T alpha 0.05 Sampling distribution if H A were true Sampling distribution if H 0 were true   POWER: 1 -  Standard Case Power and Sample size

10 Rejection of H 0 Non-rejection of H 0 H 0 true H A true Power and Sample size Power 1-type II error = 1-β Type II error = β Signifcance level Type I error = α

11 Hypothesis testing Null hypothesis : no effect A ‘significant’ result means that we can reject the null hypothesis A ‘non-significant’ result means that we cannot reject the null hypothesis Power and Sample size

12 Statistical significance The ‘p-value’ The probability of a false positive error if the null were in fact true Typically, we are willing to incorrectly reject the null 5% or 1% of the time (Type I error) Power and Sample size

13 Rejection of H 0 Non-rejection of H 0 H 0 true H A true Power and Sample size Power 1-type II error = 1-β Type II error = β Signifcance level Type I error = α

14 Rejection of H 0 Non-rejection of H 0 H 0 true H A true Nonsignificant result (1-  ) Type II error at rate  Significant result (1-  ) Type I error at rate  Power and Sample size

15 T alpha 0.05 Sampling distribution if H A were true Sampling distribution if H 0 were true   POWER: 1 -  Standard Case Power and Sample size

16 T   POWER: 1 -  ↑ Increased effect size alpha 0.05 Sampling distribution if H A were true Sampling distribution if H 0 were true Power and Sample size

17 T   More conservative α alpha 0.01 Sampling distribution if H A were true Sampling distribution if H 0 were true POWER: 1 -  ↓ Power and Sample size

18   Less conservative α alpha 0.1 Sampling distribution if H A were true Sampling distribution if H 0 were true POWER: 1 -  ↑ Power and Sample size

19 T alpha 0.05 Sampling distribution if H A were true Sampling distribution if H 0 were true   Reduced variation Power and Sample size POWER: 1 -  ↑

20 Determining Sample Size We need: – Acceptable type I error rate (  ), usually 0.05, or if one sided –A meaningful difference  in the response: the smallest Tx effect clinically worth detecting / that we wish to detect –The desirable power (1-  to detect this difference, min. 80% –Ratio of allocation to the groups (equal sample sizes?) –Whether to use one-sided or two-sided test In addition, –The variability common to the two populations for continuous endpoint –The response (event) rate of the control group for the binary endpoint Power and Sample size

21 Power and Sample size Calculating power using software or Web -PRISM StatMate ($50) -G*Power 3 (Free) -Statistical software: SPSS, SAS, Stata, R -PS Power and Sample size Calculation (free) (Windows) -Web: Google “Statistical Power Calculation” -Russell V. Lenth - -David Schoenfeld - -Perform calculation in two methods – similar answers

22 Statistical Considerations Russ Lenth’s Power and Sample size page

23 Statistical Considerations

24 Determining Sample Size: Continuous outcome Two Anti-Hypertensives: –Testing for superiority Endpoint: Difference in Diastolic BP –Continuous variable Relevant parameters –Difference in Diastolic BP between drugs:  =2 mm Hg –Standard deviation of Diastolic BP in each group:  = 10 mm Hg –Significance level: 0.05 –Required power: 0.8 –Assume equal sized groups Calculate sample size required Power and Sample size 393 patients in each group

25 Power and Sample size Russ Lenth’s Power and Sample size page

26 Power and Sample size

27 Power, by difference between two groups Statistical Considerations

28 Continuous outcome: PowerSignificance level Sample size (equal in each group, fixed ratio? ) Difference in means Standard error (equal in each group) Power and Sample size

29 Power and Sample size Determining Sample Size: Discrete Example APT070 perfusion vs. cold storage of kidney Testing for superiority Endpoint: Delayed Graft Function after transplantation Proportion of patients experiencing delayed graft Relevant parameters Baseline prevalence: 35% Minimum difference clinically significance, 10% p1=0.35, p2=0.25 [proportion with delayed graft function in each group] Significance level  =0.05 Power = 80% Calculate sample size required 349 patients in each group

30 Power and Sample size Russ Lenth’s Power and Sample size page

31 Power and Sample size With 349 patients on treatment A and 349 patients on treatment B there will be a 0% chance of detecting a significant difference at a two sided 0.05 significance level. This assumes that the response rate of treatment A is 0.35 and the response rate of treatment B is With 349 patients on treatment A and 349 patients on treatment B there will be a 80% chance of detecting a significant difference at a two sided 0.05 significance level. This assumes that the response rate of treatment A is 0.35 and the response rate of treatment B is 0.25.

32 PowerSignificance level Sample size Equal in each group? Fixed ratio? Proportion responding in Group 2 Proportion responding in Group 1 Power and Sample size Discrete outcome

33 How to use power calculations Use power prospectively for planning future studies –Determine an appropriate sample size –Evaluating a planned study – will it yield useful information? Put science before statistics. –Use effect sizes that are clinically relevant –Don’t get distracted by statistical considerations Perform a pilot study –Helps establish procedures, understand and protect against the unexpected –Gives variance estimates needed in determining sample size Power and Sample size

34 Power and Sample size 1.Superiority 2.Equivalence: Objective  To demonstrate that two treatments have no clinically meaningful difference H 0 : The two Txs effects are different with respect to the mean response H 1 : The two Txs are equal with respect to the mean response Design: What is your Hypothesis? d = largest difference clinically acceptable

35 Power and Sample size 3.Non-Inferiority: Objective  To demonstrate that a given treatment is not clinically inferior to another H 0 : A given Tx is inferior with respect to the mean response H 1 : A given Tx is non-inferior with respect to the mean response Design: What is your Hypothesis?

36 QUIZ Assume 80% Power, α = 0.05, two-sided (x) more with A (y) more with B (z) the same Study AStudy B 1. Mortality 20% vs 10% 20% vs 15% 2. Mortality 20% vs 10%40% vs 30% 3. Diastolic BP80 vs 85 mmHg 90 vs 95 mmHg St. dev 10St dev Diastolic BP80 vs 85 mmHg 80 vs 85 mmHg St. dev 10St dev 8 A B (x) more with A (y) more with B (z) the same (x) more with A (y) more with B (z) the same (x) more with A (y) more with B (z) the same How many subjects? Which study needs largest sample size? Power and Sample size

37 1. B 2. B 3. Same 4. A ANSWERS Bigger effect size in A (doubling of survival. Smaller effect, larger sample size needed to detect Small difference need more subjects Only standard deviation matters Bigger standard deviation more subjects Power and Sample size