SAMPLE SIZE AND POWER CALCULATION

Slides:

Advertisements

Similar presentations

Tests of Hypotheses Based on a Single Sample

Advertisements

Sample size estimation

Inference Sampling distributions Hypothesis testing.

Statistical Issues in Research Planning and Evaluation

Estimation of Sample Size

Review: What influences confidence intervals?

EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.

Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.

10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.

1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.

Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~

Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.

Inferences About Process Quality

BCOR 1020 Business Statistics

Today Concepts underlying inferential statistics

Sample Size Determination

Sample Size Determination Ziad Taib March 7, 2014.

Inferential Statistics

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.

Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.

Introduction to Hypothesis Testing

Testing Hypotheses.

AM Recitation 2/10/11.

Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.

Overview of Statistical Hypothesis Testing: The z-Test

Chapter 10 Hypothesis Testing

Confidence Intervals and Hypothesis Testing - II

1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.

HYPOTHESIS TESTING Dr. Aidah Abu Elsoud Alkaissi

Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.

Statistical Techniques I

Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.

Fundamentals of Hypothesis Testing: One-Sample Tests

Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.

Statistical Analysis Statistical Analysis

Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.

More About Significance Tests

CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.

Chapter 8 Hypothesis Testing I. Chapter Outline  An Overview of Hypothesis Testing  The Five-Step Model for Hypothesis Testing  One-Tailed and Two-Tailed.

Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.

Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.

Chapter 8 Introduction to Hypothesis Testing

Inference and Inferential Statistics Methods of Educational Research EDU 660.

1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.

Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.

Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.

Issues concerning the interpretation of statistical significance tests.

Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.

© Copyright McGraw-Hill 2004

Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.

Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,

Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.

Chapter 13 Understanding research results: statistical inference.

Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.

Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.

Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.

Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.

Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.

Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.

SECTION 1 TEST OF A SINGLE PROPORTION

Chapter 9 Introduction to the t Statistic

Logic of Hypothesis Testing

How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.

Understanding Results

SAMPLE SIZE AND POWER CALCULATION

Type I and Type II Errors

Presentation transcript:

SAMPLE SIZE AND POWER CALCULATION YEDITEPE UNIVERSITY SAMPLE SIZE AND POWER CALCULATION Assist. Prof. E. Çiğdem Kaspar,Ph.D. Yeditepe University, Faculty of Medicine, Department of Biostatistics Turkey

Power and Sample Size Statistical studies (surveys, experiments, observational studies, etc.) are always better when they are carefully planned. Good planning has many aspects. The problem should be carefully defined and operationalized. Experimental or observational units must be selected from the appropriate population. The study must be randomized correctly. The procedures must be followed carefully. Reliable instruments should be used to obtain measurements. Finally, the study must be of adequate size, relative to the goals of the study.

Power and Sample Size Statistical significance and biological significance are not the same thing. For example, given a large enough sample size, any statistical hypothesis test is likely to be statistically significant, almost regardless of the biological importance of the results. Conversely, when the sample size is small, biologically interesting phenomena may be missed because statistical tests are unlikely to yield statistically significant results.

Power and Sample Size It is important not to use too many experimental units in an experiment because it costs money, time and effort, and it is unethical. Conversely, if too few experimental unit are used the experiment may be unable to detect a clinically or scientifically important response to the treatment. This also wastes resources and could have serious consequences, particularly in safety assessment. We need to avoid making either of these mistakes

Minimising statistical errors The null hypothesis In a controlled experiment the aim is usually to compare two or more means (or sometimes medians or proportions). We normally set up a “null hypothesis” that there is no difference between the means, and the aim of our experiment is to disprove that null hypothesis.

Minimising statistical errors However, as a result of inter-individual variability we may make a mistake. If we fail to find a true difference, then we have a false negative result, also known as a type II or b error. Conversely, if we think that there is a difference when in fact it is just due to chance, then we have a false positive, Type I, or a error. These are shown in the table below Experimental conclusion State of nature Accept null hypothesis Reject null hypothesis Null hypothesis true Correct conclusion Type I or a error Null hypothesis false Type II or b error

Power analysis and the control of statistical errors We can control type I errors because we can estimate the probability that the means could differ to a given degree knowing the sample sizes and the degree of variability (and making some assumptions about the distribution of the data). If it is highly unlikely that they came from the same population, we reject the null hypothesis and assume that the treatment has had an effect. The probability of a type I error is usually we set it at 0.05, or 5%. For every 100 experiments we would expect, on average five type I errors to be made. We don’t usually set it much lower than this because that will increase the probability of a type II error.

Power analysis and the control of statistical errors Type II errors are more difficult to control. False negative results occur when there is excessive variation (“noise”) or there is only a small response to the treatment (a low “signal”). We can specify the probability of a type II error or the statistical power (one minus the type II error) if we use a power analysis.

Power = 1-β = P( reject H0 | H1 true ) Power Analysis Statistical power is defined as the probability of rejecting the null hypothesis while the alternative hypothesis is true. Power = 1-β = P( reject H0 | H1 true ) Power analysis can be used to determine whether the experiment had a good chance of producing a statistically significant result if a biologically significant difference existed in the population. In research, statistical power is generally calculated for two purposes. It can be calculated before data collection based on information from previous research to decide the sample size needed for the study. It can also be calculated after data analysis. It usually happens when the result turns out to be non-significant. In this case, statistical power is calculated to verify whether the non-significant result is due to really no relation in the sample or due to a lack of statistical power.

Variables involved in a power analysis The effect size of scientific interest (the signal) This is the magnitude of response to the treatment likely to be of scientific or clinical importance. It has to be specified by the investigator. Alternatively, if the experiment has already been done it is the actual response (difference between treated and control means) The variability among experimental units (the noise) This is the standard deviation of the character of interest. It has to come from a previous study or the literature as the experiment has not yet been done The power of the proposed experiment This is 1-b where b is the probability of a type II error. This also has to be specified by the investigator. It is often set at 0.8 to 0.9 (80 or 90%) The alternative hypothesis The null hypothesis is that the means of the two groups do not differ. The alternative hypothesis may be that they do differ (two sided), or that they differ in a particular direction (one sided) The significance level As previously explained, this is usually set at 0.05 The sample size This is the number in each group. It is usually what we want to estimate. However, we sometimes have only a fixed number of subjects in which case the power analysis can be used to estimate power or effect size.

Power Analysis For most common statistical tests, power is easily calculated from tables, or using statistical computer software. Power formula depends on study design it is not hard, but can be very algebra intensive Researcher may want to use a computer program or statistician As an example of hand calculation; Given that a researcher has the null hypothesis that μ=μ0 and alternative hypothesis that μ=μ1≠ μ0, and that the population variance is known as σ2. Also, he knows that he wants to reject the null hypothesis at a significance level of α which gives a corresponding Z score, called it Zα/2. Therefore, the power function will be, P{Z> Zα/2 or Z< -Zα/2|μ1}=1-Φ[Zα/2-(μ1-μ0)/(σ/n)]+Φ[-Zα/2-(μ1-μ0)/(σ/n)].

More subjects  higher power Power is Effected by Statistical power is positively correlated with the sample size, which means that given the level of the other factors, a larger sample size gives greater power. More subjects  higher power Variation in the outcome (σ2) ↓ σ2 → power ↑ Significance level (α) ↑ α → power ↑ Difference (effect) to be detected (δ) ↑ δ → power ↑ One-tailed vs. two-tailed tests Power is greater in one-tailed tests than comparable two-tailed tests

Power Analysis After plugging in the required information, a researcher can get a function that describes the relationship between statistical power and sample size and the researcher can decide which power level they prefer with the associated sample size. The choice of sample size may also be constrained by factors such as the financial budget the researcher is faced with. But generally consultants would like to recommend that the minimum power level is set to be 0.80. The researchers must have some information before they can do the power and sample size calculation. The information includes previous knowledge about the parameters (their means and variances) and what confidence or significance level is needed in the study.

Mean Systolic Blood Pressure Application-1 The following results are from a pilot study done on 29 women, all 35–39 years old Of is particular interest is whether Oral Contraceptive use is associated with higher blood pressure. Simulated Sample Data n Mean Systolic Blood Pressure Standard Deviation Oral Contraceptive users 8 132.8 15.3 Non-OC users 21 127.4 18.2

Application-1 The sample mean difference in blood pressure is 132.8-127.4=5.4 This could be considered scientifically significant, however, the result is not significant at α=0,05 level This OC/Blood pressure study has power 0,106 to detect the a difference in blood pressure of 5.4 or more, if this difference truly exists in the population of women 35-39 years old. When power is too low, it is diffucult to determine whether there is no statistical difference in population means or we just could not detect it.

Application-1 Power Changes n = 29, 2 sample test, 11% power, δ=5,4, σ = 17,49, α = 0.05, 2-sided test Variance/Standard deviation σ: 17,49 → 4,5 Power: 11% → 80% σ: 17,49 → 20 Power: 11% → 9% Significance level (α) α : 0.05 → 0.01 Power: 11% → 3% α : 0.05 → 0.10 Power: 11% → 18%

Application-1 Power Changes n = 29, 2 sample test, 11% power, δ=5,4, σ = 17,49, α = 0.05, 2-sided test Difference to be detected (δ) δ : 5,4 → 3 Power: 11% → 7% δ : 5,4 → 7 Power: 11% → 15% Sample size (n) n: 29 → 58 Power: 11% →17% n: 29→ 25 Power: 11% → 10%

Sample Size In a research study, a statistical test is applied to determine whether or not there is a significant difference between the means or proportions observed in the comparison groups. Before undertaking a study, the investigator should first determine the minimum number of subjects (i.e., sample size estimation) that must be enrolled in each group in order that the null hypothesis can be rejected if it is false.

Sample Size Sample size estimations are warranted in all clinical studies for both ethical and scientific reasons. The ethical, reasons pertain to the risks of enrolling either an inadequate number of subjects or more subject's than the minimum necessary to reject the null hypothesis. In both instances, the risks include randomizing the care of subjects and/or exposing them to unnecessary risk/harm. The scientific reasons pertain to the enrollment of more subjects than necessary because it extends the duration of and increases the costs of clinical research studies.

Sample Size Study design depends on; Variables of interest type of data e.g. continuous, categorical Desired power Desired significance level Effect/difference of clinical importance Standard deviations of continuous outcome variables One or two-sided tests

Tools to Calculate Sample Size Formula General formula: these can be complex Quick formula: for particular power and significance levels and specified tests Special Tables for different tests Altman’s Nomogram Computer Software

Application-2 Study effect of new sleep aid 1 sample test Baseline to sleep time after taking the medication for one week Two-sided test, α = 0.05, power = 90% Difference = 1 (4 hours of sleep to 5) Standard deviation = 2 hr Sample size can be calculated as follow:

Application-2 Change Effect or Difference Change Power Change difference of interest from 1hr to 2 hr n goes from 43 to 11 Change Power Change power from 90% to 80% n goes from 11 to 8

Application-2 Change Standard Deviation Change the standard deviation from 2 to 3 n goes from 8 to 18

Application-2 Changes in the detectable difference have HUGE impacts on sample size 20 point difference → 25 patients/group 10 point difference → 100 patients/group 5 point difference → 400 patients/group Changes in α, β, σ, number of samples, if it is a 1- or 2-sided test can all have a large impact on your sample

Conclusion Sample-size planning is often important, and almost always difficult. It requires care in eliciting scientific objectives and in obtaining suitable quantitative information prior to the study. Successful resolution of the sample-size problem requires the close and honest collaboration of statisticians and subject-matter experts. Power and sample size analysis based on pilot data give valuable information on the performance of the experiment and can thereby guide further decisions on experimental design.

Conclusion Researchers can use these calculations as a tool to increase the strength of their inferences, and editors and reviewers to demand that statistical power be reported in all cases where a non-significant result is obtained.