Download presentation
Presentation is loading. Please wait.
1
SAMPLE SIZE AND POWER CALCULATION
YEDITEPE UNIVERSITY SAMPLE SIZE AND POWER CALCULATION Assist. Prof. E. Çiğdem Kaspar,Ph.D. Yeditepe University, Faculty of Medicine, Department of Biostatistics Turkey
2
Power and Sample Size Statistical studies (surveys, experiments, observational studies, etc.) are always better when they are carefully planned. Good planning has many aspects. The problem should be carefully defined and operationalized. Experimental or observational units must be selected from the appropriate population. The study must be randomized correctly. The procedures must be followed carefully. Reliable instruments should be used to obtain measurements. Finally, the study must be of adequate size, relative to the goals of the study.
3
Power and Sample Size Statistical significance and biological significance are not the same thing. For example, given a large enough sample size, any statistical hypothesis test is likely to be statistically significant, almost regardless of the biological importance of the results. Conversely, when the sample size is small, biologically interesting phenomena may be missed because statistical tests are unlikely to yield statistically significant results.
4
Power and Sample Size It is important not to use too many experimental units in an experiment because it costs money, time and effort, and it is unethical. Conversely, if too few experimental unit are used the experiment may be unable to detect a clinically or scientifically important response to the treatment. This also wastes resources and could have serious consequences, particularly in safety assessment. We need to avoid making either of these mistakes
5
Minimising statistical errors
The null hypothesis In a controlled experiment the aim is usually to compare two or more means (or sometimes medians or proportions). We normally set up a “null hypothesis” that there is no difference between the means, and the aim of our experiment is to disprove that null hypothesis.
6
Minimising statistical errors
However, as a result of inter-individual variability we may make a mistake. If we fail to find a true difference, then we have a false negative result, also known as a type II or b error. Conversely, if we think that there is a difference when in fact it is just due to chance, then we have a false positive, Type I, or a error. These are shown in the table below Experimental conclusion State of nature Accept null hypothesis Reject null hypothesis Null hypothesis true Correct conclusion Type I or a error Null hypothesis false Type II or b error
7
Power analysis and the control of statistical errors
We can control type I errors because we can estimate the probability that the means could differ to a given degree knowing the sample sizes and the degree of variability (and making some assumptions about the distribution of the data). If it is highly unlikely that they came from the same population, we reject the null hypothesis and assume that the treatment has had an effect. The probability of a type I error is usually we set it at 0.05, or 5%. For every 100 experiments we would expect, on average five type I errors to be made. We don’t usually set it much lower than this because that will increase the probability of a type II error.
8
Power analysis and the control of statistical errors
Type II errors are more difficult to control. False negative results occur when there is excessive variation (“noise”) or there is only a small response to the treatment (a low “signal”). We can specify the probability of a type II error or the statistical power (one minus the type II error) if we use a power analysis.
9
Power = 1-β = P( reject H0 | H1 true )
Power Analysis Statistical power is defined as the probability of rejecting the null hypothesis while the alternative hypothesis is true. Power = 1-β = P( reject H0 | H1 true ) Power analysis can be used to determine whether the experiment had a good chance of producing a statistically significant result if a biologically significant difference existed in the population. In research, statistical power is generally calculated for two purposes. It can be calculated before data collection based on information from previous research to decide the sample size needed for the study. It can also be calculated after data analysis. It usually happens when the result turns out to be non-significant. In this case, statistical power is calculated to verify whether the non-significant result is due to really no relation in the sample or due to a lack of statistical power.
10
Variables involved in a power analysis
The effect size of scientific interest (the signal) This is the magnitude of response to the treatment likely to be of scientific or clinical importance. It has to be specified by the investigator. Alternatively, if the experiment has already been done it is the actual response (difference between treated and control means) The variability among experimental units (the noise) This is the standard deviation of the character of interest. It has to come from a previous study or the literature as the experiment has not yet been done The power of the proposed experiment This is 1-b where b is the probability of a type II error. This also has to be specified by the investigator. It is often set at 0.8 to 0.9 (80 or 90%) The alternative hypothesis The null hypothesis is that the means of the two groups do not differ. The alternative hypothesis may be that they do differ (two sided), or that they differ in a particular direction (one sided) The significance level As previously explained, this is usually set at 0.05 The sample size This is the number in each group. It is usually what we want to estimate. However, we sometimes have only a fixed number of subjects in which case the power analysis can be used to estimate power or effect size.
11
Power Analysis For most common statistical tests, power is easily calculated from tables, or using statistical computer software. Power formula depends on study design it is not hard, but can be very algebra intensive Researcher may want to use a computer program or statistician As an example of hand calculation; Given that a researcher has the null hypothesis that μ=μ0 and alternative hypothesis that μ=μ1≠ μ0, and that the population variance is known as σ2. Also, he knows that he wants to reject the null hypothesis at a significance level of α which gives a corresponding Z score, called it Zα/2. Therefore, the power function will be, P{Z> Zα/2 or Z< -Zα/2|μ1}=1-Φ[Zα/2-(μ1-μ0)/(σ/n)]+Φ[-Zα/2-(μ1-μ0)/(σ/n)].
12
More subjects higher power
Power is Effected by Statistical power is positively correlated with the sample size, which means that given the level of the other factors, a larger sample size gives greater power. More subjects higher power Variation in the outcome (σ2) ↓ σ2 → power ↑ Significance level (α) ↑ α → power ↑ Difference (effect) to be detected (δ) ↑ δ → power ↑ One-tailed vs. two-tailed tests Power is greater in one-tailed tests than comparable two-tailed tests
13
Power Analysis After plugging in the required information, a researcher can get a function that describes the relationship between statistical power and sample size and the researcher can decide which power level they prefer with the associated sample size. The choice of sample size may also be constrained by factors such as the financial budget the researcher is faced with. But generally consultants would like to recommend that the minimum power level is set to be 0.80. The researchers must have some information before they can do the power and sample size calculation. The information includes previous knowledge about the parameters (their means and variances) and what confidence or significance level is needed in the study.
14
Mean Systolic Blood Pressure
Application-1 The following results are from a pilot study done on 29 women, all 35–39 years old Of is particular interest is whether Oral Contraceptive use is associated with higher blood pressure. Simulated Sample Data n Mean Systolic Blood Pressure Standard Deviation Oral Contraceptive users 8 132.8 15.3 Non-OC users 21 127.4 18.2
15
Application-1 The sample mean difference in blood pressure is =5.4 This could be considered scientifically significant, however, the result is not significant at α=0,05 level This OC/Blood pressure study has power 0,106 to detect the a difference in blood pressure of 5.4 or more, if this difference truly exists in the population of women years old. When power is too low, it is diffucult to determine whether there is no statistical difference in population means or we just could not detect it.
16
Application-1 Power Changes
n = 29, 2 sample test, 11% power, δ=5,4, σ = 17,49, α = 0.05, 2-sided test Variance/Standard deviation σ: 17,49 → 4,5 Power: 11% → 80% σ: 17,49 → 20 Power: 11% → 9% Significance level (α) α : 0.05 → Power: 11% → 3% α : 0.05 → Power: 11% → 18%
17
Application-1 Power Changes
n = 29, 2 sample test, 11% power, δ=5,4, σ = 17,49, α = 0.05, 2-sided test Difference to be detected (δ) δ : 5,4 → 3 Power: 11% → 7% δ : 5,4 → 7 Power: 11% → 15% Sample size (n) n: 29 → 58 Power: 11% →17% n: 29→ 25 Power: 11% → 10%
18
Sample Size In a research study, a statistical test is applied to determine whether or not there is a significant difference between the means or proportions observed in the comparison groups. Before undertaking a study, the investigator should first determine the minimum number of subjects (i.e., sample size estimation) that must be enrolled in each group in order that the null hypothesis can be rejected if it is false.
19
Sample Size Sample size estimations are warranted in all clinical studies for both ethical and scientific reasons. The ethical, reasons pertain to the risks of enrolling either an inadequate number of subjects or more subject's than the minimum necessary to reject the null hypothesis. In both instances, the risks include randomizing the care of subjects and/or exposing them to unnecessary risk/harm. The scientific reasons pertain to the enrollment of more subjects than necessary because it extends the duration of and increases the costs of clinical research studies.
20
Sample Size Study design depends on; Variables of interest
type of data e.g. continuous, categorical Desired power Desired significance level Effect/difference of clinical importance Standard deviations of continuous outcome variables One or two-sided tests
21
Tools to Calculate Sample Size
Formula General formula: these can be complex Quick formula: for particular power and significance levels and specified tests Special Tables for different tests Altman’s Nomogram Computer Software
22
Application-2 Study effect of new sleep aid 1 sample test
Baseline to sleep time after taking the medication for one week Two-sided test, α = 0.05, power = 90% Difference = 1 (4 hours of sleep to 5) Standard deviation = 2 hr Sample size can be calculated as follow:
23
Application-2 Change Effect or Difference Change Power
Change difference of interest from 1hr to 2 hr n goes from 43 to 11 Change Power Change power from 90% to 80% n goes from 11 to 8
24
Application-2 Change Standard Deviation
Change the standard deviation from 2 to 3 n goes from 8 to 18
25
Application-2 Changes in the detectable difference have HUGE impacts on sample size 20 point difference → 25 patients/group 10 point difference → 100 patients/group 5 point difference → 400 patients/group Changes in α, β, σ, number of samples, if it is a 1- or 2-sided test can all have a large impact on your sample
26
Conclusion Sample-size planning is often important, and almost always difficult. It requires care in eliciting scientific objectives and in obtaining suitable quantitative information prior to the study. Successful resolution of the sample-size problem requires the close and honest collaboration of statisticians and subject-matter experts. Power and sample size analysis based on pilot data give valuable information on the performance of the experiment and can thereby guide further decisions on experimental design.
27
Conclusion Researchers can use these calculations as a tool to increase the strength of their inferences, and editors and reviewers to demand that statistical power be reported in all cases where a non-significant result is obtained.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.