Sample size calculation

Sample size calculation
Introduction to Intervention Epidemiology Tunis, 3 November 2014 Ahmed Zaghloul Ministry of Health and Population - Egypt

Learning objectives To understand: Why do we calculate the sample size
Types of sample size calculations Parameters needed to calculate a sample size How to calculate sample size

Statistical inference
Generalisation to the population Conclusions based on the sample Population Hypotheses Sample Source of pictures: CLKER Dorsten Airebouroughnf

How to be confident? What is the proportion of females among participants of the course? Answers? Are you confident/sure?

How did the blind men saw an elephant

Issues in sample size calculation
The objective is to estimate the minimum sample needed to measure the factor of interest with the accepted confidence level The sample size calculation should relate to the study's primary outcome variable Is there secondary outcomes? How to consider them when calculating? The decision on sample sizes remains a trade-off between ideal study size and resources Issues in sample size calculation NCD survey examples

Types of Sample Size Calculation
Descriptive Study: Proportion Mean Analytical Study: Cohort/Cross Sectional Case Control

Sample size for proportion
Population size (N) Expected proportion/prevalence Literature Experts Confidence level (95%) Desired precision (+/-)

Probability sampling (or random sampling)
Equation for Probability Sampling How big a sample do we need to be 95% sure that the sample proportion for females is within =- 3 or the proportion of males

Probability sampling (or random sampling)
Equation for Probability Sampling Probability theory applies Calculation of random error is possible Simple random sampling 95% CI = p ± 1,96 · √ p·(1-p) / n How big a sample do we need to be 95% sure that the sample proportion for females is within =- 3 or the proportion of males 1,962 · = p · (1-p) n d2 SE standand error or sampling error

Sample size calculation: simple random sampling
Equation for Probability Sampling Simple random sampling (from large population): 95% CI = p ± 1,96 · √ p·(1-p) / n d 1,96 · = √ p · (1-p) / n d Solve for n: 1,962 · = p · (1-p) n d2 = 3,842 · 90 · 10 9 = 384 Τύποι σε slide: χωρίς finite population correction. n: sample size p: proportion d: precision of estimate Example: - result 90% (p) - precision of estimate ±3% (d) [with 95% confidence]

Software to Calculate Sample Size
OpenEpi :

Calculation using OpenEpi
Population size – what is the size of the population to be surveyed? The default value is one million. In general, when sampling from a “large” population, whether there are 100,000 in the population or 100 million, this will not affect the sample size calculation. However, when the population size is small, then this could reduce the sample size. Anticipated % frequency (p) – provide an educated guess of the percent of the population with the outcome of interest. Of course, if you knew this with any accuracy, you would not need to perform the survey. If you are unsure of the percentage, use of 50% will result in the largest sample size (holding the other three pieces of information being requested the same). Confidence limits as +/- percent of 100 – this question is asking exactly how wide (in absolute terms) you would like the confidence interval to be around your point estimate. Using the default values, with an anticipated frequency of 50% and confidence limits as +5%, for the calculated sample size the confidence interval would be 50% +5%, i.e., (45%, 55%). With an anticipated frequency of 85% with confidence limits of +10%, for the calculated sample size the confidence interval would be (75%, 95%). Design Effect – if simple random sampling is to be used to select individuals (or whatever the element of analysis), then the design effect (DEFF) should be left as one. If a cluster-type survey is being used in the sampling methodology, the DEFF is frequently larger than one, perhaps in the range of 2 to 10 depending on the outcome being studied. An estimate of the DEFF can usually be found in the literature.

Examples For a population of 1 million, a confidence level of 95%, a confidence limit of ±5, a design effect of 1.0 and a change in expected proportion of primary outcome as follows: Expected Proportion Sample Size 10% 30% 50% 70% 90% Different expected proportions among populations ( ) % for a Confidence Interval of 95% 10% = 139 30% = 323 50% = 384

Determinants of sample size
Sample size is determined by various factors like: Significance level (1-α) Power (1-β) Expected prevalence RR, OR of outcome of interest It depends on the type of study Descriptive or Analytical

Individual on Trial (Did he commit the crime?)
Person Jury Innocent Guilty Not Guilty Correct decision Error Type II error () Individual on Trial, Did he commit the crime? Truth ( innocent, Guilty) Jury ( Not guilty, Guilty) Guilty Error Type I error (-error) Correct decision

Significance testing - Hypothesis
Null hypothesis (H0) Any difference is due to chance There is no significant difference between specified populations, any observed difference being due to sampling or experimental error. Alternative hypothesis (H1) When rejecting the Null hypothesis because there is a true difference (difference is not due to chance

Right decisions and types of errors
Truth Decision H0 is true H0 is not true H0 is not rejected Correct decision Type II error ( error) Individual on Trial, Did he commit the crime? Truth ( innocent, Guilty) Jury ( Not guilty, Guilty) H0 is rejected Type I error (-error) Correct decision 28

Significance level (p value)
α : error type I Probability of finding a difference when no difference exists Usually set to 5% (0.05) Significance level: (1- α): 95% p-value: < 0.05 P = Probability of Error

Power (1-β) β : error type II Power: (1-β) Usually set to 80%
Probability of not finding a difference, when a difference really does exist Power: (1-β) Probability of finding a difference when a difference really does exist (=sensitivity) Usually set to 80%

The four values required for a sample size calculation are: · Two-sided confidence level – most individuals would choose a 95% confidence interval, but a different confidence interval could be entered. · Power – most individuals choose a power value of 80% or 90%, however, any power level can be entered. · Ratio of Unexposed to Exposed in sample – place the desired ratio of unexposed individuals to exposed individuals. If there are to be an equal number of unexposed and exposed, then enter the value of 1.0; if there are to be twice as many unexposed as exposed, enter the value of 2.0. Any other ratio can be entered. · Percent of Unexposed with Outcome – enter an estimate of the percentage of unexposed individuals that will develop (or have) the outcome of interest. For example, in a randomized control trial, you would estimate the percentage of those in the comparison group that will develop the outcome of interest during the trial. In a cohort study, enter the percentage of unexposed individuals who will develop the outcome of interest during the study. In a cross-sectional study, enter the estimated prevalence of disease among the unexposed. The user has the choice of entering an odds ratio or the percent of exposed with the outcome of interest or the risk (prevalence) ratio or the risk (prevalence) difference - just enter one of these. The results using the default values for a risk ratio of 2 are below:

Sample size for cohort desired level of significance (α) – 0.05
power of the study (1-β) – 80% relative risk (RR) worth detecting expected frequency/prevalence of disease in unexposed population (p) ratio of unexposed to exposed

Formule Cohort n = number of exposed or non exposed
N = 2*n , the total sample size p = incidence of the disease in the cohort q = 1 – p p0 = incidence of thet disease among the non-exposed RR = relative ratio (risk) εα = value for specified alpha ε2β = value for specified power

Sample size for case-control
desired level of significance (α) power of the study (1-β) odds ratio (OR) ratio worth detecting Prevalence of exposed persons in source population number of cases number of controls per case (ratio of controls to cases)

Formule Case control N = number of cases
p = Proportion of exposed in the whole sample q = 1 – p c = number of controls per case P0 = proportion of exposed among the controls P1 = proportion of exposed among the cases

The four values required for a sample size calculation are: · Two-sided confidence level – most individuals would choose a 95% confidence interval, but a different confidence interval could be entered. · Power – most individuals choose a power value of 80% or 90%, however, any power level can be entered. · Ratio of Controls to Cases – place the desired ratio of controls to cases. If there are to be an equal number of controls and cases, then enter the value of 1.0; if there are to be twice as many controls as cases, enter the value of 2.0. Any other ratio can be entered. · Percent of controls exposed – enter an estimate of the percentage of controls that have (or had) the exposure of interest. For example, in a case-control study on cigarette smoking and lung cancer, among the controls (those without lung cancer), what percent would be expected to say they smoked cigarettes? The user has the choice of entering an odds ratio or the percent of cases with the exposure of interest – just enter one of these, not both

Effect of sample size Objective:
To determine if exposure to Oysters is related to Hepatitis A infection 3 cohort studies: Study A (N=10) Study B (N=100) Study C (N=1000)

Cohort Study A (N=10) Table A (N=10) Became ill Yes Total AR
Ate oysters 3 5 3/5 No 2 2/5 10 5/10 Table A (N=10) Calculate the relative risk, confidence interval and p value RR=1.5, 95% CI: , p=0.53

Cohort Study B (N=100) Table B (N=100) Became ill Yes Total AR
Ate oysters 30 50 30/50 No 20 20/50 100 50/100 Table B (N=100) RR=1.5, 95% CI: , p=0.046

Cohort Study C (N=1000) Table C (N=1000) Became ill Yes No AR
Ate oysters 300 500 300/500 200 200/500 Total 1000 500/1000 Table C (N=1000) RR=1.5, 95% CI: , p<0.001

Sample size and power All studies RR=1.5 Study A (N=10)
AR exposed: 60% AR non exposed: 40% Study A (N=10) No significant association with oysters (p=0.53) Study B (N=100) Significant association with oysters (p=0.046) Study C (N=1000) Significant association with oysters (p<0.001) Sample size and power

Effect of sample size The larger the sample, the higher the precision
% ill

Sample size (What else to include?)
Need to increase the initial numbers in accordance with: The expected response rate The loss to follow-up The lack of compliance

Conclusions Sample calculation Sampling methods
Minimum sample needed to measure the factor of interest Generalization in the population at reduced cost and time (inference) Consider response rate, loss to follow up Sampling methods Use PC, but understand the variables of the equations To ensure respresentativeness (and be able to generalize our estimates in the population) sample size calculation and sampling methods are essential.

Conclusions If in doubt… Call a statistician !!!!

Ministry of Health and Population - Egypt
Thank you Ahmed Zaghloul Ministry of Health and Population - Egypt

Sample size calculation

Similar presentations

Presentation on theme: "Sample size calculation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sample size calculation

Similar presentations

Presentation on theme: "Sample size calculation"— Presentation transcript:

Similar presentations

About project

Feedback