Sample size calculation

Slides:



Advertisements
Similar presentations
Sample size calculation
Advertisements

Sample size calculation
II. Potential Errors In Epidemiologic Studies Random Error Dr. Sherine Shawky.
Sample size estimation
Hypothesis Testing making decisions using sample data.
Estimation of Sample Size
ProportionMisc.Grab BagRatiosIntro.
Nemours Biomedical Research Statistics March 19, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Basic Elements of Testing Hypothesis Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director, Data Coordinating Center College.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Sample Size and Statistical Power Epidemiology 655 Winter 1999 Jennifer Beebe.
HaDPop Measuring Disease and Exposure in Populations (MD) &
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Sample size calculation
Multiple Choice Questions for discussion
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
Hypothesis Testing Field Epidemiology. Hypothesis Hypothesis testing is conducted in etiologic study designs such as the case-control or cohort as well.
Measures of Association
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Unit 8 Section 8-1 & : Steps in Hypothesis Testing- Traditional Method  Hypothesis Testing – a decision making process for evaluating a claim.
Statistical Inference An introduction. Big picture Use a random sample to learn something about a larger population.
Compliance Original Study Design Randomised Surgical care Medical care.
A short introduction to epidemiology Chapter 6: Precision Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Chapter 9 Introduction to the t Statistic
Inference for a Single Population Proportion (p)
EPID 503 – Class 12 Cohort Study Design.
Chapter 9: Testing a Claim
Statistics 200 Lecture #9 Tuesday, September 20, 2016
CHAPTER 9 Testing a Claim
Significance testing and confidence intervals
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Inference and Tests of Hypotheses
Significance testing Introduction to Intervention Epidemiology
Dr.MUSTAQUE AHMED MBBS,MD(COMMUNITY MEDICINE), FELLOWSHIP IN HIV/AIDS
The binomial applied: absolute and relative risks, chi-square
CHAPTER 9 Testing a Claim
Chapter 9: Testing a Claim
Types of Errors Type I error is the error committed when a true null hypothesis is rejected. When performing hypothesis testing, if we set the critical.
Random error, Confidence intervals and P-values
Hypothesis Testing Is It Significant?.
CHAPTER 9 Testing a Claim
Comparing Two Proportions
Chapter 8: Inference for Proportions
Significance Tests: A Four-Step Process
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Hypotheses Hypothesis Testing
Daniela Stan Raicu School of CTI, DePaul University
Essential Statistics Introduction to Inference
Interpreting Basic Statistics
Chapter 9: Testing a Claim
Sample size.
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 9: Testing a Claim
CHAPTER 9 Testing a Claim
Psych 231: Research Methods in Psychology
Chapter 8: Estimating with Confidence
Interpreting Epidemiologic Results.
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Professor Dept. of Family & Community Medicine College.
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 9: Testing a Claim
Unit 5: Hypothesis Testing
Research Techniques Made Simple: Interpreting Measures of Association in Clinical Research Michelle Roberts PhD,1,2 Sepideh Ashrafzadeh,1,2 Maryam Asgari.
Chapter 9: Testing a Claim
Type I and Type II Errors
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Presentation transcript:

Sample size calculation Introduction to Intervention Epidemiology Tunis, 3 November 2014 Ahmed Zaghloul Ministry of Health and Population - Egypt ahmed.helmy@outlook.com

Learning objectives To understand: Why do we calculate the sample size Types of sample size calculations Parameters needed to calculate a sample size How to calculate sample size

Statistical inference Generalisation to the population Conclusions based on the sample Population Hypotheses Sample Source of pictures: CLKER http://www.clker.com/clipart-population.html; Dorsten Airebouroughnf http://aireboroughnf.files.wordpress.com/2013/06/questionnaire.jpg

How to be confident? What is the proportion of females among participants of the course? Answers? Are you confident/sure?

How did the blind men saw an elephant

Issues in sample size calculation The objective is to estimate the minimum sample needed to measure the factor of interest with the accepted confidence level The sample size calculation should relate to the study's primary outcome variable Is there secondary outcomes? How to consider them when calculating? The decision on sample sizes remains a trade-off between ideal study size and resources Issues in sample size calculation NCD survey examples

Types of Sample Size Calculation Descriptive Study: Proportion Mean Analytical Study: Cohort/Cross Sectional Case Control

Sample size for proportion Population size (N) Expected proportion/prevalence Literature Experts Confidence level (95%) Desired precision (+/-)

Probability sampling (or random sampling) Equation for Probability Sampling How big a sample do we need to be 95% sure that the sample proportion for females is within =- 3 or the proportion of males

Probability sampling (or random sampling) Equation for Probability Sampling Probability theory applies Calculation of random error is possible Simple random sampling 95% CI = p ± 1,96 · √ p·(1-p) / n How big a sample do we need to be 95% sure that the sample proportion for females is within =- 3 or the proportion of males 1,962 · = p · (1-p) n d2 SE standand error or sampling error

Sample size calculation: simple random sampling Equation for Probability Sampling Simple random sampling (from large population): 95% CI = p ± 1,96 · √ p·(1-p) / n d 1,96 · = √ p · (1-p) / n d Solve for n: 1,962 · = p · (1-p) n d2 = 3,842 · 90 · 10 9 = 384 Τύποι σε slide: χωρίς finite population correction. n: sample size p: proportion d: precision of estimate Example: - result 90% (p) - precision of estimate ±3% (d) [with 95% confidence]

Software to Calculate Sample Size OpenEpi : http://openepi.com/Menu/OE_Menu.htm

Calculation using OpenEpi Population size – what is the size of the population to be surveyed? The default value is one million. In general, when sampling from a “large” population, whether there are 100,000 in the population or 100 million, this will not affect the sample size calculation. However, when the population size is small, then this could reduce the sample size. Anticipated % frequency (p) – provide an educated guess of the percent of the population with the outcome of interest. Of course, if you knew this with any accuracy, you would not need to perform the survey. If you are unsure of the percentage, use of 50% will result in the largest sample size (holding the other three pieces of information being requested the same). Confidence limits as +/- percent of 100 – this question is asking exactly how wide (in absolute terms) you would like the confidence interval to be around your point estimate. Using the default values, with an anticipated frequency of 50% and confidence limits as +5%, for the calculated sample size the confidence interval would be 50% +5%, i.e., (45%, 55%). With an anticipated frequency of 85% with confidence limits of +10%, for the calculated sample size the confidence interval would be (75%, 95%). Design Effect – if simple random sampling is to be used to select individuals (or whatever the element of analysis), then the design effect (DEFF) should be left as one. If a cluster-type survey is being used in the sampling methodology, the DEFF is frequently larger than one, perhaps in the range of 2 to 10 depending on the outcome being studied. An estimate of the DEFF can usually be found in the literature.

Examples For a population of 1 million, a confidence level of 95%, a confidence limit of ±5, a design effect of 1.0 and a change in expected proportion of primary outcome as follows: Expected Proportion Sample Size 10% 30% 50% 70% 90% Different expected proportions among populations (10-30-50-70-90) % for a Confidence Interval of 95% 10% = 139 30% = 323 50% = 384

Determinants of sample size Sample size is determined by various factors like: Significance level (1-α) Power (1-β) Expected prevalence RR, OR of outcome of interest It depends on the type of study Descriptive or Analytical

Individual on Trial (Did he commit the crime?) Person Jury Innocent Guilty Not Guilty Correct decision Error Type II error () Individual on Trial, Did he commit the crime? Truth ( innocent, Guilty) Jury ( Not guilty, Guilty) Guilty Error Type I error (-error) Correct decision

Significance testing - Hypothesis Null hypothesis (H0) Any difference is due to chance There is no significant difference between specified populations, any observed difference being due to sampling or experimental error. Alternative hypothesis (H1) When rejecting the Null hypothesis because there is a true difference (difference is not due to chance

Right decisions and types of errors Truth Decision H0 is true H0 is not true H0 is not rejected Correct decision Type II error ( error) Individual on Trial, Did he commit the crime? Truth ( innocent, Guilty) Jury ( Not guilty, Guilty) H0 is rejected Type I error (-error) Correct decision 28

Significance level (p value) α : error type I Probability of finding a difference when no difference exists Usually set to 5% (0.05) Significance level: (1- α): 95% p-value: < 0.05 P = Probability of Error

Power (1-β) β : error type II Power: (1-β) Usually set to 80% Probability of not finding a difference, when a difference really does exist Power: (1-β) Probability of finding a difference when a difference really does exist (=sensitivity) Usually set to 80%

Calculation using OpenEpi The four values required for a sample size calculation are:   ·         Two-sided confidence level – most individuals would choose a 95% confidence interval, but a different confidence interval could be entered. ·         Power – most individuals choose a power value of 80% or 90%, however, any power level can be entered. ·         Ratio of Unexposed to Exposed in sample – place the desired ratio of unexposed individuals to exposed individuals.  If there are to be an equal number of unexposed and exposed, then enter the value of 1.0; if there are to be twice as many unexposed as exposed, enter the value of 2.0.  Any other ratio can be entered. ·         Percent of Unexposed with Outcome – enter an estimate of the percentage of unexposed individuals that will develop (or have) the outcome of interest.  For example, in a randomized control trial, you would estimate the percentage of those in the comparison group that will develop the outcome of interest during the trial.  In a cohort study, enter the percentage of unexposed individuals who will develop the outcome of interest during the study.  In a cross-sectional study, enter the estimated prevalence of disease among the unexposed. The user has the choice of entering an odds ratio or the percent of exposed with the outcome of interest or the risk (prevalence) ratio or the risk (prevalence) difference - just enter one of these. The results using the default values for a risk ratio of 2 are below:

Sample size for cohort desired level of significance (α) – 0.05 power of the study (1-β) – 80% relative risk (RR) worth detecting expected frequency/prevalence of disease in unexposed population (p) ratio of unexposed to exposed

Formule Cohort n = number of exposed or non exposed N = 2*n , the total sample size p = incidence of the disease in the cohort q = 1 – p p0 = incidence of thet disease among the non-exposed RR = relative ratio (risk) εα = value for specified alpha ε2β = value for specified power

Sample size for case-control desired level of significance (α) power of the study (1-β) odds ratio (OR) ratio worth detecting Prevalence of exposed persons in source population number of cases number of controls per case (ratio of controls to cases)

Formule Case control N = number of cases p = Proportion of exposed in the whole sample q = 1 – p c = number of controls per case P0 = proportion of exposed among the controls P1 = proportion of exposed among the cases

Calculation using OpenEpi The four values required for a sample size calculation are:   ·         Two-sided confidence level – most individuals would choose a 95% confidence interval, but a different confidence interval could be entered. ·         Power – most individuals choose a power value of 80% or 90%, however, any power level can be entered. ·         Ratio of Controls to Cases – place the desired ratio of controls to cases.  If there are to be an equal number of controls and cases, then enter the value of 1.0; if there are to be twice as many controls as cases, enter the value of 2.0.  Any other ratio can be entered. ·         Percent of controls exposed – enter an estimate of the percentage of controls that have (or had) the exposure of interest.  For example, in a case-control study on cigarette smoking and lung cancer, among the controls (those without lung cancer), what percent would be expected to say they smoked cigarettes? The user has the choice of entering an odds ratio or the percent of cases with the exposure of interest – just enter one of these, not both

Effect of sample size Objective: To determine if exposure to Oysters is related to Hepatitis A infection 3 cohort studies: Study A (N=10) Study B (N=100) Study C (N=1000)

Cohort Study A (N=10) Table A (N=10) Became ill Yes Total AR Ate oysters 3 5 3/5 No 2 2/5 10 5/10 Table A (N=10) Calculate the relative risk, confidence interval and p value RR=1.5, 95% CI: 0.4-5.4, p=0.53

Cohort Study B (N=100) Table B (N=100) Became ill Yes Total AR Ate oysters 30 50 30/50 No 20 20/50 100 50/100 Table B (N=100) RR=1.5, 95% CI: 1.0-2.3, p=0.046

Cohort Study C (N=1000) Table C (N=1000) Became ill Yes No AR Ate oysters 300 500 300/500 200 200/500 Total 1000 500/1000 Table C (N=1000) RR=1.5, 95% CI: 1.3-1.7, p<0.001

Sample size and power All studies RR=1.5 Study A (N=10) AR exposed: 60% AR non exposed: 40% Study A (N=10) No significant association with oysters (p=0.53) Study B (N=100) Significant association with oysters (p=0.046) Study C (N=1000) Significant association with oysters (p<0.001) Sample size and power

Effect of sample size The larger the sample, the higher the precision % ill

Sample size (What else to include?) Need to increase the initial numbers in accordance with: The expected response rate The loss to follow-up The lack of compliance

Conclusions Sample calculation Sampling methods Minimum sample needed to measure the factor of interest Generalization in the population at reduced cost and time (inference) Consider response rate, loss to follow up Sampling methods Use PC, but understand the variables of the equations To ensure respresentativeness (and be able to generalize our estimates in the population) sample size calculation and sampling methods are essential.

Conclusions If in doubt… Call a statistician !!!!

Ministry of Health and Population - Egypt Thank you Ahmed Zaghloul Ministry of Health and Population - Egypt ahmed.helmy@outlook.com