Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida, College of Nursing Professor, College of Public Health Department of Epidemiology and Biostatistics Associate Member, Byrd Alzheimer’s Institute Morsani College of Medicine Tampa, FL, USA 1
SECTION 5.1 Parameters and factors that affect sample size Sample size estimation and correlation
Learning Outcome: Describe how margin of error, effect size, and variability of the outcome affect sample size estimates.
Overview: All studies should be designed to include a sufficient number of participants to adequately address the research question. In clinical trials, small-to-modest effects are typically anticipated, thus, large sample sizes are often needed. In some circumstances, sample sizes are determined by logistical or financial constraints. While this may be reality, it does not ensure that the research question can be adequately addressed. Sample size considerations should consider loss to follow-up, subject crossover, suboptimal intervention compliance, and outcome rates lower than anticipated. The concepts and formulas that follow are based on statistical considerations irrespective of logistical or financial constraints.
“Effect Size”: Measure of the strength of the relationship, such as a drug effect versus placebo on resolution of symptoms. Cohen’s “d” is a common measure of effect size used in sample size estimation. d = x 1 - x 2 s General interpretation: Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates
Common z values (table 1) for various confidence intervals: Precision (margin of error) 90% % % %2.800 Common z values (table 1) for type I and type II error rates: Type I (z 1-α/2 ) Type II (z 1-β ) 0.10: (80% power): : : (90% power): :2.800
SECTION 5.2 Sample size estimates for a one sample continuous outcome
Learning Outcome: Calculate and interpret sample size estimates for a one-sample continuous outcome: ---Estimate for a confidence interval ---Estimate for a hypothesis test
Continuous Outcome – One Sample Sample Size to Estimate C.I. Sample Size for Hypothesis Test C.I. for µ n = ZσZσ E Z 1 – α/2 + Z 1 – β ES H 0 : µ = µ ES = | µ 1 - µ 0 | σ
Continuous Outcome – One Sample (C.I.) Sample Size to Estimate C.I. C.I. for µ n = ZσZσ E 2 Example: Estimate required sample size for a 95% C.I. of systolic blood pressure in children with congenital heart disease ages 3 to 5. Parameters: Margin of error: 5 units (mm/Hg) Assumed SD: 20 Desired C.I.: 95% (i.e. z = 1.96) n = 2 ZσZσ E = 1.96 x = 61.5
Continuous Outcome – One Sample (C.I.) - Practice Sample Size to Estimate C.I. C.I. for µ n = ZσZσ E 2 Example: Estimate required sample size for a 95% C.I. of body mass index among teenagers ages 14 to 17 in Tampa, FL. Parameters: Margin of error: 1.5 units Assumed SD: 4.8 Desired C.I.: 95% (i.e. z = 1.96) n = 2 ZσZσ E = 2 = _______
Continuous Outcome – One Sample (C.I.) - Practice Sample Size to Estimate C.I. C.I. for µ n = ZσZσ E 2 Example: Estimate required sample size for a 95% C.I. of body mass index among teenagers ages 14 to 17 in Tampa, FL. Parameters: Margin of error: 1.5 units Assumed SD: 4.8 Desired C.I.: 95% (i.e. z = 1.96) n = 2 ZσZσ E = 1.96 x = 39.3
Continuous Outcome – One Sample (H 0 Test) Sample Size for Hypothesis Test n = Z 1 – α/2 + Z 1 – β ES H 0 : µ = µ 0 2 ES = | µ 1 - µ 0 | σ Example: Compare mean fasting glucose level in a sample of heavy coffee drinkers to known values in the general population. Parameters/Assumptions: General pop mean: μ 0 = 95.0 mg/dL General pop SD:σ 0 = 9.8 Clinically important difference:5 mg/dL 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | | 9.8 Note: 100 = = 0.51 n = = 30.1 A sample size of n = 31 will ensure that a 2-sided test with α=0.05 has 80% power to detect a 5 mg/dL difference in mean glucose levels
Continuous Outcome – One Sample (H 0 Test)(Practice) Sample Size for Hypothesis Test n = Z 1 – α/2 + Z 1 – β ES H 0 : µ = µ 0 2 ES = | µ 1 - µ 0 | σ Example: Compare mean fasting cholesterol level in a sample of intensive swimmers to known values in the general population. Parameters/Assumptions: General pop mean: μ 0 = mg/dL General pop SD:σ 0 = 34.8 Clinically important difference:20 mg/dL 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | = __________ n = = _____
Continuous Outcome – One Sample (H 0 Test)(Practice) Sample Size for Hypothesis Test n = Z 1 – α/2 + Z 1 – β ES H 0 : µ = µ 0 2 ES = | µ 1 - µ 0 | σ Example: Compare mean fasting cholesterol level in a sample of intensive swimmers to known values in the general population. Parameters/Assumptions: General pop mean: μ 0 = mg/dL General pop SD:σ 0 = 34.8 Clinically important difference:20 mg/dL 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | – | 34.8 Note: = = n = = 23.7 A sample size of n = 24 will ensure that a 2-sided test with α=0.05 has 80% power to detect a 20 mg/dL difference in mean cholesterol levels
SECTION 5.3 Sample size estimates for a two-sample (independent groups) continuous outcome
Learning Outcome: Calculate and interpret sample size estimates for a two-sample (independent groups) continuous outcome: ---Estimate for a confidence interval ---Estimate for a hypothesis test
Sample Size to Estimate C.I. Sample Size for Hypothesis Test C.I. for (µ 1 - µ 2 ) n i = 2 ZσZσ E Z 1 – α/2 + Z 1 – β ES H 0 : µ 1 = µ ES = | µ 1 - µ 2 | σ Continuous Outcome – Two Independent Samples n i = 2
Sample Size to Estimate C.I. C.I. for (µ 1 - µ 2 ) n i = 2 ZσZσ E 2 Continuous Outcome – Two Independent Samples (C.I.) Example: Compare low-fat versus low-carb diet in obese children for weight loss. Parameters: Margin of error: 3 pounds Assumed pooled SD: 8.1 pounds Desired C.I.: 95% (i.e. z = 1.96) Assumed drop out rate:20% 2 ZσZσ E = x = 56.0 (per group; n 1 and n 2 ) n i = 2 Take into account the drop out rate: N (number to enroll) / (% retained) = desired sample size N = 112 / 0.80 = 140 subjects
Sample Size to Estimate C.I. C.I. for (µ 1 - µ 2 ) n i = 2 ZσZσ E 2 Continuous Outcome – Two Independent Samples (C.I.)(Practice) Example: Compare strength training versus aerobic training for weight loss among obese adults. Parameters: Margin of error: 4 pounds Assumed pooled SD: 12.2 pounds Desired C.I.: 95% (i.e. z = 1.96) Assumed drop out rate:15% 2 ZσZσ E = = ____________________ n i = 2 Take into account the drop out rate: N (number to enroll) / (% retained) = desired sample size N = ____________________________
Sample Size to Estimate C.I. C.I. for (µ 1 - µ 2 ) n i = 2 ZσZσ E 2 Continuous Outcome – Two Independent Samples (C.I.)(Practice) Example: Compare strength training versus aerobic training for weight loss among obese adults. Parameters: Margin of error: 4 pounds Assumed pooled SD: 12.2 pounds Desired C.I.: 95% (i.e. z = 1.96) Assumed drop out rate:15% 2 ZσZσ E = x = 71.5 (per group; n 1 and n 2 ) n i = 2 Take into account the drop out rate: N (number to enroll) / (% retained) = desired sample size N = 143 / 0.85 = subjects
Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : µ 1 = µ 2 2 ES = | µ 1 - µ 2 | σ Continuous Outcome – Two Independent Samples (H 0 Test) n i = 2 Example: Compare systolic blood pressure in a trial of a new drug vs. placebo. Parameters/Assumptions: Estimated SD:σ = 19.0 mg/dL Clinically important difference:5 mg/dL 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 Assumed dropout-rate:10% ES = = A sample size of n = 503 will ensure that a 2- sided test with α=0.05 has 80% power to detect a 5 mg/dL difference in systolic blood pressure between the 2 study groups. n i = n i = Take into account the drop out rate: N (number to enroll) / (% retained) N = / 0.90 = 503 subjects n = 452.8
Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : µ 1 = µ 2 2 ES = | µ 1 - µ 2 | σ Continuous Outcome – Two Independent Samples (H 0 Test)(Practice) n i = 2 Example: Compare hemoglobin A1c levels between an insulin sensitizing and insulin augmentation drug. Parameters/Assumptions: Estimated SD:σ = 2.7 Clinically important difference:1.2 2-sided type I error rate (α):0.05 Desired power (1-β):0.90 Assumed dropout-rate:20% ES = = ______ ni =ni = n i = _____ Take into account the drop out rate: N (number to enroll) / (% retained) N = _________________________ n = _____
Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : µ 1 = µ 2 2 ES = | µ 1 - µ 2 | σ Continuous Outcome – Two Independent Samples (H 0 Test)(Practice) n i = 2 Example: Compare hemoglobin A1c levels between an insulin sensitizing and insulin augmentation drug. Parameters/Assumptions: Estimated SD:σ = 2.7 Clinically important difference:1.2 2-sided type I error rate (α):0.05 Desired power (1-β):0.90 Assumed dropout-rate:20% ES = = A sample size of n = 266 will ensure that a 2- sided test with α=0.05 has 90% power to detect a 1.2(%) difference in hemoglobin A1c between the 2 study groups. n i = n i = Take into account the drop out rate: N (number to enroll) / (% retained) N = / 0.80 = 266 subjects n = 212.8
SECTION 5.4 Sample size estimates for a matched sample and a continuous outcome
Learning Outcome: Calculate and interpret sample size estimates for a matched sample and a continuous outcome ---Estimate for a confidence interval ---Estimate for a hypothesis test
Sample Size to Estimate C.I. Sample Size for Hypothesis Test C.I. for µ d n = ZσdZσd E Z 1 – α/2 + Z 1 – β ES H 0 : µ d = ES = µdµd σdσd Continuous Outcome – Two Matched Samples n =
Sample Size to Estimate C.I. C.I. for µ d n = ZσdZσd E 2 Continuous Outcome – Two Matched Samples (C.I.) Example: Crossover trial of low-fat and low- carb diet in obese children for weight loss. Each subject is assigned to both diets. Parameters: Margin of error: 3 pounds difference Assumed SD (diff): 9.1 pounds Desired C.I.: 95% (i.e. z = 1.96) Assumed drop out rate:30% n = 1.96 x n = 35.3 Take into account the drop out rate: N (number to enroll) / (% retained) = desired sample size N = 35.3 / 0.70 = 50.5 subjects
Sample Size to Estimate C.I. C.I. for µ d n = ZσdZσd E 2 Continuous Outcome – Two Matched Samples (C.I.)(Practice) Example: Crossover trial of 2 different exercise programs for overweight adults. Each subject is assigned to both programs. Parameters: Margin of error: 2 pounds difference Assumed SD (diff): 5.6 pounds Desired C.I.: 95% (i.e. z = 1.96) Assumed drop out rate:20% n = n = ______ Take into account the drop out rate: N (number to enroll) / (% retained) = desired sample size N = ______________________________
Sample Size to Estimate C.I. C.I. for µ d n = ZσdZσd E 2 Continuous Outcome – Two Matched Samples (C.I.)(Practice) Example: Crossover trial of 2 different exercise programs for overweight adults. Each subject is assigned to both programs. Parameters: Margin of error: 2 pounds difference Assumed SD (diff): 5.6 pounds Desired C.I.: 95% (i.e. z = 1.96) Assumed drop out rate:20% n = 1.96 x n = 30.1 Take into account the drop out rate: N (number to enroll) / (% retained) = desired sample size N = 30.1 / 0.80 = 37.6 subjects
Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : µ d = 0 2 ES = µdµd σdσd Continuous Outcome – Two Matched Samples (H 0 Test) n = Example: Crossover trial of low-fat and low-carb diet in obese children for weight loss. Each subject is assigned to both diets. Parameters/Assumptions: Estimated SD:σ d = 9.1 pounds Clinically important difference:3 pounds 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 Assumed dropout-rate:30% ES = = 0.33 n = = 72.1 Take into account the drop out rate: N (number to enroll) / (% retained) N = 72.1 / 0.70 = subjects A sample size of n = 103 will ensure that a 2-sided test with α=0.05 has 80% power to detect a 3 pound difference between the 2 diets.
Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : µ d = 0 2 ES = µdµd σdσd Continuous Outcome – Two Matched Samples (H 0 Test)(Practice) n = Example: Crossover trial of 2 memory enhancement programs for adults with mild dementia. Each subject is assigned to both programs (# words remembered). Parameters/Assumptions: Estimated SD:σ d = 3.4 words Clinically important difference:2 words 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 Assumed dropout-rate:20% ES == _______ n = = __________ Take into account the drop out rate: N (number to enroll) / (% retained) N = __________________________
Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : µ d = 0 2 ES = µdµd σdσd Continuous Outcome – Two Matched Samples (H 0 Test)(Practice) n = Example: Crossover trial of 2 memory enhancement programs for adults with mild dementia. Each subject is assigned to both programs (# words remembered). Parameters/Assumptions: Estimated SD:σ d = 3.4 words Clinically important difference:2 words 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 Assumed dropout-rate:20% ES = = 0.59 n = = Take into account the drop out rate: N (number to enroll) / (% retained) N = / 0.80 = 28.3 subjects A sample size of n = 29 will ensure that a 2-sided test with α=0.05 has 80% power to detect a 2 word difference in memory recall between the 2 programs.
SECTION 5.5 Sample size estimates for a one sample dichotomous outcome
Learning Outcome: Calculate and interpret sample size estimates for a one sample dichotomous outcome ---Estimate for a confidence interval ---Estimate for a hypothesis test
Sample Size to Estimate C.I. Sample Size for Hypothesis Test C.I. for p n = p(1-p) Z E Z 1 – α/2 + Z 1 – β ES H 0 : p = p Dichotomous Outcome – One Sample n = ES = | p 1 – p 0 | p 0 (1– p 0 )
Sample Size to Estimate C.I. C.I. for p n = p(1-p) Z E 2 Dichotomous Outcome – One Sample (C.I.) Example: Estimate required sample size for 95% C.I. of the proportion of freshman at a college who currently smoke cigarettes. Parameters: Margin of error: 5% Assumed prevalence (i) 50% (i.e. no data available) Assumed prevalence (ii)27% (i.e. national data) Desired C.I.: 95% (i.e. z = 1.96) n = 0.5(1-0.5) = n = 0.27(1-0.27) = 302.9
Sample Size to Estimate C.I. C.I. for p n = p(1-p) Z E 2 Dichotomous Outcome – One Sample (C.I.)(Practice) Example: Estimate required sample size for 95% C.I. of the proportion of male freshman at a college who report binge drinking at least once a month. Parameters: Margin of error: 5% Assumed prevalence (ii)14% (i.e. national data) Desired C.I.: 95% (i.e. z = 1.96) n = = _______
Sample Size to Estimate C.I. C.I. for p n = p(1-p) Z E 2 Dichotomous Outcome – One Sample (C.I.)(Practice) Example: Estimate required sample size for 95% C.I. of the proportion of male freshman at a college who report binge drinking at least once a month. Parameters: Margin of error: 5% Assumed prevalence (ii)14% (i.e. national data) Desired C.I.: 95% (i.e. z = 1.96) n = 0.14(1-0.14) = 185.0
Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : p = p 0 2 Dichotomous Outcome – One Sample (H 0 Test) n = ES = | p 1 – p 0 | p 0 (1– p 0 ) Example: Compare prevalence of high LDL cholesterol (>159 mg/dL) between patients with a history of CVD and a population estimate of those without CVD. Parameters/Assumptions: General population proportion: p 0 = 0.26 (26%) Clinically important difference:0.05 (5%) 2-sided type I error rate (α):0.05 Desired power (1-β):0.90 ES = | 0.31 – 0.26 | 0.26 (1– 0.26) = n = n = A sample size of n = 809 will ensure that a 2-sided test with α=0.05 has 90% power to detect a difference of 0.05, or 5%, in the proportion of patients with a history of CVD who have an elevated LDL cholesterol level.
Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : p = p 0 2 Dichotomous Outcome – One Sample (H 0 Test)(Practice) n = ES = | p 1 – p 0 | p 0 (1– p 0 ) Example: Compare the prevalence of the metabolic syndrome between patients with a history of alcohol abuse and a population estimate of those without. Parameters/Assumptions: General population proportion: p 0 = 0.22 (22%) Clinically important difference:0.06 (6%) 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | =_______ n = n = ________
Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : p = p 0 2 Dichotomous Outcome – One Sample (H 0 Test)(Practice) n = ES = | p 1 – p 0 | p 0 (1– p 0 ) Example: Compare the prevalence of the metabolic syndrome between patients with a history of alcohol abuse and a population estimate of those without. Parameters/Assumptions: General population proportion: p 0 = 0.22 (22%) Clinically important difference:0.06 (6%) 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | 0.28 – 0.22 | 0.22 (1– 0.22) = n = n = A sample size of n = 374 will ensure that a 2-sided test with α=0.05 has 80% power to detect a difference of 0.06, or 6%, in the proportion of patients with a history of alcohol abuse who have the metabolic syndrome.