Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida, College of Nursing Professor, College of Public Health Department of Epidemiology and Biostatistics Associate Member, Byrd Alzheimer’s Institute Morsani College of Medicine Tampa, FL, USA 1
SECTION 4.1 Module Overview and Introduction Hypothesis testing for 2 or more independent groups and non- parametric methods.
Module 4 Learning Outcomes: 1.Calculate and interpret 2 sample hypotheses: a)2 sample – continuous outcome b)>2 samples – continuous outcome c)2 sample dichotomous outcome d)>2 samples dichotomous outcome 2.Specify 2-sample hypotheses and conduct formal testing using SPSS 3.Differentiate between parametric and non- parametric tests 4.Identify properties of non-parametric tests
Module 4 Learning Outcomes: 5.Calculate and interpret non-parametric tests: a)2 independent samples – Wilcoxon Rank Sum Test b)Matched samples – Wilcoxon Signed Rank Test c)>2 independent samples – Kruskal Wallis Test 6.Conduct and interpret non-parametric analyses using SPSS.
Assigned Reading: Textbook: Essentials of Biostatistics in Public Health Chapter 7 Sections 7.5, 7.7 to 7.9 Pages and Chapter 10
SECTION 4.2 Framework of hypothesis testing
General Steps for Hypothesis Testing: 1)Set up the hypothesis and determine the level of statistical significance (including 1 versus 2-sided hypothesis). 2)Select the appropriate test statistic 3)Set up the decision rule 4)Compute the test statistic 5)Conclusion (interpretation)
Hypothesis Testing Calculations: 1)Two Sample – Independent Groups a)Continuous outcome (student t test) b)Dichotomous outcome (risk difference or risk ratio—chi-square test) 2)More than 2 Samples – Independent Groups a)Continuous outcome (analysis of variance- ANOVA) b)Categorical Outcome (chi-square test)
Framework of Hypothesis Testing Goal is to compare sample parameter estimates (e.g. mean, proportion, etc.) between 2 or more independent groups. The groups can be defined from a clinical trial, such as treatment versus placebo, or an observational study, such as men versus women, or exposed versus not exposed. With 2 groups, one group serves as the “comparison” or “control” group representing the null value. Groups do not need to be of the same size. With more than 2 groups, can compare whether any groups differ (e.g. means) or whether groups differ in an ordered manner.
SECTION 4.3 Two-sample: independent groups – continuous outcome
1. Two-Sample: Independent Groups-Continuous Outcome Parameter:Difference in population means: μ 1 – μ 2 H 0 :μ 1 – μ 2 = 0;μ 1 = μ 2 H 1 :μ 1 > μ 2 ;μ 1 < μ 2 ;μ 1 = μ 2 ; Test statistics: n 1 > 30 and n 2 > 30 n 1 < 30 or n 2 < 30 Critical value of z in Table 1C Critical value of t in table 2 d.f. = n 1 + n 2 - 2
1. Two-Sample: Independent Groups-Continuous Outcome Example: From the Framingham Heart Study (offspring), compare mean systolic blood pressure between men and women. nX s Men Women )Set up the hypothesis and determine the level of statistical significance (including 1 versus 2-sided hypothesis). H 0 :μ 1 = μ 2 H 1 :μ 1 = μ 2 (two-sided hypothesis) α = 0.05
1. Two-Sample: Independent Groups-Continuous Outcome Example: From the Framingham Heart Study (offspring), compare mean systolic blood pressure between men and women. nX s Men Women )Select the appropriate test statistic: n 1 > 30 and n 2 > 30, so use z 3)Set up the decision rule: Reject H 0 if z 1.96
1. Two-Sample: Independent Groups-Continuous Outcome Example: From the Framingham Heart Study (offspring), compare mean systolic blood pressure between men and women. nX s Men Women )Compute the test statistic: = sqrt(359.12) = )Conclusion: Reject H 0 because 2.66 > 1.96
1. Two-Sample: Independent Groups-Continuous Outcome (Practice) Example: From the Heart SCORE Study, compare mean total cholesterol levels between men and women. (α = 0.05) nX s Men Women )Set up the hypothesis and determine the level of statistical significance (including 1 versus 2-sided hypothesis). H 0 :_________________ H 1 :_________________ 2)Select the appropriate test statistic: n 1 > 30 and n 2 > 30, so use z__________________ 3)Set up the decision rule: _________________________________________
1. Two-Sample: Independent Groups-Continuous Outcome (Practice) Example: From the Heart SCORE Study, compare mean total cholesterol levels between men and women. (α = 0.05) nX s Men Women )Set up the hypothesis and determine the level of statistical significance (including 1 versus 2-sided hypothesis). H 0 :μ 1 = μ 2 H 1 :μ 1 = μ 2 (two-sided hypothesis) 2)Select the appropriate test statistic: n 1 > 30 and n 2 > 30, so use z 3)Set up the decision rule: Reject H 0 if z 1.96
1. Two-Sample: Independent Groups-Continuous Outcome (Practice) Example: From the Heart SCORE Study, compare mean total cholesterol levels between men and women. (α = 0.05) nX s Men Women )Compute the test statistic 5)Conclusion: _________________________________
1. Two-Sample: Independent Groups-Continuous Outcome (Practice) Example: From the Heart SCORE Study, compare mean total cholesterol levels between men and women. (α = 0.05) nX s Men Women )Compute the test statistic (165–1)(38.416) 2 + (337–1)(42.023) 2 S p = = – z = = = / / )Conclusion: Reject H0: abs(-6.01) > 1.96
1. Two-Sample: Independent Groups-Continuous Outcome (Practice) Example: From the Heart SCORE Study, compare mean total cholesterol levels between men and women. (α = 0.05) SPSS Analyze Compare Means Independent Samples T-Test Test Variable: Total cholesterol Group Variable: Gender (defined as 1,2) Options: 95% C.I.
SECTION 4.4 Two-sample: independent groups – dichotomous outcome
2. Two-Sample: Independent Groups-Dichotomous Outcome Parameter:Risk Difference (RD)(p 1 – p 2 ) or Risk Ratio (RR)(p 1 / p 2 ) H 0 :RD: p 1 = p 2 ; or p 1 – p 2 = 0;RR: p 1 / p 2 = 1.0 H 1 :RD: p 1 = p 2 ; or p 1 – p 2 = 0;RR: p 1 / p 2 = 1.0 Test statistics: Critical value of z in Table 1C min[n 1 p 1, n 1 (1 – p 1 )] > 5 min[n 2 p 2, n 2 (1 – p 2 )] > 5 Note: p = proportion of successes (outcomes)
2. Two-Sample: Independent Groups-Dichotomous Outcome Example: From the Framingham Heart Study (offspring), compare the prevalence of CVD between smokers and non-smokers. No CVDCVD Total Smoker p 1 = 81/744 = Non-smoker p 2 = 298/3055 = (RD)(p 1 – p 2 = );Risk Ratio (RR)(p 1 / p 2 = 1.12) 1)Set up the hypothesis and determine the level of statistical significance (including 1 versus 2-sided hypothesis). H 0 :p 1 = p 2 H 1 :p 1 = p 2 (two-sided hypothesis) α = 0.05
2. Two-Sample: Independent Groups-Dichotomous Outcome Example: From the Framingham Heart Study (offspring), compare the prevalence of CVD between smokers and non-smokers. No CVDCVD Total Smoker p 1 = 81/744 = Non-smoker p 2 = 298/3055 = )Select the appropriate test statistic: min[n 1 p 1, n 1 (1 – p 1 )] > 5 min[n 2 p 2, n 2 (1 – p 2 )] > 5--- use z 3)Set up the decision rule: Reject H 0 if z 1.96
2. Two-Sample: Independent Groups-Dichotomous Outcome Example: From the Framingham Heart Study (offspring), compare the prevalence of CVD between smokers and non-smokers. No CVDCVD Total Smoker p 1 = 81/744 = Non-smoker p 2 = 298/3055 = )Compute the test statistic: p = = = – z = = (1 – )(1/ /3055) 5)Conclusion: Do not reject H 0 : < < 1.96
2. Two-Sample: Independent Groups-Dichotomous Outcome (Practice) Example: From the Heart SCORE Study, compare the prevalence of diabetes by level of weekly exercise. ExerciseNo diabetesDiabetes Total < 3 times/wk p 1 = _____________ > 3 times/wk p 2 = _____________ (RD) (p 1 – p 2 = _______);Risk Ratio (RR) (p 1 / p 2 = _______) 1)Set up the hypothesis and determine the level of statistical significance (including 1 versus 2-sided hypothesis). H 0 :_____________________________ H 1 :_____________________________ α = 0.05
2. Two-Sample: Independent Groups-Dichotomous Outcome (Practice) Example: From the Heart SCORE Study, compare the prevalence of diabetes by level of weekly exercise. ExerciseNo diabetesDiabetes Total < 3 times/wk p 1 = 18/195 = > 3 times/wk p 2 = 23/301 = (RD) (p 1 – p 2 = );Risk Ratio (RR) (p 1 / p 2 = 1.21) 1)Set up the hypothesis and determine the level of statistical significance (including 1 versus 2-sided hypothesis). H 0 :p 1 = p 2 H 1 :p 1 = p 2 (two-sided hypothesis) α = 0.05
2. Two-Sample: Independent Groups-Dichotomous Outcome (Practice) Example: From the Heart SCORE Study, compare the prevalence of diabetes by level of weekly exercise. ExerciseNo diabetesDiabetes Total <3 times/wk p 1 = _______ >3 times/wk p 2 = _______ 2)Select the appropriate test statistic: min[n 1 p 1, n 1 (1 – p 1 )] > 5 min[n 2 p 2, n 2 (1 – p 2 )] > 5--- use z 3)Set up the decision rule: Reject H 0 if: ________________________
2. Two-Sample: Independent Groups-Dichotomous Outcome (Practice) Example: From the Heart SCORE Study, compare the prevalence of diabetes by level of weekly exercise. ExerciseNo diabetesDiabetes Total <3 times/wk p 1 = 18/195 = >3 times/wk p 2 = 23/301 = )Select the appropriate test statistic: min[n 1 p 1, n 1 (1 – p 1 )] > 5 min[n 2 p 2, n 2 (1 – p 2 )] > 5--- use z 3)Set up the decision rule: Reject H 0 if z 1.96
2. Two-Sample: Independent Groups-Dichotomous Outcome (Practice) Example: From the Heart SCORE Study, compare the prevalence of diabetes by level of weekly exercise. ExerciseNo diabetesDiabetes Total <3 times/wk p 1 = _______ >3 times/wk p 2 = _______ 4)Compute the test statistic: p = ______________=__________ z = __________________________________ 5)Conclusion: ________________________________
2. Two-Sample: Independent Groups-Dichotomous Outcome (Practice) Example: From the Heart SCORE Study, compare the prevalence of diabetes by level of weekly exercise. ExerciseNo diabetesDiabetes Total <3 times/wk p 1 = 18/195 = >3 times/wk p 2 = 23/301 = )Compute the test statistic: p = = = – z = = (1 – )(1/ /301) 5)Conclusion: Do not reject H 0 : < < 1.96
2. Two-Sample: Independent Groups-Dichotomous Outcome (Practice) Example: From the Heart SCORE Study, compare the prevalence of diabetes by level of weekly exercise (α = 0.05) SPSS Analyze Descriptive Statistics Crosstabs Row Variable: Exercise >times/week Column Variable: History of diabetes Statistics – Chi-square Cells – Observed, Expected Note: Pearson chi-square test in SPSS includes Yates correction.
SECTION 4.5 More than two-samples: independent groups – continuous outcome
3. More Than Two Independent Groups-Continuous Outcome Parameter:Difference in means for more than 2 groups (ANOVA) H 0 :μ 1 = μ 2 = …μ k H 1 :Means are not all equal Test statistic: F value Find critical value in Table 4 (df 1 = k – 1; df 2 = N – k) Where n j = sample size in the jth group (e.g. j = 1,2,3……) Xj = mean in the jth group X = overall mean k = number of independent groups (k > 2) N = total number of observations in analysis ANOVA Assumptions:Outcome follows a normal distribution (all groups) Variances approximately equal among groups
3. More Than Two Independent Groups-Continuous Outcome “Between-group” variability F = “Residual or error” variability (“within-group” variability) (i.e. variability in the outcome); (null hypothesis is that all groups are random samples) F statistic assesses whether differences among the means (the numerator) are larger than expected by chance. F statistic has 2 degrees of freedom; df 1 (numerator), df 2 (denominator) df 1 = k – 1; df 2 = N – k Table 4 contains critical values for the F distribution
3. More Than Two Independent Groups-Continuous Outcome Analysis of Variance (ANOVA) Table Source of Variation Sum of Squares (SS) Degrees of freedom (df) Mean Squares (MS)F Between-group SSB k - 1 SSB MSB = k – 1 MSB F = MSE Error or residual (random) “within-group” SSE N - k SSE* MSE = N – k Total SST N *Textbook on page 150 has typographical error
3. More Than Two Independent Groups-Continuous Outcome Low CalorieLow FatLow CarbControl n 1 = 5n 2 = 5n 3 = 5n 4 = 5 X 1 = 6.6X 2 = 3.0X 3 = 3.4X 4 = 1.2 Example: Weight Loss by Treatment (in Pounds) 1)Set up the hypothesis and determine level of statistical significance H 0 : µ 1 = µ 2 = µ 3 = µ 4 H 1 : Means are not all equal; α = )Select the appropriate test statistic
3. More Than Two Independent Groups-Continuous Outcome Low CalorieLow FatLow CarbControl n 1 = 5n 2 = 5n 3 = 5n 4 = 5 X 1 = 6.6X 2 = 3.0X 3 = 3.4X 4 = 1.2 Example: Weight Loss by Treatment (in Pounds) 3)Set up the decision rule --- see critical value in Table 4 df 1 = k – 1 = 4 – 1 = 3 df 2 = N – k= 20 – 4 = 16 Reject H 0 if F > 3.24
3. More Than Two Independent Groups-Continuous Outcome Low CalorieLow FatLow CarbControl n 1 = 5n 2 = 5n 3 = 5n 4 = 5 X 1 = 6.6X 2 = 3.0X 3 = 3.4X 4 = 1.2 Example: Weight Loss by Treatment (in Pounds) 4)Compute test statistic: SSB = SSE = (ANOVA table) MSB = SSB / (k – 1) MSE = SSE / (N – k) F = MSB / MSE SSB = 5(6.6 – 3.6) 2 + 5(3.0 – 3.6) 2 + 5(3.4 – 3.6) 2 + 5(1.2 – 3.6) 2 = = 75.8 SSE = = 47.4 (see tables 7-24 to 7-28, page 151 of text) MSB = 75.8 / (4 – 1) = 25.3 MSE = 47.4 / (20 – 4) = 3.0 F = 25.3 / 3.0 = ) Conclusion: Reject H 0 ; 8.43 > 3.24
3. More Than Two Independent Groups-Continuous Outcome Low CalorieLow FatLow CarbControl n 1 = 5n 2 = 5n 3 = 5n 4 = 5 X 1 = 6.6X 2 = 3.0X 3 = 3.4X 4 = 1.2 Example: Weight Loss by Treatment (in Pounds) ANOVA orthogonal contrasts of mean: Sometimes, rather than just comparing a difference among all means, we wish to compare specific means or whether the means increase or decrease in a monotonic (linear) manner. This can be achieved with orthogonal contrasts of the means. Sum of coefficients in each linear contrast must equal zero In the example above: µ 1 versus (µ 2, µ 3, µ 4 ) (µ 1, µ 2 ) versus (µ 3, µ 4 ) (µ 1, µ 2, µ 3 ) versus µ linear trend
3. More Than Two Independent Groups-Continuous Outcome (Practice) NormalPre-hypertensiveHtn Stage IHtn Stage II n 1 = 88n 2 = 191n 3 = 139n 4 = 55 X 1 = 28.42X 2 = 29.43X 3 = 30.75X 4 = s = 5.37s = 5.75s = 5.89s = 6.39 Example: Body Mass Index by Blood Pressure Classification 1)Set up the hypothesis and determine level of statistical significance H 0 :__________________________________ H 1 : ___________________________________ α = )Select the appropriate test statistic:__________________
3. More Than Two Independent Groups-Continuous Outcome (Practice) NormalPre-hypertensiveHtn Stage IHtn Stage II n 1 = 88n 2 = 191n 3 = 139n 4 = 55 X 1 = 28.42X 2 = 29.43X 3 = 30.75X 4 = s = 5.37s = 5.75s = 5.89s = 6.39 Example: Body Mass Index by Blood Pressure Classification 1)Set up the hypothesis and determine level of statistical significance H 0 : µ 1 = µ 2 = µ 3 = µ 4 H 1 : Means are not all equal; H 1 : Means increase or decrease in a monotonic (linear) manner; α = )Select the appropriate test statistic
3. More Than Two Independent Groups-Continuous Outcome (Practice) NormalPre-hypertensiveHtn Stage IHtn Stage II n 1 = 88n 2 = 191n 3 = 139n 4 = 55 X 1 = 28.42X 2 = 29.43X 3 = 30.75X 4 = s = 5.37s = 5.75s = 5.89s = 6.39 Example: Body Mass Index by Blood Pressure Classification 3)Set up the decision rule --- see critical value in Table 4 df 1 = k – 1 = ___________ df 2 = N – k= ___________ Reject H 0 if: _______________________ Total N = n 1 + n 2 + n 3 + n 4 = ___________________________
3. More Than Two Independent Groups-Continuous Outcome (Practice) NormalPre-hypertensiveHtn Stage IHtn Stage II n 1 = 88n 2 = 191n 3 = 139n 4 = 55 X 1 = 28.42X 2 = 29.43X 3 = 30.75X 4 = s = 5.37s = 5.75s = 5.89s = 6.39 Example: Body Mass Index by Blood Pressure Classification 3)Set up the decision rule --- see critical value in Table 4 df 1 = k – 1 = 4 – 1 = 3 df 2 = N – k= 473 – 4 = Reject H 0 if F > 2.62 Total N = n 1 + n 2 + n 3 + n 4 = = 473
3. More Than Two Independent Groups-Continuous Outcome (Practice) NormalPre-hypertensiveHtn Stage IHtn Stage II n 1 = 88n 2 = 191n 3 = 139n 4 = 55 X 1 = 28.42X 2 = 29.43X 3 = 30.75X 4 = s = 5.37s = 5.75s = 5.89s = 6.39 Example: Body Mass Index by Blood Pressure Classification Source of Variation Sum of Squares (SS) Degrees of freedom (df) Mean Squares (MS)F Between-group SSB k – 1 ________ SSB MSB = = _____ k – 1 MSB F = MSE Error or residual (random) “within-group” SSE N – k _______ SSE MSE = = _____ N – k F = _______ Total SST N – 1 _______ Compute the test statistic F = ____________N = ____ 5.Conclusion: ___________________________
3. More Than Two Independent Groups-Continuous Outcome (Practice) NormalPre-hypertensiveHtn Stage IHtn Stage II n 1 = 88n 2 = 191n 3 = 139n 4 = 55 X 1 = 28.42X 2 = 29.43X 3 = 30.75X 4 = s = 5.37s = 5.75s = 5.89s = 6.39 Example: Body Mass Index by Blood Pressure Classification Source of Variation Sum of Squares (SS) Degrees of freedom (df) Mean Squares (MS)F Between-group SSB k – 1 3 SSB MSB = = k – 1 MSB F = MSE Error or residual (random) “within-group” SSE N – k 469 SSE MSE = = 33.6 N – k F = __9.83__ Total SST N – Compute the test statistic F = MSB / MSE N = Conclusion: Reject H 0 : 9.83 > 2.62 Linear trend: F=11.27
3. More Than Two Independent Groups-Continuous Outcome (Practice) Example: Body mass index and blood pressure classification in the Heart SCORE Study (α = 0.05) SPSS Analyze Compare Means One-Way ANOVA Dependent Variable: Body mass index Group Variable (Factor): Blood pressure class Contrasts Options: Descriptive Homogeneity of variance test Means plot