Lesson Test to See if Samples Come From Same Population
Objectives Test a claim using the Kruskal–Wallis test
Vocabulary Kruskal–Wallis Test -- nonparametric procedure used to test the claim that k (3 or more) independent samples come from populations with the same distribution.
Test of Means of 3 or more groups ●Parametric test of the means of three or more groups: Compared the corresponding observations by subtracting one mean from the other Performed a test of whether the mean is 0 ●Nonparametric case for three or more groups: Combine all of the samples and rank this combined set of data Compare the rankings for the different groups
Kruskal-Wallis Test ●Assumptions: Samples are simple random samples from three or more populations Data can be ranked ●We would expect that the values of the samples, when combined into one large dataset, would be interspersed with each other ●Thus we expect that the average relative ratings of each sample to be about the same
Test Statistic for Kruskal–Wallis Test A computational formula for the test statistic is where R i is the sum of the ranks of the ith sample R² 1 is the sum of the ranks squared for the first sample R² 2 is the sum of the ranks squared for the second sample, and so on n 1 is the number of observations in the first sample n 2 is the number of observations in the second sample, and so on N is the total number of observations (N = n 1 + n 2 + … + n k ) k is the number of populations being compared n i (N + 1) ² H = R i N(N + 1) n i 2 Σ 12 R² 1 R² 2 R² k H = … (N + 1) N(N + 1) n 1 n 2 n k
Test Statistic (cont) ●Large values of the test statistic H indicate that the R i ’s are different than expected ●If H is too large, then we reject the null hypothesis that the distributions are the same ●This always is a right-tailed test
Critical Value for Kruskal–Wallis Test Small-Sample Case When three populations are being compared and when the sample size from each population is 5 or less, the critical value is obtained from Table XIV in Appendix A. Large-Sample Case When four or more populations are being compared or the sample size from one population is more than 5, the critical value is χ² α with k – 1 degrees of freedom, where k is the number of populations and α is the level of significance.
Hypothesis Tests Using Kruskal–Wallis Test Step 0 Requirements: 1. The samples are independent random samples. 2. The data can be ranked. Step 1 Box Plots: Draw side-by-side boxplots to compare the sample data from the populations. Doing so helps to visualize the differences, if any, between the medians. Step 2 Hypotheses: (claim is made regarding distribution of three or more populations) H 0 : the distributions of the populations are the same H 1 : the distributions of the populations are not the same Step 3 Ranks: Rank all sample observations from smallest to largest. Handle ties by finding the mean of the ranks for tied values. Find the sum of the ranks for each sample. Step 4 Level of Significance: (level of significance determines the critical value) The critical value is found from Table XIV for small samples. The critical value is χ² α with k – 1 degrees of freedom (found in Table VI) for large samples. Step 5 Compute Test Statistic: Step 6 Critical Value Comparison: We reject the null hypothesis if the test statistic is greater than the critical value. 12 R² 1 R² 2 R² k H = … (N + 1) N(N + 1) n 1 n 2 n k
Kruskal–Wallis Test Hypothesis In this test, the hypotheses are H 0 : The distributions of all of the populations are the same H 1 : The distributions of all of the populations are not the same This is a stronger hypothesis than in ANOVA, where only the means (and not the entire distributions) are compared
Example 1 from 15.7 S (29)61 (31.5)44 (18) 243 (16)41 (14)65 (34.5) 338 (11.5)44 (18)62 (33) 430 (2)47 (21)53 (27.5) 561 (31.5)33 (3)51 (26) 653 (27.5)29 (1)49 (22.5) 735 (7.5)59 (30)49 (22.5) 834 (4.5)35 (7.5)42 (15) 939 (13)34 (4.5)35 (7.5) 1046 (20)74 (36)44 (18) 1150 (24.5) 37 (10) 1235 (7.5)65 (34.5)38 (11.5) Medians (Sums) 41 (194.5) 45.5 (225.5) 46.5 (246)
Example 1 (cont) 12 R² 1 R² 2 R² k H = … (N + 1) N(N + 1) n 1 n 2 n k ² 225.5² 246² H = (36 + 1) = (36 + 1) Critical Value: ( Large-Sample Case) χ² α with 2 (3 – 1) degrees of freedom, where 3 is the number of populations and 0.05 is the level of significance CV= Conclusion : Since H < CV, therefore we FTR H 0 (distributions are the same)
Summary and Homework Summary –The Kruskal-Wallis test is a nonparametric test for comparing the distributions of three or more populations –This test is a comparison of the rank sums of the populations –Critical values for small samples are given in tables –The critical values for large samples can be approximated by a calculation with the chi-square distribution Homework –problems 3, 6, 7, 10 from the CD