Assume as previously that we have k samples on as many treatments Assume as previously that we have k samples on as many treatments. We’ll let Rij denote the rank of the sample value Xij and as before N = the total number of sample values (see the Table 3.2.1 on page 86 for the complete notations…). Then the Kruskal – Wallis statistic is the rank-based equivalent to the F-statistic given by the formula Note that the sum in the KW formula is actually the treatment sum of squares (remember, (N+1)/2 is the mean of all the N ranks). The constant coefficient of the sum is a “scaling factor” which makes the KW statistic have approximately a chi-square distribution with k-1 degrees of freedom. Thus p-values may be obtained from the chi-square tables, for large enough sample size; or we may use Table A6 in the Appendix (for small sample sizes and for k=3 or 4; or we may use a permutation test based on this KW statistic…
The permutation test based on the KW statistic is done in a similar manner to the others we’ve done except we’ll have to compute KW after each “shuffle” (sample)… try it now on Example 3.2.1 on page 87… see R#7 handout. For SAS, use PROC NPAR1WAY WILCOXON; but be careful about using the EXACT WILCOXON; statement in the k-sample case – it can take several minutes to actually compute the exact probabilities… Try this on the data from Table 3.2.2 on page 87 In the case of ties in the data, use mid-ranks to compute the ranks and make one of two adjustments (see p. 88 and 89):
It is also possible to create a “KW-like” statistic for general scores (not just ranks or mid-ranks) such as van der Waerden scores. See the statistic GS on page 91 and go over it carefully… HW: Finish reading this section 3.2; make sure you can calculate the KW statistic in R and SAS and understand the output. Midterm HW: Apply the Kruskal-Wallis permutation test to the data in problem #2 on page 105.
/*use the following to calculate the tied KW statistics*/ /*note that proc npar1way does the tied KW on p.88*/ dm log 'clear'; dm output 'clear'; options ls=80; data table3_2_3; input food_group $ salt_score @@; datalines; pr1 4 pr1 5 pr1 3 pr1 4 pr1 5 pr1 5 pr1 2 pr2 3 pr2 4 pr2 5 pr2 2 pr2 3 pr2 1 pr2 1 pr2 2 pr3 2 pr3 1 pr3 1 pr3 2 pr3 1 pr3 3 ; proc print; run; proc sort; by food_group; run; proc rank; run; *to get the mid-ranks; proc means; by food_group; run; *to get the means of the ranks for each group; proc means; run; *to get the mean and s.d. for all the ranks combined; proc npar1way wilcoxon data=table3_2_3; class food_group; var salt_score; run; quit;