Chapter 15 Analysis of Variance ( ANOVA )
Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations of interval data. Analysis of variance is: an extremely powerful and widely used procedure. a procedure which determines whether differences exist between population means. a procedure which works by analyzing sample variance.
One-Way Analysis of Variance… Independent samples are drawn from k populations: Note: These populations are referred to as treatments. It is not a requirement that n 1 = n 2 = … = n k.
Table Notation for the One-Way Analysis of Variance
Independent samples are drawn from k populations (treatments). 12k X 11 x 21. X n1,1 X 12 x 22. X n2,2 X 1k x 2k. X nk,k Sample size Sample mean First observation, first sample Second observation, second sample X is the “response variable”. The variables’ value are called “responses”. Notation
One Way Analysis of Variance… New Terminology: x is the response variable, and its values are responses. x ij refers to the i th observation in the j th sample. E.g. x 35 is the third observation of the fifth sample. ∑ x ij x j = mean of the j th sample = n j njnj i=1 n j = number of observations in the sample taken from the j th population
One Way Analysis of Variance… x = ∑ ∑ x ij x = n = k n j j = 1 i = 1 The grand mean,, is the mean of all the observations, i.e.: (n = n 1 + n 2 + … + n k ) and k is the number of populations
One Way Analysis of Variance… More New Terminology: The unit that we measure is called an experimental unit. Population classification criterion is called a factor. Each population is a factor level.
Example 15-1… An apple juice company has a new product featuring… more convenience, similar or better quality, and lower price when compared with existing juice products. Which factor should an advertising campaign focus on? Before going national, test markets are set-up in three cities, each with its own campaign, and data is recorded… recorded Do differences in sales exist between the test markets?
City 1 City2 City3 (Convenience) (Quality) (Price) Data Xm15-01
Example 15.1… x is the response variable, and its values are responses. weekly sales is the response variable; the actual sales figures are the responses in this example. x ij refers to the i th observation in the j th sample. E.g. x 42 is the fourth week’s sales in city #2: 717 pkgs. x 20, 3 is the last week of sales for city #3: 532 pkgs. Terminology comma added for clarity
Example 15.1… The unit that we measure is called an experimental unit. The response variable is weekly sales Population classification criterion is called a factor. The advertising strategy is the factor we’re interested in. This is the only factor under consideration (hence the term “one way” analysis of variance). Each population is a factor level. In this example, there are three factor levels: convenience, quality, and price. Terminology
In the context of this problem… Response variable – weekly sales Responses – actual sale values Experimental unit – weeks in the three cities when we record sales figures. Factor – the criterion by which we classify the populations (the treatments). In this problem the factor is the marketing strategy. Factor levels – the population (treatment) names. In this problem factor levels are the marketing strategies.
Example 15.1… The null hypothesis in this case is: H 0 : μ 1 = μ 2 =μ 3 i.e. there are no differences between population means. Our alternative hypothesis becomes: H 1 : at least two means differ OK. Now we need some test statistics… IDENTIFY
Two types of variability are employed when testing for the equality of the population means The rationale of the test statistic
Graphical demonstration: Employing two types of variability
Treatment 1Treatment 2 Treatment Treatment 1Treatment 2Treatment The sample means are the same as before, but the larger within-sample variability makes it harder to draw a conclusion about the population means. A small variability within the samples makes it easier to draw a conclusion about the population means.
The rationale behind the test statistic – I If the null hypothesis is true, we would expect all the sample means to be close to one another (and as a result, close to the grand mean). If the alternative hypothesis is true, at least some of the sample means would differ. Thus, we measure variability between sample means.
The variability between the sample means is measured as the sum of squared distances between each mean and the grand mean. This sum is called the Sum of Squares for Treatments SST In our example treatments are represented by the different advertising strategies. Variability between sample means
There are k treatments The size of sample j The mean of sample j Sum of squares for treatments (SST) Note: When the sample means are close to one another, their distance from the grand mean is small, leading to a small SST. Thus, large SST indicates large variation between sample means, which supports H 1.
Test Statistics… Since μ 1 = μ 2 =μ 3 is of interest to us, a statistic that measures the proximity of the sample means to each other would also be of interest. Such a statistic exists, and is called the between- treatments variation. It is denoted SST, short for “sum of squares for treatments”. Its is calculated as: grand mean sum across k treatments A large SST indicates large variation between sample means which supports H 1.
Example 15.1… Since: If it were the case that: then SST = 0 and our null hypothesis, H 0 : would be supported. More generally, a “small value” of SST supports the null hypothesis. The question is, how small is “small enough”? COMPUTE
Example 15.1… The following sample statistics and grand mean were computed… Hence, the between-treatments variation, sum of squares for treatments, is: is SST = 57, “large enough” to indicate the population means differ? COMPUTE
Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means. Therefore, even though sample means may markedly differ from one another, SST must be judged relative to the “within samples variability”. The rationale behind test statistic – II
The variability within samples is measured by adding all the squared distances between observations and their sample means. This sum is called the Sum of Squares for Error SSE In our example this is the sum of all squared differences between sales in city j and the sample mean of city j (over all the three cities). Within samples variability
Test Statistics… SST gave us the between-treatments variation. A second statistic, SSE (Sum of Squares for Error) measures the within-treatments variation. SSE is given by: or: In the second formulation, it is easier to see that it provides a measure of the amount of variation we can expect from the random variable we’ve observed.
Example 15.1… We calculate the sample variances as: COMPUTE 3 and from these, calculate the within-treatments variation (sum of squares for error) as:
Is SST = 57, large enough relative to SSE = 506, to reject the null hypothesis that specifies that all the means are equal? We still need a couple more quantities in order to relate SST and SSE together in a meaningful way… Sum of squares for errors (SSE)
Mean Squares… The mean square for treatments (MST) is given by: is F-distributed with k–1 and n–k degrees of freedom. The mean square for errors (MSE) is given by: And the test statistic: ν 1 = 3 – 1 = 2 ; ν 2 = 60 – 3 = 57
Example 15.1… We can calculate the mean squares treatment and mean squares error quantities as: COMPUTE
Giving us our F-statistic of: Does F = 3.23 fall into a rejection region or not? How does it compare to a critical value of F? Note these required conditions: 1. The populations tested are normally distributed. 2. The variances of all the populations tested are equal. Example 15.1… COMPUTE
Example 15.1… Since the purpose of calculating the F-statistic is to determine whether the value of SST is large enough to reject the null hypothesis, if SST is large, F will be large. Hence our rejection region is: Our value for F Critical is: INTERPRET
Example 15.1… Since F = 3.23 is greater than F Critical = 3.15, we reject the null hypothesis (H 0 : μ 1 = μ 2 =μ 3 ) in favor of the alternative hypothesis (H 1 : at least two population means differ). That is: there is enough evidence to infer that the mean weekly sales differ between the three cities. Stated another way: we are quite confident that the strategy used to advertise the product will produce different sales figures. INTERPRET
Summary of Techniques (so far)…
ANOVA Table… The results of analysis of variance are usually reported in an ANOVA table… Source of Variation degrees of freedom Sum of Squares Mean Square Treatmentsk–1SSTMST=SST/(k–1) Errorn–kSSEMSE=SSE/(n–k) Totaln–1SS(Total) F-stat=MST/MSE
Table 15.2 ANOVA Table for the One-Way Analysis of Variance
Table 15.3 ANOVA Table for Example 15.1
SPSS Output
Figure 15.3a Histogram of Sales, City 1 (Convenience) Checking required conditions
Figure 15.3b Histogram of Sales, City 2 (Quality)
Figure 15.3c Histogram of Sales, City 3 (Price)
Can We Use t – Test Instead of ANOVA? We can’t for two reasons 1.We need to perform more calculations. If we have six pairs then we will have to test C 6 = ( 6 x 5 ) / 2 = 15 times 2.It will increase the probability of making Type I error from 5% to 54% 2
Relationship Between t and F Statistics F = t 2 The F statistic is approximately equal to the square of t Hence we will draw exactly the same conclusion using analysis of variance as we did when we applied t test of u 1 – u 2.
Identifying Factors… Factors that Identify the One-Way Analysis of Variance:
Analysis of Variance Experimental Designs Experimental design is one of the factors that determines which technique we use. In the previous example we compared three populations on the basis of one factor – advertising strategy. One-way analysis of variance is only one of many different experimental designs of the analysis of variance.
Analysis of Variance Experimental Designs A multifactor experiment is one where there are two or more factors that define the treatments. For example, if instead of just varying the advertising strategy for our new apple juice product if we also vary the advertising medium (e.g. television or newspaper), then we have a two-factor analysis of variance situation. The first factor, advertising strategy, still has three levels (convenience, quality, and price) while the second factor, advertising medium, has two levels (TV or print).
Factor A Level 1Level2 Level 1 Factor B Level 3 Two - way ANOVA Two factors Level2 One - way ANOVA Single factor Treatment 3 (level 1) Response Treatment 1 (level 3) Treatment 2 (level 2)
Independent Samples and Blocks Similar to the ‘matched pairs experiment’, a randomized block design experiment reduces the variation within the samples, making it easier to detect differences between populations. The term block refers to a matched group of observations from each population. We can also perform a blocked experiment by using the same subject for each treatment in a “repeated measures” experiment.
Independent Samples and Blocks The randomized block experiment is also called the two-way analysis of variance, not to be confused with the two-factor analysis of variance. To illustrate where we’re headed… we’ll do this first
Fixed effects –If all possible levels of a factor are included in our analysis we have a fixed effect ANOVA. –The conclusion of a fixed effect ANOVA applies only to the levels studied. Random effects –If the levels included in our analysis represent a random sample of all the possible levels, we have a random-effect ANOVA. –The conclusion of the random-effect ANOVA applies to all the levels (not only those studied). Models of Fixed and Random Effects
In some ANOVA models the test statistic of the fixed effects case may differ from the test statistic of the random effect case. Fixed and random effects - examples –Fixed effects - The advertisement Example (15.1): All the levels of the marketing strategies were included –Random effects - To determine if there is a difference in the production rate of 50 machines, four machines are randomly selected and there production recorded. Models of Fixed and Random Effects.
Randomized Block Analysis of Variance The purpose of designing a randomized block experiment is to reduce the within-treatments variation to more easily detect differences between the treatment means. In this design, we partition the total variation into three sources of variation: SS(Total) = SST + SSB + SSE where SSB, the sum of squares for blocks, measures the variation between the blocks.
Treatment 4 Treatment 3 Treatment 2 Treatment 1 Block 1Block3Block2 Block all the observations with some commonality across treatments Randomized Blocks
Randomized Blocks… In addition to k treatments, we introduce notation for b blocks in our experimental design… mean of the observations of the 2 nd treatment mean of the observations of the 1 st block
Sum of Squares : Randomized Block… Squaring the ‘distance’ from the grand mean, leads to the following set of formulae… test statistic for treatments test statistic for blocks
ANOVA Table… We can summarize this new information in an analysis of variance (ANOVA) table for the randomized block analysis of variance as follows… Source of Variation d.f.: Sum of Squares Mean SquareF Statistic Treatmen ts k–1SSTMST=SST/(k–1)F=MST/MSE Blocksb–1SSBMSB=SSB/(b-1)F=MSB/MSE Error n–k– b+1 SSEMSE=SSE/(n–k–b+1) Totaln–1SS(Total)
Test Statistics & Rejection Regions…
Example 15.2… Are there difference in the effectiveness of four new cholesterol drugs? 25 groups of men were matched according to age & weight, and the results were recorded.results The hypotheses to test in this case are: H 0 : μ 1 = μ 2 =μ 3 = μ 4 H 1 : At least two means differ IDENTIFY
Group Drug 1 Drug 2 Drug 3 Drug 4
Example 15.2… Each of the four drugs can be considered a treatment. Each group) can be blocked, because they are matched by age and weight. By setting up the experiment this way, we eliminate the variability in cholesterol reduction related to different combinations of age and weight. This helps detect differences in the mean cholesterol reduction attributed to the different drugs. IDENTIFY
Example 15.2… The Data GroupDrug 1Drug 2Drug 3Drug 4GroupDrug 1Drug 2Drug 3Drug
K - 1 b - 1 BlocksTreatments MSB MST SPSS Output
The p value to determine whether differences exist between the four drugs ( treatments) is.009. Thus we reject H 0 in favor of the research hypothesis: at least two means differ. The p value for groups = 0 indicates that there are differences between groups of men ( blocks) that is: age, and weight have an impact, but our experiment design accounts for that.
Identifying Factors… Factors that Identify the Randomized Block of the Analysis of Variance:
Two-Factor Analysis of Variance… The original set-up for Example 15.1 examined one factor, namely the effects of the marketing strategy on sales. Emphasis on convenience, Emphasis on quality, or Emphasis on price. Suppose we introduce a second factor, that being the effects of the selected media on sales, that is: Advertise on television, or Advertise in newspapers. To which factor(s) or the interaction of factors can we attribute any differences in mean sales of apple juice?
More Terminology… A complete factorial experiment is an experiment in which the data for all possible combinations of the levels of the factors are gathered. This is also known as a two-way classification. The two factors are usually labeled A & B, with the number of levels of each factor denoted by a & b respectively. The number of observations for each combination is called a replicate, and is denoted by r. For our purposes, the number of replicates will be the same for each treatment, that is they are balanced.
Example 15.3 Test Marketing of Advertising Strategies and Advertising Media Manufacturing Media: Television & Newspaper City 1: Convenience – Television City 2: Convenience – Newspaper City 3: Quality - Television City 4: Quality – Newspaper City 5: Price - Television City 6: Price - Newspaper
C-1C-2C-3C-4C-5C Sales Data
Newspaper Newspaper Newspaper Newspaper Newspaper Newspaper Newspaper Newspaper Newspaper Newspaper Television Television Television Television Television Television Television Television Television Television ConvenienceQualityPrice Factor A: Strategy: Convenience; Quality; & Price Factor B : Medium; Television & Newspaper The Data
Example 15.3… The Data Factor “B” Medium Factor “A” Strategy There are a = 3 levels of factor A, b = 2 levels of factor B, yielding 3 x 2 = 6 replicates, each replicate has r = 10 observations…
Possible Outcomes… This figure illustrates the case where there are differences between levels of A, but no difference between the levels of B and no interaction between A & B: Fig Levels 1 and 2 of factor B Levels of factor A Mean response
Possible Outcomes… This figure illustrates the case where there are differences between levels of B, but no differences between the levels of A and no interaction between A & B: Fig Level 2 of factor B Levels of factor A Mean response Level 1 of factor B
Possible Outcomes… This figure illustrates the case where there are differences between levels of A, and there are differences between the levels of B, but and no interaction between A & B: (i.e. the factors affect sales independently, which means there is no interaction) Fig Level 1 of factor B Levels of factor A Mean response Level 2 of factor B
Possible Outcomes… This figure shows the levels of A & B interacting: Fig Level 1 of factor B Levels of factor A Mean response Level 2 of factor B
ANOVA Table… Table 15.8 Source of Variation d.f.: Sum of Squares Mean SquareF Statistic Factor Aa-1SS(A)MS(A)=SS(A)/(a-1)F=MS(A)/MSE Factor Bb–1SS(B)MS(B)=SS(B)/(b-1)F=MS(B)/MSE Interaction(a-1)(b-1)SS(AB) MS(AB) = SS(AB) [(a-1)(b-1)] F=MS(AB)/MSE Errorn–abSSEMSE=SSE/(n–ab) Totaln–1SS(Total)
Two Factor ANOVA… Test for the differences between the Levels of Factor A… H 0 : The means of the a levels of Factor A are equal H 1 : At least two means differ Test statistic: F = MS(A) / MSE Example 15.3: Are there differences in the mean sales caused by different marketing strategies? H 0 : μ convenience = μ quality = μ price H 1 : At least two means differ
Two Factor ANOVA… Test for the differences between the Levels of Factor B… H 0 : The means of the a levels of Factor B are equal H 1 : At least two means differ Test statistic: F = MS(B) / MSE Example 15.3: Are there differences in the mean sales caused by different advertising media? H 0 : μ television = μ newspaper H 1 : At least two means differ
Two Factor ANOVA… Test for interaction between Factors A and B… H 0 : Factors A and B do not interact to affect the mean responses. H 1 : Factors A and B do interact to affect the mean responses. Test statistic: F = MS(AB) / MSE Example 15.3: Are there differences in the mean sales caused by interaction between marketing strategy and advertising medium?? H 0 : μ convenience & television = μ quality & television = μ price & newspaper H 1 : At least two means differ
COMPUTE Factor B - MediaFactor A - Mktg Strategy Interaction of A&B Error SPSS Output
Example 15.3… INTERPRET There is evidence at the 5% significance level to infer that differences in weekly sales exist between the different marketing strategies (Factor A).
Example 15.3… INTERPRET There is insufficient evidence at the 5% significance level to infer that differences in weekly sales exist between television and newspaper advertising (Factor B).
Example 15.3… INTERPRET There is not enough evidence to conclude that there is an interaction between marketing strategy and advertising medium that affects mean weekly sales (interaction of Factor A & Factor B).
See for yourself… There are differences between the levels of factor A, no difference between the levels of factor B, and no interaction is apparent.
See for yourself… These results indicate that emphasizing quality produces the highest sales and that television and newspapers are equally effective. INTERPRET
Identifying Factors… Independent Samples Two-Factor Analysis of Variance…
Multiple Comparisons… When we conclude from the one-way analysis of variance that at least two treatment means differ (i.e. we reject the null hypothesis that H 0 : μ 1 = μ 2 = …. = μ k ), we often need to know which treatment means are responsible for these differences. We will examine three statistical inference procedures that allow us to determine which population means differ: Fisher’s least significant difference (LSD) method Bonferroni adjustment, and Tukey’s multiple comparison method. μ 1 = μ 2 = μ 3
Multiple Comparisons… Two means are considered different if the difference between the corresponding sample means is larger than a critical number. The general case for this is, IF THEN we conclude μ i and μ j differ. The larger sample mean is then believed to be associated with a larger population mean.
Fisher’s Least Significant Difference… What is this critical number, N Critical ? One measure is the Least Significant Difference, given by: LSD will be the same for all pairs of means if all k sample sizes are equal. If some sample sizes differ, LSD must be calculated for each combination.
Back to Example 15.1… With k=3 treatments (marketing strategy based on convenience, quality, or price), we will perform three comparisons based on the sample means: We compare these to the Least Significant Difference we calculate as (at 5%significance):
Example 15.1 Fisher’s LSD we conclude that only the means for convenience and quality differ
Bonferroni Adjustment to LSD Method… Fisher’s method may result in an increased probability of committing a type I error. We can adjust Fisher’s LSD calculation by using the “Bonferroni adjustment”. Where we used alpha ( ), say.05, previously, we now use and adjusted value for alpha: where
Example 15.1 Bonferroni’s Adjustment Since we have k=3 treatments, C=k(k–1)/2=3(2)/2=3, hence we set our new alpha value to: Thus, instead of using t.05/2 in our LSD calculation, we are going to use t.0167/2
Bonferroni Similar result as before but different Std.error and Sig…
Tukey’s Multiple Comparison Method… As before, we are looking for a critical number to compare the differences of the sample means against. In this case: Note: is a lower case Omega, not a “w” Critical value of the Studentized range with n–k degrees of freedom Table 7 - Appendix B harmonic mean of the sample sizes ω
Example 15.1 Tukey’s Method… Similar result as before but different Std.error and Sig…
Which method to use? In example 15.1, all three multiple comparison methods yielded the same results. This will not always be the case! Generally speaking… If you have identified two or three pairwise comparisons that you wish to make before conducting the analysis of variance, use the Bonferroni method. If you plan to compare all possible combinations, use Tukey’s comparison method.
Nonparametric Tests for Two or More Populations
Kruskal-Wallis Test… So far we’ve been comparing locations of two populations, now we’ll look at comparing two or more populations. The Kruskal-Wallis test is applied to problems where we want to compare two or more populations of ordinal or interval (but nonnormal) data from independent samples. Our hypotheses will be: H 0 : The locations of all k populations are the same. H 1 : At least two population locations differ.
Test Statistic… In order to calculate the Kruskal-Wallis test statistic, we need to: 1.Rank all the observations from smallest (1) to largest (n), and average the ranks in the case of ties. 2. We calculate rank sums for each sample: T 1, T 2, …, T k 3.Lastly, we calculate the test statistic (denoted H):
Sampling Distribution of the Test Statistic: For sample sizes greater than or equal to 5, the test statistic H is approximately Chi-squared distributed with k–1 degrees of freedom. Our rejection region is: H > χ 2 α,k-1 And our p-value is: P ( χ 2 > H )
Figure Sampling Distribution of H
Example 21.5… Can we compare customer ratings (4=good … 1=poor) for “speed of service” across three shifts in a fast food restaurant? Our hypotheses will be: H 0 : The locations of all 3 populations are the same. ( that is, there is no difference in service between shifts ), and H 1 : At least two population locations differ. Customer ratings for service were recorded…ratings IDENTIFY
Example customers were selected at random from each shift 4:00 P.M to Midnight Midnight to 8:00 A.M : A.M to 4: P.M
Example 21.5… One way to solve the problem is to take the original data, “stack” it, and then sort by customer response & rank bottom to top… COMPUTE sorted by response
Example 21.5… Once its in “stacked” format, put in straight rankings from 1 to 30, average the rankings for the same response, then parse them out by shift to come up with rank sum totals… COMPUTE
Example 21.5… COMPUTE Our critical value of Chi-squared (5% significance and k–1=2 degrees of freedom) is , hence there is not enough evidence to reject H 0. = 2.64
Example 21.5… “There is not enough evidence to infer that a difference in speed of service exists between the three shifts, i.e. all three of the shifts are equally rated, and any action to improve service should be applied to all three shifts” INTERPRET
Example 21.5… “There is not enough evidence to infer that a difference in speed of service exists between the three shifts, i.e. all three of the shifts are equally rated, and any action to improve service should be applied to all three shifts” COMPUTE compare… p-value
SPSS Output “There is not enough evidence to infer that a difference in speed of service exists between the three shifts, i.e. all three of the shifts are equally rated, and any action to improve service should be applied to all three shifts”
Identifying Factors… Factors that Identify the Kruskal-Wallis Test…
Friedman Test… The Friedman Test is a technique used compare two or more populations of ordinal or interval (nonnormal) data that are generated from a matched pairs experiment. The hypotheses are the same as before: H 0 : The locations of all k populations are the same. H 1 : At least two population locations differ.
Friedman Test – Test Statistic… Since this is a matched pairs experiment, we first rank each observation within each of b blocks from smallest to largest (i.e. from 1 to k), averaging any ties. We then compute the rank sums: T 1, T 2, …, T k. Then we calculate our test statistic:
Friedman Test – Test Statistic… This test statistic is approximate Chi-squared with k–1 degrees of freedom (provided either k or b ≥ 5). Our rejection region and p-value are:
The test statistics is approximately chi- squared distributed with k – 1 degrees of freedom provided either k or b is greater than or equal to 5.The rejection region is F r > χ 2 α, k-1 and the p value is P( χ 2 > F r ) The figure on next slide depicts the sampling distribution and p value Sampling Distribution of the Test Statistic
Figure Sampling Distribution of F r
Example 21.6… Four managers evaluate and score job applicants on a scale from 1 (good) to 5 (not so good). There have been complaints that the process isn’t fair. Is it the case that all managers score the candidates equally or not? That is: IDENTIFY
H 0 : The locations of all 4 populations are the same. (i.e. all managers score like candidates alike) H 1 : At least two population locations differ. (i.e. there is some disagreement between managers on scores) The rejection region is F r > χ 2 α,k-1 = χ 2.05,3 = Example 21.6… IDENTIFY
Example 21.6… The data looks like this: COMPUTE There are k=4 populations (managers) and b=8 blocks (applicants) in this set-up.
Example 21.6… Applicant #1 for example, received a top score from manager and next-to-top scores from the other three. Applicant #7 received a top score from manager as well, but the other three scored this candidate very low… COMPUTE
Example 21.6… “rank each observation within block from smallest to largest (i.e. from 1 to k), averaging any ties”… For example, consider the case of candidate #2: COMPUTE Manager Manager Manager Manager Original Scores 4232 checksum “straight ” ranking averaged ranking 4 (1+2)/2= (1+2)/2= checksum = … + k
Example 21.6… Compute the rank sums: T 1, T 2, …, T k and our test statistic… COMPUTE
Example 21.6… COMPUTE The rejection region is Fr > χ 2 α,k-1 = χ 2.05,3 = =
Example 21.6… The value of our Friedman test statistic is compared to a critical value of Chi-squared (at 5% significance and 3 d.f.) which is: Thus, there is sufficient evidence to reject H 0 in favor of H 1 INTERPRET It appears that the managers’ evaluations of applicants do indeed differ
SPSS Output
Identifying Factors… Factors that Identify the Friedman Test…