Comparing Means
Anova F-test can be used to determine whether the expected responses at the t levels of an experimental factor differ from each other When the null hypothesis is rejected, it may be desirable to find which mean(s) is (are) different, and at what ranking order. In practice, it is actually not primary interest to test the null hypothesis, instead the investigators want to make specific comparisons of the means and to estimate pooled error
Means comparison Three categories: 1. Pair-wise comparisons 2. Comparison specified prior to performing the experiment (Planned comparison) 3. Comparison specified after observing the outcome of the experiment (Post-Hoc Comparison) Statistical inference procedures of pair-wise comparisons: Fisher’s least significant difference (LSD) method Duncan’s Multiple Range Test (DMRT) Student Newman Keul Test (SNK) Tukey’s HSD (“Honestly Significantly Different”) Procedure
Suppose there are t means Pair Comparison Suppose there are t means An F-test has revealed that there are significant differences amongst the t means Performing an analysis to determine precisely where the differences exist.
Pair Comparison Two means are considered different if the difference between the corresponding sample means is larger than a critical number. Then, the larger sample mean is believed to be associated with a larger population mean. Conditions common to all the methods: The ANOVA model is the one way analysis of variance The conditions required to perform the ANOVA are satisfied. The experiment is fixed-effect.
Comparing Pair-comparison methods With the exception of the F-LSD test, there is no good theoretical argument that favors one pair-comparison method over the others. Professional statisticians often disagree on which method is appropriate. In terms of Power and the probability of making a Type I error, the tests discussed can be ordered as follows: MORE Power HIGHER P[Type I Error] Tukey HSD Test Student-Newman-Keuls Test Duncan Multiple Range Test Fisher LSD Test Pairwise comparisons are traditionally considered as “post hoc” and not “a priori”, if one needs to categorize all comparisons into one of the two groups
Fisher Least Significant Different (LSD) Method This method builds on the equal variances t-test of the difference between two means. The test statistic is improved by using MSE rather than sp2. It is concluded that mi and mj differ (at a% significance level if |mi - mj| > LSD, where
Experimentwise Type I error rate (aE) (the effective Type I error) The Fisher’s method may result in an increased probability of committing a type I error. The experimentwise Type I error rate is the probability of committing at least one Type I error at significance level of a. It is calculated by aE = 1-(1 – a)C where C is the number of pairwise comparisons (all: C = t(t-1)/2 The Bonferroni adjustment determines the required Type I error probability per pairwise comparison (a) , to secure a pre-determined overall aE.
Bonferroni Adjustment The procedure: Compute the number of pairwise comparisons (C) [all: C=t(t-1)/2], where t is the number of populations. Set a = aE/C, where aE is the true probability of making at least one Type I error (called experimentwise Type I error). It is concluded that mi and mj differ at a/C% significance level if
Duncan’s Multiple Range Test The Duncan Multiple Range test uses different Significant Difference values for means next to each other along the real number line, and those with 1, 2, … , a means in between the two means being compared. The Significant Difference or the range value: where ra,p,n is the Duncan’s Significant Range Value with parameters p (= range-value) and n (= MSE degree-of-freedom), and experiment-wise alpha level a (= ajoint).
Duncan’s Multiple Range Test MSE is the mean square error from the ANOVA table and n is the number of observations used to calculate the means being compared. The range-value is: 2 if the two means being compared are adjacent 3 if one mean separates the two means being compared 4 if two means separate the two means being compared …
Significant Ranges for Duncan’s Multiple Range Test
Student-Newman-Keuls Test Similar to the Duncan Multiple Range test, the Student-Newman-Keuls Test uses different Significant Difference values for means next to each other, and those with 1, 2, … , a means in between the two means being compared. The Significant Difference or the range value for this test is where qa,a,n is the Studentized Range Statistic with parameters p (= range-value) and n (= MSE degree-of-freedom), and experiment-wise alpha level a (= ajoint).
Student-Newman-Keuls Test MSE is the mean square error from the ANOVA table and n is the number of observations used to calculate the means being compared. The range-value is: 2 if the two means being compared are adjacent 3 if one mean separates the two means being compared 4 if two means separate the two means being compared …
Studentized Range Statistic
Tukey HSD Procedure Tukey HSD Procedure: The Tukey Procedure is used to compare all individual pairs of means after a significant ANOVA test has been conducted. HSD = “honestly significantly different”
Tukey HSD Procedure The test procedure: Assumes equal number of observation per populations. Find a critical number w as follows: dft = treatment degrees of freedom n =degrees of freedom = dfe ng = number of observations per population a = significance level qa(dft,n) = a critical value obtained from the studentized range table
Studentized Range Statistic
Tukey Multiple Comparisons Select a pair of means. Calculate the difference between the larger and the smaller mean. If there is sufficient evidence to conclude that mmax > mmin . Repeat this procedure for each pair of samples. Rank the means if possible. If the sample sizes are not extremely different, it can be used the above procedure with ng calculated as the harmonic mean of the sample sizes.
Planned Comparisons or Contrasts In some cases, an experimenter may know ahead of time that it is of interest to compare two different means, or groups of means. An effective way to do this is to use contrasts or planned comparisons. These represent specific hypotheses in terms of the treatment means such as:
Planned Comparisons or Contrasts Each contrast can be specified as: and it is required: A sum-of-squares can be calculated for a contrast as
Planned Comparisons or Contrasts Each contrast has 1 degree-of-freedom, and a contrast can be tested by comparing it to the MSE for the ANOVA: If more than 1 contrast is tested, it is important that the contrasts all be orthogonal, that is Note that It can be tested at most t-1 orthogonal contrasts.
Contrast Examples Design Characteristics 1 3 colors, with cartoons 2 3 colors, without cartoons 3 5 colors, with cartoons 4 5 colors, without cartoons Comparison of… The mean sales for the two 3-color designs m1 = m2 The mean sales for the 3-color and 5-color designs (m1+m2)/2 = (m3+m4)/2 The mean sales for designs with and without cartoons (m1+m3)/2 = (m2+m4)/2 The mean sales for design 1 with average sales for all four designs m1 = (m1+m2+m3+m4)/4
Post Hoc comparisons 1. F test for contrast →Scheffe test 2. To test all linear combinations at once. 3. Very conservative; not to be used for pairwise comparisons 4. A Priori comparisons
Scheffe's Critical Differences (for Linear contrasts) A linear contrast is declared significant if it exceeds this amount. = the tabled value for F distribution (p -1 = df for comparing p means, n = df for Error)
Scheffe's Critical Differences (for comparing two means) Two means are declared significant if they differ by more than this amount.