Statistical Data Analysis - Lecture12 - 01/04/03 How LSD plots work Recall: when we use an LSD plot to make a pairwise comparison we assume two things The numbers of observations per group are fairly similar – i.e. balanced or “near balanced” designs are better The standard errors of the groups are approximately equal. It is the second assumption that explains the factor of in the LSD interval formula Statistical Data Analysis - Lecture12 - 01/04/03
Statistical Data Analysis - Lecture12 - 01/04/03 Assume then Now suppose , then the difference is significant if where t=tdf(0.025). Now assuming that the t-value used for the LSD interval is approximately the same as t, the two arrows will not overlap if or Statistical Data Analysis - Lecture12 - 01/04/03
Statistical Data Analysis - Lecture12 - 01/04/03 Comparisons When you perform a one-way ANOVA, you have the choice of performing some comparisons The choices are Tukey’s HSD, Fisher’s Protected LSD, Dunnett’s Multiple Range Test and Hsu’s MCB We will consider the first two (Dunnett’s is considered unreliable, and Hsu’s MCB is virtually never used) Fisher’s comparisons are carried out using the LSD procedure we have discussed (although a different df may have been used) Statistical Data Analysis - Lecture12 - 01/04/03
Tukey’s HSD / Tukey-Kramér intervals Test Against where is the critical value from the Studentised Range Distribution and nh is the harmonic mean Statistical Data Analysis - Lecture12 - 01/04/03
Statistical Data Analysis - Lecture12 - 01/04/03 Example In this experiment there are three (k = 3) groups with 50 observations per group. ANOVA gives us Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) treatment 2 164.616 82.308 82.29 < 2.2e-16 *** Residuals 147 147.033 1.000 P-value << 0.05 so definitely significant. The group means are Lo Med Hi -0.0222 0.1230 2.2700 And the critical value Which differences are significant? Statistical Data Analysis - Lecture12 - 01/04/03
Tukey’s HSD / Tukey-Kramér intervals Bonferroni intervals are very very conservative for large numbers of groups (large k) By conservative we mean the intervals are wide Fisher’s LSD is at the other end of the scale, where the intervals are quite small, so the chance of a type I error is higher than with a Bonferroni interval Tukey intervals are somewhere in between. For small k, they behave more like Bonferroni intervals, for large k, like LSD intervals – “the porridge was neither too hot nor too cold – it was just right!” Statistical Data Analysis - Lecture12 - 01/04/03
Statistical Data Analysis - Lecture12 - 01/04/03 Linear Contrasts It is relatively simple to see from our results that the two Michael Crichton books have smaller sentence lengths on average. Therefore, it may be instructive, and useful, to be able to compare the two authors rather than to compare two books We can do this by generalising the concept of confidence intervals for the difference of two means. All possible pairwise differences between the mean sentence length of the ith book, i, and the mean sentence length of the jth book, j, are linear combinations of the general form If the ci’s are specified constants subject to the constraint Then this is called a linear contrast Statistical Data Analysis - Lecture12 - 01/04/03
Statistical Data Analysis - Lecture12 - 01/04/03 Linear Contrasts For example, 1 - 2 is a linear contrast that examines the difference between book 1 (Eye of the Dragon) and book 2 (The Tommy Knockers). The coefficients are This contrast satisifies our constraint that the coefficients sum to zero It is then easy to see how we can construct contrasts that test groups against each other. Statistical Data Analysis - Lecture12 - 01/04/03
A linear contrast for authors E.g. We wish to look at the difference between the Steven King books and the Michael Crichton books, so our contrast takes the form To place any confidence intervals on our contrasts we need estimates of the contrast itself and the standard error of the estimate. Statistical Data Analysis - Lecture12 - 01/04/03
Estimating a linear contrast The estimate of the contrast is easily obtained by replacing the population means with the sample means, i.e. if is the mean of the ith group is then an estimate of the contrast is We’ve seem that the WGMS is an estimate of the variance of each of the groups (remember we assume each group has the same variance), so the square root of the WGMS is an estimate of the standard deviation, Statistical Data Analysis - Lecture12 - 01/04/03
Standard error of a linear contrast An estimate of the standard deviation (the standard error) of the contrast is then given by Therefore 100(1-)% confidence interval for the contrast is given by where Statistical Data Analysis - Lecture12 - 01/04/03
A hypothesis test for linear contrasts Given that we have the an estimate of the statistic, and the and estimate of the the standard deviation (its standard error), it is relatively simple to go from our confidence interval to a hypothesis test Our null and alternative hypotheses are So our hypothesised difference 0 is zero Our test statistic is where and Statistical Data Analysis - Lecture12 - 01/04/03
Statistical Data Analysis - Lecture12 - 01/04/03 so Now we find the P-value using where T is distributed Student with N – k degrees of freedom Statistical Data Analysis - Lecture12 - 01/04/03
Statistical Data Analysis - Lecture12 - 01/04/03 E.g. Lets test the logged sentence lengths of the Steven King books against the logged sentence lengths of the Michael Crichton books. Our estimate of the contrast is where is the mean of the logged sentence lengths of the ith book (i=“The Eye of the Dragon”,..., “Disclosure”). Working this out gives 0.75. The WGMS0.507, so therefore our standard error is Statistical Data Analysis - Lecture12 - 01/04/03
Statistical Data Analysis - Lecture12 - 01/04/03 N-k=400-8=392, so t392(0.025)=z0.025=1.96 and our test statistic is As t0>>1.96 we can say that there is very strong evidence against the null hypothesis that the two authors are the same (on the basis of sentence length). A 95% confidence interval for our contrast is Transforming back to the original scale, this tells us that the sentences ins Stephen King’s books are on average approximately 1.8 to 2.5 times longer Statistical Data Analysis - Lecture12 - 01/04/03