Tests after a significant F 1. The F test is only a preliminary analysis 2. Planned comparisons vs. post-hoc comparisons 3. What goes in the denominator of our test? 4. What happens to α when we make multiple comparisons among means? 5. t-test for planned comparisons 6. Tukey’s HSD test for post-hoc comparisons 7. Newman-Keuls test for post-hoc comparisons Lecture 16
An aside: We have a set of treatment means, e.g.: X1 X2 X3 X4 X5 From this set, we can form a number of pairs for comparisons of treatment means – here are just a few examples of the possible pairs: X1 vs. X2 X3 X6 vs. X2 vs. X5 Lecture 16
The F test is only a preliminary You have a number of treatments (levels of the independent variable). Each treatment produces a treatment mean. The significant F tells you only that there is a difference among these means somewhere. Pairwise comparisons of the means are then necessary to pinpoint exactly where your effect is. Lecture 16
Planned comparisons Planned comparisons are tests of differences among the treatment means that you designed your experiment to make possible Is different from ? We usually don’t do all possible comparisons among the entire set of treatment means. We choose a few specific comparisons on the basis of a theory of the behavior being studied. Xi Xj Lecture 16
Planned comparisons Doing only a few comparisons is important for two reasons: 1. With α = .05, we would expect to reject H0 by mistake once in 20 tests. If you do all possible comparison, you might do 20 tests for one experiment – so the odds are good that one of them will be “significant” by chance Lecture 16
Planned comparisons 2. When you select a few comparisons out of the set of all possible comparisons, you put your theory in jeopardy. Such specific predictions (of differences between means) are unlikely to be correct by chance. If you put your theory in jeopardy and it survives, you have more confidence in your theory If it doesn’t survive, at least you know the theory was wrong Lecture 16
Planned comparisons Because we only do a few comparisons when using planned comparisons, we do not need to “adjust α.” We do not correct for a higher probability of Type 1 error, when doing a small number of planned comparisons. Lecture 16
The denominator of our t-test Completely Randomized design: planned comparison uses an independent groups t-test. The t-test requires an estimate of 2 for the denominator. Where should that estimate come from? Lecture 16
The denominator of our t-test Previously, to estimate 2, we used a pooled variance based on the two sample variances (SP). In the CRD ANOVA, each sample variance gives an independent estimate of 2 But the average of the sample variances gives a better estimate of 2. 2 Lecture 16
The denominator of our t-test In the ANOVA design, we have multiple samples, so we have multiple sample variances. We can use all of these sample variances to compute an estimate of 2. In fact, we have already computed such an estimate – in the Mean Square Error produced for the ANOVA. Lecture 16
Planned Comparisons t-test √MSE 1 + 1 ni nj Choose pair of means you want to test Find MSE in ANOVA summary table Feed these values into the equation above Evaluate tobt against tα (df MSE) Xi Xj ( ) Lecture 16
Post-hoc tests Post-hoc tests are also tests of differences among treatment means. Here, you decide which means you want to test post-hoc – that is, after looking at the data. “Post-hoc” means “after the fact” – after collecting and looking at the data. “A priori” comparisons are those decided on before data collection – differences predicted on the basis of theory Lecture 16
Post-hoc tests The problem for post-hoc tests is α If you do one test with α = .05, the “long-run” probability of a Type 1 error is .05. But when you do many such comparisons, the probability of one Type 1 error is no longer .05. It is roughly (.05 * k) where k = # of comparisons. Lecture 16
Post-hoc tests IMPORTANT POINT: Even if you do not do all possible comparisons among a set of means explicitly – if you just test the biggest difference among all the pairs of means – you have implicitly tested all the others. This means that the problem alluded to on the previous slides always exists for post-hoc tests. Lecture 16
Two types of Post-hoc Tests 1. Tukey’s Honestly Significant Difference compares all possible pairs of means maintains Type 1 error rate at α for the entire set of comparisons Qobt = (Xi – Xj) √MSE/n (n = sample size) Lecture 16
Tukey’s HSD test To evaluate Qobt, get Qcrit from table. You will need: df = df for MSE k = # of samples in experiment α In Tukey’s HSD tests, use same Qcrit for all the comparisons in the experiment. Lecture 16
Tukey’s HSD test NOTE: If sample sizes are not equal, use the harmonic mean of the sample sizes: n = k Σ(1/ni) (k = # of samples) ~ Lecture 16
Two types of Post-hoc tests 2. Newman-Keuls test The N-K is like Tukey’s HSD in that it makes all possible comparisons among the sample means, and in that it uses the Q statistic. N-K differs from HSD in that Qcrit varies for different comparisons. Lecture 16
Newman-Keuls test As with HSD, Qobt = (Xi – Xj) √MSE/n n = sample size Evaluate Qobt against Qcrit obtained from table, using df, α, and r. r may vary for different comparisons. Lecture 16
Newman-Keuls test To find r for a given comparison, begin by ordering the sample means from highest to lowest. r is then the number of means spanned by the comparison you want to make. X1 X3 X2 X4 77 74 72.5 58.75 r = 4 r = 2 r = 3 Lecture 16
Example 1 1. Students taking Summer School courses sometimes attempt to take more than one course at the same time and/or have a full time job on top of their course(s). To study the effect that these situations may have on a student’s performance, four randomly selected students in each of four conditions are compared on their final exam grades in the statistics course they all took. Lecture 16
Example 1 a. Prior to data collection, it was predicted that students taking just one course (no job) would obtain a significantly higher mean final exam grade than students in the two-courses-plus-job group. It was also predicted that the mean final exam grade of students in the two courses (no job) group would not differ significantly from that of students in the one-course-plus-job group. Perform the necessary analyses to determine whether these predictions are borne out by the data, using .01 for each prediction. Lecture 16
Example 1a Notice these words: “Prior to data collection, it was predicted that …” That means this question calls for a planned comparison – so to answer the question, you do not have to do the ANOVA first, as you would if this were a post-hoc test. But you do need MSE. Lecture 16
Example 1 We have the raw data, so we can use the computational formulas learned last week: CM = (ΣXi)2 = 11292 = 79665.6625 n 16 SSTotal = ΣXi2 – CM SSE = SSTotal – SSTreat SSTreat = ΣTi2 – CM ni Lecture 16
Example 1 The data: S only S + C.S. S + Job S + C.S. + J 78 67 74 59 69 72 63 62 86 74 81 68 75 77 78 46 308 290 246 235 Lecture 16
Example 1 SSE = ΣXi2 – ΣTi2 ni ΣXi2 = 782 + 692 + … + 462 = 81099 Lecture 16
Example 1 SSE = SSTotal – SST = (ΣXi – CM) – (ΣTi – CM) ni = (ΣXi – ΣTi ) – CM + CM = (ΣXi – ΣTi ) 2 2 2 2 2 2 Lecture 16
Example 1 SSE = 81099 – 80451.25 = 647.75 MSE = SSE = SSE = 647.75 = 53.979 df n–p 12 Now, we’re ready to make the comparisons… SSE = SSTotal – SST = Lecture 16
Example 1 HO: μ1 = μ4 HA: μ1 > μ4 Rejection region: tobt > tn-p,α = t12,.01 = 2.681 Reject HO if tobt > 2.681 Lecture 16
Example 1 1 vs 4: t = 77 – 58.75 53.979 + 53.979 4 4 t = 18.25 = 3.513. Reject HO. 5.195 (prediction is supported) √ See the similarity of the denominator of this test to that of the independent groups t-test. In both cases, we’re using measures of error variability averaged across all the samples available. Lecture 16
Example 1 HO: μ2 = μ3 HA: μ2 ≠ μ3 Rejection region: tobt > tn-p,α/2 = t12,.005 = 3.055 Reject HO if tobt > 3.055 Lecture 16
Example 1 2 vs. 3 t = 72.5 – 74 5.195 t = –0.29 Do not reject HO. Lecture 16
Example 1 b. After data collection, it was decided to compare the mean final exam grades of the one course (no job) and two courses (no job) groups, and also to compare the mean grade of the one-course-plus-job group with the two-courses-plus-job group. Each comparison was to be tested with .05. Perform the appropriate procedures. Lecture 16
Example 1b Notice these words: “After data collection, it was decided to compare…” This is a post-hoc test. That means we have to do the ANOVA first (by definition – the ANOVA is the hoc this test is post). Lecture 16
Example 1 HO: μ1 = μ2 = μ3 = μ4 HA: At least two means differ significantly Rejection region: Fobt > F3,12,.05 = 3.49 SSTreat = 80451.25 – 79665.6625 = 786.1875 SSTotal = 81099 – 79665.6625 = 1433.9375 CM Lecture 16
Example 1 Source df SS MS F Treatment 3 786.1875 262.0625 4.85 Error 12 647.75 53.979 Total 15 1433.9375 Decision: Reject HO… now, do the post-hoc test. Lecture 16
Example 1 Using the Newman-Keuls procedure: X1 X3 X2 X4 77 74.0 72.5 58.75 r = 3 r = 3 Comparison 1: One course no job vs. two courses no job Comparison 2: One course plus job vs. two courses plus job Lecture 16
Example 1 HO: μi = μj HO: μi ≠ μj Rejection region: Qobt > Qr,n-p,α/2 = Q3,12,.025 = 3.77 Note: this Qcrit applies to both following tests, because both ‘span’ 3 means. Lecture 16
Example 1 1 vs. 2 Qobt = 77 – 72.5 53.979 4 = 4.5 = 1.23 (Do not reject HO.) 3.67 √ Lecture 16
Example 1 3 vs. 4 Qobt = 74 – 58.75 53.979 4 = 15.5 = 4.16 (Reject HO) 3.67 √ Lecture 16
Example 2a HO: μ1 = μ2 = μ3 HA: At least two means differ significantly Rejection region: Fobt > F2,87,.05 ≈ F2,60,.05 = 3.15 Note: We cannot use computational formulas because we do not have raw data. So, we’ll use the conceptual formulas. Lecture 16
Example 2 1. Compute XG (the Grand Mean). Since ns are all equal: XG = 10.5 + 18.0 +21.1 3 = 16.533 Lecture 16
Example 2 SSTreat = Σni(Xi – XG)2 = 30 [(10.5-16.53)2 + (18.0-16.53)2 + (21.1-16.53)2] = 1782.2 Now we can create the summary table… Lecture 16
Example 2 Source df SS MS F Treatment 2 1782.2 891.1 32.7 Error 87 ???? 27.25 Total 90 Decision: Reject HO – Rotation skill differs significantly across the grades. Lecture 16
Example 2b HO: μ8 = μ4 HA: μ8 > μ4 Rejection region: tobt > t87,.05 ≈ t29,.05 = 1.699 Reject HO if tobt > 1.699 Lecture 16
Example 1 8 vs 4: t = 18.0 – 10.5 27.25 + 27.25 30 30 t = 7.5 = 5.56. Reject HO. 1.348 (prediction is supported) √ See the similarity of the denominator of this test to that of the independent groups t-test. In both cases, we’re using measures of error variability averaged across all the samples available. Lecture 16