Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tests after a significant F

Similar presentations


Presentation on theme: "Tests after a significant F"— Presentation transcript:

1 Tests after a significant F
1. The F test is only a preliminary analysis 2. Planned comparisons vs. post-hoc comparisons 3. What goes in the denominator of our test? 4. What happens to α when we make multiple comparisons among means? 5. t-test for planned comparisons 6. Tukey’s HSD test for post-hoc comparisons 7. Newman-Keuls test for post-hoc comparisons Lecture 16

2 An aside: We have a set of treatment means, e.g.:
X1 X2 X3 X4 X5 From this set, we can form a number of pairs for comparisons of treatment means – here are just a few examples of the possible pairs: X1 vs. X2 X3 X6 vs. X2 vs. X5 Lecture 16

3 The F test is only a preliminary
You have a number of treatments (levels of the independent variable). Each treatment produces a treatment mean. The significant F tells you only that there is a difference among these means somewhere. Pairwise comparisons of the means are then necessary to pinpoint exactly where your effect is. Lecture 16

4 Planned comparisons Planned comparisons are tests of differences among the treatment means that you designed your experiment to make possible Is different from ? We usually don’t do all possible comparisons among the entire set of treatment means. We choose a few specific comparisons on the basis of a theory of the behavior being studied. Xi Xj Lecture 16

5 Planned comparisons Doing only a few comparisons is important for two reasons: 1. With α = .05, we would expect to reject H0 by mistake once in 20 tests. If you do all possible comparison, you might do 20 tests for one experiment – so the odds are good that one of them will be “significant” by chance Lecture 16

6 Planned comparisons 2. When you select a few comparisons out of the set of all possible comparisons, you put your theory in jeopardy. Such specific predictions (of differences between means) are unlikely to be correct by chance. If you put your theory in jeopardy and it survives, you have more confidence in your theory If it doesn’t survive, at least you know the theory was wrong Lecture 16

7 Planned comparisons Because we only do a few comparisons when using planned comparisons, we do not need to “adjust α.” We do not correct for a higher probability of Type 1 error, when doing a small number of planned comparisons. Lecture 16

8 The denominator of our t-test
Completely Randomized design: planned comparison uses an independent groups t-test. The t-test requires an estimate of 2 for the denominator. Where should that estimate come from? Lecture 16

9 The denominator of our t-test
Previously, to estimate 2, we used a pooled variance based on the two sample variances (SP). In the CRD ANOVA, each sample variance gives an independent estimate of 2 But the average of the sample variances gives a better estimate of 2. 2 Lecture 16

10 The denominator of our t-test
In the ANOVA design, we have multiple samples, so we have multiple sample variances. We can use all of these sample variances to compute an estimate of 2. In fact, we have already computed such an estimate – in the Mean Square Error produced for the ANOVA. Lecture 16

11 Planned Comparisons t-test
√MSE ni nj Choose pair of means you want to test Find MSE in ANOVA summary table Feed these values into the equation above Evaluate tobt against tα (df MSE) Xi Xj ( ) Lecture 16

12 Post-hoc tests Post-hoc tests are also tests of differences among treatment means. Here, you decide which means you want to test post-hoc – that is, after looking at the data. “Post-hoc” means “after the fact” – after collecting and looking at the data. “A priori” comparisons are those decided on before data collection – differences predicted on the basis of theory Lecture 16

13 Post-hoc tests The problem for post-hoc tests is α
If you do one test with α = .05, the “long-run” probability of a Type 1 error is .05. But when you do many such comparisons, the probability of one Type 1 error is no longer .05. It is roughly (.05 * k) where k = # of comparisons. Lecture 16

14 Post-hoc tests IMPORTANT POINT:
Even if you do not do all possible comparisons among a set of means explicitly – if you just test the biggest difference among all the pairs of means – you have implicitly tested all the others. This means that the problem alluded to on the previous slides always exists for post-hoc tests. Lecture 16

15 Two types of Post-hoc Tests
1. Tukey’s Honestly Significant Difference compares all possible pairs of means maintains Type 1 error rate at α for the entire set of comparisons Qobt = (Xi – Xj) √MSE/n (n = sample size) Lecture 16

16 Tukey’s HSD test To evaluate Qobt, get Qcrit from table. You will need: df = df for MSE k = # of samples in experiment α In Tukey’s HSD tests, use same Qcrit for all the comparisons in the experiment. Lecture 16

17 Tukey’s HSD test NOTE: If sample sizes are not equal, use the harmonic mean of the sample sizes: n = k Σ(1/ni) (k = # of samples) ~ Lecture 16

18 Two types of Post-hoc tests
2. Newman-Keuls test The N-K is like Tukey’s HSD in that it makes all possible comparisons among the sample means, and in that it uses the Q statistic. N-K differs from HSD in that Qcrit varies for different comparisons. Lecture 16

19 Newman-Keuls test As with HSD, Qobt = (Xi – Xj) √MSE/n n = sample size
Evaluate Qobt against Qcrit obtained from table, using df, α, and r. r may vary for different comparisons. Lecture 16

20 Newman-Keuls test To find r for a given comparison, begin by ordering the sample means from highest to lowest. r is then the number of means spanned by the comparison you want to make. X1 X3 X2 X4 r = 4 r = 2 r = 3 Lecture 16

21 Example 1 1. Students taking Summer School courses sometimes attempt to take more than one course at the same time and/or have a full time job on top of their course(s). To study the effect that these situations may have on a student’s performance, four randomly selected students in each of four conditions are compared on their final exam grades in the statistics course they all took. Lecture 16

22 Example 1 a. Prior to data collection, it was predicted that students taking just one course (no job) would obtain a significantly higher mean final exam grade than students in the two-courses-plus-job group. It was also predicted that the mean final exam grade of students in the two courses (no job) group would not differ significantly from that of students in the one-course-plus-job group. Perform the necessary analyses to determine whether these predictions are borne out by the data, using   .01 for each prediction. Lecture 16

23 Example 1a Notice these words:
“Prior to data collection, it was predicted that …” That means this question calls for a planned comparison – so to answer the question, you do not have to do the ANOVA first, as you would if this were a post-hoc test. But you do need MSE. Lecture 16

24 Example 1 We have the raw data, so we can use the computational formulas learned last week: CM = (ΣXi)2 = = n SSTotal = ΣXi2 – CM SSE = SSTotal – SSTreat SSTreat = ΣTi2 – CM ni Lecture 16

25 Example 1 The data: S only S + C.S. S + Job S + C.S. + J Lecture 16

26 Example 1 SSE = ΣXi2 – ΣTi2 ni ΣXi2 = 782 + 692 + … + 462 = 81099
Lecture 16

27 Example 1 SSE = SSTotal – SST = (ΣXi – CM) – (ΣTi – CM) ni
= (ΣXi – ΣTi ) – CM + CM = (ΣXi – ΣTi ) 2 2 2 2 2 2 Lecture 16

28 Example 1 SSE = – = MSE = SSE = SSE = = df n–p 12 Now, we’re ready to make the comparisons… SSE = SSTotal – SST = Lecture 16

29 Example 1 HO: μ1 = μ4 HA: μ1 > μ4
Rejection region: tobt > tn-p,α = t12,.01 = 2.681 Reject HO if tobt > 2.681 Lecture 16

30 Example 1 1 vs 4: t = – 58.75 4 4 t = = Reject HO. (prediction is supported) See the similarity of the denominator of this test to that of the independent groups t-test. In both cases, we’re using measures of error variability averaged across all the samples available. Lecture 16

31 Example 1 HO: μ2 = μ3 HA: μ2 ≠ μ3 Rejection region: tobt > tn-p,α/2 = t12,.005 = 3.055 Reject HO if tobt > 3.055 Lecture 16

32 Example 1 2 vs. 3 t = 72.5 – 74 5.195 t = –0.29 Do not reject HO.
Lecture 16

33 Example 1 b. After data collection, it was decided to compare the mean final exam grades of the one course (no job) and two courses (no job) groups, and also to compare the mean grade of the one-course-plus-job group with the two-courses-plus-job group. Each comparison was to be tested with   .05. Perform the appropriate procedures. Lecture 16

34 Example 1b Notice these words:
“After data collection, it was decided to compare…” This is a post-hoc test. That means we have to do the ANOVA first (by definition – the ANOVA is the hoc this test is post). Lecture 16

35 Example 1 HO: μ1 = μ2 = μ3 = μ4 HA: At least two means differ significantly Rejection region: Fobt > F3,12,.05 = 3.49 SSTreat = – = SSTotal = – = CM Lecture 16

36 Example 1 Source df SS MS F Treatment 3 786.1875 262.0625 4.85
Error Total Decision: Reject HO… now, do the post-hoc test. Lecture 16

37 Example 1 Using the Newman-Keuls procedure: X1 X3 X2 X4
r = 3 r = 3 Comparison 1: One course no job vs. two courses no job Comparison 2: One course plus job vs. two courses plus job Lecture 16

38 Example 1 HO: μi = μj HO: μi ≠ μj Rejection region:
Qobt > Qr,n-p,α/2 = Q3,12,.025 = 3.77 Note: this Qcrit applies to both following tests, because both ‘span’ 3 means. Lecture 16

39 Example 1 1 vs. 2 Qobt = – 72.5 53.979 4 = = (Do not reject HO.) 3.67 Lecture 16

40 Example 1 3 vs. 4 Qobt = – 58.75 53.979 4 = = (Reject HO) 3.67 Lecture 16

41 Example 2a HO: μ1 = μ2 = μ3 HA: At least two means differ significantly Rejection region: Fobt > F2,87,.05 ≈ F2,60,.05 = 3.15 Note: We cannot use computational formulas because we do not have raw data. So, we’ll use the conceptual formulas. Lecture 16

42 Example 2 1. Compute XG (the Grand Mean).
Since ns are all equal: XG = 3 = Lecture 16

43 Example 2 SSTreat = Σni(Xi – XG)2
= 30 [( )2 + ( )2 + ( )2] = Now we can create the summary table… Lecture 16

44 Example 2 Source df SS MS F Treatment 2 1782.2 891.1 32.7
Error 87 ???? Total 90 Decision: Reject HO – Rotation skill differs significantly across the grades. Lecture 16

45 Example 2b HO: μ8 = μ4 HA: μ8 > μ4
Rejection region: tobt > t87,.05 ≈ t29,.05 = 1.699 Reject HO if tobt > 1.699 Lecture 16

46 Example 1 8 vs 4: t = – 10.5 t = = Reject HO. (prediction is supported) See the similarity of the denominator of this test to that of the independent groups t-test. In both cases, we’re using measures of error variability averaged across all the samples available. Lecture 16


Download ppt "Tests after a significant F"

Similar presentations


Ads by Google