Download presentation
Presentation is loading. Please wait.
Published byNancy Kelly Modified over 6 years ago
1
Chapter 11- Lecture 2 More t tests for independent groups: Dunnett’s t, andTukey’s Honestly Significant Difference 2/19/2019
2
Conceptual review 2/19/2019
3
t and F tests: 2 approaches to measuring the distance between means
There are two ways to tell how far apart things are. When there are only two things, you can directly determine their distance from each other. If they are two measurements, as they usually are in Psychology, you simply subtract one from the other to find their difference. That is the approach used in the t test and its variants. 2/19/2019
4
F tests: Alternatively, when you want to describe the distance of three or more things from each other, the best way to index their distance from each other is to find a central point and talk about their average squared distance (or average unsquared distance) from that point. The further apart things are from each other, the further they will be, on the average, from that central point. That is the approach you have used in the F test (and when treating the t test as a special case of the F test: t for one or two, F for more.) 2/19/2019
5
One way or another: ultimately they yield identical results.
We can use either method, or a combination of the two methods to ask the key question in this part of the course, “Are two or more means further apart than they are likely to be when the null hypothesis is true.” 2/19/2019
6
H0: It’s just sampling fluctuation.
If the only thing that makes the two means different is random sampling fluctuation, the means will be fairly close to the population mean and to each other. If an independent variable is pushing the means apart, their distance from each other, or from some central point, will tend to be too great to be explained by the null hypothesis. 2/19/2019
7
Let’s review the t test for 2 independent groups
2/19/2019
8
Here’s the formula for the independent groups t test:
2/19/2019
9
Here’s the formula for the independent groups t test:
2/19/2019
10
Here is the formula for the estimated standard error of the difference between the means of two independent samples 2/19/2019
11
So, to compute a t test for two independent groups, the only (fairly) new computation is nH.
2/19/2019
12
Calculating the Harmonic Mean
Notice that this technique allows different numbers of subjects in each group. Oh No!! My rat died! What is going to happen to my experiment? 2/19/2019
13
If the groups are the same size , the harmonic and ordinary mean number of participants is the same.
3 groups; 4 subjects each 2/19/2019
14
When groups do not have equal numbers, the harmonic mean is smaller than ordinary mean.
4 groups; 6, 4, 8 and 4 participants. Ordinary mean=22/4=5.50 participants each. 2/19/2019
15
Let’s say you had the following data
Group 1 has 18 participants and a mean of Group 2 has 14 participants and a mean of 71.50 MSW = 20.00 nH = 2/(1/18+1/14)=2/.127 = 15.75 Compute the independent groups t test 2/19/2019
16
Here’s the formula for the independent groups t test:
2/19/2019
17
Let’s do the computation for our example:
2/19/2019
18
df df df df df 2/19/2019
19
Is that significant? With 30 df, a t of is significant at .05. We would write the result t(30)=2.201, p<.05 2/19/2019
20
MULTIPLE COMPARISONS 2/19/2019
21
Two or more experimental treatments vs. a control group.
Many studies compare several treatment groups to a control group. For example, imagine a study comparing three active medications to placebo. Somehow, there is a mix-up at the plant and all the pills in the study are actually the sugar pill – placebo. 2/19/2019
22
All the groups got sugar pills
All the groups got sugar pills! Any differences among the groups reflects only sampling fluctuation H0 is true. In fact, all the groups are treated identically, we expect their means of the groups to be similar to each other. So the null hypothesis, H0, is true. Significance testing should fail to reject the null hypothesis and we should attribute all the differences among the groups to sampling fluctuation. 2/19/2019
23
On the other hand, once in a while we would obtain statistically significant differences although the independent variable has no effect. 2/19/2019
24
In those cases, significant differences between means would mislead us into making a Type 1 error.
That is, we would have to say that the independent variable pushed the means apart and the results should hold up in the population as a whole when neither is true. 2/19/2019
25
We really don’t like making Type 1 errors
We really don’t like making Type 1 errors! That is why we do significance testing in the first place. However for a variety of reasons, most of which are dealt with in Chapter 12, we are willing to make five Type 1 errors in 100 studies (p<.05). 2/19/2019
26
NOTE THAT WE ALLOW OURSELVES TO MAKE TYPE 1 ERRORS IN FIVE STUDIES IN 100, NOT FIVE HYPOTHESIS TESTS IN 100. BUT WE OFTEN TEST MORE THAN ONE NULL HYPOTHESIS IN A STUDY. IF WE ALLOW FIVE TYPE 1 ERRORS IN 100 HYPOTHESIS TESTS, AND WE TEST SEVERAL NULL HYPOTHESES IN EACH STUDY, FAR MORE THAN FIVE STUDIES IN 100 WILL CONTAIN AT LEAST ONE TYPE 1 ERROR. 2/19/2019
27
Think of it this way. You must pass a true/false test on advanced tensor calculus. You can’t make heads or tails of the subject no matter how hard you try. (I couldn’t and I’m teaching this course). You have a 5% chance to passing the test each time you take it just by choosing True or False randomly. Which would you prefer, taking the test once or as many times as you wish You know intuitively that you will pass the test eventually just by chance. 2/19/2019
28
It’s the same thing with the t test
Compare 2 groups that differ by chance an you have only a 5% chance of making a Type 1 error. Make lots of comparisons and sooner or later you will make a Type 1 error simply by chance. 2/19/2019
29
It’s the same thing with the t test
The solution: change the alpha level and thus the critical value of t on each test so that there is only a 5% chance of getting any Type 1 errors given the number of comparisons you have to make. 2/19/2019
30
For example: Comparing 3 experimental treatment groups to a control group.
In a study in which several samples are compared to a control group, we do several t independent groups t tests. If there are three treatment groups, we do 3 t tests. In an ordinary t test, the odds on obtaining a strange sample and getting statistical significance when H0 is true is set, by convention, at .05 (5 in 100) The probability of not rejecting a null hypothesis that is correct is therefore 95 in 100 (.95). 2/19/2019
31
What are the odds on a Type 1 error when you compare 3 treatments to a control group?
But, the odds on not rejecting 3 correct nulls in a row, using simple t tests, is the probability of correctly rejecting the first null times the probability of also rejecting the second null times the probability of rejecting the third null (.95)(.95)(.95) =(.95)3=.8573 The odds on one of the three comparisons being significant when the three nulls are true is = or almost exactly 1 out of 7, not 1 out of 20. 2/19/2019
32
Conclusion: So, if we do three comparisons and don’t adjust alpha for each one, the odds on sampling fluctuation (mis)leading us to finding a significant difference between at least one of the 3 pairs are over 14 in 100, not 5 in 100. 2/19/2019
33
When there are more comparisons the odds on type 1 error get much higher!
Say we have 6 groups to compare to the control group, that yields 6 comparisons. If we do 6 t tests with alpha at .05 for each, we have 95 chances in 100 of properly failing to reject the null (and retaining it) each time. But the odds on properly retaining it every one of the 6 times is (.95)6= .95x.95x.95x.95x.95x.95=.735 So there would be = .265 = 26.5% chance (or more than 1 chance in 4) of committing at least one Type 1 error by rejecting the null hypothesis when the only reason two groups differ is random sampling fluctuation 2/19/2019
34
We have to lower alpha for each comparison to keep experimentwise alpha at .05
To make alpha for the whole study (experimentwise alpha) stay at .05 when the all the comparisons are considered, we must lower alpha (quite a bit). For example, with 10 treatment groups and one control group, we need to set alpha at a little more than .005 for each of the 10 comparisons. Then, when we consider all 10 comparisons they yield an experimentwise alpha of .05 [( )10=( )10 = ] 2/19/2019
35
It’s like dividing .05 by the number of comparisons
When there are three groups vs. a control group (and 3 comparisons) its almost like you divide .05 by 3 and set the critical value of t so that a proportion of .05/3=0167 (1.67%) stays in the tails for each t test That means that you are creating a =98.33% confidence interval that is consistent with the null hypothesis for each t test. Then you have 5% altogether for the 3 tests. The actual values for the confidence interval are slightly different than that, but its close. 2/19/2019
36
More than 2 groups? Divide .05 by the number of comparisons.
For 6 comparisons the critical value for t leaves about .05/6~.0083 in the tails, about =.9917 in the body of the t curve For 10 comparisons the critical value for t leaves about .05/10~.0050 in the tails, about =.995 in the body of the t curve (Actual values involve nth root of .95, so are a very little different than the values above (e.g., comparisons instead of But dividing .05 by the number of comparisons is a “good enough” way to think about it.) 2/19/2019
37
Summary: We have to lower alpha for each individual t test to keep experimentwise alpha at .05
If we kept alpha for each individual t test at .05, then did 10, 20 comparisons, or 30 comparisons between pairs of mean, we almost certainly will get at least one and possibly more Type 1 errors. That is, we would get statistically significant findings that would force us to say that two treatments would differ in their effects in the population as a whole, when that isn’t true. So we must lower alpha for each comparison to get an experimentwise alpha of .05 2/19/2019
38
How can we do that? Simple! We raise the critical value of t so that there is a 5% chance of rejecting any of the null hypotheses examined in a study purely because of sampling fluctuation. How high do we raise the critical value of t? That depends on dfW and the number of comparisons. If we had a sufficiently lengthy t table (or the right computer) we could easily generate the correct critical values for t. Dunnett and Tukey did it for us. So we just have to use their tables. Dunnett looked at treatments vs. a control group. Tukey was concerned with making all possible pairwise comparisons between different conditions in an experiment. 2/19/2019
39
Table 11.5: Dunnett's t table for comparison of treatment groups to a control group. To find the correct, critical value of t, go across to the column for the correct # of groups, including the control group, then down to the row for the appropriate number of df within group, n-k. If your dfW lies between values in the table, either interpolate or use the critical values for the smaller # of df. 2/19/2019
40
df within Number of treatment means total, including the control condition
group alpha 2/19/2019
41
df within Number of treatment means total, including the control condition
group alpha 2/19/2019
42
Notice The numbers in the column for two groups in the Dunnett’s table are the same as those in an ordinary t table. That is because the Dunnett’s table corrects for the number of comparison. When there is only one comparison we are back at the ordinary t table. 2/19/2019
43
Two groups and a control group.
Group 1 has 18 participants and a mean of Group 2 has 16 participants and a mean of Group 3 (the control group) has 14 participants and a mean of 71.50 MSW = 20.00 nH = 3/(1/18+1/14+1/16)=3/.190 = 15.83 Compute an independent groups t test for each of the two treatment groups compared to Group 3, the control group. 2/19/2019
44
Here’s the formula for the independent groups t test again:
2/19/2019
45
Group 1 vs. Group 3 (the control group).
2/19/2019
46
Group 2 vs. Group 3 (the control group).
2/19/2019
47
Are those comparisons significant?
If we were just making a single comparison, with 40 df, a t of is significant at .05. With 60 df a t of is needed. Usually, we would just use the value for 40 df. We could interpolate (going 5/20=25% of the way from the critical value for 40 df to that for 60 df.) Unless it’s close, no one bothers! If we used an ordinary t table, Group1 vs. Group 3 would be significant while Group 2 vs. Group 3 would not. 2/19/2019
48
However, now we have two comparisons and must keep experimentwise alpha at .05.
So, now we use the table for Dunnett’s t, a t table corrected for the number of comparisons we are making. For experimentwise alpha to be .05, alpha should be about .05/2=.025 for each comparison (actually .0253). Again we would look at the critical values for 40 and 60 df, but use the value for 40 df. But those values will be higher due to the more stringent alpha for each test Let’s look at Dunnett’s t table. 2/19/2019
49
df within Number of treatment means total, including the control condition
group alpha 2/19/2019
50
Now the critical value of t is about 2. 29 (2
Now the critical value of t is about 2.29 (2.285 if you bother to interpolate). And our results are different. Group 1 vs 3: t(45)=2.201, n.s. Group 2 vs 3: t(45)=0.943, n.s. Given the higher critcal t value, neither comparison is significant. 2/19/2019
51
Comparing all the group means in an experiment to each other.
2/19/2019
52
Means and Significance
A significant F test could tell us that there is a main effect or an interaction effect. When that happens,we know that there is a difference among the means that is unlikely to occur just by chance. But which means???? The only ones we can be sure are significantly different from each other are the highest and lowest means. That’s not a problem with the t test, because there are only two means But when more that 2 means, what can we say about the relationship between means other than the highest and lowest? 2/19/2019
53
Tukey’s HSD When we want to compare all the means in an experiment to each other (as opposed to a control group) there are many more possible comparisons that could come out significant, just by chance. Keeping experimentwise alpha at .05 requires even more stringent levels of alpha for each individual test. 2/19/2019
54
Also: When there are so many comparisons, it becomes tedious to do individual t tests on them all. So we would like a fancy t test, one that tells us how far apart any two means have to be in order to be significantly different. That will depend on 1. The number of possible pairwise comparisons 2. dfW and the critical values of t 3. MSW 2/19/2019
55
Let’s see how things change when we compare the experimental groups to each other as well as to the control group. When we were comparing 3 experimental groups to a control group, There are 4 groups and three t tests (Group 1 vs control, Group 2 vs. control, Group 3 vs. control). When there were 4 groups in a study, comparing all the possible combinations of the four means, two means at a time, requires us to make six comparisons instead of 3. 2/19/2019
56
Here are the (4)(3)/2=6 comparisons for 4 groups
Example Here are the (4)(3)/2=6 comparisons for 4 groups Group # vs Group # 2/19/2019
57
Here is the formula for the number of possible pairwise comparisons
Number of possible pairwise comparisons = [k(k-1)/2] The number of possible pairwise comparisons equals the number of groups times the number of groups minus one divided by 2 6 groups =[(6)(6-1)/2=6*5/2= 30/2 = 15 7 groups =[(7)(7-1)/2]=7*6/2 = 42/2=21 Etc. 2/19/2019
58
That’s too many t tests, because: 1
That’s too many t tests, because: 1. It’s a pain in the neck to do that many t tests. 2. We’d need a very big Dunnett’s table. 3. There has to be a better way to do it. 2/19/2019
59
We need a fancy t test The rule is t for 2.
When you want to compare 2 groups to determine whether they are significantly different, use the t test. But we need a t test designed for multiple comparisons. Alpha adjusted so that experimentwise alpha =.05. Easy to compare all the groups in any specific experiment, no matter how many there are. 2/19/2019
60
Six groups: 15 pairwise comparisons
Let’s say that we had 6 groups in a study That gives us (6)(5)/2=15 possible pairwise comparisons Assume H0 is true. With 15 comparisons, the chances of all of the 15 comparisons failing to be statistically significant are (.95)15=..463 So the odds on at least one significant finding (though H0 is true) is =.537 or 53.7%. By this point the odds on making at least one Type 1 error are higher than the odds of not making one. 2/19/2019
61
We have to lower alpha for each comparison to keep experimentwise alpha at .05
To make alpha stay at .05 when the all the comparisons are considered, we must lower alpha (quite a bit) For example, with 6 groups, we need to set alpha at a little more than .05/15 ~.0033 for each of the 10 comparisons. Then, taken together, the 15 comparisons yield an experimentwise alpha of very close to .05 [( )15=(.9966)15 = .9502~.95] 2/19/2019
62
The q table and the Tukey test
We could find the correct critical values for t with a lot of work and a very lengthy t table. Fortunately, as in the case of Dunnett’s t someone did it for us. A man named Tukey. He gave us the Tukey test and the q table. The q table is a fancy t table with each value of q equal to the proper critical value for t corrected for the number of comparisons to be made. Then he multiplied the critical values of t by the square root of 2.00 (1.414) to make the equations simpler. 2/19/2019
63
A little algebra: 2/19/2019
64
A little more algebra: 2/19/2019
65
HSD 2/19/2019
66
Here is the equation that results
If we multiply the critical value for t, given the number of comparisons we are making, by we get Tukey’s q statistic. Then the left side of the equation will give us the distance between two means that is large enough to be significant, given the number of comparisons to be made. We call that distance between any two means, the honestly significant difference or HSD Here is the equation that results 2/19/2019
67
Moving terms around: 2/19/2019
68
Substituting q and HSD for 1
Substituting q and HSD for 1.414(tcritical) and minimum significant distance between the two means 2/19/2019
69
Tukey’s Honestly Significant Difference
An HSD is the minimum difference between two means that can be deemed statistically different, while keeping the experiment-wise alpha at .05. Any two means separated by this amount or greater are significantly different from each other. Any two means separated by less than this amount cannot be considered significantly different. 2/19/2019
70
q - look up in a table based on dfW and k.
Calculating HSD, the minimum distance between two significantly different means. Here is the formula again: q - look up in a table based on dfW and k. nH – compute as before Remember that this is a post hoc comparison, therefore we have already calculated MSW, computed the ANOVA and found a statistically significant F ratio. 2/19/2019
71
dfW q table for =.05
72
The table in the book h bad values. q table for =.05 Number of groups
dfW q table for =.05 Number of groups (means) across top. There is a whole other table for .01 dfW down left. (n-k) The table in the book h bad values.
73
Effects of alcohol Vitamin B in various teas
Two examples Effects of alcohol Vitamin B in various teas 2/19/2019
74
Ethanol and minutes of REM sleep
Means 0 g/kg min 1 g/kg min 2 g/kg min 3 g/kg min MSW = 65 k = 4 n = 16; 4 each group A rat in group 3 died! n=15 dfW = n-k = 15-4 = 11 2/19/2019
75
k=4 dfW=11 q table for =.05 4.26 dfW 2 3 4 5 6 7 8 9 10
4.26 k=4 dfW=11
76
Harmonic Mean 2/19/2019
77
Ethanol and sleep Means 0 g/kg - 79.28 min 1 g/kg - 61.54 min
MSW = 65.00 k = 4 n=15 dfW = n-k = 15-4 = 11 q = 4.26 Means as far or further apart than represent a significant difference and can be generalized. 2/19/2019
78
Ethanol and Sleep – the six comparisons
HSD = 17.87 Comparisons Difference p 0g/kg g/kg n.s. 0g/kg g/kg 0g/kg g/kg 1g/kg g/kg n.s. 1g/kg g/kg 2g/kg g/kg n.s. 2/19/2019
79
Ethanol and Sleep Conclusion
2 and 3 gm/kg of ethanol interrupted sleep significantly more than no ethanol, Also, 3 gm/kg of ethanol interrupts sleep significantly more than 1 gm/kg of ethanol. No adjoining doses differed significantly (0 vs.1, 1vs2, 2 vs.3 – all n.s.) 0 vs. 1 n.s. 0 vs 0 vs 1 vs. 2 n.s. 1 vs 2 vs. 3 n.s. 2/19/2019
80
Tea Example The means are Brand A: 8.27 ml Brand B: 7.50 ml
Brand C: 6.15 ml Brand D: 6.00 ml Brand E: 5.82 ml MSW = 1.51 k = 5 n = 50; 10 each group dfW = n-k = 50-5 = 45 2/19/2019
81
dfW q table for =.05 k=5 dfW=45 Use smaller number of df for missing degrees of freedom (or interpolate). 4.04
82
nH: when there are the same number of participants in each group, the harmonic mean equals the usual arithmetic mean 2/19/2019
83
Tea Example – amount of vitamin B present in various cups of tea -10 cups in each group.
MSW = 1.51 k = 5 n = 50; 10 each group dfW = n-k = 50-5 = 45 The means are Brand A: 8.27 ml Brand B: 7.50 ml Brand C: 6.15 ml Brand D: 6.00 ml Brand E: 5.82 ml q = 4.04 Means as far or further apart than 1.57 represent an honestly significant difference. 2/19/2019
84
Tea Example – the ten comparisons
HSD = 1.57 Brand vs Brand Difference p A B n.s. A C A D A E B C n.s. B D n.s B E C D n.s. C E n.s. D E n.s. 2/19/2019
85
Tea Conclusion Brand A has significantly more nutritional value, as measured by amount of vitamin B, than Brands C, D, and E. Brand B has significantly more vitamin B than Brand E. No other brands differed significantly in nutritional value. A B n.s. A C .05 A D .05 A E .05 B C n.s. B D n.s B E .05 C D n.s. C E n.s. D E n.s. 2/19/2019
86
End 2/19/2019
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.