Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Psych 5500/6500 Comparisons Among Treatment Means in Single- Factor Experiments Fall, 2008.

Similar presentations


Presentation on theme: "1 Psych 5500/6500 Comparisons Among Treatment Means in Single- Factor Experiments Fall, 2008."— Presentation transcript:

1 1 Psych 5500/6500 Comparisons Among Treatment Means in Single- Factor Experiments Fall, 2008

2 2 Example Let’s go back to the example from the previous lecture. We have an experiment where the independent variable is ‘Type of Therapy’ (Control Group, Behavior Modification, Psychoanalysis, Client-Centered, Gestalt) and the dependent variable is level of depression after 2 months. H0: μ CG = μ BM = μ PA = μ CC = μ G HA: at least one μ is different than the rest.

3 3 Data ControlBeh. Mod.Psycho- analysis Client- Centered Gestalt 96663 86751 74675

4 4 Summary Table SourceSSdfMSFp Between39.6049.906.46.008 Within15.33101.53 Total54.9314 F(4,10)=6.46, p=.008 We can conclude that at least one μ is different than the rest. We will refer to this as the ‘overall’ or ‘omnibus’ F.

5 5 Comparisons Comparison procedures allow us to ask much more specific questions. There are many different comparison procedures, but they all involve comparing just two things with each other. This can be accomplished several ways, for a specific comparison you can: 1.Drop all but two groups. 2.Add groups together to get to two groups 3.Drop some groups and add others together (to end up with two groups). An important restriction is that no group can appear on both sides of a comparison.

6 6 Examples Dropping all but two groups. In this particular example we will compare each therapy group with the control group giving us 4 different comparisons (note other pair-wise comparisons could also be made, e.g. BM vs. Gestalt) 1.Control group vs. Behavior Mod H0: μ CG = μ BM 2.Control group vs. Psychoanalysis H0: μ CG = μ PA 3.Control group vs. Client-Centered H0: μ CG = μ CC 4.Control group vs. Gestalt H0: μ CG = μ G

7 7 Example Control Group vs Behavior Modification Control GroupBehavior Modification 96 86 74 Mean=8Mean=5.33 These are the means being compared, but as we shall see the analysis uses the within-group variance of all the groups.

8 8 Adding groups together 5.Control group vs. Therapy (i.e. the Control group vs. all of the therapy groups combined into one large ‘therapy’ group).

9 9 Example Control GroupTherapy 96 86 74 6 7 6 6 5 7 3 1 5 Mean = 8Mean = 5.17

10 10 Dropping some groups and adding other groups together 6.Behavior Modification versus ‘Talking Therapies’ (i.e. drop the control group and compare the Behavior Mod group to a group consisting of Psychoanalysis, Client-Centered, and Gestalt combined)

11 11 Computing the Comparison With a comparison we always end up with two groups, we could simply due a t test for those two groups. Comparison procedures, however, do something that has more power, they: 1.Recompute MS Between using the two groups of the comparison. 2.Borrow the MS Within from the overall F test. 3.F= MS Between / MS Within 4.If you want to do a directional (one-tailed) hypothesis, then adjust the p value accordingly.

12 12 Computing the Comparison This comparison procedure has more power than just doing a t test on the two groups of the comparison. 1.The df for MS Within comes from all of the groups, even if the comparison involves less than all of the groups. 2.When you combine groups together that can cause more within group variance that would hurt the power of the t test.

13 13 Control GroupTherapy 96 86 74 16 17 16 2 3 1 I’ve changed the data here to make a point. If this was analyzed using the t test than the variance within the therapy group (a combination of the data from three therapies) would be large and would hurt the power. But with the comparison technique MS within only looks at the variance within each of the three therapy groups & within the control group.

14 14 Number of Possible Comparisons As the number of groups in the experiment grows, the number of possible comparisons gets very large. For example, in an experiment with 4 groups (‘A’, ‘B’, ‘C’, and ‘D’) some of the comparisons would be: A vs B, A vs C, A vs D, B vs C, B vs D, C vs D, A vs (B+C), A vs (B+D), A vs (C+D), B vs (A+D), B vs (C+A). B vs (C+D), C vs (A+B), C vs (A+D), C vs (A+D)...etc...(A+B) vs (C+D), (A+C) vs (B+D)...etc....A vs (B+C+D), B vs (A+C+D), C vs (A+B+D), D vs (A+B+C).

15 15 Error Rate A concern is that if H0 is true, and we make lots of comparisons, and each has a.05 chance of making a type 1 error (rejection H0 when H0 is true), then the chances of making at least one type 1 error becomes quite large. The (rather large and complicated) topic of ‘comparisons’ provides a variety of procedures for keeping the probability of making a type 1 under control when making many comparisons.

16 16 Definitions Error rate per comparison: p(making a Type 1 error when performing any one specific comparison|H0 true for that comparison) Error rate per comparison set: p(making at least one Type 1 error when you perform a set of comparisons|H0 true for all of them) Error rate per experiment: If you make several sets of comparisons when analyzing the data from an experiment, then this would be: p(making at least one Type 1 error in the analysis of the data from the experiment|H0 true for all of them)

17 17 Example We are going to make two, independent, comparisons (and the null hypothesis happens to be true for both). Each has a.05 chance of making a Type 1 error (i.e. α=.05 per comparison). Thus the error rate per comparison =.05. Now lets calculate the error rate for that set of two comparisons...

18 18 Remember, you can only make a Type 1 error when H0 is true, we will assume H0 is true for this example: We will let ‘C’ stand for making a correct decision to not reject H0. p(C)=.95 We will let ‘E’ stand for incorrectly rejecting H0, thus making a Type 1 error. p(E)=.05

19 19 Possible Outcomes Possible outcomes Determining probability Probability of that outcome C,C (2 correct decisions) (.95)(.95).9025 C,E (1 st correct, 2 nd an error) (.95)(.05).0475 E,C (1 st an error, 2 nd correct) (.05)(.95).0475 E,E (2 errors) (.05)(.05).0025 The probability of making at least one Type 1 error in two comparisons =.0475+.0475+.0025=.0975. Thus the error rate per comparison set =.0975

20 20 Formula for Error Rate Per Comparison Set If: α PC = error rate per comparison, and α PCS = error rate per comparison set, and k=the number of independent comparisons that will be made, Then: α PCS =1 – (1- α PC ) K

21 21 Examples If error rate per comparison =.05, and you do three comparisons, then error rate for that comparison set α PCS =1 – (1-.05 PC ) 3 =.143 With ten comparisons, the error rate per comparison set =.40 (a 40% chance of making at least one Type 1 error!)

22 22 Non-independent Comparisons There is no simple formula for computing error rate per comparison set when the comparisons are not independent. We will get to the difference between independent and dependent comparisons in a second. First I would like to review the various concerns that arise when making several comparisons.

23 23 Error Rate Concern #1 If H0 is true: 1.The more comparisons you make the more likely it is you’ll make a Type 1 error. This is the point we just covered.

24 24 2.One improbable mean (one that differs from the others quite a bit just due to chance) can lead to many wrong decisions to reject H0. Every comparison involving that mean could lead to the decision to reject H0. Error Rate Concern #2

25 25 3.If you have several groups in your experiment, a comparison of the lowest mean and highest mean is likely to lead to a rejection of H0. The overall F test is not fooled by this, it knows how many groups are in the experiment and looks at how much they all differ from each other. If you were, however, to drop all of the groups except the one with the highest mean and the one with the lowest mean, and do a t test on those two groups, you will probably be able to reject H0 even when the independent variable had no effect. Error Rate Concern #3

26 26 Controlling Error Rate There are many, many, procedures for controlling error rate when making multiple comparisons, they all take a different approach to the problem. We will look at a few procedures that cover the basic concepts. To differentiate among those procedures we need to determine if the comparisons are a priori or a posteriori, and if they are orthogonal (independent) or non-orthogonal (non-independent).

27 27 a priori and a posteriori a priori comparisons (also known as ‘planned comparisons’) are those you know you are going to want to do before you gather your data. They are based upon theory, specifically, upon which comparisons will shed light on the hypotheses you are testing. a posteriori comparisons (also known as ‘post hoc’ or ‘data snooping’ comparisons) are those you run to examine unexpected patterns in your data, or just to snoop out what else your data might tell you.

28 28 Orthogonality Conceptual: Comparisons are orthogonal when they lead to analyses that are not redundant. For example, if one comparison allows us to compare group 1 with group 2, that analysis will not in any way predict the results of a comparison that allows us to compare group 3 with group 4. Those two comparisons are ‘orthogonal’ (i.e. non-redundant).

29 29 Sets of Orthogonal Comparisons If ‘a’ is the number of groups in your experiment, it is possible to come up with a set of a-1 comparisons that are all orthogonal to each other. To determine whether or not comparisons are orthogonal to each other we need to express them as ‘contrast codes’

30 30 Contrast Codes Contrast codes are sets of constants that add up to zero and that indicate which groups are involved in a comparison (and how much to weight each one). CGBMPACCG Comparison 11000 Comparison 2001-1/2 Comparison 31/2 0-1/2 Comparison 1: CG vs. BM Comparison 2: PA vs. (CC+G) Comparison 3: (CG+BM) vs. (CC+G)

31 31 Testing Orthogonality Two comparisons are orthogonal if the sum of their products (as demonstrated below) equals zero. CGBMPACCG Comparison 11000 Comparison 2001-1/2 1 x 200000 The sum of the products = 0+0+0+0+0=0 so these two comparisons are orthogonal

32 32 Testing Orthogonality Now let’s see if comparisons 1 and 3 are orthogonal.. CGBMPACCG Comparison 11000 Comparison 31/2 0-1/2 1 x 31/2-1/2000 The sum of the products = 1/2+-1/2+0+0+0=0 so these two comparisons are orthogonal

33 33 Testing Orthogonality Now let’s see if comparisons 2 and 3 are orthogonal.. CGBMPACCG Comparison 2001-1/2 Comparison 31/2 0-1/2 2 x 30001/4 The sum of the products = 1/4+1/4+0+0+0  0 so these two comparisons are not orthogonal.

34 34 Comparisons 2 and 3 are not orthogonal, so this was not a set of orthogonal comparisons (they all need to be orthogonal to each other, tested by looking at each pair of contrasts). However, as there are five groups in the experiment we should be able to come up with a set of four comparisons that are all orthogonal to each other (pairwise). Actually we can come up with many different sets of four orthogonal comparisons, but we can’t come up with five comparisons that are orthogonal to each other if we have five groups in the experiment.

35 35 Examples of Orthogonal Sets CGBMPACCG Comparison 11000 Comparison 21/2 00 Comparison 31/3 0 Comparison 41/4 Check them out, they are all orthogonal to each other (tested as pairs). Note the general pattern here; compare two groups, combine those two groups and compare them to a third group, combine those three groups and compare them to a fourth, and so on.

36 36 Another Example CGBMPACCG Comparison 11000 Comparison 20010 Comparison 31/2 -1/2 0 Comparison 41/4 Check them out, this set of comparisons is also orthogonal (pairwise to each other). Note the pattern: compare 2 groups, compare 2 other groups, compare the first pair with the second, combine them all and compare them to yet another group. There are many possible sets of orthogonal comparisons, but unless the same comparison happens to appear in both sets (like comparison 4 here), the comparisons in this set will not be orthogonal to the comparisons in a different set.

37 37 Orthogonal Sets There are various patterns of orthogonality. After a while you get the hang of creating them. Remember, the maximum number of comparisons that can all be orthogonal to each other is a-1. Before going on to the next slide, go back and reread the material on error rate, error rate per comparison and error rate per comparison set, and the three factors the lead to high error rates.

38 38 Error Rate Concerns (revisited) Any method to control error rate has to address these three concerns: 1.The error rate per comparison set goes up as the number of comparisons goes up. 2.One weird mean might lead to a lot of mistaken rejections of H0. 3.In an experiment with several groups, a comparison of the largest and smallest means is likely to be significant.

39 39 Comparison Procedures We will be looking at three different comparison procedures, they each address the error rate concern in a different fashion: 1.A priori, orthogonal comparisons 2.Dunn’s method for a priori, non-orthogonal comparisons (also called the Bonferroni method) 3.Scheffe’s method for a posteriori comparison.

40 40 1) A priori, orthogonal comparisons To perform comparisons on a set of a priori, orthogonal comparisons: 1.You do not have to first reject H0 on the overall F test before doing these comparisons (as we will see you do need to reject H0 on the overall F test before doing a post hoc comparison) 2.Set the error rate per comparison at your normal significance level (i.e..05), which will make the error rate per comparison set greater than.05 3.You can transform the p value of the comparison to test a directional hypothesis.

41 41 How Error Rate is Controlled 1.The number of comparisons is limited to a-1 by the requirement that they all be orthogonal to each other. 2.One weird mean is not likely to lead to lots of Type 1 errors, as the comparisons are orthogonal and thus aren’t redundant in the questions being asked. 3.As the comparisons are decided upon a priori you can’t wait until after you see the means and then select the biggest and smallest for your comparison.

42 42 General Strategy for Controlling Error Rate The other comparison procedures keep error rate per comparison set from getting too large by using a more stringent error rate per comparison. In other words, they keep you from falsely rejecting H0 over many comparisons by making it harder to reject H0 for each comparison.

43 43 Dunn’s Method for a priori, Non-Orthogonal Comparisons* 1.You do not have to first reject H0 on the overall F test. 2.Set your significance level for each comparison at α/(# of comparisons). See the next slide for more details. 3.You can transform the F of the comparison to a t test to test a directional hypothesis, or simply adjust the p value * Also known as Bonferroni t

44 44 Significance Level for Dunn’s Method You are controlling type 1 error by using a smaller significance level equal to α/(# of comparisons). The appropriate ‘# of comparisons’ is a matter of some controversy. The common options (in order of ascending conservatism) are: 1.The number of comparisons you will be making within the set of nonorthogonal a priori comparisons. 2.The total number of a priori comparisons you will be making (orthogonal as well as nonorthogonal). 3.The total number of statistical tests you will be performing on the data. For example, using criteria #1, if you are making 10 a priori, non- orthogonal comparisons then set your significance level for each comparison at.05/10=.005

45 45 How Error Rate is Controlled 1.You can make as many a priori comparisons as you would like, but the more you make the harder it is to reject H0 on each one. 2.One weird mean may appear in several comparisons in a redundant way, but the correction to the error rate per comparison should help control that to some degree. 3.As the comparisons are decided upon a priori you can’t wait until after you see the means and then select the biggest and smallest for your comparison.

46 46 A Posteriori Comparisons The general rule for a posteriori comparisons is that first you need to reject H0 on the overall F test before you can do an a posteriori comparison. In other words, first you have to show their is an effect somewhere in the data before you can snoop around looking for where it is. There are many, many a posteriori comparison procedures available, we will look at one that can do any comparison you want.

47 47 Scheffe’s Method for a posteriori Comparisons 1.You first have to reject H0 on the overall F test. 2.For each comparison, use (F c from the overall F test)x(a-1) as your F c value. In our example it would be F c =(3.48)(5- 1)=13.92

48 48 How Error Rate is Controlled 1.You can make every possible comparison, but you have set your critical value so high that the error rate per comparison set = your significance level. 2.One weird mean may appear in several comparisons in a redundant way, but the correction to the error rate per comparison controls that. 3.The overall F test must be significant before you do this procedure, the overall F test is not fooled by the likely big difference between the largest and smallest mean when you have many groups.

49 49 Advantages and Disadvantages Advantage of Scheffe’s method: you can do any comparisons you want, and as many as you want. Disadvantage: Scheffe’s assumes you are going to do every possible comparison, and so it makes the error rate per comparison very low. If you don’t make a lot of comparisons then you have conservatively biased your chances of rejecting H0.

50 50 Additional Comparison Procedures Tukey’s HSD Test: this is used to make all possible pair-wise comparisons among the group means. The error rate per comparison set is your significance level (i.e..05), while the error rate per comparison is less than your significance level. Dunnett’s Test: this is used to compare each treatment group one at a time with the control group. Again, the error rate per comparison set is your significance level, while the error rate per comparison is less than your significance level.

51 51 On Post-Hoc Comparisons Some researchers (including myself) believe that data-snooping comparisons (e.g. doing all pair- wise comparisons) are of dubious value, that analyses should be limited to results that test a priori hypotheses. This is consistent with a certain view on the processes of science. In this view post-hoc or data snooping comparisons should not be interpreted within the context of the theories being examine in a study but can serve to help shape where the theory goes next (which then serve as the a priori aspect of the next experiment).

52 52 On Interpreting Comparisons in Statistical Packages Statistical programs make it easy to interpret the results of comparisons. The internal manipulations to control error rate are invisible, they simply report a p value. If the p value is less than your general significance level then the comparison is statistically significant using whatever procedure is involved. In other words, if p .05 then reject H0 for that comparison no matter which technique you are using.

53 53 Using SPSS The one-way ANOVA menu item in SPSS gives you a choice between entering a specific contrast (which will be evaluated as if it were an orthogonal, a priori, comparison...i.e. no control over error rate) or selecting a post hoc procedure. If you select a post hoc procedure then you will be limited to specific types of comparisons, most of the procedures automatically do all pair-wise comparisons.

54 54 Effect Size of Comparisons As each comparison reduces the data to two groups we report the difference between the means of those two groups as a measure of effect size. For example, if the comparison involves Group 1 vs (Groups 2 & 3 Combined) then we have: Mean difference = mean of group 1 – (mean of groups 2 and 3 combined).

55 55 Standardized Effect Size The difference between the two means of the comparison can be turned into a standardized effect size using Cohen’s, Hedges’s, or Glass’s formulas. Cohen’s and Hedges’s assume that all of the variances are equal. Conveniently, a pooled estimate of the standard deviation can be obtained by taking the square root of MS within.

56 56 Standardized Effect Size If the variances population variances are not equal (an option within the SPSS ANOVA procedure will allow you to test the assumption of homogeneity) then the denominator should be the estimate of the population standard deviation from the control group. Indicate in SPSS that you want the ‘descriptives’ option when doing the ANOVA, then look at the ‘Std Deviation’ of the group you consider to be the control group.


Download ppt "1 Psych 5500/6500 Comparisons Among Treatment Means in Single- Factor Experiments Fall, 2008."

Similar presentations


Ads by Google