Download presentation
Presentation is loading. Please wait.
Published byMelina McDonald Modified over 9 years ago
1
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 13: Multiple Comparisons Experimentwise Alpha (α EW ) –The probability that an experiment will produce any Type I errors among multiple tests (a variation is called familywise alpha). –If, instead of ANOVA, all of the possible t tests are performed for a multigroup experiment, α EW will be greater than α PC (alpha per comp- arison), the alpha used for each individual t test. –Without protection, α EW increases as the number of groups in a one- way ANOVA increases, due to the increased number of opportunities to commit Type I errors.
2
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 2 How Large Can α EW Get? –First find the probability of making no Type I errors at all: For one test: usually, α pc =.05; there- fore, p of no Type I error = 1 – α pc = 1 –.05 =.95 For j tests, if each test is independent of all the others, the probability of no errors requires multiplying j times: The probability of one or more Type I errors occurring among the multiple tests is the complement of the probability of no errors. Thus, The possible pairwise comparisons following an ANOVA are not all mutually independent, but the above equation does give a reasonable indication of how large α EW can get if α pc is not adjusted when performing multiple comparisons.
3
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 3 Fisher’s Protected t Tests –“Protected” because the F for the ANOVA must be significant before proceeding. –If we assume homogeneity of variance for the experiment, MS W is the best estimate of the common error variance and can therefore be used in place of s 2 p in all of the pairwise comparisons. Fisher’s L east S ignificant D ifference Test Unequal ns Equal ns
4
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 4 Complete vs. Partial Null Hypothesis Complete Null: Hypothesizes the equality of all the population means Fisher’s protected t tests control α EW only with respect to experiments for which the complete H 0 is true, or when the study involves only three groups. Partial Null: Hypothesizes the equality of some, but not all of the population means If a partial null is true for more than three populations, the ANOVA can attain significance due to one mean that differs from the others, which then allows multiple comparisons among groups whose population means are equal, which, in turn, increases the chances of committing Type I errors. Thus, Fisher’s procedure is not pro- tected against partial null hypotheses.
5
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 5 Tukey’s H onestly S ignificant D ifference –Maintains good control over α EW even for partial nulls. It is therefore more conser- vative than Fisher’s LSD test. –Uses the Studentized range statistic (q) to obtain its critical values. The q statistic is based on the fact that the greater the number of samples drawn from the same population, the larger the difference between the smallest and largest of the sample means becomes. The size of all of the samples is assumed to be the same (n). The size of q increases as the number of groups (the “range”) increases, but decreases as the size (n) of those groups increases (in a manner similar to Student’s t distribution, which is why q is said to be “studentized”).
6
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 6 Tukey’s HSD Formula –The formula for Tukey’s HSD is as follows: where n is the size of each sample, and q crit is found from the number of groups, and the number of degrees of freedom associated with MS W (i.e., df W ) –The number 2, which appears in the LSD formula, is missing from the HSD formula, because its value has been incorporated into the table of the q statistic (i.e., the original q values were multiplied by the square root of 2). –The q statistic is based on equal sample sizes, but if the ns differ only slightly and accidental- ly, it is reasonable to use the harmonic mean of all of the ns to obtain the value of n for the HSD formula.
7
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 7 Properties of Tukey’s HSD –Advantages: 1) α EW is kept from rising above the value chosen for the test (usually.05), regardless of how many pairs are compared, or whether any partial H 0 is true. 2) It is easy to find confidence intervals for the difference of any two population means. – Disadvantages: 1) α EW usually turns out to be below the alpha chosen for the test (e.g., about.02 or.03, if.05 is chosen). Therefore, there are more powerful alternatives to HSD (i.e., less strict about α EW ), which are, nonetheless, sufficiently conservative. 2) The sample sizes must be equal, or nearly equal. –HSD does not require the one-way ANOVA to be significant, in order to be “protected.” It is possible for a pair of means to be significantly different (i.e., exceed HSD), even when the ANOVA is not. It is possible for the ANOVA to be significant, and yet fail to find any pair of means to differ significantly (i.e., by more than HSD).
8
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 8 Confidence Intervals for Tukey’s HSD Test –HSD is considered a simultaneous rather than a sequential comparison method; CIs can be created easily for simultaneous methods. –The CI for any pair of means in the study is given by the following formula: where q crit is based on the total number of groups in the study and n is the size of any one group –If the groups differ slightly and accidentally in size, the harmonic mean of the sample sizes can be used in place of n in the preceding formula.
9
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 9 Try This Example… Does diet affect bicycle riding speed? DV is time (in minutes) to ride 6 miles. Compare the results of the LSD and HSD tests. Which diets differ at the.05 level? LSD = 1.65; HSD = 2.02 Only the normal and vegetarian diets differ by HSD, but with LSD both the normal and organic diets differ significantly from the vegetarian diet. HSD is overly conservative with three groups.
10
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 10 Other Procedures for Post Hoc Pairwise Comparisons The main difference among the many tests that can be used for multiple pairwise comparisons following an ANOVA is how conservative they are (i.e., strict in controlling Type I errors). The more conservative the test, the less powerful it is (i.e., leads to a greater rate of Type II errors). –Newman-Keuls Test (AKA Student Newman-Keuls or SNK Test) CVs come from studentized range statistic. Arrange means in order and use the range between any two of them to look up critical q values (instead of # of groups). Easier to find significance, and therefore more powerful than HSD. Used to be one of the most popular post hoc procedures until it was discovered that its extra power came from allowing α EW to rise above the level that was set for it. The greater the number of groups, the worse the problem. This test is no longer considered acceptable.
11
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 11 Other Procedures for Post Hoc Pairwise Comparisons (cont.) –Modified LSD (Fisher-Hayter) Test Requires significance of the one-way ANOVA to proceed. If ANOVA is significant, calculate a modified HSD value: Find the critical q by setting the number of groups to k – 1, rather than k. This test is acceptably conservative, more powerful than Tukey’s HSD, and easy to understand and calculate. Unfortunately, this test is not well-known, and is therefore rarely used. –Dunnett’s Test Applies to the specific situation in which several groups are being compared to the same reference (e.g., control) group. It is the optimal test for that situation. –REGWQ Test Modifies Tukey’s test to be more powerful without allowing α EW to rise above the set value. Like Dunnett’s test, it is readily available from SPSS, but rarely used.
12
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 12 Planned Comparisons Bonferroni Correction –Based on the following inequality: where j equals the number of comparisons being planned, and α pc is the α used for each comparison –The following formula can be used to adjust α PC accordingly: –Very conservative; not recommended for post hoc comparisons. –Very simple and flexible procedure when used for planned comparisons. –Works best when only a relatively small proportion of the possible tests in the study have been planned.
13
Chapter 13For Explaining Psychological Statistics, 4th ed. by B. Cohen 13 Complex Comparisons –You might want to compare the average of two groups to the average of three others: –The general format for a linear contrast is as follows: –Applied to this particular example, the linear contrast in the population looks like this: –Note that the coefficients (cs) add up to zero, which must be the case f or the contrast to be considered a linear one.
14
Chapter 13For Explaining Psychological Statistics, 4th ed. by B. Cohen 14 Sample Estimates for Linear Contrasts –A linear contrast can be reduced to a single difference score, though it may involve many group means. –The estimate of a linear contrast from sample means looks like this: When only two means are involved, it is called a pairwise comparison When more than two means are involved, it is called a complex comparison –The sum of squares associated with a linear contrast involving equal-sized samples can be found from this formula:
15
Chapter 13For Explaining Psychological Statistics, 4th ed. by B. Cohen 15 Testing a Planned Contrast for Significance Because the contrast is based on only one df, MS contrast = SS contrast Its error term is just MS w from the ANOVA, so for equal ns, the contrast can be tested as: –The critical F for a planned contrast is based on one df for the numerator, and df W from the ANOVA for the denominator. Note that this critical F will be larger than the critical F for the one-way ANOVA, which has df bet for the numerator. This is a potential disadvantage of testing a planned contrast.
16
Chapter 13For Explaining Psychological Statistics, 4th ed. by B. Cohen 16 Testing Post Hoc Complex Comparisons Scheffé’s test: Adjusts the critical F of the omnibus one-way ANOVA by multiplying it by df bet –N T = total number of subjects –k = number of groups –If the one-way ANOVA was not statistically significant, there is no point in finding a post hoc complex contrast; its F ratio will not exceed F S as defined above. –Advantages of Scheffé’s test: Adequately conservative for complex comparisons (but overly conservative for pairwise comparisons). Requires no special tables (just the F tables). Doesn’t require equal ns. Leads to easy creation of CI’s.
17
Chapter 13For Explaining Psychological Statistics, 4th ed. by B. Cohen 17 Orthogonal Contrasts –Comparisons that represent mutually independent pieces of information. –Maximum number of orthogonal contrasts in a set is related to the number of groups in the experiment. If there are k groups, there can be, at most, k –1 mutually orthogonal contrasts. Maximum number of contrasts is the same as df bet for ANOVA. Each orthogonal contrast represents 1 df. –Sum of the SSs for a test of orthogonal contrasts will add up to SS bet. –If the sum of the cross products of two linear contrasts is zero, those contrasts are orthogonal.
18
Chapter 13 For Explaining Psychological Statistics, 4th ed. by B. Cohen 18 Properties of Complex Comparisons –The more closely your contrast matches the actual pattern of sample means, the larger the proportion of SS bet that will be captured by SS contrast. –When the sample means do not fall in the pattern you expect, your F contrast may be smaller than the ANOVA F. –It is typically not acceptable to gain power through choosing a planned contrast if there is no possibility of losing power by making the wrong choice. –Planned contrasts are analogous to one- tailed tests (in a two-group design): They are valid only when you predict the pattern of means before looking at your results. –Post hoc complex comparisons are only allowed when the omnibus ANOVA is significant, and Scheffé’s test is used.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.