Psych 5500/6500 ANOVA: Single-Factor Independent Means Fall, 2008
ANOVA ANOVA is short for ‘Analysis of Variance’, it is also known as the F test. It is applicable in a variety of experimental designs that involve the comparison of group means to determine whether or not the independent variable had an effect. We will begin with the ‘single-factor independent means’ ANOVA.
‘Single-Factor’ This is similar to the ‘t test for independent means’. As in the t test there is one dependent variable and one independent variable (a ‘factor’ is an independent variable, thus ‘single-factor’ means one IV). But unlike in the t test, with the F test there can be 2 or more levels of the IV (i.e. two or more groups in the experiment).
Independent Means The ANOVA (F test) we will begin with assumes that the scores are independent across groups. In other words, this could be used in a true experimental design, a quasi-experimental design, or a static group design (just like the t test for independent means).
Example 1: True Experimental Design Does Vitamin C affect the length of people’s colds? Randomly divide subjects who are in their first day of a cold into 4 groups, then each group gets a different level of Vitamin C. Measure how long it takes each person to get over their cold (DV). IV= Group 1: 0 mg, Group 2: 100 mg, Group 3: 500 mg, Group 4: 5000 mg
Example 2: Quasi-Experimental Design Do three specific therapies differ in their ability to treat depression? Let subjects select the type of therapy they want (three different kinds are available), then measure their level of depression (DV) after 2 months of therapy (note the IV is manipulated by the experimenter). IV= Group 1: Behavior Modification, Group 2: Gestalt, Group 3: Client-Centered
Example 3: Static Group Design Does the size of a city affect the cancer rate in that city? Randomly select several small cities, several medium sized cities, and several large cities, measure their cancer rate per 10,000 citizens (DV). Note the IV is not manipulated, instead it is the criteria for assigning to groups; IV= Group 1: Small Cities, Group 2: Medium Cities, Group 3: Large Cities
Relationship of t and F If you have two groups (i.e. two levels of your IV) you can use either a ‘t’ or ‘F’ test to analyze the results, and if you are testing a two-tailed hypothesis there is no difference between doing a t test or an F test. The F test cannot test a directional (one-tailed) hypothesis. Thus, if you want to do a one-tailed test use t. The t test, however, cannot be used if you have more than two groups, then you must use F.
Hypotheses If you have three levels of your IV then: H0: μ 1 = μ 2 = μ 3 (one μ for each group) You are saying that all the populations in the experiment have the same mean, and that any differences in the group sample means are just due to chance. So what is HA? HA: μ 1 μ 2 μ 3 ?
H0 and HA No, HA: μ 1 μ 2 μ 3 doesn’t work, as H0 and HA together must cover every possibility (e.g. what about μ 1 = μ 2 μ 3 ?). So, the correct answer is: H0: μ 1 = μ 2 = μ 3 HA: at least one μ is different than the rest
Test Statistic We need a statistic whose value we know if H0 is true. With the t test for independent groups the way we tested whether μ 1 = μ 2 was by using: If H0 is true then we expect But what if we have three or more groups? If H0 is μ 1 = μ 2 = μ 3 what would we expect if H0 is true?
We need a statistic that will measure whether several group means are about the same (H0 true and the means differ only due to chance) or if they differ more than you would expect if only chance were involved (i.e. if the independent variable made the populations—and thus the groups means—more different than you would expect if only random error were involved). What statistic do we know measures how much a bunch of numbers (in this case group means but that doesn’t matter) differ from each other?
Analysis of Variance The essence of the F test for the one factor independent group ANOVA is that it examines the variance of the group means to determine whether the group means differ more than you would expect if H0 were true. The logic of how we will do that is based upon ‘partitioning the Sums of Squares’
Setup We will begin with a simple experiment with three groups, and three scores in each group. Group 1Group 2Group 3 Y1Y1 Y4Y4 Y7Y7 Y2Y2 Y5Y5 Y8Y8 Y3Y3 Y5Y5 Y9Y9
Symbols
Group 1Group 2Group 3 Y1Y1 Y4Y4 Y7Y7 Y2Y2 Y5Y5 Y8Y8 Y3Y3 Y5Y5 Y9Y9 N 1 =3N 2 =3N 3 =3
Partitioning the Deviation We begin by looking at how far each score is from the mean of all of the scores: Then we break (partition) that distance into two pieces, how far the score is from the mean of its group, and how far the mean of the group is from the mean total.
Partitioning the SS Now we use those deviations to create three sums of squares. SS Total = SS WithinGroups + SS BetweenGroups SS Total measures the squared deviations of the scores from the mean of all of the scores. SS Within measures the squared deviations of the scores from the mean of the group they are in. SS Between measures the squared deviations of the group means from the mean of all of the scores.
SS’s Sums of squares are a way of measuring variability. Consequently: SS Total reflects how much all of the scores differ from each other (if all the scores were the same they would all equal the total mean and the squared distances would all be zero). SS Within reflects how much the scores differ from other scores in the same group (if all the scores in a group are the same they would all equal the mean of their group and the squared distances would all be zero). SS Between reflects how much the group means differ from each other (if all of the group means were the same they would all equal the total mean and the squared distances would all be zero.
Example… Refer to the handout on partitioning SS.
Partitioning the df The total df for the experiment would be the total number of scores – 1. We are also going to partition that. df Total = df Within + df Between
What are d.f.s? (Discuss in class)…
Mean Squares A mean square is a Sum of Squares divided by its degrees of freedom.
What is a Mean Square? It is not normally computed as we won’t be needing it, but to make a conceptual point, let’s look at MS Total. Does that look familiar?
Error Variance The term error variance refers to the variance of the population from which the scores were originally sampled. The use of the term ‘error’ will be clearer next semester, it refers to the error of using the mean to predict each score. For now just think of error variance as the variance of the population from which we sampled. The ANOVA assumes that each population in the study has the same variance.
Mean Square Within Groups MS Within is an estimate of error variance based upon how much the scores differ inside of each group. Essentially, it uses each group to estimate error variance, then pools those different estimates into one good estimate. If the N’s of each group are the same then MS Within is literally the mean of the variance estimates from each group.
Mean Square Between Groups MS Between is an estimate of error variance based upon how much the group means differ from each other. Remember that the variance of the population affects the variance of the sample means (the standard error); well it also works the other way, the variance of the sample means tells us something about the variance of the population from which those means were drawn.
F MS Between and MS Within are two, independent estimates of the same thing...error variance.
Logic of the ANOVA When H0 is true: MS Between and MS Within are two, independent estimates of error variance. When H0 is false: the independent variable makes the group means differ more than they would if only chance were involved, which affects MS Between making it larger. The independent variable— however—does not affect the variance inside of each group, thus MS Within is not affected.
Example IV=type of therapy (control group, vs behavior modification vs psychoanalysis vs client- centered vs gestalt) DV=level of depression after 2 months H0: μ C = μ BM = μ PA = μ CC = μ G HA: at least one μ is different than the rest.
Data ControlBeh. Mod. Psycho- analysis Client- Centered Gestalt
Bar Graph
Computations SS Total =54.93 SS Within =15.33 SS Between =39.60 df Total =14 df Within =10 df Between =4 MS Within =9.90 MS Between =1.53 F obt =6.46 F c;.05,df1=4,df2=10 =3.48
We reject H0
p Values & Expressing Results The exact p value can be obtained by either performing the analysis using SPSS or by using my F tool and inputting the df’s and the value of F obt. In this example p=.0078 The way the results are commonly expressed are as F(df 1,df 2 )=F obt, p=... In our example it would be: F(4,10)=6.46, p=.0078
Summary Table SourceSSdfMSFp Between Within Total Another common way of expressing the results of the analysis is in a ‘Summary Table’.
Decision H0: μ C = μ BM = μ PA = μ CC = μ G HA: at least one μ is different than the rest. We have rejected H0, which means that we can conclude that at least one of the population means is different than the rest. It is tempting to say, for example, that the control group (which had a mean level of depression of 8) was more depressed than the Gestalt group (which had a mean level of depression of 3) but we cannot be that specific, we can only say that at least one group was different than the rest. We will learn in a future lecture how to make more specific tests among the group means.
Effect Size Cohen’s d is not capable of determining an overall effect due to the independent variable when there are more than two groups as we can’t expect d to equal zero when H0 is true (i.e. when the independent variable has no effect):
Effect Size (cont.) For the overall effect of the independent variable we will have to turn to measures of association, which examine how much knowing what group the score is in helps us in predicting their score on the dependent variable. We will be covering that next semester. The measure we will be looking at then is called R², and to get a general idea of how it works...
Group 1Group 2Group Group 1Group 2Group R²=0 (knowing which group the score is in doesn’t help at all). R²=1.00 (knowing which group the score is in allows us to know exactly what the score will be). You can see that R² will always be between 0 and 1
Computing R² In our example:
Cohen’s f GPower uses Cohen’s f to express effect size. While R² will be between 0 and 1, f expands that out to be between 0 and infinity. The conventions for relating f to effect size are:.10=small effect.25=medium effect.40=large effect
Our Example A whopping big effect size (because we are in Oakley land rather than using real data).
GPower and Cohen’s f GPower will compute f for you if you give it the N’s of each group, the means of each group, and the standard deviation (the one you assume each group has in common), but the equation on the previous slide is much simpler. With this information you can then compute the power a priori and post hoc as you did with the t test. In our example power=0.98 (ridiculously large for 3 scores per group, due to the big effect of the IV and the small amount of within-group variance, a byproduct of my making up the data).
Assumptions of This Use of the F test 1. Independence of scores (important). 2. All the populations are normally distributed (the F test is ‘robust’ to this assumption, particularly if N’s are large and roughly equal across groups). 3. Homogeneity of Variance (this can be violated if you have roughly equal N’s across the groups). Levenes’ test will evaluate this.
Homogeneity of Variance If N’s are not equal, then the effects of violating this assumption are: 1. If larger sample size is associated with larger variances then alpha decreases (biased towards not making a type 1 error but at the expense of power). 2. If larger sample size is associated with smaller variances then alpha increases (biased towards making a type 1 error). If this is the case then either select a smaller significance level (e.g..01 rather than.05) or ‘transform’ your data.
Levene’s Test Levene’s test for the inequality of variances can be used to test whether a difference exists somewhere among the population variances. In our example with five types of therapy the hypotheses for Levene’s test would be: H0: σ² 1 = σ² 2 = σ² 3 = σ² 4 = σ² 5 Ha: at least one σ² is different than the rest.
Now that we know how ANOVA works it is easy to describe how Levene’s test works. Let’s begin with a simply study with just two groups. Group 1Group Mean=10Mean=30 S²=0.5S²=10
The two groups have very different variances (10 vs. 30), we want to test to see whether it is reasonable to conclude that the populations these groups came from have different variances (our assumption is about populations). H0: σ² 1 = σ² 2 Ha: σ² 1 σ² 2 Group 1Group Mean=10Mean=30 S²=0.5S²=10
The first thing we do is to transform the original scores to deviation scores that reflect how far each score was from the mean of its group. Then we take the absolute values of those deviations Group 1Group Mean=10Mean=30 Group 1Group Original Scores Absolute deviations from the group mean.
We have changed the data to being a measure of how much each score differed from its group mean. In other words, each score is now a measure of variability. We can see in the absolute deviations that the original scores in group 1 did not vary much from their group mean (and thus didn’t vary much from each other). Group 1Group Mean=10Mean=30 Group 1Group Original Scores Absolute deviations from the group mean.
Levene’s test simply performs an ANOVA (with only 2 groups you could use a t test) to see if the mean of the deviations differ significant in the two groups, which tells you whether the variance of the original scores differ significantly. H0: μ 1 = μ 2 Ha: μ 1 μ 2 Group 1Group Mean=.5Mean=3 Absolute deviations from the group mean.
Result of the ANOVA on the absolute deviation scores. F(1,6)=15.00, p=.008, so we can conclude that the difference in variances among the groups (original scores) was statistically significant. Group 1Group Group 1Group Mean=.5Mean=3 Original ScoresAbsolute deviations from the group mean.
Our Example ControlB. M.P.A.C.C.Gestalt ControlB. M.P.A.C.C.Gestalt M=.67M=.89M=.44M=.67M=1.33 Original data Absolute deviation scores. For the |deviations|, F(4,10)=0.782, p=.562
Levene’s: Considerations The previous example would actually have a problem in real life, Levene’s test is not accurate when there are very small N’s in each group. This problem becomes negligible when you have 10 or more scores per group. Also, you should know that since Levene’s procedure involves simply applying ANOVA to the absolute mean deviation scores, that Levene’s too has the assumption that the absolute deviation scores are normally distributed and that the groups have equal variances.
Levene’s: Assumptions Levene’s test is fairly robust to violations of the ANOVA assumptions. A study by Brown and Forsythe (1974), however, suggests that if the populations (of the original scores) are fat tailed that you use the ’10 percent trimmed mean’ instead of the mean when finding the absolute mean deviations, and if the populations are skewed that you find the absolute deviation from the median rather than the absolute deviation from the mean. To find the 10 percent trimmed mean first chop off the 10% highest scores, and then the 10% lowest scores before finding the mean.
Levene’s: Assumptions The problem with using either the 10% trimmed mean or the median is that SPSS will do Levene’s for you using the mean, if you want to use the 10 percent trimmed mean or the median you will have to do it yourself in a similar fashion to what I did in demonstrating how Levene’s works, you find the correct deviations and then do an ANOVA on them. Brown, M. G. & Forsythe, A. B. (1974) Robust tests for the equality of variances. Journal of the American Statistical Association, 69,