Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas The F table Comparing pairs of means Relationship between ANOVA and t-tests
Research Designs Like the t-test, ANOVA is used when the independent variable in a study is categorical (e.g. treatment groups) and the dependent variable is continuous. A t-test is used when there are only two groups to compare. ANOVA is used when there are two or more groups
Research Designs Terminology In analysis of variance, an independent variable is called a factor. (e.g. treatment groups) The groups that make up the independent variable are called levels of that factor. (e.g. low dose, high dose, control) A research study that involves only one factor is called a single- factor design. A research study that involves more than one factor is called a factorial design (e.g. treatment group and gender)
Research Designs Suppose that a psychologist wants to examine learning performance under three temperature conditions: 50 o, 70 o, 90 o. Three samples of subjects are selected, one sample for each treatment condition. The subjects are taught some basic material and given a short test. Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X = 4X = 1 Factor: temperature Levels: 50, 70, 90
Research Designs Why don’t we just run three t-tests? compare 50 o sample and 70 o sample compare 50 o sample and 90 o sample compare 70 o sample and 90 o sample
Type I Error and Multiple Tests With an =.05, there is a 5 % risk of a Type I error. Thus, for every 20 hypothesis tests, you expect to make one Type I error. The more tests you do, the more risk there is of a Type I error. Testwise alpha level The alpha level you select for each individual hypothesis test. Experimentwise alpha level The total probability of a Type I error accumulated from all of the separate tests in the experiment. e.g. three t-tests, each at =.05 -> an experimentwise alpha of.14
The Logic of ANOVA ANOVA uses one test with one alpha level to evaluate all the mean differences, thereby avoiding the problem of an inflated experimentwise alpha level H o : 1 = 2 = 3 H A : at least one population mean is different
The Logic of ANOVA There is some variability among the treatment means Is it more than we might expect from chance alone? Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1
The Logic of ANOVA t statistic t = obtained difference between sample means difference expected by chance F statistic F = obtained variance between sample means variance expected by chance Random variation + effect
The Logic of ANOVA
Let’s talk about how to calculate this first The Logic of ANOVA (s b 2 ) (s w 2 ) F statistic F = obtained variance between sample means variance expected by chance F = Between-Group Variance Within-Group Variance
The Logic of ANOVA A summary of how much each datapoint varies from its own treatment mean Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 Within-Groups Variance
The Logic of ANOVA Within Groups variance is our estimate of random variation Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 -use the variance of the datapoints around their own treatment mean -Pooled variance!
The Logic of ANOVA Pooled variance This comes from the “within- treatment” variabilities Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 SS 1 = 6 SS 2 = 6 SS 3 = 4
The Logic of ANOVA Pooled variance we now call Within-Groups variance SS 1 = 6 SS 2 = 6 SS 3 = 4 Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1
The Logic of ANOVA A summary of how much each datapoint varies from its own treatment mean Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 Within-Groups Variance
Now let’s calculate this The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 F statistic F = Between-Groups Variance Within-Groups Variance (s b 2 ) (s w 2 )
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 Between-Group Variance A summary of how much each treatment mean varies from the grand mean
The Logic of ANOVA So what is the variance between our treatment means? This is the “between-treatment” variability Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 We need to calculate SS between
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 First look at the deviation between each treatment mean and the “grand mean”.
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 We need to square those deviations
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 And weight them by the sample size of each treatment group
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 Now add them up to get SS between SS between = 30
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 Between-Groups Variance But what is the df b ? SS between = 30
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 SS between = 30 Between-Groups Variance
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 Between-Group Variance A summary of how much each treatment mean varies from the grand mean
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 So what is our test statistic? F statistic F = Between-Groups Variance Within-Groups Variance (s b 2 ) (s w 2 )
The Logic of ANOVA Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 What are its degrees of freedom? There are two! One for the numerator (df between ) One for the denominator (df within )
The Logic of ANOVA F statistic F = obtained variance between sample means variance expected by chance F = between-treatment variance within-treatment variance F = s b 2 s w 2 These are the two dfs for the F test
Step by Step Step 1: Calculate each treatment mean and the grand mean Step 2: Calculate the SS w, the df w, and the s w 2 Step 3: Calculate the SS b, the df b, and the s b 2 Step 4: Calculate F Step 5: Compare to critical values of F (using dfs and the F table)
ANOVA table SourceSSdfs2s2 F Between treatments Within treatments Total
Computations Step 1: Calculate each treatment mean and the grand mean Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1
Computations SS 1 = 6 SS 2 = 6 SS 3 = 4 Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 Step 2: Calculate the SS w, the df w, and the s w 2
Computations Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1 Step 3: Calculate the SS b, the df b, and the s b 2
Computations Step 4: Calculate F Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1 X 3 = 1
ANOVA table SourceSSdfs2s2 F Between treatments Within treatments Total
ANOVA table SourceSSdfs2s2 F Between treatments Within treatments Total
ANOVA table SourceSSdfs2s2 F Between treatments Within treatments Total4614 Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4 X 1 = 1 X 3 = 1 SS total = 46 SS between = 30SS within = 16
“Partitioning of Variance” Total = Between + Within
“Partitioning of Variance” Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o X 2 = 4X 1 = 1X 3 = 1 6 = 2 + (2) + (2)
“Partitioning of Variance” Treatment 1 50 o Treatment 2 70 o Treatment 3 90 o 0 = 2 + (-1) + (-1) 1= 2 + (-1) + (0) 3= 2 + (-1) + (2) 1= 2 + (-1) + (0) 0= 2 + (-1) + (-1) 4 = 2 + (2) + (0) 3 = 2 + (2) + (-1) 6 = 2 + (2) + (2) 3 = 2 + (2) + (1) 4 = 2 + (2) + (0) 1 = 2 + (-1) + (0) 2 = 2 + (-1) + (1) 2 = 2 + (-1) + (1) 0 = 2 + (-1) + (-1) 0 = 2 + (-1) + (-1) X 2 = 4X 1 = 1X 3 = 1
ANOVA formulas SourceSSdfs2s2 F Between treatments Within treatments Total (MS)
The F Distribution F statistics Because F-ratios are computed from two variances, F values will always be positive numbers. When H o is true, the numerator and denominator of the F- ratio are measuring the same variance. In this case the two sample variances should be about the same size, so the ratio should be near 1. In other words, the distribution of F-ratios should pile up around 1.00.
The F Table F Distribution –Positively skewed distribution: Concentration of values near 1, no values smaller than 0 are possible –Like t, F is a family of curves. Need to use two degrees of freedom (for s b 2 and s w 2 ).
The F Table For an alpha of.05, the critical value of F with degrees of freedom 2,12 is = 3.88 Our F = reject null
1. The samples are independent simple random samples 2. The populations are normal 3. The populations have equal variance Assumptions
Exercise The data depicted below were obtained from an experiment designed to measure the effectiveness of three pain relievers (A, B, and C). A fourth group that received a placebo was also tested. PlaceboDrug ADrug B Drug C Is there any evidence for a significant difference between groups?
SourceSSdfMS Between treatments Within treatments Total Exercise PlaceboDrug ADrug B Drug C
SourceSSdfMS Between treatments 54318F = 9 Within treatments 1682 Total7011 Exercise PlaceboDrug ADrug B Drug C F 3,8 * = 4.07, so reject null
Comparisons of Specific Means The technique to use depends on the nature of the research 1)Run an ANOVA 2)If ANOVA is significant, run post-hoc “multiple comparison” tests (e.g. LSD, Bonferroni, Tukey’s) No a priori Predictions If we do not have strong predictions about how the means will differ: 1)Run contrast tests on the expected differences Yes a priori Predictions If we do have strong predictions about how the means will differ: conservative!
Comparing F and t PlaceboDrug C What is the relationship between F and t? 1.Compute an independent-samples t statistic for this data 2.Compute an F statistic for this data t = F = 12.53
Comparing F and t What is the relationship between F and t? F = t 2