1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii
2 Purpose of ANOVA Use one-way Analysis of Variance to test when the mean of a variable (Dependent variable) differs among three or more groups –For example, compare whether systolic blood pressure differs between a control group and two treatment groups
3 Purpose of ANOVA One-way ANOVA compares three or more groups defined by a single factor. –For example, you might compare control, with drug treatment with drug treatment plus antagonist. Or might compare control with five different treatments. Some experiments involve more than one factor. These data need to be analyzed by two-way ANOVA or Factorial ANOVA. –For example, you might compare the effects of three different drugs administered at two times. There are two factors in that experiment: Drug treatment and time.
4 Why not do repeated t-tests? Rather than using one-way ANOVA, you might be tempted to use a series of t tests, comparing two groups each time. Don’t do it. Repeated t-test increase the chances of type I error or multiple comparison problem If you are making comparison between 5 groups, you will need 10 comparison of means When the null hypothesis is true the probability that at least 1 of the 10 observed significance levels is less than 0.05 is about 0.29
5 Why not do repeated t-tests? With 10 means (45 comparisons), the probability of finding at least one significant difference is about 0.63 In other words, when level of significance is.05, there is a 1 in 20 chance that one t-test will yield a significant result even when the null hypothesis is true. The more t-test the more that probability will increase
6 What Does ANOVA Do? ANOVA involves the partitioning of variance of the dependent variable into different components: –A. Between Group Variability –B. Within Group Variability More Specifically, The Analysis of Variance is a method for partitioning the Total Sum of Squares into two Additive and independent parts.
7 Definition of Total Sum of Squares or Variance Case Group 1 Group 2… Group p 1X 11 X 21 …X p1 2X 12 X 22 …X p2 3X 13 X 23 …X p3 ……….. nX 1n X 2n..X pn Summed across all n times p observations Grand average
8.j Definition of Between Sum of Squares Case Group 1 Group 2… Group p 1X 11 X 21 …X p1 2X 12 X 22 …X p2 3X 13 X 23 …X p3 ……….. nX 1n X 2n..X pn Average of group j Grand average Sum of squared differences of group means from the grand mean is SS B
9 Definition of Within Sum of Squares Case Group 1 Group 2… Group p 1X 11 X 21 …X p1 2X 12 X 22 …X p2 3X 13 X 23 …X p3 ……….. nX 1n X 2n..X pn Sum of squared difference of observations from group means Observations Group mean
10 Partitioning of Variance into Different Components Total sum of squares Between groups sum of squares Within groups sum of squares
11 Test Statistic in ANOVA Test statistic for ANOVA is based on between & within groups SS
12 Test Statistic in ANOVA F = Between group variability / Within group variability –The source of Within group variability is the individual differences. –The source of Between group variability is effect of independent or grouping variables. –Within group variability is sampling error across the cases –Between group variability is effect of independent groups or variables
13 Steps in Test of Hypothesis 1.Determine the appropriate test 2.Establish the level of significance:α 3.Determine whether to use a one tail or two tail test 4.Calculate the test statistic 5.Determine the degree of freedom 6.Compare computed test statistic against a tabled/critical value Same as Before
14 1. Determine the Appropriate Test Independent random samples have been taken from each population Dependent variable population are normally distributed (ANOVA is robust with regards to this assumption) Population variances are equal (ANOVA is robust with regards to this assumption) Subjects in each group have been independently sampled
15 2. Establish Level of Significance α is a predetermined value The convention α =.05 α =.01 α =.001
16 3. Use a Two Tailed Test H o : 1 = 2 = 3 = 4 Where 1 = population mean for group 1 2 = population mean for group 2 3 = population mean for group 3 4 = population mean for group 4 H 1 = not H o
17 3. Use a Two Tailed Test H a = not H o The alternative hypothesis does not specify whether – 1 2 or – 2 3 or – 1 3
18 4. Calculating Test Statistics F = (SS b / df B ) / (SS w / df w ) Sum of square between Degrees of freedom between Sum of square within Degrees of freedom within
19 4. Calculating Test Statistics By dividing the sum of the squared deviations by degrees of freedom, we are essentially computing an “average” (or mean) amount of variation The specific name for the numerator of the F statistic is the mean square between (the average amount of between-group variation The specific name for the denominator of the F statistic is the mean square within (the average amount of within- group variation)
20 5. Determine Degrees of Freedom Degrees of freedom between –df B = k – 1 –K = number of groups Degrees of freedom within –df w = N – k –N = total number of subjects in the study
21 6. Compare the Computed Test Statistic Against a Tabled Value α =.05 If F c > F α Reject H 0 If F c > F α Can not Reject H 0
22 Example Suppose we had patients with myocardial infarction in the following groups: –Group 1: A music therapy group –Group 2: A relaxation therapy group –Group 3: A control group 15 patients are randomly assigned to the 3 groups and then their stress levels are measured to determine if the interventions were effective in minimizing stress.
23 Example Dependent Variable – The stress scores. The ranges are from zero (no stress) to 10 (extreme stress) Independent Variable or Factor – Treatment Conditions(3 levels)
24 Observations
25 Sum of Squares for Each Group Group 1 0 Group 2 1 Group SS 1 = 20SS 2 = 10 SS 3 = 16 n1=5n2= 5n3 = 5
26 SS Within
27 Number of cases SS Between Group 2 average Group 1 average Group 3 average Grand average
28 Sum of Squares Total
29 Components of Variance SS Total = SS Between + SS Within 116 =
30 Degrees of Freedom Df between = 3 -1 Df within = df B = k – 1 df w = N – k
31 Test Statistic MS Between = 70 / 2 = 35 MS Within = 46 / 12 = 3.83 F c = MS Between / MS Within F c = 35 / 3.83 = 9.13
32 Lookup Critical Value F α = 3.88
33 Conclusions F c = 9.13 > F α = 3.88 F c > F α Therefore Reject H 0
34 One-way ANOVA Summary SourceSSDFMSFcFc FαFα Between Within Total11614
35 Multiple Comparison Groups F test does not tell which pair are not equal Additional analysis is necessary to answer which pair are not equal
36 Fisher’s LSD Test These are the null and alternative hypothesis being tested –H o1 : µ1 = µ2H a1 : µ1 µ2 –H o2 : µ1 = µ3 Ha2 : µ1 µ3 –H o3 : µ2 = µ3H a3 : µ2 µ3
37 Fisher’s LSD Test Known as the protected t-test The least difference between means needed for significance Df = N – K Use the following formula:
38 Calculation of LSD All pairs for means differing by at least 2.70 points on the stress scale would be significantly different from on another.
39 Application to Three Samples Mean 1 – Mean 2 = 1 Mean 3 – Mean 1 = 4 Mean 3 – Mean 2 = 5 Alternative Hypotheses: H o1 :µ1 = µ2 Not Rejected H o2 :µ1 = µ3 Rejected H o3 :µ2 = µ3 Rejected
40 Use of SPSS in ANOVA
41 Data in SPSS Input Format Stress ScoreGroups
42 SPSS Output for ANOVA Descriptives Stress Levels Music Therapy Relaxation Therapy Control Group NMeanStd. DeviationStd. Error 95% Confidence Interval for MeanMinimumMaximum Lower Bound Upper Bound Total
43 SPSS Output for ANOVA Test of Homogeneity of Variances Stress Levels. Levene Statisticdf1df2 Sig level or p- value Stress Levels Between Groups Within Groups Sum of Squaresdf Mean SquareF Sig.level or p- value Total P<.05, therefore, we reject the Null Hypothesis and continue with Multiple Comparison Table P >.05, therefore, th assumption of Homogeneity of Variance is met. ANOVA
44 SPSS Output for ANOVA Multiple Comparisons Dependent Variable: Stress Levels LSD Music TherapyRelaxation Therapy Control Group (*) Relaxation Therapy Music Therapy Control Group (*) Control GroupMusic Therapy 4.000(*) Relaxation Therapy 5.000(*) (I) Groups(J) Groups Mean Difference (I-J)Std. Error Sig. Level95% Confidence Interval * The mean difference is significant at the.05 level.
45 Take home lesson How to compare means of three or more samples