Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright (c) Bani K. Mallick1 STAT 651 Lecture # 12.

Similar presentations


Presentation on theme: "Copyright (c) Bani K. Mallick1 STAT 651 Lecture # 12."— Presentation transcript:

1 Copyright (c) Bani K. Mallick1 STAT 651 Lecture # 12

2 Copyright (c) Bani K. Mallick2 Topics in Lecture #12 The ANOVA F-test, and the basics of the F- table

3 Copyright (c) Bani K. Mallick3 Book Sections Covered in Lecture #12 Chapter 8.1-8.2

4 Copyright (c) Bani K. Mallick4 Relevant SPSS Tutorials ANOVA-GLM Post hoc tests

5 Copyright (c) Bani K. Mallick5 ANalysis Of VAriance We now turn to making inferences when there are 3 or more populations This is classically called ANOVA It is somewhat more formula dense than what we have been used to. Tests for normality are also somewhat more complex

6 Copyright (c) Bani K. Mallick6 ANOVA Suppose we form three populations on the basis of body mass index (BMI): BMI 28 This forms 3 populations We want to know whether the three populations have the same mean caloric intake, or if their food composition differs.

7 Copyright (c) Bani K. Mallick7 ANOVA If you do lots of 95% confidence intervals, you’d expect by chance that about 5% will be wrong Thus, if you do 20 confidence intervals, you expect 1 = 20 x 5% will not include the true population parameter This is a fact of life

8 Copyright (c) Bani K. Mallick8 ANOVA One procedure that is often followed is to do a preliminary test to see whether there are any differences among the populations Then, once you conclude that some differences exist, you allow somewhat more informality in deciding where those differences manifest themselves The first step is the ANOVA F-test

9 Copyright (c) Bani K. Mallick9 ANOVA Consider the ACS data, with 3 BMI groups and measuring the % calories from fat (first FFQ) What is your preliminary conclusion about differences in means/medians? About differences in variability? About massive outliers?

10 Copyright (c) Bani K. Mallick10 ANOVA Consider the ACS data, with 3 BMI groups and measuring the % calories from fat (first FFQ)

11 Copyright (c) Bani K. Mallick11 ANOVA The F-test is easy to compute, and provided in all statistical packages The populations are 1, 2, … t The sample sizes are The population means are The hypothesis to test is

12 Copyright (c) Bani K. Mallick12 ANOVA The data from population i are The sample mean from population i is The sample mean of all the data is The total sample size is the total number of observations, called

13 Copyright (c) Bani K. Mallick13 ANOVA The ANOVA Table (demo in SPSS in class of ACS data) “Analyze” “Compare Means” “1- way ANOVA”: I’ll now show you what each item is

14 Copyright (c) Bani K. Mallick14 ANOVA The sample mean from population i is The sample mean of all the data is The idea of the F-test is based on distances The distance of the data to the overall mean is TSS = Total Sum of Squares

15 Copyright (c) Bani K. Mallick15 ANOVA The distance of the data to the overall mean is TSS = Total Sum of Squares This has degrees of freedom

16 Copyright (c) Bani K. Mallick16 ANOVA Next comes the “Between Groups” row

17 Copyright (c) Bani K. Mallick17 ANOVA The sum of squares between groups is It has t-1 degrees of freedom, so the number of populations is the degrees of freedom between groups + 1.

18 Copyright (c) Bani K. Mallick18 ANOVA Next comes the “Within Groups” row

19 Copyright (c) Bani K. Mallick19 ANOVA The distance of the observations to their sample means is This is the Sum of Squares Within It has degrees of freedom

20 Copyright (c) Bani K. Mallick20 ANOVA Next comes the “Mean Squares” These are the different sums of squares divided by their degrees of freedom

21 Copyright (c) Bani K. Mallick21 ANOVA Next comes the F-statistic It is the ratio of the mean square between groups to the mean square within groups

22 Copyright (c) Bani K. Mallick22 ANOVA The F-statistic is

23 Copyright (c) Bani K. Mallick23 ANOVA The F-statistic is What values (large or small) indicate differences? Clearly large, since if the population means are equal, the sample means will be close, and the top will be near 0

24 Copyright (c) Bani K. Mallick24 Why do they call it ANOVA? ANOVA = ANalysis Of VAriance I want to show you the concept in graphs, because these become important in STAT 652 I will illustrate the idea with samples from two populations The first population will be in red The second in green When pooled I will use blue

25 Copyright (c) Bani K. Mallick25 Why do they call it ANOVA? The data from the first population The total distance of the observations to their sample mean is Y Y Y

26 Copyright (c) Bani K. Mallick26 Why do they call it ANOVA? I will use this funny symbol to denote total distance The total distance of the observations to their sample mean is Y Y Y

27 Copyright (c) Bani K. Mallick27 Why do they call it ANOVA? Consider two similar populations Summing the two symbols is the SSW = Sum of Squared distances Within the two samples Y Y Y

28 Copyright (c) Bani K. Mallick28 Why do they call it ANOVA? Now pool the two similar populations Y Y Y YYYY YY YY Y Y Y Y The Blue symbol represents the sum of squared distances of the total sample to the total sample mean The is the Total Sum of Squares, or TSS

29 Copyright (c) Bani K. Mallick29 Why do they call it ANOVA? Now pool the two similar populations Y Y Y YYYY YY YY Y Y Y Y The Blue symbol represents the sum of squared distances of the total sample to the total sample mean The is the Total Sum of Squares, or TSS Note how the SSW and the TSS are about the same: when this happens, it indicates equal means for the populations

30 Copyright (c) Bani K. Mallick30 Why do they call it ANOVA? Now note what happens if the population means are different Y Y Y Y Y Y Y YYYY Y Y Y Y Note how the TSS has greatly increased Note how the SSW are the same as before (remember, we add the squared distances separately for the two populations It is this phenomenon that the F-test measures

31 Copyright (c) Bani K. Mallick31 ANOVA The F-statistic is compared to the F- distribution with t-1 and degrees of freedom. See Table 8,which lists the cutoff points in terms of . If the F-statistic exceeds the cutoff, you reject the hypothesis of equality of all the means. SPSS gives you the p-value (significance level) for this test

32 Copyright (c) Bani K. Mallick32 ANOVA I have used the language of the book up to this point, which coincides with the SPSS 1- way ANOVA output. For our purposes, the general linear model is more useful. It allows us to compare all the populations simultaneously It also allows us to check assumptions about normality Unfortunately, it uses slightly different terminology

33 Copyright (c) Bani K. Mallick33 ANOVA For our purposes, the general linear model is more useful. ANOVA Procedure General Linear Model Procedure: note how F-statistics and p-values are identical

34 Copyright (c) Bani K. Mallick34 ANOVA For our purposes, the general linear model is more useful. ANOVA Baseline FFQ 960.2872480.1435.689.004 15275.63918184.396 16235.925183 Between Groups Within Groups Total Sum of SquaresdfMean SquareFSig. Tests of Between-Subjects Effects Dependent Variable: Baseline FFQ 960.287 a 2480.1435.689.004 196009.9191 2322.508.000 960.2872480.1435.689.004 15275.63918184.396 226223.216184 16235.925183 Source Corrected Model Intercept BMIGROUP Error Total Corrected Total Type III Sum of SquaresdfMean SquareFSig. R Squared =.059 (Adjusted R Squared =.049) a. Between Groups Corrected model or variable name (BMIGROUP)

35 Copyright (c) Bani K. Mallick35 ANOVA For our purposes, the general linear model is more useful. ANOVA Baseline FFQ 960.2872480.1435.689.004 15275.63918184.396 16235.925183 Between Groups Within Groups Total Sum of SquaresdfMean SquareFSig. Tests of Between-Subjects Effects Dependent Variable: Baseline FFQ 960.287 a 2480.1435.689.004 196009.9191 2322.508.000 960.2872480.1435.689.004 15275.63918184.396 226223.216184 16235.925183 Source Corrected Model Intercept BMIGROUP Error Total Corrected Total Type III Sum of SquaresdfMean SquareFSig. R Squared =.059 (Adjusted R Squared =.049) a. Within Groups Error

36 Copyright (c) Bani K. Mallick36 ANOVA For our purposes, the general linear model is more useful. ANOVA Baseline FFQ 960.2872480.1435.689.004 15275.63918184.396 16235.925183 Between Groups Within Groups Total Sum of SquaresdfMean SquareFSig. Tests of Between-Subjects Effects Dependent Variable: Baseline FFQ 960.287 a 2480.1435.689.004 196009.9191 2322.508.000 960.2872480.1435.689.004 15275.63918184.396 226223.216184 16235.925183 Source Corrected Model Intercept BMIGROUP Error Total Corrected Total Type III Sum of SquaresdfMean SquareFSig. R Squared =.059 (Adjusted R Squared =.049) a. Total Corrected Total

37 Copyright (c) Bani K. Mallick37 ANOVA If the populations have a common variance  2, the Mean squared error estimates it For 2-populations, the mean squared error equals the square of the pooled sd: Hence, estimated common s.d.

38 Copyright (c) Bani K. Mallick38 ANOVA There appears to be some difference 95% CI for difference in population means between high and low BMI groups is 2.26 to 10.20 95% CI for difference in population means between medium and low BMI groups is from -1.59 to 4.31 High & Medium: 1.08 to 8.65 Conclusions?

39 Copyright (c) Bani K. Mallick39 ANOVA The ANOVA Table (demo in SPSS in class of ACS data)

40 Copyright (c) Bani K. Mallick40 ANOVA The ANOVA Table Number of populations is t=3, so degrees of freedom for the model (BMIGROUP) is t-1 = 2

41 Copyright (c) Bani K. Mallick41 ANOVA The ANOVA Table Total sample size is 184, so the degrees of freedom for the corrected total is = 183

42 Copyright (c) Bani K. Mallick42 ANOVA The mean square between groups is 480.143 The mean square within groups is 84.396 The F-statistic is the ratio: 5.689

43 Copyright (c) Bani K. Mallick43 ANOVA The p-value is 0.004: what does this mean? What was the null hypothesis?

44 Copyright (c) Bani K. Mallick44 ANOVA The p-value is 0.004: what does this mean? What was the null hypothesis? That the populations means are all equal

45 Copyright (c) Bani K. Mallick45 ANOVA The p-value is 0.004: what does this mean? We have more than 99% confidence that the null hypothesis is false What was the null hypothesis? That the populations means are all equal


Download ppt "Copyright (c) Bani K. Mallick1 STAT 651 Lecture # 12."

Similar presentations


Ads by Google