Presentation is loading. Please wait.

Presentation is loading. Please wait.

Math Interlude: Signs, Symbols, and ANOVA nuts and bolts.

Similar presentations


Presentation on theme: "Math Interlude: Signs, Symbols, and ANOVA nuts and bolts."— Presentation transcript:

1 Math Interlude: Signs, Symbols, and ANOVA nuts and bolts

2 Remember these? Σ= Summation  = Product ! = Factorial Just shorthand!!

3 Example Take 5 data points: 5, 6, 7, 9, 10 Represent them generally as X 1 to X 5 (the observations are indexed by subscripts i): Just a shorthand way to write “add up all five data points”!

4 Example Take 5 data points: 5, 6, 7, 9, 10 Represent them generally as X 1 to X 5 (the observations are indexed by subscripts i):

5 More summation… In general,

6 Working with summation… In general,

7 Practice (just for fun!)…

8 Products… = 5(6)(7)(9)(10) = 18,900 5 data points again: 5, 6, 7, 9, 10

9 Factorials… 5! = 5(4)(3)(2)(1) note: 0! = 1 by convention

10 Review of math symbols… X = often used to indicate the independent (predictor) variable Y = often used to indicate the dependent (or outcome) variable X i = the ith observation of X X ij = in a table, the observation in row i and column j µ y = the “true” (population or theoretical) mean of Y Y = the sample mean of Y  x 2 = the true variance of X /  x = the true standard deviation S x 2 = Sample variance/ S x = Sample standard dev

11 ANOVA: Concepts Review When do you use ANOVA? What assumptions are you making when you use ANOVA? What is the null hypothesis in ANOVA? What does a “significant” ANOVA mean? Why not just do a bunch of t-tests to compare means?

12 ANOVA example S1 a, n=28 aS2 b, n=25 bS3 c, n=21 cP-value d d Calcium (mg)Mean117.8158.7206.50.000 SD e e 62.470.586.2 Iron (mg)Mean2.0 0.854 SD0.6 Folate (μg)Mean26.638.742.60.000 SD13.114.515.1 Zinc (mg)Mean1.91.51.30.055 SD1.01.20.4 a School 1 (most deprived; 40% subsidized lunches). b School 2 (medium deprived; <10% subsidized). c School 3 (least deprived; no subsidization, private school). d ANOVA; significant differences are highlighted in bold (P<0.05). Mean micronutrient intake from the school lunch by school FROM: Gould R, Russell J, Barker ME. School lunch menus and 11 to 12 year old children's food choice in three secondary schools in England- are the nutritional standards being met? Appetite. 2006 Jan;46(1):86-92. Does this example meet the assumptions of ANOVA?

13 ANOVA: Concepts Review On a piece of paper, represent the relationship between school group and calcium as a linear regression equation (no you do not need a computer to calculate this!) predictor=school group outcome=calcium no other predictors in the model set the most deprived school as your reference group round numbers for ease!

14 New Stuff: The Math Behind ANOVA It’s like this: If I have three groups to compare: I could do three pair-wise ttests, but this would increase my type I error So, instead I want to look at the pairwise differences “all at once.” To do this, I can recognize that variance is a statistic that let’s me look at more than one difference at a time…

15 The ANOVA “F-test” Is the difference in the means of the groups more than background noise (=variability within groups)? Summarizes the mean differences between all groups at once. Analogous to pooled variance from a ttest.

16 Side Note: The F-distribution The F-distribution is a continuous probability distribution that depends on two parameters n and m (numerator and denominator degrees of freedom, respectively): http://www.econtools.com/jevons/java/Graphics2D/FDist.html

17 The F-distribution A ratio of variances follows an F-distribution: The F-test tests the hypothesis that two variances are equal. F will be close to 1 if sample variances are equal.

18 For example… Randomize 33 subjects to three groups: 800 mg calcium supplement vs. 1500 mg calcium supplement vs. placebo. Compare the spine bone density of all 3 groups after 1 year.

19 Group means and standard deviations Placebo group (n=11): Mean spine BMD =.92 g/cm 2 standard deviation =.10 g/cm 2 800 mg calcium supplement group (n=11) Mean spine BMD =.94 g/cm 2 standard deviation =.08 g/cm 2 1500 mg calcium supplement group (n=11) Mean spine BMD =1.06 g/cm 2 standard deviation =.11 g/cm 2

20 PLACEBO800mg CALCIUM1500 mg CALCIUM 0.7 0.8 0.9 1.0 1.1 1.2 S P I N E Between group variation Spine bone density vs. treatment Within group variability

21 The F-Test The size of the groups. The difference of each group’s mean from the overall mean. Between-group variation. The average amount of variation within groups. Each group’s variance. Large F value indicates that the between group variation exceeds the within group variation (=the background noise).

22 How to calculate ANOVA’s by hand… Treatment 1Treatment 2Treatment 3Treatment 4 y 11 y 21 y 31 y 41 y 12 y 22 y 32 y 42 y 13 y 23 y 33 y 43 y 14 y 24 y 34 y 44 y 15 y 25 y 35 y 45 y 16 y 26 y 36 y 46 y 17 y 27 y 37 y 47 y 18 y 28 y 38 y 48 y 19 y 29 y 39 y 49 y 110 y 210 y 310 y 410 n=10 obs./group k=4 groups The group means The (within) group variances

23 Sum of Squares Within (SSW), or Sum of Squares Error (SSE) The (within) group variances + ++ Sum of Squares Within (SSW) (or SSE, for chance error) Terminology Note: “Sum of squares” is just a fancy way of saying the numerator of a variance. Sum of squares divided by degrees of freedom = variance Variance times degrees of freedom = sum of squares

24 Sum of Squares Between (SSB), or Sum of Squares Regression (SSR) Sum of Squares Between (SSB). Variability of the group means compared to the grand mean (the variability due to the treatment). Overall mean of all 40 observations (“grand mean”)

25 Total Sum of Squares (SST) Total sum of squares(TSS). Squared difference of every observation from the overall mean. (numerator of variance of Y!)

26 Partitioning of Variance = + SSW + SSB = TSS

27 ANOVA Table Between (k groups) k-1 SSB (sum of squared deviations of group means from grand mean) SSB/k-1 Go to F k-1,nk-k chart Total variation nk-1TSS (sum of squared deviations of observations from grand mean) Source of variation d.f. Sum of squares Mean Sum of Squares F-statisticp-value Within (n individuals per group) nk-k SSW (sum of squared deviations of observations from their group mean) s 2= SSW/nk-k TSS=SSB + SSW

28 ANOVA=t-test Between (2 groups) 1 SSB (squared difference in means multiplied by n) Squared difference in means times n Go to F 1, 2n-2 Chart  notice values are just (t 2n-2 ) 2 Total variation 2n-1TSS Source of variation d.f. Sum of squares Mean Sum of SquaresF-statisticp-value Within2n-2SSW equivalent to numerator of pooled variance Pooled variance

29 Numerical Example Treatment 1Treatment 2Treatment 3Treatment 4 60 inches504847 67524967 42435054 67 5567 56675668 62596165 64676165 59646056 72635960 71656465

30 Example Treatment 1Treatment 2Treatment 3Treatment 4 60 inches504847 67524967 42435054 67 5567 56675668 62596165 64676165 59646056 72635960 71656465 Step 1) calculate the sum of squares between groups: Mean for group 1 = 62.0 Mean for group 2 = 59.7 Mean for group 3 = 56.3 Mean for group 4 = 61.4 Grand mean= 59.85 SSB = [(62-59.85) 2 + (59.7-59.85) 2 + (56.3-59.85) 2 + (61.4-59.85) 2 ] xn per group= 19.65x10 = 196.5

31 Example Treatment 1Treatment 2Treatment 3Treatment 4 60 inches504847 67524967 42435054 67 5567 56675668 62596165 64676165 59646056 72635960 71656465 Step 2) calculate the sum of squares within groups: (60-62) 2 + (67-62) 2 + (42-62) 2 + (67-62) 2 + (56-62) 2 + (62- 62) 2 + (64-62) 2 + (59-62) 2 + (72-62) 2 + (71-62) 2 + (50- 59.7) 2 + (52-59.7) 2 + (43- 59.7) 2 + 67-59.7) 2 + (67- 59.7) 2 + (69-59.7) 2 …+….(sum of 40 squared deviations) = 2060.6

32 Step 3) Fill in the ANOVA table 3 196.5 65.51.14.344 362060.657.2 Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value Between Within Total 39 2257.1

33 Step 3) Fill in the ANOVA table 3 196.5 65.51.14.344 362060.657.2 Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value Between Within Total 39 2257.1 Coefficient of Determination: How much of the variance in height is “explained by” treatment group? R 2= “Coefficient of Determination” = SSB/TSS = 196.5/2275.1=9%

34 Coefficient of Determination The amount of variation in the outcome variable (dependent variable) that is “explained by” the predictor (independent variable).

35 Beyond one-way ANOVA Often, you may want to test more than 1 treatment. ANOVA can accommodate more than 1 treatment or factor, so long as they are independent. Again, the variation partitions beautifully! TSS = SSB1 + SSB2 + SSW

36 Calculating ANOVA from grouped data… S1 a, n=25 aS2 b, n=25 bS3 c, n=25 cP-value d d Calcium (mg)Mean117.8158.7206.50.000 SD e e 62.470.586.2 Iron (mg)Mean2.0 0.854 SD0.6 Folate (μg)Mean26.638.742.60.000 SD13.114.515.1 Zinc (mg) Mean1.91.51.30.055 SD1.01.20.4 a School 1 (most deprived; 40% subsidized lunches). b School 2 (medium deprived; <10% subsidized). c School 3 (least deprived; no subsidization, private school). d ANOVA; significant differences are highlighted in bold (P<0.05). Table 6. Mean micronutrient intake from the school lunch by school FROM: Gould R, Russell J, Barker ME. School lunch menus and 11 to 12 year old children's food choice in three secondary schools in England- are the nutritional standards being met? Appetite. 2006 Jan;46(1):86-92.

37 Answer Step 1) calculate the sum of squares between groups: Mean for School 1 = 117.8 Mean for School 2 = 158.7 Mean for School 3 = 206.5 Grand mean: 161 SSB = [(117.8-161) 2 + (158.7-161) 2 + (206.5-161) 2 ] x25 per group= 98,113

38 Answer Step 2) calculate the sum of squares within groups: S.D. for S1 = 62.4 S.D. for S2 = 70.5 S.D. for S3 = 86.2 Therefore, sum of squares within is: (24)[ 62.4 2 + 70.5 2 + 86.2 2 ]=391,066

39 Answer Step 3) Fill in your ANOVA table Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value Between 298,113490569 <.05 Within72 391,0665431 Total74 489,179 **R 2 =98113/489179=20% School “explains” 20% of the variance in lunchtime calcium intake in these kids.

40 ANOVA reminders… A statistically significant ANOVA (F-test) only tells you that at least two of the groups differ, but not which ones differ. Determining which groups differ (when it’s unclear) requires more sophisticated analyses to correct for the problem of multiple comparisons…

41 ANOVA reminders… A statistically significant ANOVA does not imply a clinically significant difference between the groups! For example, if we had found a significant 10-mg difference in calcium, this probably would not be enough to care…

42 ANOVA reminders… When data are not normally distributed and sample size is small, use non- parametric ANOVA equivalent…

43 ANOVA reminders ANOVA is just linear regression!


Download ppt "Math Interlude: Signs, Symbols, and ANOVA nuts and bolts."

Similar presentations


Ads by Google