Download presentation
Presentation is loading. Please wait.
Published byCynthia Wright Modified over 8 years ago
1
DSCI 346 Yamasaki Lecture 4 ANalysis Of Variance
2
So far we have looked at 1 and 2 sample problems. We now generalize to more than 2 groups. Group 1 Group 2Group 3 x1x1 x2x2 x3x3 x Notation: x ij = jth observation in ith group DSCI Lect 4 (23 pages)2
3
So total variability is broken down into two parts: 1)Variability due to different means (SSB) 2)Variability due to variation about each mean (SSW) DSCI Lect 4 (23 pages)3
4
Basic assumptions: 1) Normality 2) Equal variance for all groups 3) Independent data points Recall: DSCI Lect 4 (23 pages)4
5
Consider the following situation: H 0 : ... H A : not all equal If H 0 is true, then x i ’s should be “close” to each other. So SSB = n i (x i – x) 2 should be “relatively small If there are a lot of groups, then we will be adding up a lot of numbers and the sum will have a tendency to get large just because it is made up of more deviations even if each deviation is small, similar to the chi square situation. So to take this into account, divide SSB by k-1 SSTB (k-1) = MSB (Mean Square Between) DSCI Lect 4 (23 pages)5
6
If H 0 is true, then MSB is relatively small; when H 0 is false, then MSB gets larger. As before, the $1,000,000 question is “How large is too large?” Consider OH NO! Not another table! Yes, another table; but it beats the living daylights out of the alternative of trying to calculate the distribution by hand. DSCI Lect 4 (23 pages)6
7
Note: What we have is essentially, the ratio of two chi square variables, each with its own degrees of freedom. The numerator degrees of freedom( 1 ) is k-1, where k is the number of groups. The denominator degrees of freedom ( 2 ) is N-k where N is the total number of observations in all the groups combined. In these tables, rows correspond to the denominator degrees of freedom, the columns correspond to the numerator degrees of freedom, and the tables correspond to different significance levels. Note that the numerator df and the denominator df are not interchangable F 3,2,05 = 19.16 = F 2.3,.05 = 9.55 DSCI Lect 4 (23 pages)7
8
Example: The table below shows the results of a study on the average dollar value for orders made by individuals at the four locations of a sandwich company LocationSample SizeMeanVariance Location 18$7.007.341 Location 28$9.008.423 Location 38$8.007.632 Location 48$9.005.016 Do the data provide sufficient evidence to indicate differences in the mean dollar values? Test using =.05 DSCI Lect 4 (23 pages)8
9
Typical way of reporting (ANOVA table) SourceSSdfMSF BetweenSSBk-1 MSB MSB/MSW (a.k.a. among) WithinSSWN-k MSW (a.k.a. error) Total SST N-1 DSCI Lect 4 (23 pages)9
10
SourceSSdfMSF Between22.00 3 7.3333 1.03 Within198.8840 28 7.1030 Total 220.8840 31 From F table F 3,28,.05 = 2.95 H 0 : 1 = 2 = 3 = 4 H A : not all equal =.05 Since F=1.03 is not larger than 2.95 we cannot reject null hypothesis. Conclude insufficient evidence to indicate a difference in average dollar values for orders. DSCI Lect 4 (23 pages)10
11
Notes: 1) Since everything is squared (like chi square) there is no such thing as a one tailed test. 2) If k=2, then this test is equivalent to 3) Even though theory is based on normality, in practice normality isn’t that important 4) If we reject H 0, we may want to know specifically which means are different. There are several approaches available, but not in this class. 5) This technique can be generalized to include more than one grouping variable, or non independent groups. DSCI Lect 4 (23 pages)11
12
DSCI Lect 4 (23 pages) Multiple Comparisons: What to do if reject H 0 Suppose we have rejected the null hypothesis and now we want to know which means are different. Comparing Pairs of Population Means using Tukey-Cramer Multiple Comparisons for One-Way ANOVA H 0 : i = j H A : i ≠ j Example Group 1Group 2Group 3 Sample mean-1.752.452.58 Sample size 89 9193 MSW = 26.182 F test rejected that all population means equal 12
13
DSCI Lect 4 (23 pages) 13
14
DSCI Lect 4 (23 pages) |x i – x j | 1 vs 24.20 1.785Yes 1 vs 34.33 1.827Yes 2 vs 30.13 1.818No Significant? 14
15
What we just introduced is known as One-Way ANOVA (a generalization of the two independent population means problem. In the two population means problem we talked about pairing (non-independent) as a way to control for the impact of other variables. The concept of non-independence can also arise in the ANOVA situation, in which case they are known as blocks. Consider the following example: A bank is comparing the average house appraisal value between three appraisal companies. If a random sample from each company is chosen, problems could arise due to different characteristics of the houses chosen. To control for this, the bank will have each appraisal company appraise the same houses; the houses are the blocks DSCI Lect 4 (23 pages)15
16
DSCI Lect 4 (23 pages) Randomized Complete Block ANOVA Randomized Complete Block ANOVA is similar to One-Way ANOVA, but has an additional sum of squares, SSBL, to compute and the way the pieces are obtained is a little different Here SST = SSB + SSBL + SSE, where SSE (error) takes the place of SSW and is obtained by first calculating SST Where X j is the average of the jth block and k is the number of levels of the blocks; just like we calculated SSB Just like s 2. Note you can calculate SST for one-way in the same fashion 16
17
DSCI Lect 4 (23 pages) Example: Comparing three appraisal companies using the same five houses House(Block)Co1Co2Com3Block Average 1 78 8279 79.667 210210299101.00 3 68 7470 70.667 4 83 8886 85.667 5 95 9992 95.333 Ave85.2 8985.2 17
18
ANOVA table SourceSSdfMSF BetweenSSBk-1MSBMSB/MSW BlocksSSBLb-1MBBLMSBL/MSW ErrorSSW(k-1)(b-1)MSE Total SST N-1 DSCI Lect 4 (23 pages)18
19
ANOVA table SourceSSdfMS F Between 48.132224.066 8.519 Blocks1759.0024439.751155.671 Error 22.5998 2.825 Total 1829.733 14 F 2,8,05 = 4.459 Since 8.519 >4.459, sufficient evidence to conclude not all average appraisals are the same. DSCI Lect 4 (23 pages)19
20
DSCI Lect 4 (23 pages) Comparing Pairs of Population Means using Tukey-Cramer Multiple Comparisons for Randomized Blocks H 0 : i = j H A : i ≠ j Group 1Group 2Group 3 Sample mean85.289.085.2 Sample size 5 55 MSE = 2.825 F test rejected that all population means equal 20
21
DSCI Lect 4 (23 pages) 21
22
DSCI Lect 4 (23 pages) |x i – x j | 1 vs 23.80 3.037Yes 1 vs 30.00 3.037No 2 vs 33.80 3.037Yes Significant? 22
23
DSCI Lect 4 (23 pages) One last use of F Table Remember we talked about the small sample t-test for the difference of two means only being valid if the two population variances are equal? We use the F tables to test the hypothesis of equality. F = s 1 2 /s 2 2, where s 1 2 >s 2 2 and use the corresponding df’s 23
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.