Download presentation
Presentation is loading. Please wait.
Published byJared Maxwell Modified over 9 years ago
1
1 Always be mindful of the kindness and not the faults of others.
2
2 One-way Anova: Inferences about More than Two Population Means Model and test for one- way anova Assumption checking Nonparamateric alternative
3
3 Analysis of Variance & One Factor Designs (One-Way ANOVA) Y= RESPONSE VARIABLE (of numerical type) (e.g. battery lifetime) X = EXPLANATORY VARIABLE (of categorical type) (A possibly influential FACTOR) (e.g. brand of battery) OBJECTIVE: To determine the impact of X on Y
4
Completely Randomized Design (CRD) 4 Goal: to study the effect of Factor X The same # of observations are taken randomly and independently from the individuals at each level of Factor X i.e. n 1 =n 2 =…n c (c levels)
5
5 Example: Y = LIFETIME (HOURS) BRAND 3 replications per level 1 2 3 4 5 6 7 8 1.8 4.2 8.6 7.0 4.2 4.2 7.8 9.0 5.0 5.4 4.6 5.0 7.8 4.2 7.0 7.4 1.0 4.2 4.2 9.0 6.6 5.4 9.8 5.8 2.6 4.6 5.8 7.0 6.2 4.6 8.2 7.4 5.8
6
Analysis of Variance 6
7
7 Statistical Model C “levels” OF BRAND R observations for each level Y 11 Y 12 Y 1R Y ij Y 21 Y cI 1 2 C 1 2 R Y ij = + i + ij i = 1,....., C j = 1,....., R Y cR
8
8 Where = OVERALL AVERAGE i = index for FACTOR (Brand) LEVEL j = index for “replication” i = Differential effect associated with i th level of X (Brand i) = i – and ij = “noise” or “error” due to other factors associated with the (i,j) th data value. i = AVERAGE associated with i th level of X (brand i) = AVERAGE of i ’s.
9
9 Y ij = + i + ij By definition, i = 0 C i=1 The experiment produces R x C Y ij data values. The analysis produces estimates of c . (We can then get estimates of the ij by subtraction).
10
10 Y = Y i / C = “GRAND MEAN” (assuming same # data points in each column) (otherwise, Y = mean of all the data) i=1 c Let Y 1, Y 2, etc., be level means
11
11 MODEL: Y ij = + i + ij Y estimates Y i - Y estimates i (= i – ) (for all i) These estimates are based on Gauss’ (1796) PRINCIPLE OF LEAST SQUARES and on COMMON SENSE
12
12 MODEL: Y ij = + j + ij If you insert the estimates into the MODEL, (1) Y ij = Y + (Y j - Y ) + ij. it follows that our estimate of ij is (2) ij = Y ij – Y j, called residual < <
13
13 Then, Y ij = Y + (Y i - Y ) + ( Y ij - Y i ) or, (Y ij - Y ) = (Y i - Y ) + (Y ij - Y i ) { { { (3) TOTAL VARIABILITY in Y = Variability in Y associated with X Variability in Y associated with all other factors +
14
14 If you square both sides of (3), and double sum both sides (over i and j), you get, [after some unpleasant algebra, but lots of terms which “cancel”] (Y ij - Y ) 2 = R (Y i - Y ) 2 + (Y ij - Y i ) 2 C R i=1 j=1 { { i=1 CC R i=1 j=1 TSS TOTAL SUM OF SQUARES ==== SSB SUM OF SQUARES BETWEEN SAMPLES ++++ SSW (SSE) SUM OF SQUARES WITHIN SAMPLES ( ( ( ( ( (
15
15 ANOVA TABLE SOURCE OF VARIABILITY SSQDF Mean square (M.S.) Between samples (due to brand) Within samples (due to error) SSBC - 1 MSB SSB C - 1 SSW (R - 1) C SSW (R-1)C = MSW = TOTAL TSS RC -1
16
16 Example: Y = LIFETIME (HOURS) BRAND 3 replications per level 1 2 3 4 5 6 7 8 1.8 4.2 8.6 7.0 4.2 4.2 7.8 9.0 5.0 5.4 4.6 5.0 7.8 4.2 7.0 7.4 1.0 4.2 4.2 9.0 6.6 5.4 9.8 5.8 2.6 4.6 5.8 7.0 6.2 4.6 8.2 7.4 5.8 SSB = 3 ( [2.6 - 5.8] 2 + [4.6 - 5.8] 2 + + [7.4 - 5.8] 2 ) = 3 (23.04) = 69.12
17
17 (1.8 - 2.6) 2 =.64 (4.2 - 4.6) 2 =.16 (9.0 -7.4) 2 = 2.56 (5.0 - 2.6) 2 = 5.76 (5.4 - 4.6) 2 =.64 (7.4 - 7.4) 2 = 0 (1.0 - 2.6) 2 = 2.56 (4.2 - 4.6) 2 =.16 (5.8 - 7.4) 2 = 2.56 8.96.96 5.12 Total of (8.96 +.96 + + 5.12), SSW = 46.72 SSW =?
18
18 ANOVA TABLE Source of Variability SSQ df M.S. BRAND ERROR 69.12 46.72 7 = 8 - 1 16 = 2 (8) 9.87 2.92 TOTAL 115.84 23 = (3 8) -1
19
19 We can show: E (MSB) = 2 + “V COL ” { MEASURE OF DIFFERENCES AMONG LEVEL MEANS R C-1 ( i - ) 2 { ii ( ( E (MSW) = 2 (Assuming Y ij follows N( j 2 ) and they are independent)
20
20 E ( MSB C ) = 2 + V COL E ( MSW ) = 2 This suggests that if MSB C MSW > 1, There’s some evidence of non- zero V COL, or “level of X affects Y” if MSB C MSW < 1, No evidence that V COL > 0, or that “level of X affects Y”
21
21 With H O : Level of X has no impact on Y H I : Level of X does have impact on Y, We need MSB C MSW > > 1 to reject H O.
22
22 More Formally, H O : 1 = 2 = c = 0 H I : not all j = 0 OR H O : 1 = 2 = c H I : not all j are EQUAL (All level means are equal)
23
23 The distribution of MSB MSW = “F calc ”, is The F - distribution with (C-1, (R-1)C) degrees of freedom Assuming H O true. C = Table Value
24
24 In our problem: ANOVA TABLE Source of Variability SSQ df M.S. BRAND ERROR 69.12 46.72 7 16 9.87 2.92 = 9.87 2.92 F calc 3.38
25
25 =.05 C = 2.66 3.38 F table: table 8 (7,16 DF)
26
26 Hence, at =.05, Reject H o. (i.e., Conclude that level of BRAND does have an impact on battery lifetime.)
27
27 MINITAB INPUT lifebrand 1.81 5.01 1.01 4.22 5.42 4.22... 9.08 7.48 5.88
28
28 ONE FACTOR ANOVA (MINITAB) Analysis of Variance for life Source DF SS MS F P brand 7 69.12 9.87 3.38 0.021 Error 16 46.72 2.92 Total 23 115.84 MINITAB: STAT>>ANOVA>>ONE-WAY Estimate of the common variance ^2
29
29
30
30 Assumptions MODEL: Y ij = + i + ij 1.) the ij are indep. random variables 2.) Each ij is Normally Distributed E( ij ) = 0 for all i, j 3.) 2 ( ij ) = constant for all i, j Normality plot & test Residual plot & test Run order plot
31
31 Diagnosis: Normality The points on the normality plot must more or less follow a line to claim “normal distributed”. There are statistic tests to verify it scientifically. The ANOVA method we learn here is not sensitive to the normality assumption. That is, a mild departure from the normal distribution will not change our conclusions much. Normal probability plot & normality test of residuals
32
32 Minitab: stat>>basic statistics>>normality test
33
33 Diagnosis: Constant Variances The points on the residual plot must be more or less within a horizontal band to claim “constant variances”. There are statistic tests to verify it scientifically. The ANOVA method we learn here is not sensitive to the constant variances assumption. That is, slightly different variances within groups will not change our conclusions much. Tests and Residual plot: fitted values vs. residuals
34
34 Minitab: Stat >> Anova >> One-way
35
35 Minitab: Stat>> Anova>> Test for Equal variances
36
36 Diagnosis: Randomness/Independence The run order plot must show no “systematic” patterns to claim “randomness”. There are statistic tests to verify it scientifically. The ANOVA method is sensitive to the randomness assumption. That is, a little level of dependence between data points will change our conclusions a lot. Run order plot: order vs. residuals
37
37 Minitab: Stat >> Anova >> One-way
38
38 KRUSKAL - WALLIS TEST (Non - Parametric Alternative) H O : The probability distributions are identical for each level of the factor H I : Not all the distributions are the same
39
39 Brand A B C 32 32 28 30 32 21 30 26 15 29 26 15 26 22 14 23 20 14 20 19 14 19 16 11 18 14 9 12 14 8 BATTERY LIFETIME (hours) (each column rank ordered, for simplicity) Mean: 23.9 22.1 14.9 (here, irrelevant!!)
40
40 H O : no difference in distribution among the three brands with respect to battery lifetime H I : At least one of the 3 brands differs in distribution from the others with respect to lifetime
41
41 Brand A B C 32 (29) 32 (29) 28 (24) 30 (26.5) 32 (29) 21 (18) 30 (26.5) 26 (22) 15 (10.5) 29 (25) 26 (22) 15 (10.5) 26 (22) 22 (19) 14 (7) 23 (20) 20 (16.5) 14 (7) 20 (16.5) 19 (14.5) 14 (7) 19 (14.5) 16 (12) 11 (3) 18 (13) 14 (7) 9 (2) 12 (4) 14 (7) 8 (1) T 1 = 197 T 2 = 178 T 3 = 90 n 1 = 10 n 2 = 10 n 3 = 10 Ranks in ( )
42
42 TEST STATISTIC: H = 12 N (N + 1) (T j 2 /n j ) - 3 (N + 1) n j = # data values in column j N = n j K = # Columns (levels) T j = SUM OF RANKS OF DATA ON COL j When all DATA COMBINED (There is a slight adjustment in the formula as a function of the number of ties in rank.) K j = 1 K
43
43 H = [ 12 197 2 178 2 90 2 30 (31) 10 10 10 + + [ - 3 (31) = 8.41 (with adjustment for ties, we get 8.46)
44
44 We can show that, under H O, H is well approximated by a 2 distribution with df = K - 1. What do we do with H? Here, df = 2, and at =.05, the critical value = 5.99 2 df df F df, = 5.99 8.41 = H =.05 Reject H O ; conclude that mean lifetime NOT the same for all 3 BRANDS 8
45
45 Kruskal-Wallis Test: life versus brand Kruskal-Wallis Test on life brand N Median AveRank Z 1 3 1.800 4.5 -2.09 2 3 4.200 7.8 -1.22 3 3 4.600 11.8 -0.17 4 3 7.000 16.5 1.05 5 3 6.600 13.3 0.22 6 3 4.200 7.8 -1.22 7 3 7.800 20.0 1.96 8 3 7.400 18.2 1.48 Overall 24 12.5 H = 12.78 DF = 7 P = 0.078 H = 13.01 DF = 7 P = 0.072 (adjusted for ties) Minitab: Stat >> Nonparametrics >> Kruskal- Wallis
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.