Always be mindful of the kindness and not the faults of others.
One-way Anova: Inferences about More than Two Population Means What is Anova? One-Way Anova; F tests Pairwise comparisons: Bonferroni procedure
Analysis of Variance & One Factor Designs Y= DEPENDENT VARIABLE (“yield”) (“response variable”) (“quality indicator”) X = INDEPENDENT VARIABLE (A possibly influential FACTOR)
= Many other factors (possibly, some we’re unaware of) OBJECTIVE: To determine the impact of X on Y Mathematical Model: Y = f (x, ) , where = (impact of) all factors other than X Ex: Y = Forced expiratory volume in one second (liters) X = Medical center (John Hopkins, Rancho Los Amigos, St. Louis) = Many other factors (possibly, some we’re unaware of)
Statistical Model • Yij “LEVEL” OF Center Yij = + j + ij (Brand is, of course, represented as “categorical”) “LEVEL” OF Center 1 2 • • • • • • • • C 1 2 • n Y11 Y12 • • • • • • •Y1c Yij = + j + ij i = 1, . . . . . , nj j = 1, . . . . . , C Y21 • YnI • Yij Ync • • • • • • • •
Let mj = AVERAGE associated with jth level of X Where = OVERALL AVERAGE j = index for FACTOR (center) LEVEL i= index for “replication” j = Differential effect (response) associated with jth level of X and ij = “noise” or “error” associated with the (particular) (i,j)th data value. Let mj = AVERAGE associated with jth level of X tj = mj – m and m = AVERAGE of mj .
•••• Y1, Y2, etc., are Column Means 1 2 3 C Y11 Y12 • • • • • •Y1c Y21 YRI •••• YRc • • • • • • • • • Y 1 Y 2 • • • (Y j) • • Y c Y1, Y2, etc., are Column Means
Y • = Y j /C = “GRAND MEAN” (assuming same # data points in each column) (otherwise, Y • = mean of all the data) c j=1
These estimates are based on Gauss’ (1796) PRINCIPLE OF LEAST SQUARES MODEL: Yij = + j + ij Y• estimates Yj - Y • estimatesj (= mj – m) (for all j) These estimates are based on Gauss’ (1796) PRINCIPLE OF LEAST SQUARES and on COMMON SENSE
MODEL: Yij = + j + ij If you insert the estimates into the MODEL, (1) Yij = Y • + (Yj - Y • ) + ij. < it follows that our estimate of ij is (2) ij = Yij - Yj <
{ { { Then, Yij = Y• + (Yj - Y• ) + ( Yij - Yj) or, (Yij - Y• ) = (Yj - Y•) + (Yij - Yj ) { { { (3) TOTAL VARIABILITY in Y Variability in Y associated with X Variability in Y associated with all other factors + =
SUM OF SQUARES BETWEEN COLUMNS SUM OF SQUARES WITHIN COLUMNS If you square both sides of (3), and double sum both sides (over i and j), you get, [after some unpleasant algebra, but lots of terms which “cancel”] {{ C nj C C nj (Yij - Y• )2 = nj(Yj - Y•)2 + (Yij - Yj)2 { j=1 i=1 j=1 j=1 i=1 TSS TOTAL SUM OF SQUARES ( SSB SUM OF SQUARES BETWEEN COLUMNS = + SSW (SSE) SUM OF SQUARES WITHIN COLUMNS ( ( ( ( (
ANOVA TABLE SSB SSB C - 1 = MSB C - 1 SSW = MSW SSW N - C N-C SOURCE OF VARIABILITY SSQ DF Mean square (M.S.) Between Columns (due to center) SSB SSB C - 1 = MSB C - 1 Within Columns (due to other factors) SSW = MSW SSW N - C N-C TOTAL TSS N - 1
ANOVA TABLE Source of Variability df SSQ M.S. CENTER 1.583 2 0.791 = 3 - 1 0.791 ERROR 14.480 57 = 59 - 2 0.254 TOTAL 115.84 59 = 60 -1
> 1 , < 1 , We can show: E ( MSB ) = 2 + VCOL E ( MSW ) = 2 This suggests that There’s some evidence of non-zero VCOL, or “level of X affects Y” if MSB > 1 , MSW if MSB No evidence that VCOL > 0, or that “level of X affects Y” < 1 , MSW
With HO: Level of X has no impact on Y HI: Level of X does have impact on Y, We need MSB > > 1 MSW to reject HO.
More Formally, HO: 1 = 2 = • • • c = 0 HI: not all j = 0 OR (All column means are equal) HO: 1 = 2 = • • • • c HI: not all j are EQUAL
The distribution of MSB = “Fcalc” , is MSW The F - distribution with (dfB, dfw) degrees of freedom Assuming HO true. Ca = Table Value
In our problem: ANOVA TABLE Source of Variability df SSQ M.S. Fcalc CENTER 1.583 2 = 3 - 1 0.791 3.12=0.791/0.254 ERROR 14.480 57 = 59 - 2 0.254 TOTAL 115.84 59 = 60 -1
F table: Table A-5 = .05 C0.5 = 3.15 Fcal =3.12 (2, 57 DF)
Hence, at =. 05, Do Not Reject Ho , i. e Hence, at = .05, Do Not Reject Ho , i.e., Conclude that centers don’t differ significantly on FEV1 at 5% level. P-value is .052, so it is significant at 6% level
Multiple Comparison Procedures Once we reject H0: ==...c in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4 ’s different? Are 3 the same and one different? If so, which one? etc.
Overall Type I Error Rate We set up “” as the significance level for a hypothesis test. Suppose we test 3 independent hypotheses, each at = .05; each test has type I error (rej H0 when it’s true) of .05. However, P(at least one type I error in the 3 tests) = 1-P( accept all ) = 1 - (.95)3 .14 3, given true
Pairwise Comparisons Bonferroni Correction: Do a series of pairwise t-tests, each with specified value divided by # of comparisons. Pairwise Comparisons
MINITAB INPUT center fev1 1 3.23 1 3.47 1 1.86 1 2.47 . . 3 2.85 1 3.23 1 3.47 1 1.86 1 2.47 . . 3 2.85 3 2.43 3 3.20 3 3.53
ONE FACTOR ANOVA (MINITAB) MINITAB: STAT>>ANOVA>>ONE-WAY Click for comparisons
Minitab Outputs Fisher 98.3% Individual Confidence Intervals All Pairwise Comparisons among Levels of center Simultaneous confidence level = 95.58% center = 1 subtracted from: center Lower Center Upper ------+---------+---------+---------+--- 2 -0.0049 0.4063 0.8176 (-----------*----------) 3 -0.1215 0.2525 0.6266 (---------*----------) ------+---------+---------+---------+--- -0.35 0.00 0.35 0.70 center = 2 subtracted from: center Lower Center Upper ------+---------+---------+---------+--- 3 -0.5572 -0.1538 0.2496 (-----------*----------)