Analysis of Variance Analysis of Variance (AOV) was originally devised within the realm of agricultural statistics for testing the yields of various crops under different nutrient regimes. Typically, a field is divided into a regular array, in row and column format, of small plots of a fixed size. The yield y i, j within each plot is recorded. If the field is of irregular width, different crops can be grown in each row and we can regard the yields as replicated results for each crop in turn. If the field is rectangular, we can grow different crops in each row and supply different nutrients in each column and so study the interaction of two factors simultaneously. If the field is square, we can incorporate a third factor. By replicating the sampling over many fields, very sophisticated interactions can be studied. One - Way Classification Model:y i, j = + i + i, j, i,j -> N (0, 1) where = overall mean i = effect of the i th factor i, j = error term. Hypothesis:H 0 : 1 = 2 = … = m y 1, 3 y 1, 1 y 1, 2 y 2, 2 y 1, 4 y 2, 1 y 2, 3 y 3, 1 y 3, y 1, 5 y 3, 3
Totals Means Factor (1)y 1, 1 y 1, 2 y 1, 3 y 1, n1 T 1 = y 1, j y 1. = T 1 / n 1 (2)y 2, 1 y 2,, 2 y 2, 3 y 1, n2 T 2 = y 2, j y 2. = T 2 / n 2 (m)y m, 1 y m, 2 y m, 3 y m, nm T m = y m, j y m. = T m / n m Overall mean y = y i, j / n,where n = n i Decomposition of Sums of Squares: (y i, j - y ) 2 = n i (y i. - y ) 2 + (y i, j - y i. ) 2 Total Variation (Q) = Between Factors (Q 1 ) + Residual Variation (Q E ) Under H 0 : Q / (n-1) -> 2 n - 1, Q 1 / (m - 1) -> 2 m - 1, Q E / (n - m) -> 2 n - m Q 1 / ( m - 1 ) -> F m - 1, n - m Q E / ( n - m ) AOV Table: Variation D.F. Sums of Squares Mean Squares F Between m -1 Q 1 = n i (y i. - y ) 2 MS 1 = Q 1 /(m - 1) MS 1 / MS E Residual n - m Q E = (y i, j - y i. ) 2 MS E = Q E /(n - m) Total n -1 Q = (y i, j. - y ) 2 Q /( n - 1)
Two - Way Classification Factor I Means Factor IIy 1, 1 y 1, 2 y 1, 3 y 1, n y 1. y m, 1 y m, 2 y m, 3 y m, n y m. Meansy. 1 y. 2 y. 3 y. n y Decomposition of Sums of Squares: (y i, j - y ) 2 = n (y i. - y ) 2 + m (y. j - y ) 2 + (y i, j - y i. - y. j + y) 2 Total Between Between Residual Variation Rows Columns Variation Model: y i, j = + i + j + i, j, i, j -> n ( 0, 1) H 0 : All i are equal and all j are equal AOV Table: Variation D.F. Sums of Squares Mean Squares F Between m -1 Q 1 = n (y i. - y ) 2 MS 1 = Q 1 /(m - 1) MS 1 / MS E Rows Between n -1 Q 2 = m (y.j - y ) 2 MS 2 = Q 2 /(n - 1) MS 2 / MS E Columns Residual (m-1)(n-1) Q E = (y i, j - y i. - y. j + y) 2 MS E = Q E / (m-1)(n-1) Total mn -1 Q = (y i, j. - y ) 2 Q /( mn - 1)
Two - Way AOV [Example] Factor I Totals Means Variation d.f. S.S. F Factor II Rows ** Columns Residual Totals Total Means Note that many statistical packages, such as SPSS, are designed for analysing data that is recorded with variables values in columns and individual observations in the rows.Thus the AOV data above would be written as a set of columns or rows, based on the concepts shown: Variable Factor Factor Normal Regression Model ( p independent variables) - AOV Model:y = 0 + i x i +, -> n (0, ) Source d.f. S.S. M.S. F Regression p SSR MSR MSR/MSE SSR = ( y i - y ) 2 Error n-p-1 SSE MSE - SSE = ( y i - y i ) 2 SST = ( y j - y ) 2 Total n -1 SST - - Value of
Latin Squares We can incorporate a third source of variation in our A B C D models by the use of latin squares. A latin square is a B D A C design with exactly one instance of each “letter” in C A D B each row and column.D C B A Model: y i, j = + i + j + l + i, j, i, j -> n ( 0, 1) Latin Square Component Column Effects Row Effects Decomposition of Sums of Squares (and degrees of freedom) : (y i, j - y ) 2 = n (y i. - y ) 2 + n (y. j - y ) 2 + n (y. l - y ) 2 + (y i, j - y i. - y. j - y l + 2 y) 2 Total Between Between Latin Square Residual Variation Rows Columns Variation Variation (n 2 - 1) (n - 1) (n -1) (n - 1) (n - 1) (n - 2) H 0 : All i are equal, all i are equal and all i are equal. Experimental design is used heavily in management, educational and sociological applications. Its popularity is based on the fact that the underlying normality conditions are easy to justify, the concepts in the model are easy to understand and reliable software is available.