Review.

Review

Fitting Equations to Data

The Multiple Linear Regression Model
An important statistical model

In Multiple Linear Regression we assume the following model
Y = b0 + b1 X1 + b2 X bp Xp + e This model is called the Multiple Linear Regression Model. Again are unknown parameters of the model and where b0, b1, b2, ... , bp are unknown parameters and e is a random disturbance assumed to have a normal distribution with mean 0 and standard deviation s.

The importance of the Linear model
1. It is the simplest form of a model in which each independent variable has some effect on the .dependent variable Y. When fitting models to data one tries to find the simplest form of a model that still adequately describes the relationship between the dependent variable and the independent variables. The linear model is sometimes the first model to be fitted and only abandoned if it turns out to be inadequate.

In many instances a linear model is the most appropriate model to describe the dependence relationship between the dependent variable and the independent variables. This will be true if the dependent variable increases at a constant rate as any or the independent variables is increased while holding the other independent variables constant.

3. Many non-Linear models can be put into the form of a Linear model by appropriately transforming the dependent variables and/or any or all of the independent variables. This important fact ensures the wide utility of the Linear model. (i.e. the fact the many non-linear models are linearizable.)

Summary of the Statistics used in Multiple Regression

Note: = predicted value of yi The Least Squares Estimates:
- The values that minimize Note: = predicted value of yi

The Analysis of Variance Table Entries
a) Adjusted Total Sum of Squares (SSTotal) b) Residual Sum of Squares (SSError) c) Regression Sum of Squares (SSReg) Note: i.e. SSTotal = SSReg +SSError

The Analysis of Variance Table
Source Sum of Squares d.f. Mean Square F Regression SSReg p SSReg/p = MSReg MSReg/s2 Error SSError n-p-1 SSError/(n-p-1) =MSError = s2 Total SSTotal n-1

Testing for Hypotheses related to Multiple Regression.

When testing hypotheses there are two models of interest.
1. The Complete Model Y = b0 + b1X1 + b2X2 + b3X bpXp+ e 2. The Reduced Model The model implied by H0. You are interested in knowing whether the complete model can be simplified to the reduced model.

Some Comments The complete model contains more parameters and will always provide a better fit to the data than the reduced model. The Residual Sum of Squares for the complete model will always be smaller than the R.S.S. for the reduced model. If the reduction in the R.S,S. is small as we change from the reduced model to the complete model, the reduced model should be accepted as providing an adequate fit. If the reduction in the R.S,S. is large as we change from the reduced model to the complete model, the reduced model should be rejected as providing an adequate fit and the complete model should be kept. These principles form the basis for the following test.

Testing the General Linear Hypothesis
The F-test for H0 is performed by carrying out two runs of a multiple regression package.

Run 1: Fit the complete model.
Resulting in the following Anova Table: Source df Sum of Squares Regression p SSReg Residual (Error) n-p-1 SSError Total n-1 SSTotal

Run 2: Fit the reduced model (q parameters eliminated)
Resulting in the following Anova Table: Source df Sum of Squares Regression p-q SS1Reg Residual (Error) n-p+q-1 SS1Error Total n-1 SSTotal

The Test: The Test is carried out using the Test Statistic where SSH0 = SS1Error- SSError= SSReg- SS1Reg and s2 = SSError/(n-p-1). The test statistic, F, has an F-distribution with n1 = q d.f. in the numerator and n2 = n – p - 1 d.f. in the denominator if H0 is true.

The Anova Table for the Test:
Source df Sum of Squares Mean Square F Regression p-q SS1Reg [1/(p-q)]SS1Reg MS1Reg/s2 (for the reduced model) Departure q SSH0 (1/q)SSH0 MSH0/s2 from H0 Residual n-p-1 SSError s2 (Error) Total n-1 SSTotal

The Use of Dummy Variables

In the examples so far the independent variables are continuous numerical variables.
Suppose that some of the independent variables are categorical. Dummy variables are artificially defined variables designed to convert a model including categorical independent variables to the standard multiple regression model.

Example: Comparison of Slopes of k Regression Lines with Common Intercept

Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both Y (the response variable) and X (an independent variable) Y is assumed to be linearly related to X with the slope dependent on treatment (population), while the intercept is the same for each treatment

The Model:

This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments. Dummy variables are variables that are artificially defined

In this case we define a new variable for each category of the categorical variable.
That is we will define Xi for each category of treatments as follows:

Then the model can be written as follows:
The Complete Model: where

In this case Dependent Variable: Y Independent Variables: X1, X2, ... , Xk

In the above situation we would likely be interested in testing the equality of the slopes. Namely the Null Hypothesis (q = k – 1)

Independent Variable: X = X1+ X2+... + Xk
The Reduced Model: Dependent Variable: Y Independent Variable: X = X1+ X Xk

Example: Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance)

Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both Y (then response variable) and X (an independent variable) Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment. Y is called the response variable, while X is called the covariate.

The Model:

In this case we define a new variable for each category of the categorical variable.
That is we will define Xi for categories I i = 1, 2, …, (k – 1) of treatments as follows:

Then the model can be written as follows:
The Complete Model: where

In this case Dependent Variable: Y Independent Variables: X1, X2, ... , Xk-1, X

In the above situation we would likely be interested in testing the equality of the intercepts. Namely the Null Hypothesis (q = k – 1)

Independent Variable: X
The Reduced Model: Dependent Variable: Y Independent Variable: X

The F Test

The Analysis of Covariance
This analysis can also be performed by using a package that can perform Analysis of Covariance (ANACOVA) The package sets up the dummy variables automatically

Another application of the use of dummy variables
The dependent variable, Y, is linearly related to X, but the slope changes at one or several known values of X (nodes). Y X nodes

Y X x1 x2 xk b1 b2 bk The model or

Now define Etc.

Then the model can be written

Selecting the Best Equation
Multiple Regression Selecting the Best Equation

Techniques for Selecting the "Best" Regression Equation
The best Regression equation is not necessarily the equation that explains most of the variance in Y (the highest R2). This equation will be the one with all the variables included. The best equation should also be simple and interpretable. (i.e. contain a small no. of variables). Simple (interpretable) & Reliable - opposing criteria. The best equation is a compromise between these two.

We will discuss several strategies for selecting the best equation:
All Possible Regressions Uses R2, s2, Mallows Cp Cp = RSSp/s2complete - [n-2(p+1)] "Best Subset" Regression Uses R2,Ra2, Mallows Cp Backward Elimination Stepwise Regression

I All Possible Regressions
Suppose we have the p independent variables X1, X2, ..., Xp. Then there are 2p subsets of variables

Variables in Equation Model
no variables Y = b0 + e X1 Y = b0 + b1 X1+ e X2 Y = b0 + b2 X2+ e X3 Y = b0 + b3 X3+ e X1, X2 Y = b0 + b1 X1+ b2 X2+ e X1, X3 Y = b0 + b1 X1+ b3 X3+ e X2, X3 Y = b0 + b2 X2+ b3 X3+ e and X1, X2, X3 Y = b0 + b1 X1+ b2 X2+ b2 X3+ e

Use of R2 1. Assume we carry out 2p runs for each of the subsets. Divide the Runs into the following sets Set 0: No variables Set 1: One independent variable. ... Set p: p independent variables. 2. Order the runs in each set according to R2. 3. Examine the leaders in each run looking for consistent patterns - take into account correlation between independent variables.

Example (k=4) X1, X2, X3, X4 Variables in for leading runs 100 R2% Set 1: X % Set 2: X1, X % X1, X % Set 3: X1, X2, X % Set 4: X1, X2, X3, X % Examination of the correlation coefficients reveals a high correlation between X1, X3 (r13= ) and between X2, X4 (r24= ). Best Equation Y = b0 + b1 X1+ b4 X4+ e

Use of R2 Number of variables required, p, coincides with where R2 begins to level out

Use of the Residual Mean Square (RMS) (s2)
When all of the variables having a non-zero effect have been included in the mode then the residual mean square is an estimate of s2. If "significant" variables have been left out then RMS will be biased upward.

No. of Variables p RMS s2(p) Average s2(p) , 82.39, , 2 5.79*,122.71,7.48**, 3 5.35, 5.33, 5.65, *- run X1, X2 **- run X1, X s2- approximately 6.

Use of s2 Number of variables required, p, coincides with where s2 levels out

Use of Mallows Cp If the equation with p variables is adequate then both s2complete and RSSp/(n-p-1) will be estimating s2. If "significant" variables have been left out then RMS will be biased upward.

Then Thus if we plot, for each run, Cp vs p and look for Cp close to p + 1 then we will be able to identify models giving a reasonable fit.

Run Cp p + 1 no variables 1,2,3, , 142.5, 315.2, 12,13,14 2.7, 198.1, 5.5 3 23,24, , 138.2, 22.4 123,124,134, , 3.0, 3.5, 7.5 4

Use of Cp Cp p Number of variables required, p, coincides with where Cp becomes close to p + 1

II "Best Subset" Regression
Similar to all possible regressions. If p, the number of variables, is large then the number of runs , 2p, performed could be extremely large. In this algorithm the user supplies the value K and the algorithm identifies the best K subsets of X1, X2, ..., Xp for predicting Y.

III Backward Elimination
In this procedure the complete regression equation is determined containing all the variables - X1, X2, ..., Xp. Then variables are checked one at a time and the least significant is dropped from the model at each stage. The procedure is terminated when all of the variables remaining in the equation provide a significant contribution to the prediction of the dependent variable Y.

The precise algorithm proceeds as follows:
Fit a regression equation containing all variables in the equation.

2. A partial F-test is computed for each of the independent variables still in the equation.
The Partial F statistic: where RSS1 = the residual sum of squares with all variables that are presently in the equation, RSS2 = the residual sum of squares with one of the variables removed, and MSE1 = the Mean Square for Error with all variables that are presently in the equation.

3. The lowest partial F value is compared with Fa for some pre-specified a .
If FLowest  Fa then remove that variable and return to step 2. If FLowest > Fa then accept the equation as it stands.

IV Stepwise Regression
In this procedure the regression equation is determined containing no variables in the model. Variables are then checked one at a time using the partial correlation coefficient as a measure of importance in predicting the dependent variable Y. At each stage the variable with the highest significant partial correlation coefficient is added to the model. Once this has been done the partial F statistic is computed for all variables now in the model is computed to check if any of the variables previously added can now be deleted.

This procedure is continued until no further variables can be added or deleted from the model.
The partial correlation coefficient for a given variable is the correlation between the given variable and the response when the present independent variables in the equation are held fixed. It is also the correlation between the given variable and the residuals computed from fitting an equation with the present independent variables in the equation.

Transformations

Transformations to Linearity
Many non-linear curves can be put into a linear form by appropriate transformations of the either the dependent variable Y or some (or all) of the independent variables X1, X2, ... , Xp . This leads to the wide utility of the Linear model. We have seen that through the use of dummy variables, categorical independent variables can be incorporated into a Linear Model. We will now see that through the technique of variable transformation that many examples of non-linear behaviour can also be converted to linear behaviour.

Polynomial Models y = b0 + b1x + b2x2 + b3x3 Linear form Y = b0 + b1 X1 + b2 X2 + b3 X3 Variables Y = y, X1 = x , X2 = x2, X3 = x3

Exponential Models with a polynomial exponent
Linear form lny = b0 + b1 X1 + b2 X2 + b3 X3+ b4 X4 Y = lny, X1 = x , X2 = x2, X3 = x3, X4 = x4

Trigonometric Polynomial Models
y = b0 + g1cos(2pf1x) + d1sin(2pf1x) + … + gkcos(2pfkx) + dksin(2pfkx) Linear form Y = b0 + g1 C1 + d1 S1 + … + gk Ck + dk Sk Variables Y = y, C1 = cos(2pf1x) , S2 = sin(2pf1x) , … Ck = cos(2pfkx) , Sk = sin(2pfkx)

Response Surface models
Dependent variable Y and two independent variables x1 and x2. (These ideas are easily extended to more the two independent variables) The Model (A cubic response surface model) or Y = b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4 + b5 X5 + b6 X6 + b7 X7 + b8 X8 + b9 X9+ e where

The Box-Cox Family of Transformations

The Transformation Staircase

The Bulging Rule x up y up y down x down

Nonlinearizable models
Non-Linear Models Nonlinearizable models

Non-Linear Growth models
many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring e) “rate of increase in Y” =

The Logistic Growth Model
Equation: or (ignoring e) “rate of increase in Y” =

The Gompertz Growth Model:
Equation: or (ignoring e) “rate of increase in Y” =

Non-Linear Regression

Least Squares in the Nonlinear Case

Suppose that we have collected data on the Y,
(y1, y2, ...yn) corresponding to n sets of values of the independent variables X1, X2, ... and Xp (x11, x21, ..., xp1) , (x12, x22, ..., xp2), ... and (x12, x22, ..., xp2).

For a set of possible values q1, q2,
For a set of possible values q1, q2, ... , qq of the parameters, a measure of how well these values fit the model described in equation * above is the residual sum of squares function where is the predicted value of the response variable yi from the values of the p independent variables x1i, x2i, ..., xpi using the model in equation * and the values of the parameters q1, q2, ... , qq.

The Least squares estimates of q1, q2, ... , qq, are values
which minimize S(q1, q2, ... , qq). It can be shown that the error terms are independent normally distributed with mean 0 and common variance s2 than the least squares estimates are also the maximum likelihood estimate of q1, q2, ... , qq).

To find the least squares estimate we need to determine when all the derivatives S(q1, q2, ... , qq) with respect to each parameter q1, q2, ... and qq are equal to zero. This quite often leads to a set of equations in q1, q2, ... and qq that are difficult to solve even with one parameter and a comparatively simple nonlinear model. When more parameters are involved and the model is more complicated, the solution of the normal equations can be extremely difficult to obtain, and iterative methods must be employed.

Techniques for Estimating the Parameters of a Nonlinear System
In some nonlinear problems it is convenient to determine equations (the Normal Equations) for the least squares estimates , the values that minimize the sum of squares function, S(q1, q2, ... , qq). These equations are nonlinear and it is usually necessary to develop an iterative technique for solving them.

We shall mention three of these:
In addition to this approach there are several currently employed methods available for obtaining the parameter estimates by a routine computer calculation. We shall mention three of these: 1) Steepest descent, 2) Linearization, and 3) Marquardt's procedure.

In each case a iterative procedure is used to find the least squares estimators .
That is an initial estimates, ,for these values are determined. The procedure than finds successfully better estimates, that hopefully converge to the least squares estimates,

Steepest Descent Steepest descent path Initial guess

Linearization The linearization (or Taylor series) method uses the results of linear least squares in a succession of stages. Suppose the postulated model is of the form: Y = f(X1, X2, ..., Xp| q1, q2, ... , qq) + e Let be initial values for the parameters q1, q2, ... , qq. These initial values may be intelligent guesses or preliminary estimates based on whatever information are available.

These initial values will, hopefully, be improved upon in the successive iterations to be described below. The linearization method approximates f(X1, X2, ..., Xp| q1, q2, ... , qq) with a linear function of q1, q2, ... , qq using a Taylor series expansion of f(X1, X2, ..., Xp| q1, q2, ... , qq) about the point and curtailing the expansion at the first derivatives. The method then uses the results of linear least squares to find values, that provide the least squares fit of of this linear function to the data .

The procedure is then repeated again until the successive approximations converge to hopefully at the least squares estimates:

Linearization Contours of RSS for linear approximatin 2nd guess
Initial guess

3rd guess Contours of RSS for linear approximatin 2nd guess Initial guess

4th guess 3rd guess Contours of RSS for linear approximatin 2nd guess Initial guess

The Examination of Residuals

The residuals are defined as the n differences :
where is an observation and is the corresponding fitted value obtained by use of the fitted model.

Many of the statistical procedures used in linear and nonlinear regression analysis are based certain assumptions about the random departures from the proposed model. Namely; the random departures are assumed i) to have zero mean, ii) to have a constant variance, s2, iii) independent, and iv) follow a normal distribution.

Thus if the fitted model is correct,
the residuals should exhibit tendencies that tend to confirm the above assumptions, or at least, should not exhibit a denial of the assumptions.

The principal ways of plotting the residuals ei are:
1. Overall. 2. In time sequence, if the order is known. 3. Against the fitted values 4. Against the independent variables xij for each value of j In addition to these basic plots, the residuals should also be plotted 5. In any way that is sensible for the particular problem under consideration,

The residuals can be plotted in an overall plot in several ways.

1. The scatter plot. 2. The histogram. 3. The box-whisker plot.
4. The kernel density plot 5. a normal plot or a half normal plot on standard probability paper.

The standard statistical test for testing Normality are:
1. The Kolmogorov-Smirnov test. 2. The Chi-square goodness of fit test

Namely the random departures for observations that were taken at neighbouring points in time are autocorrelated. This autocorrelation can sometimes be seen in a time sequence plot. The following three graphs show a sequence of residuals that are respectively i) positively autocorrelated , ii) independent and iii) negatively autocorrelated.

i) Positively auto-correlated residuals

ii) Independent residuals

iii) Negatively auto-correlated residuals

There are several statistics and statistical tests that can also pick out autocorrelation amongst the residuals. The most common are: i) The Durbin Watson statistic ii) The autocorrelation function iii) The runs test

The Durbin Watson statistic :
The Durbin-Watson statistic which is used frequently to detect serial correlation is defined by the following formula: If the residuals are serially correlated the differences, ei - ei+1, will be stochastically small. Hence a small value of the Durbin-Watson statistic will indicate positive autocorrelation. Large values of the Durbin-Watson statistic on the other hand will indicate negative autocorrelation. Critical values for this statistic, can be found in many statistical textbooks.

The autocorrelation function:
The autocorrelation function at lag k is defined by : This statistic measures the correlation between residuals the occur a distance k apart in time. One would expect that residuals that are close in time are more correlated than residuals that are separated by a greater distance in time. If the residuals are independent than rk should be close to zero for all values of k A plot of rk versus k can be very revealing with respect to the independence of the residuals. Some typical patterns of the autocorrelation function are given below:

This statistic measures the correlation between residuals the occur a distance k apart in time.
One would expect that residuals that are close in time are more correlated than residuals that are separated by a greater distance in time. If the residuals are independent than rk should be close to zero for all values of k A plot of rk versus k can be very revealing with respect to the independence of the residuals.

Some typical patterns of the autocorrelation function are given below:
Auto correlation pattern for independent residuals

Various Autocorrelation patterns for serially correlated residuals

Plot Against fitted values and the Predictor Variables Xij
If we "step back" from this diagram and the residuals behave in a manner consistent with the assumptions of the model we obtain the impression of a horizontal "band " of residuals which can be represented by the diagram below.

Individual observations lying considerably outside of this band indicate that the observation may be and outlier. An outlier is an observation that is not following the normal pattern of the other observations. Such an observation can have a considerable effect on the estimation of the parameters of a model. Sometimes the outlier has occurred because of a typographical error. If this is the case and it is detected than a correction can be made. If the outlier occurs for other (and more natural) reasons it may be appropriate to construct a model that incorporates the occurrence of outliers.

If our "step back" view of the residuals resembled any of those shown below we should conclude that assumptions about the model are incorrect. Each pattern may indicate that a different assumption may have to be made to explain the “abnormal” residual pattern. b) a)

Pattern a) indicates that the variance the random departures is not constant (homogeneous) but increases as the value along the horizontal axis increases (time, or one of the independent variables). This indicates that a weighted least squares analysis should be used. The second pattern, b) indicates that the mean value of the residuals is not zero. This is usually because the model (linear or non linear) has not been correctly specified. Linear and quadratic terms have been omitted that should have been included in the model.

Example – Analysis of Residuals
Motor Vehicle Data Dependent = mpg Independent = Engine size, horsepower and weight

When a linear model was fit and residuals examined graphically the following plot resulted:

The pattern that we are looking for is:

The pattern that was found is:
This indicates a nonlinear relationship: This can be handle by adding polynomial terms (quadratic, cubic, quartic etc.) of the independent variables or transforming the dependent variable

Performing the log transformation on the dependent variable (mpg) results in the following residual plot There still remains some non linearity

The log transformation

The Box-Cox transformations
l = 2 l = 1 l = 0 l = -1 l = -1

The log (l = 0) transformation was not totally successful - try moving further down the staircase of the family of transformations (l = -0.5)

try moving a bit further down the staircase of the family of transformations (l = -1.0)

The results after deleting the outlier are given below:

This corresponds to the model
and

Checking normality with a P-P plot

Factorial Experiments
Analysis of Variance Experimental Design

k Categorical independent variables A, B, C, … (the Factors) Let
Dependent variable Y k Categorical independent variables A, B, C, … (the Factors) Let a = the number of categories of A b = the number of categories of B c = the number of categories of C etc.

The Completely Randomized Design
We form the set of all treatment combinations – the set of all combinations of the k factors Total number of treatment combinations t = abc…. In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. Total number of experimental units N = nt=nabc..

The treatment combinations can thought to be arranged in a k-dimensional rectangular block
1 2 b 1 2 A a

Another way of representing the treatment combinations in a factorial experiment
... A ... D

Profile of a Factor Plot of observations means vs. levels of the factor. The levels of the other factors may be held constant or we may average over the other levels

Definition: A factor is said to not affect the response if the profile of the factor is horizontal for all combinations of levels of the other factors: No change in the response when you change the levels of the factor (true for all combinations of levels of the other factors) Otherwise the factor is said to affect the response:

Definition: Two (or more) factors are said to interact if changes in the response when you change the level of one factor depend on the level(s) of the other factor(s). Profiles of the factor for different levels of the other factor(s) are not parallel Otherwise the factors are said to be additive . Profiles of the factor for different levels of the other factor(s) are parallel.

If two (or more) factors interact each factor effects the response.
If two (or more) factors are additive it still remains to be determined if the factors affect the response In factorial experiments we are interested in determining which factors effect the response and which groups of factors interact .

Factor A has no effect B A

Additive Factors B A

Interacting Factors B A

The testing in factorial experiments
Test first the higher order interactions. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response.

Anova Table entries Sum of squares interaction (or main) effects being tested = (product of sample size and levels of factors not included in the interaction) × (Sum of squares of effects being tested) Degrees of freedom = df = product of (number of levels - 1) of factors included in the interaction.

Analysis of Variance (ANOVA) Table Entries (Two factors – A and B)

The ANOVA Table 2 Factor Experiment

Analysis of Variance (ANOVA) Table Entries (Three factors – A, B and C)

The ANOVA Table

The Completely Randomized Design is called balanced
If the number of observations per treatment combination is unequal the design is called unbalanced. (resulting mathematically more complex analysis and computations) If for some of the treatment combinations there are no observations the design is called incomplete. (some of the parameters - main effects and interactions - cannot be estimated.)

Analysis of Variance Experimental Design

Objectives Determine which factors have some effect on the response Which groups of factors interact

Factor A has no effect B A

Additive Factors B A

Interacting Factors B A

Test first the higher order interactions. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response.

The statistical model for the 3 factor Experiment

Anova table for the 3 factor Experiment
Source SS df MS F p -value A SSA a - 1 MSA MSA/MSError B SSB b - 1 MSB MSB/MSError C SSC c - 1 MSC MSC/MSError AB SSAB (a - 1)(b - 1) MSAB MSAB/MSError AC SSAC (a - 1)(c - 1) MSAC MSAC/MSError BC SSBC (b - 1)(c - 1) MSBC MSBC/MSError ABC SSABC (a - 1)(b - 1)(c - 1) MSABC MSABC/MSError Error SSError abc(n - 1) MSError

Test first the higher order interactions. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact The testing continues with lower order interactions and main effects for factors which have not yet been determined to affect the response.

Random Effects and Fixed Effects Factors

So far the factors that we have considered are fixed effects factors
This is the case if the levels of the factor are a fixed set of levels and the conclusions of any analysis is in relationship to these levels. If the levels have been selected at random from a population of levels the factor is called a random effects factor The conclusions of the analysis will be directed at the population of levels and not only the levels selected for the experiment

The Anova table for the two factor model (A, B – fixed)
Source SS df MS EMS F A SSA a -1 MSA MSA/MSError B b - 1 MSB MSB/MSError AB SSAB (a -1)(b -1) MSAB MSAB/MSError Error SSError ab(n – 1) MSError EMS = Expected Mean Square

The Anova table for the two factor model (A – fixed, B - random)
Source SS df MS EMS F A SSA a -1 MSA MSA/MSAB B b - 1 MSB MSB/MSError AB SSAB (a -1)(b -1) MSAB MSAB/MSError Error SSError ab(n – 1) MSError Note: The divisor for testing the main effects of A is no longer MSError but MSAB.

Example: 3 factors A fixed, B, C random
Source EMS F A B C AB AC BC ABC Error

Example: 3 factors A , B fixed, C random

Example: 3 factors A , B and C fixed

Crossed and Nested Factors

The factors A, B are called crossed if every level of A appears with every level of B in the treatment combinations. Levels of B Levels of A

Factor B is said to be nested within factor A if the levels of B differ for each level of A.
Levels of A Levels of B

Example: A company has a = 4 plants for producing paper
Example: A company has a = 4 plants for producing paper. Each plant has 6 machines for producing the paper. The company is interested in how paper strength (Y) differs from plant to plant and from machine to machine within plant Plants Machines

Machines (B) are nested within plants (A)
The model for a two factor experiment with B nested within A.

The ANOVA table Source SS df MS F p - value A SSA a - 1 MSA
MSA/MSError B(A) SSB(A) a(b – 1) MSB(A) MSB(A) /MSError Error SSError ab(n – 1) MSError Note: SSB(A ) = SSB + SSAB and a(b – 1) = (b – 1) + (a - 1)(b – 1)

Analysis of Variance Factorial Experiments

Example: 3 factors A, B, C – all are random effects

Example: 3 factors A fixed, B, C random

Example: 3 factors A , B fixed, C random

Example: 3 factors A , B and C fixed

The Analysis of Covariance
ANACOVA

Multiple Regression Dependent variable Y (continuous)
Continuous independent variables X1, X2, …, Xp The continuous independent variables X1, X2, …, Xp are quite often measured and observed (not set at specific values or levels)

Analysis of Variance Dependent variable Y (continuous)
Categorical independent variables (Factors) A, B, C,… The categorical independent variables A, B, C,… are set at specific values or levels.

Analysis of Covariance
Dependent variable Y (continuous) Categorical independent variables (Factors) A, B, C,… Continuous independent variables (covariates) X1, X2, …, Xp

The Multiple Regression Model

The ANOVA Model

The ANACOVA Model

ANOVA Tables

The Multiple Regression Model
Source S.S. d.f. Regression SSReg p Error SSError n – p - 1 Total SSTotal n - 1

The ANOVA Model Source S.S. d.f. A SSA a - 1 B SSB b - 1 AB SSAB
Main Effects A SSA a - 1 B SSB b - 1 Interactions AB SSAB (a – 1)(b – 1) ⁞ Error SSError n – p - 1 Total SSTotal n - 1

The ANACOVA Model Source S.S. d.f. Covariates SSCovaraites p A SSA
Main Effects A SSA a - 1 B SSB b - 1 Interactions AB SSAB (a – 1)(b – 1) ⁞ Error SSError n – p - 1 Total SSTotal n - 1

Analysis of unbalanced Factorial Designs
Type I, Type II, Type III Sum of Squares

Sum of squares for testing an effect
modelComplete ≡ model with the effect in. modelReduced ≡ model with the effect out.

Type I SS Type I estimates of the sum of squares associated with an effect in a model are calculated when sums of squares for a model are calculated sequentially Example Consider the three factor factorial experiment with factors A, B and C. The Complete model Y = m + A + B + C + AB + AC + BC + ABC

A sequence of increasingly simpler models
Y = m + A + B + C + AB + AC + BC + ABC Y = m + A+ B + C + AB + AC + BC Y = m + A + B+ C + AB + AC Y = m + A + B + C+ AB Y = m + A + B + C Y = m + A + B Y = m + A Y = m

Type I S.S.

Type II SS Type two sum of squares are calculated for an effect assuming that the Complete model contains every effect of equal or lesser order. The reduced model has the effect removed ,

The Complete models Y = m + A + B + C + AB + AC + BC + ABC (the three factor model) Y = m + A+ B + C + AB + AC + BC (the all two factor model) Y = m + A + B + C (the all main effects model) The Reduced models For a k-factor effect the reduced model is the all k-factor model with the effect removed

Type III SS The type III sum of squares is calculated by comparing the full model, to the full model without the effect.

Comments When using The type I sum of squares the effects are tested in a specified sequence resulting in a increasingly simpler model. The test is valid only the null Hypothesis (H0) has been accepted in the previous tests. When using The type II sum of squares the test for a k-factor effect is valid only the all k-factor model can be assumed. When using The type III sum of squares the tests require neither of these assumptions.

An additional Comment When the completely randomized design is balanced (equal number of observations per treatment combination) then type I sum of squares, type II sum of squares and type III sum of squares are equal.

Experimental Designs The objective of Experimental design is to reduce the magnitude of random error resulting in more powerful tests to detect experimental effects

Other experimental designs
Randomized Block design Repeated Measures designs

The Randomized Block Design

Suppose a researcher is interested in how several treatments affect a continuous response variable (Y). The treatments may be the levels of a single factor or they may be the combinations of levels of several factors. Suppose we have available to us a total of N = nt experimental units to which we are going to apply the different treatments.

The Completely Randomized (CR) design randomly divides the experimental units into t groups of size n and randomly assigns a treatment to each group.

The Randomized Block Design
divides the group of experimental units into n homogeneous groups of size t. These homogeneous groups are called blocks. The treatments are then randomly assigned to the experimental units in each block - one treatment to a unit in each block.

The Completely Randomizes Design
Treats 1 2 3 … t Experimental units randomly assigned to treatments

Randomized Block Design
Blocks All treats appear once in each block

The Model for a randomized Block Experiment
i = 1,2,…, t j = 1,2,…, b yij = the observation in the jth block receiving the ith treatment m = overall mean ti = the effect of the ith treatment bj = the effect of the jth Block eij = random error

The Anova Table for a randomized Block Experiment
Source S.S. d.f. M.S. F p-value Treat SST t-1 MST MST /MSE Block SSB n-1 MSB MSB /MSE Error SSE (t-1)(b-1) MSE

A randomized block experiment is assumed to be a two-factor experiment.
The factors are blocks and treatments. The is one observation per cell. It is assumed that there is no interaction between blocks and treatments. The degrees of freedom for the interaction is used to estimate error.

The ANOVA table for the Completely Randomized Design
Source df Sum of Squares Treatments t - 1 SSTr Error t(n – 1) SSError Total tn - 1 SSTotal The ANOVA table for the Randomized Block Design Source df Sum of Squares Blocks n - 1 SSBlocks Treatments t - 1 SSTr Error (t – 1) (n – 1) SSError Total tn - 1 SSTotal

Comments The error term, , for the Completely Randomized Design models variability in the reponse, y, between experimental units The error term, , for the Completely Block Design models variability in the reponse, y, between experimental units in the same block (hopefully the is considerably smaller than The ability to detect treatment differences depends on the magnitude of the random error term

Repeated Measures Designs

In a Repeated Measures Design
We have experimental units that may be grouped according to one or several factors (the grouping factors) Then on each experimental unit we have not a single measurement but a group of measurements (the repeated measures) The repeated measures may be taken at combinations of levels of one or several factors (The repeated measures factors)

Anova Table for a Repeated Measures Design

Latin Square Designs

Latin Square Designs Selected Latin Squares 3 x 3 4 x 4 A B C A B C D A B C D A B C D A B C D B C A B A D C B C D A B D A C B A D C C A B C D B A C D A B C A D B C D A B D C A B D A B C D C B A D C B A 5 x 5 6 x 6 A B C D E A B C D E F B A E C D B F D C A E C D A E B C D E F B A D E B A C D A F E C B E C D B A E C A B F D F E B A D C

Definition A Latin square is a square array of objects (letters A, B, C, …) such that each object appears once and only once in each row and each column. Example - 4 x 4 Latin Square. A B C D B C D A C D A B D A B C

In a Latin square You have three factors:
Treatments (t) (letters A, B, C, …) Rows (t) Columns (t) The number of treatments = the number of rows = the number of colums = t. The row-column treatments are represented by cells in a t x t array. The treatments are assigned to row-column combinations using a Latin-square arrangement

The Model for a Latin Experiment
i = 1,2,…, t j = 1,2,…, t k = 1,2,…, t yij(k) = the observation in ith row and the jth column receiving the kth treatment m = overall mean tk = the effect of the ith treatment No interaction between rows, columns and treatments ri = the effect of the ith row gj = the effect of the jth column eij(k) = random error

A Latin Square experiment is assumed to be a three-factor experiment.
The factors are rows, columns and treatments. It is assumed that there is no interaction between rows, columns and treatments. The degrees of freedom for the interactions is used to estimate error.

The Anova Table for a Latin Square Experiment
Source S.S. d.f. M.S. F p-value Treat SSTr t-1 MSTr MSTr /MSE Rows SSRow MSRow MSRow /MSE Cols SSCol MSCol MSCol /MSE Error SSE (t-1)(t-2) MSE Total SST t2 - 1

Experimental Design Of interest: to compare t treatments (the treatment combinations of one or several factors)

Treats 1 2 3 … t Experimental units randomly assigned to treatments

The Model for a CR Experiment
i = 1,2,…, t j = 1,2,…, n yij = the observation in jth observation receiving the ith treatment m = overall mean ti = the effect of the ith treatment eij = random error

The Anova Table for a CR Experiment
Source S.S. d.f. M.S. F p-value Treat SSTr t-1 MST MST /MSE Error SSE t(n-1) MSE

Blocks 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ t t t t t t t t t All treats appear once in each block

The Model for a RB Experiment
i = 1,2,…, t j = 1,2,…, b yij = the observation in jth block receiving the ith treatment m = overall mean ti = the effect of the ith treatment No interaction between blocks and treatments bj = the effect of the jth block eij = random error

A Randomized Block experiment is assumed to be a two-factor experiment.
The factors are blocks and treatments. It is assumed that there is no interaction between blocks and treatments. The degrees of freedom for the interaction is used to estimate error.

The Anova Table for a randomized Block Experiment
Source S.S. d.f. M.S. F p-value Treat SST t-1 MST MST /MSE Block SSB n-1 MSB MSB /MSE Error SSE (t-1)(b-1) MSE

The Latin square Design
Columns Rows t 1 2 3 2 3 1 3 1 2 ⁞ t All treats appear once in each row and each column

The Model for a Latin Experiment
i = 1,2,…, t j = 1,2,…, t k = 1,2,…, t yij(k) = the observation in ith row and the jth column receiving the kth treatment m = overall mean tk = the effect of the ith treatment No interaction between rows, columns and treatments ri = the effect of the ith row gj = the effect of the jth column eij(k) = random error

A Latin Square experiment is assumed to be a three-factor experiment.
The factors are rows, columns and treatments. It is assumed that there is no interaction between rows, columns and treatments. The degrees of freedom for the interactions is used to estimate error.

The Anova Table for a Latin Square Experiment
Source S.S. d.f. M.S. F p-value Treat SSTr t-1 MSTr MSTr /MSE Rows SSRow MSRow MSRow /MSE Cols SSCol MSCol MSCol /MSE Error SSE (t-1)(t-2) MSE Total SST t2 - 1

Graeco-Latin Square Designs
Mutually orthogonal Squares

Definition A Greaco-Latin square consists of two latin squares (one using the letters A, B, C, … the other using greek letters a, b, c, …) such that when the two latin square are supper imposed on each other the letters of one square appear once and only once with the letters of the other square. The two Latin squares are called mutually orthogonal. Example: a 7 x 7 Greaco-Latin Square Aa Be Cb Df Ec Fg Gd Bb Cf Dc Eg Fd Ga Ae Cc Dg Ed Fa Ge Ab Bf Dd Ea Fe Gb Af Bc Cg Ee Fb Gf Ac Bg Cd Da Ff Gc Ag Bd Ca De Eb Gg Ad Ba Ce Db Ef Fc

Note: There exists at most (t –1) t x t Latin squares L1, L2, …, Lt-1 such that any pair are mutually orthogonal. e.g. It is possible that there exists a set of six 7 x 7 mutually orthogonal Latin squares L1, L2, L3, L4, L5, L6 .

The Model for a Greaco-Latin Experiment
j = 1,2,…, t i = 1,2,…, t k = 1,2,…, t l = 1,2,…, t yij(kl) = the observation in ith row and the jth column receiving the kth Latin treatment and the lth Greek treatment

tk = the effect of the kth Latin treatment
m = overall mean tk = the effect of the kth Latin treatment ll = the effect of the lth Greek treatment ri = the effect of the ith row gj = the effect of the jth column eij(k) = random error No interaction between rows, columns, Latin treatments and Greek treatments

A Greaco-Latin Square experiment is assumed to be a four-factor experiment.
The factors are rows, columns, Latin treatments and Greek treatments. It is assumed that there is no interaction between rows, columns, Latin treatments and Greek treatments. The degrees of freedom for the interactions is used to estimate error.

Greaco-Latin Square Experiment
The Anova Table for a Greaco-Latin Square Experiment Source S.S. d.f. M.S. F p-value Latin SSLa t-1 MSLa MSLa /MSE Greek SSGr MSGr MSGr /MSE Rows SSRow MSRow MSRow /MSE Cols SSCol MSCol MSCol /MSE Error SSE (t-1)(t-3) MSE Total SST t2 - 1

Incomplete Block Designs

We want to compare t treatments Group the N = bt experimental units into b homogeneous blocks of size t. In each block we randomly assign the t treatments to the t experimental units in each block. The ability to detect treatment to treatment differences is dependent on the within block variability.

Comments The within block variability generally increases with block size. The larger the block size the larger the within block variability. For a larger number of treatments, t, it may not be appropriate or feasible to require the block size, k, to be equal to the number of treatments. If the block size, k, is less than the number of treatments (k < t)then all treatments can not appear in each block. The design is called an Incomplete Block Design.

Comments regarding Incomplete block designs
When two treatments appear together in the same block it is possible to estimate the difference in treatments effects. The treatment difference is estimable. If two treatments do not appear together in the same block it not be possible to estimate the difference in treatments effects. The treatment difference may not be estimable.

Example Consider the block design with 6 treatments and 6 blocks of size two. 1 2 2 3 1 3 4 5 5 6 4 6 The treatments differences (1 vs 2, 1 vs 3, 2 vs 3, 4 vs 5, 4 vs 6, 5 vs 6) are estimable. If one of the treatments is in the group {1,2,3} and the other treatment is in the group {4,5,6}, the treatment difference is not estimable.

Definitions Two treatments i and i* are said to be connected if there is a sequence of treatments i0 = i, i1, i2, … iM = i* such that each successive pair of treatments (ij and ij+1) appear in the same block In this case the treatment difference is estimable. An incomplete design is said to be connected if all treatment pairs i and i* are connected. In this case all treatment differences are estimable.

Example Consider the block design with 5 treatments and 5 blocks of size two. 1 2 2 3 1 3 4 5 1 4 This incomplete block design is connected. All treatment differences are estimable. Some treatment differences are estimated with a higher precision than others.

Incomplete Block Designs
Balanced incomplete block designs Partially balanced incomplete block designs

Definition An incomplete design is said to be a Balanced Incomplete Block Design. if all treatments appear in exactly r blocks. This ensures that each treatment is estimated with the same precision The value of l is the same for each treatment pair. if all treatment pairs i and i* appear together in exactly l blocks. This ensures that each treatment difference is estimated with the same precision.

Some Identities bk = rt r(k-1) = l (t – 1)
Let b = the number of blocks. t = the number of treatments k = the block size r = the number of times a treatment appears in the experiment. l = the number of times a pair of treatment appears together in the same block bk = rt Both sides of this equation are found by counting the total number of experimental units in the experiment. r(k-1) = l (t – 1) Both sides of this equation are found by counting the total number of experimental units that appear with a specific treatment in the experiment.

BIB Design A Balanced Incomplete Block Design
(b = 15, k = 4, t = 6, r = 10, l = 6)

Anova Table for Incomplete Block Designs

Designs for Estimating
Carry-over (or Residual) Effects of Treatments Ref: “Design and Analysis of Experiments” Roger G. Petersen, Dekker 1985

The Cross-over or Simple Reversal Design
An Example A clinical psychologist wanted to test two drugs, A and B, which are intended to increase reaction time to a certain stimulus. He has decided to use n = 8 subjects selected at random and randomly divided into two groups of four. The first group will receive drug A first then B, while the second group will receive drug B first then A.

To conduct the trial he administered a drug to the individual, waited 15 minutes for absorption, applied the stimulus and then measured reaction time. The data and the design is tabulated below:

The Switch-back or Double Reversal Design
An Example A following study was interested in the effect of concentrate type on the daily production of fat-corrected milk (FCM) . Two concentrates were used: A - high fat; and B - low fat. Five test animals were then selected for each of the two sequence groups ( A-B-A and B-A-B) in a switch-back design.

The data and the design is tabulated below:
One animal in the first group developed mastitis and was removed from the study.

The Incomplete Block Switch-back Design
An Example An insurance company was interested in buying a quantity of word processing machines for use by secretaries in the stenographic pool. The selection was narrowed down to three models (A, B, and C). A study was to be carried out , where the time to process a test document would be determined for a group of secretaries on each of the word processing models. For various reasons the company decided to use an incomplete block switch back design using n = 6 secretaries from the secretarial pool.

BIB incomplete block design with t = 3 treatments – A, B and block size k = 2. A B A C B C

Designs for Estimating
Carry-over (or Residual) Effects of Treatments Ref: “Design and Analysis of Experiments” Roger G. Petersen, Dekker 1985

The Latin Square Change-Over (or Round Robin) Design
Selected Latin Squares Change-Over Designs (Balanced for Residual Effects) Period = Rows Columns = Subjects

Four Treatments

An Example An experimental psychologist wanted to determine the effect of three new drugs (A, B and C) on the time for laboratory rats to work their way through a maze. A sample of n= 12 test animals were used in the experiment. It was decided to use a Latin square Change-Over experimental design.

Orthogonal Linear Contrasts
This is a technique for partitioning ANOVA sum of squares into individual degrees of freedom

Definition Let x1, x2, ... , xp denote p numerical quantities computed from the data. These could be statistics or the raw observations. A linear combination of x1, x2, ... , xp is defined to be a quantity ,L ,computed in the following manner: L = c1x1+ c2x cpxp where the coefficients c1, c2, ... , cp are predetermined numerical values:

Definition Let m1, m2, ... , mp denote p means and c1, c2, ... , cp denote p coefficients such that: c1+ c cp = 0, Then the linear combination L = c1m1+ c2m cpmp is called a linear contrast of the p means m1, m2, ... , mp .

Examples 1. A linear combination A linear contrast 2. A linear contrast L = m1 - 4 m2+ 6m3 - 4 m4 + m5 = (1) m1+ (-4) m2+ (6) m3 + (-4) m4 + (1) m5

Definition Let A = a1m1+ a2m2+ ... + apmp and
B = b1m1+ b2m bpmp be two linear contrasts of the p means m1, m2, ... , mp. Then A and B are called Orthogonal Linear Contrasts if in addition to: a1+ a ap = 0 and b1+ b bp = 0, it is also true that: a1b1+ a2b apbp = 0.

Example Let Note:

Definition B= b1m1+ b2m2+ ... + bpmp , ..., and
Let A = a1m1+ a2m apmp, B= b1m1+ b2m bpmp , ..., and L= l1m1+ l2m lpmp be a set linear contrasts of the p means m1, m2, ... , mp. Then the set is called a set of Mutually Orthogonal Linear Contrasts if each linear contrast in the set is orthogonal to any other linear contrast.

Theorem: The maximum number of linear contrasts in a set of Mutually Orthogonal Linear Contrasts of the quantities m1, m2, ... , mp is p - 1. p - 1 is called the degrees of freedom (d.f.) for comparing quantities m1, m2, ... , mp .

Comments Linear contrasts are making comparisons amongst the p values m1, m2, ... , mp Orthogonal Linear Contrasts are making independent comparisons amongst the p values m1, m2, ... , mp . The number of independent comparisons amongst the p values m1, m2, ... , mp is p – 1.

Definition Let denote a linear contrast of the p means
where each mean, , is calculated from n observations.

Then the Sum of Squares for testing
the Linear Contrast L, i.e. H0: L = 0 against HA: L ≠ 0 is defined to be:

the degrees of freedom (df) for testing the Linear Contrast L, is defined to be
the F-ratio for testing the Linear Contrast L, is defined to be:

To test if a set of mutually orthogonal linear contrasts are zero:
i.e. H0: L1 = 0, L2 = 0, ... , Lk= 0 then the Sum of Squares is: the degrees of freedom (df) is the F-ratio is:

Theorem: Let L1, L2, ... , Lp-1 denote p-1 mutually orthogonal Linear contrasts for comparing the p means . Then the Sum of Squares for comparing the p means based on p – 1 degrees of freedom , SSBetween, satisfies:

Comment Defining a set of Orthogonal Linear Contrasts for comparing the p means allows the researcher to "break apart" the Sum of Squares for comparing the p means, SSBetween, and make individual tests of each the Linear Contrast.

Techniques for constructing orthogonal linear contrasts

Comparing first k – 1 with kth
Consider the p values – y1, y2, y3, ... , yp L1 = 1st vs 2nd = y1 - y2 L2 = 1st , 2nd vs 3rd = ½ (y1 + y2) – y3 L3 = 1st , 2nd , 3rd vs 4th = 1/3 (y1 + y2 + y3) – y4 etc

Helmert contrasts Contrast coefficients L1 -1 1 L2 2 L3 3 L4 4
L2 2 L3 3 L4 4 Contrast explanation L1 2nd versus 1st L2 3rd versus 1st and 2nd L3 4th versus 1st, 2nd and 3rd L4 5th versus 1st, 2nd, 3rd and 4th

Comparing between Groups then within groups
Consider the p = 10 values – y1, y2, y3, ... , y10 Suppose these 10 values are grouped Group 1 y1, y2, y3 Group 2 y4, y5, y6 , y7 Group 3 y8, y9, y10 Comparison of Groups (2 d.f.) L1 = Group 1 vs Group 2 = 1/3(y1 + y2 + y3) - 1/4(y4 + y5 + y6 + y7) L2 = Group 1, Group 2 vs Group 2 = 1/7(y1 + y2 + y3 + y4 + y5 + y6 + y7) - 1/3( y8 + y9 + y10)

Comparison of within Groups Within Group 1 (2 d. f
Comparison of within Groups Within Group 1 (2 d.f.) L3 = 1 vs 2 = y1 - y2 L4 = 1,2 vs 3= 1/2(y1 + y2) - y3 Within Group 2 (3 d.f.) L5 = 4 vs 5 = y4 – y5 L6 = 4, 5 vs 6= 1/2(y4 + y5) – y6 L7 = 4, 5, 6 vs 7= 1/3(y4 + y5 + y6) –y7 Within Group 3 (2 d.f.) L8 = 8 vs 9 = y8 – y9 L9 = 8, 9 vs 10= 1/2(y8 + y9) –y10

Comparisons when Grouping is done on two different ways
Consider the p = ab values – y11, y12, y13, ... , y1b , y21, y22, y23, ... , y2b , ... , ya1, ya2, ya3, ... , yab Column Groups 1 2 3 ... b Row Groups y11 y12 y13 y1b y21 y22 y23 y2b y31 y32 y33 y3b ⁞ a ya1 ya2 ya3 yab

Comparison of Row Groups (a - 1 d.f.) R1 , R2 , R3 , ... , Ra -1
Comparison of Column Groups (b - 1 d.f.) C1 , C2 , C3 , ... , Cb -1 Interaction contrasts (a - 1) (b - 1) d.f. (RC)11 = R1 × C1 , (RC)12 = R1 × C2 , ... , (RC)a - 1,b - 1 = Ra - 1 × Cb – 1 Comment: The coefficients of (RC)ij = Ri × Cj are found by multiplying the coefficients of Ri with the coefficients of Cj.

Orthogonal Linear Contrasts
≠ Polynomial Regression

Let m1, m2, ... , mp denote p means and consider the first differences
Dmi = mi - mi-1 if m1 = m2 = ... = mp then Dmi = mi - mi-1 = 0 If the points (1, m1), (2, m2) … (p, mp) lie on a straight line with non-zero slope then Dmi = mi - mi-1 ≠ 0 but equal.

Consider the 2nd differences
D2mi = (mi - mi-1)-(mi -1 - mi-2) = mi - 2mi-1 + mi-2 If the points (1, m1), (2, m2) … (p, mp) lie on a straight line then D2mi = mi - 2mi-1 + mi-2 = 0 If the points (1, m1), (2, m2) … (p, mp) lie on a quadratic curve then D2mi = mi - 2mi-1 + mi-2 ≠ 0 but equal.

Consider the 3rd differences
D3mi = mi - 3mi-1 + 3mi-2 - mi-3 If the points (1, m1), (2, m2) … (p, mp) lie on a quadratic curve then D3mi = mi - 3mi-1 + 3mi-2 - mi-3 = 0 If the points (1, m1), (2, m2) … (p, mp) lie on a cubic curve then D3mi = mi - 3mi-1 + 3mi-2 - mi-3 ≠ 0 but equal.

Continuing, 4th differences, D4mi will be non- zero but equal if the points (1, m1), (2, m2) … (p, mp) lie on a quartic curve (4th degree). 5th differences, D5mi will be non- zero but equal if the points (1, m1), (2, m2) … (p, mp) lie on a quintic curve (5th degree). etc.

Let L = a2 Dm2 + a3 Dm3 + … + ap Dmp Q2 = b3 D2m3 + … + bp D2mp C = c4 D3m4 + … + cp D3mp Q4 = d5 D4m5+ … + dp D4mp etc. Where a2, …, ap, b1, …, bp, c1, … etc are chosen so that L, Q2, C, Q4, … etc are mutually orthogonal contrasts.

If the means are equal then
L = Q2 = C = Q4 = … = 0. If the means are linear then L ≠ 0 but Q2 = C = Q4 = … = 0. If the means are quadratic then Q2 ≠ 0 but C = Q4, … = 0. If the means are cubic then C ≠ 0 but Q4 = … = 0.

Orthogonal Linear Contrasts for Polynomial Regression

Multiple Testing Fisher’s L.S.D. (Least Significant Difference) Procedure Tukey’s Multiple comparison procedure Scheffe’s multiple comparison procedure

Multiple Testing – a Simple Example
Suppose we are interested in testing to see if two parameters (q1 and q2) are equal to zero. There are two approaches We could test each parameter separately H0: q1 = 0 against HA: q1 ≠ 0 , then H0: q2 = 0 against HA: q2 ≠ 0 We could develop an overall test H0: q1 = 0, q2= 0 against HA: q1 ≠ 0 or q2 ≠ 0

To test each parameter separately
then We might use the following test: then is chosen so that the probability of a Type I errorof each test is a.

To perform an overall test
H0: q1 = 0, q2= 0 against HA: q1 ≠ 0 or q2 ≠ 0 we might use the test is chosen so that the probability of a Type I error is a.

Multiple Comparison Tests
Post-hoc Tests Multiple Comparison Tests

Multiple Testing Fisher’s L.S.D. (Least Significant Difference) Procedure Tukey’s Multiple comparison procedure Scheffe’s multiple comparison procedure

Suppose we have p means An F-test has revealed that there are significant differences amongst the p means We want to perform an analysis to determine precisely where the differences exist.

Example One –way ANOVA The F test – for comparing k means
Situation We have k normal populations Let mi and s denote the mean and standard deviation of population i. i = 1, 2, 3, … k. Note: we assume that the standard deviation for each population is the same. s1 = s2 = … = sk = s

We want to test against

Anova Table Mean Square F-ratio Between k - 1 SSBetween MSBetween
Source d.f. Sum of Squares Mean Square F-ratio Between k - 1 SSBetween MSBetween MSB /MSW Within N - k SSWithin MSWithin Total N - 1 SSTotal

Comments The F-test H0: m1 = m2 = m3 = … = mk against HA: at least one pair of means are different If H0 is accepted we know that all means are equal (not significantly different) If H0 is rejected we conclude that at least one pair of means is significantly different. The F – test gives no information to which pairs of means are different. One now can use two sample t tests to determine which pairs means are significantly different

Fishers LSD (least significant difference) procedure:

Fishers LSD (least significant difference) procedure:
Test H0: m1 = m2 = m3 = … = mk against HA: at least one pair of means are different, using the ANOVA F-test If H0 is accepted we know that all means are equal (not significantly different). Then stop in this case If H0 is rejected we conclude that at least one pair of means is significantly different, then follow this by using two sample t tests to determine which pairs means are significantly different

Tukey’s Multiple Comparison Test

Let denote the standard error of each Tukey's Critical Differences Two means are declared significant if they differ by more than this amount. = the tabled value for Tukey’s studentized range p = no. of means, n = df for Error

Table: Critical values for Tukey’s studentized Range distribution

Scheffe’s Multiple Comparison Test

Scheffe's Critical Differences (for Linear contrasts)
A linear contrast is declared significant if it exceeds this amount. = the tabled value for F distribution (p -1 = df for comparing p means, n = df for Error)

Scheffe's Critical Differences
(for comparing two means) Two means are declared significant if they differ by more than this amount.

Multiple Confidence Intervals
Tukey’s Multiple confidence intervals Scheffe’s Multiple confidence intervals One-at-a-time confidence intervals

Tukey’s Multiple confidence intervals
Comments Tukey’s Multiple confidence intervals One-at-a-time confidence intervals The probability that each of these interval contains mi – mj is 1 – a. The probability that all of these interval contains mi – mj is considerably lower than 1 – a Scheffe’s Multiple confidence intervals These intervals can be computed not only for simple differences in means, mi – mj , but also any other linear contrast, c1m1 + … + ckmk. The probability that all of these intervals contain its linear contrast is 1 – a

There are many multiple (post hoc) comparison procedures
Tukey’s Scheffe’, Duncan’s Multiple Range Neumann-Keuls etc Considerable controversy: “I have not included the multiple comparison methods of D.B. Duncan because I have been unable to understand their justification” H. Scheffe, Analysis of Variance

2k Experiments, Incomplete block designs for 2k experiments, fractional 2k experiments

Dependent variable y k Categorical independent variables A, B, C, … (the Factors) Let a = the number of categories of A b = the number of categories of B c = the number of categories of C etc. t = abc... Treatment combinations

The ANOVA Table three factor experiment

If the number of factors, k, is large then it may be appropriate to keep the number of levels of each factor low (2 or 3) to keep the number of treatment combinations, t, small. t = 2k if a = b =c = ... =2 or t = 3k if a = b =c = ... =3 The experimental designs are called 2k and 3k designs

The ANOVA Table 23 experiment
Source Sum of Squares d.f. A SSA 1 B SSB C SSC AB SSAB AC SSAC BC SSBC ABC SSABC Error SSError 23(n – 1)

Notation for treatment combinations for 2k experiments
There are several methods for indicating treatment combinations in a 2k experiment and 3k experiment. A sequence of small letters representing the factors with subscripts (0,1 for 2k experiment and 0, 1, 2 for a 3k experiment) A sequence of k digits (0,1 for 2k experiment and 0, 1, 2 for a 3k experiment. A third way of representing treatment combinations for 2k experiment is by representing each treatment combination by a sequence of small letters. If a factor is at its high level, it’s letter is present. If a factor is at its low level, it’s letter is not present.

The 8 treatment combinations in a 23 experiment
(a0, b0, c0), (a1, b0, c0), (a0, b1, c0), (a0, b0, c1), (a1, b1, c0), (a1, b0, c1), (a0, b1, c1), (a1, b1, c1) 000, 100, 010, 001, 110, 101, 011, 111 1, a, b, c, ab, ac, bc, abc In the last way of representing the treatment combinations, a more natural ordering is: 1, a, b, ab, c, ac, bc, abc Using this ordering the 16 treatment combinations in a 24 experiment 1, a, b, ab, c, ac, bc, abc, d, da, db, dab, dc, dac, dbc, dabc

Notation for Linear contrasts treatment combinations in a 2k experiments
The linear contrast for 1 d.f. representing the Main effect of A LA = (1 + b + c + bc) – (a + ab +ac + abc) = comparison of the treatment combinations when A is at its low level with treatment combinations when A is at its high level. Note: LA = (1 - a) (1 + b) (1 + c) also LB = (1 + a) (1 - b) (1 + c) = (1 + a + c + ac) – (b + ab +bc + abc) LC = (1 + a) (1 + b) (1 - c) = (1 + a + b + ab) – (c + ca +cb + abc)

The linear contrast for 1 d.f. representing the interaction AB
LAB = (1 - a) (1 - b) (1 + c) = (1 + ab + c + abc) – (a + b +ac + bc) = comparison of the treatment combinations where A and B are both at a high level or both at a low level with treatment combinations either A is at its high level and B is at a low level or B is at its high level and A is at a low level. LAC = (1 - a) (1 + b) (1 - c) = (1 + ac + b + abc) – (a + c +ab + bc) LBC = (1 + a) (1 - b) (1 - c) = (1 + bc + a + abc) – (b + c +ac + ab)

The linear contrast for 1 d.f. representing the interaction ABC
LABC = (1 - a) (1 - b) (1 - c) = (1 + ab + ac + bc) – (a + b + c + abc) In general Linear contrasts are of the form: L = (1 ± a)(1 ± b)(1 ± c) etc We use minus (-) if the factor is present in the effect and plus (+) if the factor is not present.

+ × + = + - × + = - + × - = - - × - = +
The sign of coefficients of each treatment for each contrast (LA, LB, LAB, LC, LAC, LBC, LABC) is illustrated in the table below: For the main effects (LA, LB, LC) the sign is negative (-) if the letter is present in the treatment, positive (+) if the letter is not present. The interactions are products of the main effects: + × + = + - × + = - + × - = - - × - = +

Strategy for a single replication (n = 1)
The ANOVA Table 23 experiment Source Sum of Squares d.f. A SSA 1 B SSB C SSC AB SSAB AC SSAC BC SSBC ABC SSABC Error SSError 23(n – 1) If n = 1 then there is 0 df for estimating error. In practice the higher order interactions are not usually present. One makes this assumption and pools together these degrees of freedom to estimate Error

In a 7 factor experiment (each at two levels) there are 27 =128 treatments.

ANOVA table: Pool together these degrees of freedom to estimate Error
Source d.f. Main Effects 7 2-factor interactions 21 3-factor interactions 35 4-factor interactions 5-factor interactions 6-factor interactions 7-factor interaction 1 Pool together these degrees of freedom to estimate Error

Randomized Block design for 2k experiments
Blocks ... n 1 2 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc A Randomized Block Design for a 23 experiment

The ANOVA Table 23 experiment in RB design
Source Sum of Squares d.f. Blocks SSBlocks n - 1 A SSA 1 B SSB C SSC AB SSAB AC SSAC BC SSBC ABC SSABC Error SSError (23 – 1)(n – 1)

Incomplete Block designs for 2k experiments Confounding

... 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a
A Randomized Block Design for a 23 experiment Blocks ... n 1 2 3 4 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc

Incomplete Block designs for 2k experiments
A Randomized Block Design for a 2k experiment requires blocks of size 2k. The ability to detect treatment differences depends on the magnitude of the within block variability. This can be reduced by decreasing the block size. Blocks 1 2 Example: a 23 experiment in blocks of size 4 (1 replication). The ABC interaction is confounded with blocks ac 1 a bc ac b c ab

ac 1 a bc b ac c ab Blocks In this experiment the linear contrast 1 2
LABC = (1 + ab + ac + bc) – (a + b + c + abc) In addition to measuring the ABC interaction it is also subject to block to block differences. ac 1 a bc b ac The ABC interaction it is said to be confounded with block differences. c ab The linear contrasts LA = (1 + b + c + bc) – (a + ab +ac + abc) LB = (1 + a + c + ac) – (b + ab +bc + abc) LC = (1 + a + b + ab) – (c + ca +cb + abc LAB = (1 + ab + c + abc) – (a + b +ac + bc) LAC = (1 + ac + b + abc) – (a + c +ab + bc) LBC = (1 + bc + a + abc) – (b + c +ac + ab) are not subject to block to block differences

To confound an interaction (e. g
To confound an interaction (e. g. ABC) consider the linear contrast associated with the interaction: LABC = 1 + ab + ac + bc – a – b – c – abc Assign treatments associated with positive (+) coefficients to one block and treatments associated with negative (-) coefficients to the other block Blocks 1 2 ac 1 a bc b ac c ab

The ANOVA Table 23 experiment in incomplete design with 2 blocks of size 4
Source Sum of Squares d.f. Blocks SSBlocks 1 A SSA B SSB C SSC AB SSAB AC SSAC BC SSBC Total SSTotal 7

Confounding more than one interaction to further reduce block size

Example: contrasts for 23 experiment
If I want to confound ABC, one places the treatments associated with the positive sign (+) in one block and the treatments associated with the negative sign (-) in the other block. If I want to confound both BC and ABC, one chooses the blocks using the sign categories (+,+) (+,-) (-,+) (-,-) Comment: There will also be a third contrast that will also be confounded

1 a ab b bc abc ac c LABC = (1 + ab + ac + bc) – (a + b + c + abc) and
Example: a 23 experiment in blocks of size 2 (1 replicate). BC and ABC interaction is confounded in the four block. Block 1 Block 2 Block 3 Block 4 1 a ab b bc abc ac c LABC = (1 + ab + ac + bc) – (a + b + c + abc) and LBC = (1 + bc + a + abc) – (b + c +ac + ab) are confounded with blocks LA = (1 + b + c + bc) – (a + ab +ac + abc) is also confounded with blocks LB = (1 + a + c + ac) – (b + ab +bc + abc) LC = (1 + a + b + ab) – (c + ca +cb + abc LAB = (1 + ab + c + abc) – (a + b +ac + bc) LAC = (1 + ac + b + abc) – (a + c +ab + bc) are not subject to block to block differences

The ANOVA Table 23 experiment in incomplete design with 4 blocks of size 2 (ABC, BC and hence A confounded with blocks) Source Sum of Squares d.f. Blocks SSBlocks 3 B SSB 1 C SSC AB SSAB AC SSAC Total SSTotal 7 There are no degrees of freedom for Error. Solution: Assume either one or both of the two factor interactions are not present and use those degrees of freedom to estimate error

Rule: (for determining additional contrasts that are confounded with block)
“Multiply” the confounded interactions together. If a factor is raised to the power 2, delete it Example: Suppose that ABC and BC is confounded, then so also is (ABC)(BC) = AB2C2 = A. A better choice would be to confound AC and BC, then the third contrast that would be confounded would be (AC)(BC) = ABC2 = AB

If I want to confound both AC and BC, one chooses the blocks using the sign categories (+,+) (+,-) (-,+) (-,-). As noted this would also confound (AC)(BC) = ABC2 = AB. Block 1 Block 2 Block 3 Block 4 1 b a ab abc ac bc c

The ANOVA Table 23 experiment in incomplete design with 4 blocks of size 2 (AC, BC and hence AB confounded with blocks) Source Sum of Squares d.f. Blocks SSBlocks 3 A SSA 1 B SSB C SSC ABC SSABC Total SSTotal 7 There are no degrees of freedom for Error. Solution: Assume that the three factor interaction is not present and use this degrees of freedom to estimate error

Partial confounding

1 ab ab bc 1 a bc c b a c b abc ac ab ab ac 1 a b ac c ab bc
Example: a 23 experiment in blocks of size 4 (3 replicates). BC interaction is confounded in 1st replication. AC interaction is confounded in 2nd replication. AB interaction is confounded in 3rd replication. Replicate 1 BC confounded Replicate 2 AC confounded Replicate 3 AB confounded Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 1 ab ab bc 1 a bc c b a c b abc ac ab ab ac 1 a b ac c ab bc The main effects (A, B and C) and the three factor interaction ABC can be estimated using all three replicates. The two factor interaction AB can be estimated using replicates 1 and 2, AC using replicates 1 and 3, BC using replicates 2 and 3,

The ANOVA Table Source Sum of Squares d.f. Reps SSBlocks 2
Blocks within Reps SSBlocks(Reps) 3 A SSA 1 B SSB C SSC AB SSAB Reps I,II AC SSAC Reps I,III BC SSBC Reps II,III ABC SSABC Error SSError 11 Total SSTotal 23

Example: A chemist is interested in determining how purity (Y) of a chemical product, depends on agitation rate (A), base component concentration (B) and concentration of reagent (C). He decides to use a 23 design. Only 4 runs can be done each day (Block) and he wanted to have 3 replications of the experiment. Replicate 1 BC confounded Replicate 2 AC confounded Replicate 3 AB confounded day 1 day 2 day 3 day 4 day 5 day 6 1 25 ab 43 abc 39 bc 38 26 a 34 c 30 b 29 37 32 42 ac 40 27 46 52 33 51 36

The ANOVA Table F0.05(1,11) = 4.84 and F0.01(1,11) = 9.65 Source
Sum of Squares d.f. Mean Square F Reps 111.00 2 55.50 Blocks within Reps 108.00 3 36.00 A 600.00 1 40.6** B 253.50 17.2** C 54.00 3.7(ns) AB (Reps I,II) 6.25 < 1 AC (Reps I,III) 1.00 BC (Reps II,III) ABC 13.50 Error 162.50 11 14.77 Total 23 F0.05(1,11) = and F0.01(1,11) = 9.65

Fractional Factorials

In a 2k experiment the number of experimental units required may be quite large even for moderate values of k. For k = 7, 27 = 128 and n27 = 256 if n = 2. Solution: Use only n = 1 replicate and use higher order interactions to estimate error. It is very rare thqt the higher order interactions are significant An alternative solution is to use ½ a replicate, ¼ a replicate, 1/8 a replicate etc. (i.e. a fractional replicate) 2k – 1 = ½ 2k design, 2k – 2 = ¼ 2k design

In a fractional factorial design, some ot he effects (interactions or main effects) may not be estimable. However it may be assumed that these effects are not present (in particular the higher order interactions)

Example: 24 experiment, A, B, C, D - contrasts
To construct a ½ replicate of this design in which the four factor interaction, ABCD, select only the treatment combinations where the coefficient is positive (+) for ABCD

The treatments and contrasts of a ½ 24 = 24-1 experiment
Notice that some of the contrasts are equivalent e.g. A and BCD, B and ACD, etc In this case the two contrasts are said to be aliased. Note the defining contrast, ABCD is aliased with the constant term I. To determine aliased contrasts multiply the any effect by the effect of the defining contrast e.g. (A)×(ABCD) = A2BCD = BCD

Aliased contrasts in a 24 -1 design with ABCD the defining contrast
A with BCD B with ACD C with ABD D with ABC AB with CD AC with BD AD with BC If an effect is aliased with another effect you can either estimate one or the other but not both

The ANOVA for a 24 -1 design with ABCD the defining contrast
Source df A 1 B C D AB AC AD Total 7

Example: ¼ 24 experiment To construct a ¼ replicate of the 24 design. Choose two defining contrasts, AB and CD, say and select only the treatment combinations where the coefficient is positive (+) for both AB and CD

The treatments and contrasts of a ¼ 24 = 24-2 experiment
Aliased contrasts I and AC and BD and ABCD A and C and ABD and BCD B and ABC and D and ACD AB and BC and AD and CD

The ANOVA for a 24 -1 design with ABCD the defining contrast
Source df A 1 B AB Total 3 There may be better choices for the defining contrasts The smaller fraction of a 2k design becomes more appropriate as k increases.

Response surfaces

We have a dependent variable y, independent variables x1, x2, ... ,xp
The general form of the model y = f(x1, x2, ... ,xp) + e Contour Map Surface Graph

The linear model y = b0 + b1x1 + b2x2 +... + bpxp + e Contour Map
Surface Graph

The quadratic response model
Linear terms Quadratic terms Contour Map Surface Graph

The quadratic response model (3 variables)
Linear terms Quadratic terms To fit this model we would be given the data on y, x1, x2, x3. From that data we would compute: We then regress y on x1, x2, x3, u4, u5, u6 , u7, u8 and u9

Exploration of a response surface The method of steepest ascent

Situation We have a dependent variable y, independent variables x1, x2, ... ,xp The general form of the model y = f(x1, x2, ... ,xp) + e We want to find the values of x1, x2, ... ,xp to maximize (or minmize) y. We will assume that the form of f(x1, x2, ... ,xp) is unknown. If it was known (e.g. A quadratic response model), we could estimate the parameters and determine the optimum values of x1, x2, ... ,xp using calculus

The method of steepest ascent:
Choose a region in the domain of f(x1, x2, ... ,xp) Collect data in that region Fit a linear model (plane) to that data. Determine from that plane the direction of its steepest ascent. (direction (b1, b2, ... ,bp )) Move off in the direction of steepest ascent collecting on y. Continue moving in that direction as long as y is increasing and stop when y stops increasing. Choose a region surrounding that point and return to step 2. Continue until the plane fitted to the data is horizontal Consider fitting a quadratic response model in this region and determining where it is optimal.

The method of steepest ascent:
domain of f(x1, x2, ... ,xp) Optimal (x1, x2) Initial region direction of steepest ascent. Final region 2nd region

Logistic regression

Recall the simple linear regression model:
y = b0 + b1x + e where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model: y = b0 + b1x1 + b2x2 + … + + bpxp + e Here we are trying to predict a continuous dependent variable y from a several continuous dependent variables x1 , x2 , … , xp .

Now suppose the dependent variable y is binary.
It takes on two values “Success” (1) or “Failure” (0) We are interested in predicting a y from a continuous dependent variable x. This is the situation in which Logistic Regression is used

The logisitic Regression Model
Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. is called the odds ratio The ratio: This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio

Example: odds ratio, log odds ratio
Suppose a die is rolled: Success = “roll a six”, p = 1/6 The odds ratio The log odds ratio

Assumes the log odds ratio is linearly related to x. i. e. : In terms of the odds ratio

Solving for p in terms x. or

Interpretation of the parameter b0 (determines the intercept)
x

Interpretation of the parameter b1 (determines when p is 0
Interpretation of the parameter b1 (determines when p is 0.50 (along with b0)) p when x

Also when is the rate of increase in p with respect to x when p = 0.50

Interpretation of the parameter b1 (determines slope when p is 0.50 )
x

The data The data will for each case consist of
a value for x, the continuous independent variable a value for y (1 or 0) (Success or Failure) Total of n = 250 cases

Estimation of the parameters
The parameters are estimated by Maximum Likelihood estimation and require a statistical package such as SPSS

Using SPSS to perform Logistic regression
Open the data file:

Choose from the menu: Analyze -> Regression -> Binary Logistic

The following dialogue box appears
Select the dependent variable (y) and the independent variable (x) (covariate). Press OK.

Here is the output The Estimates and their S.E.

The parameter Estimates

Interpretation of the parameter b0 (determines the intercept)
Interpretation of the parameter b1 (determines when p is 0.50 (along with b0))

Another interpretation of the parameter b1
is the rate of increase in p with respect to x when p = 0.50

The Multiple Logistic Regression model

Here we attempt to predict the outcome of a binary response variable Y from several independent variables X1, X2 , … etc

Multiple Logistic Regression an example
In this example we are interested in determining the risk of infants (who were born prematurely) of developing BPD (bronchopulmonary dysplasia) More specifically we are interested in developing a predictive model which will determine the probability of developing BPD from X1 = gestational Age and X2 = Birthweight

For n = 223 infants in prenatal ward the following measurements were determined
X1 = gestational Age (weeks), X2 = Birth weight (grams) and Y = presence of BPD

The data

The results

Graph: Showing Risk of BPD vs GA and BrthWt

Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data

Multiway Frequency Tables
Two-Way A

Three -Way B A C

Three -Way C B A

four -Way B A C D

Log Linear Model

Three-way Frequency Tables

Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general Where – Side conditions hold

Let mijk denote the expected frequency in cell (i,j,k) of the table then in general

Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

1. Model: (All Main effects model)
ln mijk = u + u1(i) + u2(j) + u3(k) i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables.

2. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.

3. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.

4. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u23(j,k) i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.

5. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) i.e. u23(j,k) = u123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.

6. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k) i.e. u13(i,k) = u123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.

7. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k) i.e. u12(i,j) = u123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.

8. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) i.e. u123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

9. Model: (the saturated model)
ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) + u123(i,j,k) Notation: [123] Description: No simplifying dependence structure.

Hierarchical Log-linear models for 3 way table
Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123] The saturated model

Comments Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) In certain situations the equations need to be solved numerically For the saturated model (all interactions and main effects), the estimate of mijk… is xijk… .

Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data

Multiway Frequency Tables
Two-Way A

four -Way B A C D

Log Linear Model

Two- way table where The multiplicative form:

Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where

Let mijk denote the expected frequency in cell (i,j,k) of the table then in general or the multiplicative form

Comments The log-linear model is similar to the ANOVA models for factorial experiments. The ANOVA models are used to understand the effects of categorical independent variables (factors) on a continuous dependent variable (Y). The log-linear model is used to understand dependence amongst categorical variables The presence of interactions indicate dependence between the variables present in the interactions