Download presentation
Presentation is loading. Please wait.
1
Review
2
Fitting Equations to Data
3
The Multiple Linear Regression Model
An important statistical model
4
In Multiple Linear Regression we assume the following model
Y = b0 + b1 X1 + b2 X bp Xp + e This model is called the Multiple Linear Regression Model. Again are unknown parameters of the model and where b0, b1, b2, ... , bp are unknown parameters and e is a random disturbance assumed to have a normal distribution with mean 0 and standard deviation s.
5
The importance of the Linear model
1. It is the simplest form of a model in which each independent variable has some effect on the .dependent variable Y. When fitting models to data one tries to find the simplest form of a model that still adequately describes the relationship between the dependent variable and the independent variables. The linear model is sometimes the first model to be fitted and only abandoned if it turns out to be inadequate.
6
In many instances a linear model is the most appropriate model to describe the dependence relationship between the dependent variable and the independent variables. This will be true if the dependent variable increases at a constant rate as any or the independent variables is increased while holding the other independent variables constant.
7
3. Many non-Linear models can be put into the form of a Linear model by appropriately transforming the dependent variables and/or any or all of the independent variables. This important fact ensures the wide utility of the Linear model. (i.e. the fact the many non-linear models are linearizable.)
8
Summary of the Statistics used in Multiple Regression
9
Note: = predicted value of yi The Least Squares Estimates:
- The values that minimize Note: = predicted value of yi
10
The Analysis of Variance Table Entries
a) Adjusted Total Sum of Squares (SSTotal) b) Residual Sum of Squares (SSError) c) Regression Sum of Squares (SSReg) Note: i.e. SSTotal = SSReg +SSError
11
The Analysis of Variance Table
Source Sum of Squares d.f. Mean Square F Regression SSReg p SSReg/p = MSReg MSReg/s2 Error SSError n-p-1 SSError/(n-p-1) =MSError = s2 Total SSTotal n-1
12
Testing for Hypotheses related to Multiple Regression.
13
When testing hypotheses there are two models of interest.
1. The Complete Model Y = b0 + b1X1 + b2X2 + b3X bpXp+ e 2. The Reduced Model The model implied by H0. You are interested in knowing whether the complete model can be simplified to the reduced model.
14
Some Comments The complete model contains more parameters and will always provide a better fit to the data than the reduced model. The Residual Sum of Squares for the complete model will always be smaller than the R.S.S. for the reduced model. If the reduction in the R.S,S. is small as we change from the reduced model to the complete model, the reduced model should be accepted as providing an adequate fit. If the reduction in the R.S,S. is large as we change from the reduced model to the complete model, the reduced model should be rejected as providing an adequate fit and the complete model should be kept. These principles form the basis for the following test.
15
Testing the General Linear Hypothesis
The F-test for H0 is performed by carrying out two runs of a multiple regression package.
16
Run 1: Fit the complete model.
Resulting in the following Anova Table: Source df Sum of Squares Regression p SSReg Residual (Error) n-p-1 SSError Total n-1 SSTotal
17
Run 2: Fit the reduced model (q parameters eliminated)
Resulting in the following Anova Table: Source df Sum of Squares Regression p-q SS1Reg Residual (Error) n-p+q-1 SS1Error Total n-1 SSTotal
18
The Test: The Test is carried out using the Test Statistic where SSH0 = SS1Error- SSError= SSReg- SS1Reg and s2 = SSError/(n-p-1). The test statistic, F, has an F-distribution with n1 = q d.f. in the numerator and n2 = n – p - 1 d.f. in the denominator if H0 is true.
19
The Anova Table for the Test:
Source df Sum of Squares Mean Square F Regression p-q SS1Reg [1/(p-q)]SS1Reg MS1Reg/s2 (for the reduced model) Departure q SSH0 (1/q)SSH0 MSH0/s2 from H0 Residual n-p-1 SSError s2 (Error) Total n-1 SSTotal
20
The Use of Dummy Variables
21
In the examples so far the independent variables are continuous numerical variables.
Suppose that some of the independent variables are categorical. Dummy variables are artificially defined variables designed to convert a model including categorical independent variables to the standard multiple regression model.
22
Example: Comparison of Slopes of k Regression Lines with Common Intercept
23
Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both Y (the response variable) and X (an independent variable) Y is assumed to be linearly related to X with the slope dependent on treatment (population), while the intercept is the same for each treatment
24
The Model:
25
This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments. Dummy variables are variables that are artificially defined
26
In this case we define a new variable for each category of the categorical variable.
That is we will define Xi for each category of treatments as follows:
27
Then the model can be written as follows:
The Complete Model: where
28
In this case Dependent Variable: Y Independent Variables: X1, X2, ... , Xk
29
In the above situation we would likely be interested in testing the equality of the slopes. Namely the Null Hypothesis (q = k – 1)
30
Independent Variable: X = X1+ X2+... + Xk
The Reduced Model: Dependent Variable: Y Independent Variable: X = X1+ X Xk
31
Example: Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance)
32
Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both Y (then response variable) and X (an independent variable) Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment. Y is called the response variable, while X is called the covariate.
33
The Model:
34
In this case we define a new variable for each category of the categorical variable.
That is we will define Xi for categories I i = 1, 2, …, (k – 1) of treatments as follows:
35
Then the model can be written as follows:
The Complete Model: where
36
In this case Dependent Variable: Y Independent Variables: X1, X2, ... , Xk-1, X
37
In the above situation we would likely be interested in testing the equality of the intercepts. Namely the Null Hypothesis (q = k – 1)
38
Independent Variable: X
The Reduced Model: Dependent Variable: Y Independent Variable: X
39
The F Test
40
The Analysis of Covariance
This analysis can also be performed by using a package that can perform Analysis of Covariance (ANACOVA) The package sets up the dummy variables automatically
41
Another application of the use of dummy variables
The dependent variable, Y, is linearly related to X, but the slope changes at one or several known values of X (nodes). Y X nodes
42
Y X x1 x2 xk b1 b2 bk The model or
43
Now define Etc.
44
Then the model can be written
45
Selecting the Best Equation
Multiple Regression Selecting the Best Equation
46
Techniques for Selecting the "Best" Regression Equation
The best Regression equation is not necessarily the equation that explains most of the variance in Y (the highest R2). This equation will be the one with all the variables included. The best equation should also be simple and interpretable. (i.e. contain a small no. of variables). Simple (interpretable) & Reliable - opposing criteria. The best equation is a compromise between these two.
47
We will discuss several strategies for selecting the best equation:
All Possible Regressions Uses R2, s2, Mallows Cp Cp = RSSp/s2complete - [n-2(p+1)] "Best Subset" Regression Uses R2,Ra2, Mallows Cp Backward Elimination Stepwise Regression
48
I All Possible Regressions
Suppose we have the p independent variables X1, X2, ..., Xp. Then there are 2p subsets of variables
49
Variables in Equation Model
no variables Y = b0 + e X1 Y = b0 + b1 X1+ e X2 Y = b0 + b2 X2+ e X3 Y = b0 + b3 X3+ e X1, X2 Y = b0 + b1 X1+ b2 X2+ e X1, X3 Y = b0 + b1 X1+ b3 X3+ e X2, X3 Y = b0 + b2 X2+ b3 X3+ e and X1, X2, X3 Y = b0 + b1 X1+ b2 X2+ b2 X3+ e
50
Use of R2 1. Assume we carry out 2p runs for each of the subsets. Divide the Runs into the following sets Set 0: No variables Set 1: One independent variable. ... Set p: p independent variables. 2. Order the runs in each set according to R2. 3. Examine the leaders in each run looking for consistent patterns - take into account correlation between independent variables.
51
Example (k=4) X1, X2, X3, X4 Variables in for leading runs 100 R2% Set 1: X % Set 2: X1, X % X1, X % Set 3: X1, X2, X % Set 4: X1, X2, X3, X % Examination of the correlation coefficients reveals a high correlation between X1, X3 (r13= ) and between X2, X4 (r24= ). Best Equation Y = b0 + b1 X1+ b4 X4+ e
52
Use of R2 Number of variables required, p, coincides with where R2 begins to level out
53
Use of the Residual Mean Square (RMS) (s2)
When all of the variables having a non-zero effect have been included in the mode then the residual mean square is an estimate of s2. If "significant" variables have been left out then RMS will be biased upward.
54
No. of Variables p RMS s2(p) Average s2(p) , 82.39, , 2 5.79*,122.71,7.48**, 3 5.35, 5.33, 5.65, *- run X1, X2 **- run X1, X s2- approximately 6.
55
Use of s2 Number of variables required, p, coincides with where s2 levels out
56
Use of Mallows Cp If the equation with p variables is adequate then both s2complete and RSSp/(n-p-1) will be estimating s2. If "significant" variables have been left out then RMS will be biased upward.
57
Then Thus if we plot, for each run, Cp vs p and look for Cp close to p + 1 then we will be able to identify models giving a reasonable fit.
58
Run Cp p + 1 no variables 1,2,3, , 142.5, 315.2, 12,13,14 2.7, 198.1, 5.5 3 23,24, , 138.2, 22.4 123,124,134, , 3.0, 3.5, 7.5 4
59
Use of Cp Cp p Number of variables required, p, coincides with where Cp becomes close to p + 1
60
II "Best Subset" Regression
Similar to all possible regressions. If p, the number of variables, is large then the number of runs , 2p, performed could be extremely large. In this algorithm the user supplies the value K and the algorithm identifies the best K subsets of X1, X2, ..., Xp for predicting Y.
61
III Backward Elimination
In this procedure the complete regression equation is determined containing all the variables - X1, X2, ..., Xp. Then variables are checked one at a time and the least significant is dropped from the model at each stage. The procedure is terminated when all of the variables remaining in the equation provide a significant contribution to the prediction of the dependent variable Y.
62
The precise algorithm proceeds as follows:
Fit a regression equation containing all variables in the equation.
63
2. A partial F-test is computed for each of the independent variables still in the equation.
The Partial F statistic: where RSS1 = the residual sum of squares with all variables that are presently in the equation, RSS2 = the residual sum of squares with one of the variables removed, and MSE1 = the Mean Square for Error with all variables that are presently in the equation.
64
3. The lowest partial F value is compared with Fa for some pre-specified a .
If FLowest Fa then remove that variable and return to step 2. If FLowest > Fa then accept the equation as it stands.
65
IV Stepwise Regression
In this procedure the regression equation is determined containing no variables in the model. Variables are then checked one at a time using the partial correlation coefficient as a measure of importance in predicting the dependent variable Y. At each stage the variable with the highest significant partial correlation coefficient is added to the model. Once this has been done the partial F statistic is computed for all variables now in the model is computed to check if any of the variables previously added can now be deleted.
66
This procedure is continued until no further variables can be added or deleted from the model.
The partial correlation coefficient for a given variable is the correlation between the given variable and the response when the present independent variables in the equation are held fixed. It is also the correlation between the given variable and the residuals computed from fitting an equation with the present independent variables in the equation.
67
Transformations
68
Transformations to Linearity
Many non-linear curves can be put into a linear form by appropriate transformations of the either the dependent variable Y or some (or all) of the independent variables X1, X2, ... , Xp . This leads to the wide utility of the Linear model. We have seen that through the use of dummy variables, categorical independent variables can be incorporated into a Linear Model. We will now see that through the technique of variable transformation that many examples of non-linear behaviour can also be converted to linear behaviour.
69
Polynomial Models y = b0 + b1x + b2x2 + b3x3 Linear form Y = b0 + b1 X1 + b2 X2 + b3 X3 Variables Y = y, X1 = x , X2 = x2, X3 = x3
70
Exponential Models with a polynomial exponent
Linear form lny = b0 + b1 X1 + b2 X2 + b3 X3+ b4 X4 Y = lny, X1 = x , X2 = x2, X3 = x3, X4 = x4
71
Trigonometric Polynomial Models
y = b0 + g1cos(2pf1x) + d1sin(2pf1x) + … + gkcos(2pfkx) + dksin(2pfkx) Linear form Y = b0 + g1 C1 + d1 S1 + … + gk Ck + dk Sk Variables Y = y, C1 = cos(2pf1x) , S2 = sin(2pf1x) , … Ck = cos(2pfkx) , Sk = sin(2pfkx)
72
Response Surface models
Dependent variable Y and two independent variables x1 and x2. (These ideas are easily extended to more the two independent variables) The Model (A cubic response surface model) or Y = b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4 + b5 X5 + b6 X6 + b7 X7 + b8 X8 + b9 X9+ e where
73
The Box-Cox Family of Transformations
74
The Transformation Staircase
75
The Bulging Rule x up y up y down x down
76
Nonlinearizable models
Non-Linear Models Nonlinearizable models
77
Non-Linear Growth models
many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring e) “rate of increase in Y” =
78
The Logistic Growth Model
Equation: or (ignoring e) “rate of increase in Y” =
79
The Gompertz Growth Model:
Equation: or (ignoring e) “rate of increase in Y” =
80
Non-Linear Regression
81
Least Squares in the Nonlinear Case
82
Suppose that we have collected data on the Y,
(y1, y2, ...yn) corresponding to n sets of values of the independent variables X1, X2, ... and Xp (x11, x21, ..., xp1) , (x12, x22, ..., xp2), ... and (x12, x22, ..., xp2).
83
For a set of possible values q1, q2,
For a set of possible values q1, q2, ... , qq of the parameters, a measure of how well these values fit the model described in equation * above is the residual sum of squares function where is the predicted value of the response variable yi from the values of the p independent variables x1i, x2i, ..., xpi using the model in equation * and the values of the parameters q1, q2, ... , qq.
84
The Least squares estimates of q1, q2, ... , qq, are values
which minimize S(q1, q2, ... , qq). It can be shown that the error terms are independent normally distributed with mean 0 and common variance s2 than the least squares estimates are also the maximum likelihood estimate of q1, q2, ... , qq).
85
To find the least squares estimate we need to determine when all the derivatives S(q1, q2, ... , qq) with respect to each parameter q1, q2, ... and qq are equal to zero. This quite often leads to a set of equations in q1, q2, ... and qq that are difficult to solve even with one parameter and a comparatively simple nonlinear model. When more parameters are involved and the model is more complicated, the solution of the normal equations can be extremely difficult to obtain, and iterative methods must be employed.
86
Techniques for Estimating the Parameters of a Nonlinear System
In some nonlinear problems it is convenient to determine equations (the Normal Equations) for the least squares estimates , the values that minimize the sum of squares function, S(q1, q2, ... , qq). These equations are nonlinear and it is usually necessary to develop an iterative technique for solving them.
87
We shall mention three of these:
In addition to this approach there are several currently employed methods available for obtaining the parameter estimates by a routine computer calculation. We shall mention three of these: 1) Steepest descent, 2) Linearization, and 3) Marquardt's procedure.
88
In each case a iterative procedure is used to find the least squares estimators .
That is an initial estimates, ,for these values are determined. The procedure than finds successfully better estimates, that hopefully converge to the least squares estimates,
89
Steepest Descent Steepest descent path Initial guess
90
Linearization The linearization (or Taylor series) method uses the results of linear least squares in a succession of stages. Suppose the postulated model is of the form: Y = f(X1, X2, ..., Xp| q1, q2, ... , qq) + e Let be initial values for the parameters q1, q2, ... , qq. These initial values may be intelligent guesses or preliminary estimates based on whatever information are available.
91
These initial values will, hopefully, be improved upon in the successive iterations to be described below. The linearization method approximates f(X1, X2, ..., Xp| q1, q2, ... , qq) with a linear function of q1, q2, ... , qq using a Taylor series expansion of f(X1, X2, ..., Xp| q1, q2, ... , qq) about the point and curtailing the expansion at the first derivatives. The method then uses the results of linear least squares to find values, that provide the least squares fit of of this linear function to the data .
92
The procedure is then repeated again until the successive approximations converge to hopefully at the least squares estimates:
93
Linearization Contours of RSS for linear approximatin 2nd guess
Initial guess
94
3rd guess Contours of RSS for linear approximatin 2nd guess Initial guess
95
4th guess 3rd guess Contours of RSS for linear approximatin 2nd guess Initial guess
96
The Examination of Residuals
97
The residuals are defined as the n differences :
where is an observation and is the corresponding fitted value obtained by use of the fitted model.
98
Many of the statistical procedures used in linear and nonlinear regression analysis are based certain assumptions about the random departures from the proposed model. Namely; the random departures are assumed i) to have zero mean, ii) to have a constant variance, s2, iii) independent, and iv) follow a normal distribution.
99
Thus if the fitted model is correct,
the residuals should exhibit tendencies that tend to confirm the above assumptions, or at least, should not exhibit a denial of the assumptions.
100
The principal ways of plotting the residuals ei are:
1. Overall. 2. In time sequence, if the order is known. 3. Against the fitted values 4. Against the independent variables xij for each value of j In addition to these basic plots, the residuals should also be plotted 5. In any way that is sensible for the particular problem under consideration,
101
The residuals can be plotted in an overall plot in several ways.
102
1. The scatter plot. 2. The histogram. 3. The box-whisker plot.
4. The kernel density plot 5. a normal plot or a half normal plot on standard probability paper.
103
The standard statistical test for testing Normality are:
1. The Kolmogorov-Smirnov test. 2. The Chi-square goodness of fit test
104
Namely the random departures for observations that were taken at neighbouring points in time are autocorrelated. This autocorrelation can sometimes be seen in a time sequence plot. The following three graphs show a sequence of residuals that are respectively i) positively autocorrelated , ii) independent and iii) negatively autocorrelated.
105
i) Positively auto-correlated residuals
106
ii) Independent residuals
107
iii) Negatively auto-correlated residuals
108
There are several statistics and statistical tests that can also pick out autocorrelation amongst the residuals. The most common are: i) The Durbin Watson statistic ii) The autocorrelation function iii) The runs test
109
The Durbin Watson statistic :
The Durbin-Watson statistic which is used frequently to detect serial correlation is defined by the following formula: If the residuals are serially correlated the differences, ei - ei+1, will be stochastically small. Hence a small value of the Durbin-Watson statistic will indicate positive autocorrelation. Large values of the Durbin-Watson statistic on the other hand will indicate negative autocorrelation. Critical values for this statistic, can be found in many statistical textbooks.
110
The autocorrelation function:
The autocorrelation function at lag k is defined by : This statistic measures the correlation between residuals the occur a distance k apart in time. One would expect that residuals that are close in time are more correlated than residuals that are separated by a greater distance in time. If the residuals are independent than rk should be close to zero for all values of k A plot of rk versus k can be very revealing with respect to the independence of the residuals. Some typical patterns of the autocorrelation function are given below:
111
This statistic measures the correlation between residuals the occur a distance k apart in time.
One would expect that residuals that are close in time are more correlated than residuals that are separated by a greater distance in time. If the residuals are independent than rk should be close to zero for all values of k A plot of rk versus k can be very revealing with respect to the independence of the residuals.
112
Some typical patterns of the autocorrelation function are given below:
Auto correlation pattern for independent residuals
113
Various Autocorrelation patterns for serially correlated residuals
115
Plot Against fitted values and the Predictor Variables Xij
If we "step back" from this diagram and the residuals behave in a manner consistent with the assumptions of the model we obtain the impression of a horizontal "band " of residuals which can be represented by the diagram below.
116
Individual observations lying considerably outside of this band indicate that the observation may be and outlier. An outlier is an observation that is not following the normal pattern of the other observations. Such an observation can have a considerable effect on the estimation of the parameters of a model. Sometimes the outlier has occurred because of a typographical error. If this is the case and it is detected than a correction can be made. If the outlier occurs for other (and more natural) reasons it may be appropriate to construct a model that incorporates the occurrence of outliers.
117
If our "step back" view of the residuals resembled any of those shown below we should conclude that assumptions about the model are incorrect. Each pattern may indicate that a different assumption may have to be made to explain the “abnormal” residual pattern. b) a)
118
Pattern a) indicates that the variance the random departures is not constant (homogeneous) but increases as the value along the horizontal axis increases (time, or one of the independent variables). This indicates that a weighted least squares analysis should be used. The second pattern, b) indicates that the mean value of the residuals is not zero. This is usually because the model (linear or non linear) has not been correctly specified. Linear and quadratic terms have been omitted that should have been included in the model.
119
Example – Analysis of Residuals
Motor Vehicle Data Dependent = mpg Independent = Engine size, horsepower and weight
120
When a linear model was fit and residuals examined graphically the following plot resulted:
121
The pattern that we are looking for is:
122
The pattern that was found is:
This indicates a nonlinear relationship: This can be handle by adding polynomial terms (quadratic, cubic, quartic etc.) of the independent variables or transforming the dependent variable
123
Performing the log transformation on the dependent variable (mpg) results in the following residual plot There still remains some non linearity
124
The log transformation
125
The Box-Cox transformations
l = 2 l = 1 l = 0 l = -1 l = -1
126
The log (l = 0) transformation was not totally successful - try moving further down the staircase of the family of transformations (l = -0.5)
127
try moving a bit further down the staircase of the family of transformations (l = -1.0)
128
The results after deleting the outlier are given below:
129
This corresponds to the model
and
130
Checking normality with a P-P plot
131
Factorial Experiments
Analysis of Variance Experimental Design
132
k Categorical independent variables A, B, C, … (the Factors) Let
Dependent variable Y k Categorical independent variables A, B, C, … (the Factors) Let a = the number of categories of A b = the number of categories of B c = the number of categories of C etc.
133
The Completely Randomized Design
We form the set of all treatment combinations – the set of all combinations of the k factors Total number of treatment combinations t = abc…. In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. Total number of experimental units N = nt=nabc..
134
The treatment combinations can thought to be arranged in a k-dimensional rectangular block
1 2 b 1 2 A a
135
C B A
136
Another way of representing the treatment combinations in a factorial experiment
... A ... D
137
Profile of a Factor Plot of observations means vs. levels of the factor. The levels of the other factors may be held constant or we may average over the other levels
138
Definition: A factor is said to not affect the response if the profile of the factor is horizontal for all combinations of levels of the other factors: No change in the response when you change the levels of the factor (true for all combinations of levels of the other factors) Otherwise the factor is said to affect the response:
139
Definition: Two (or more) factors are said to interact if changes in the response when you change the level of one factor depend on the level(s) of the other factor(s). Profiles of the factor for different levels of the other factor(s) are not parallel Otherwise the factors are said to be additive . Profiles of the factor for different levels of the other factor(s) are parallel.
140
If two (or more) factors interact each factor effects the response.
If two (or more) factors are additive it still remains to be determined if the factors affect the response In factorial experiments we are interested in determining which factors effect the response and which groups of factors interact .
141
Factor A has no effect B A
142
Additive Factors B A
143
Interacting Factors B A
144
The testing in factorial experiments
Test first the higher order interactions. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response.
145
Anova Table entries Sum of squares interaction (or main) effects being tested = (product of sample size and levels of factors not included in the interaction) × (Sum of squares of effects being tested) Degrees of freedom = df = product of (number of levels - 1) of factors included in the interaction.
146
Analysis of Variance (ANOVA) Table Entries (Two factors – A and B)
147
The ANOVA Table 2 Factor Experiment
148
Analysis of Variance (ANOVA) Table Entries (Three factors – A, B and C)
149
The ANOVA Table
150
The Completely Randomized Design is called balanced
If the number of observations per treatment combination is unequal the design is called unbalanced. (resulting mathematically more complex analysis and computations) If for some of the treatment combinations there are no observations the design is called incomplete. (some of the parameters - main effects and interactions - cannot be estimated.)
151
Factorial Experiments
Analysis of Variance Experimental Design
152
k Categorical independent variables A, B, C, … (the Factors) Let
Dependent variable Y k Categorical independent variables A, B, C, … (the Factors) Let a = the number of categories of A b = the number of categories of B c = the number of categories of C etc.
153
Objectives Determine which factors have some effect on the response Which groups of factors interact
154
The Completely Randomized Design
We form the set of all treatment combinations – the set of all combinations of the k factors Total number of treatment combinations t = abc…. In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. Total number of experimental units N = nt=nabc..
155
Factor A has no effect B A
156
Additive Factors B A
157
Interacting Factors B A
158
The testing in factorial experiments
Test first the higher order interactions. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response.
159
The statistical model for the 3 factor Experiment
160
The statistical model for the 3 factor Experiment
161
Anova table for the 3 factor Experiment
Source SS df MS F p -value A SSA a - 1 MSA MSA/MSError B SSB b - 1 MSB MSB/MSError C SSC c - 1 MSC MSC/MSError AB SSAB (a - 1)(b - 1) MSAB MSAB/MSError AC SSAC (a - 1)(c - 1) MSAC MSAC/MSError BC SSBC (b - 1)(c - 1) MSBC MSBC/MSError ABC SSABC (a - 1)(b - 1)(c - 1) MSABC MSABC/MSError Error SSError abc(n - 1) MSError
162
The testing in factorial experiments
Test first the higher order interactions. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact The testing continues with lower order interactions and main effects for factors which have not yet been determined to affect the response.
163
Random Effects and Fixed Effects Factors
164
So far the factors that we have considered are fixed effects factors
This is the case if the levels of the factor are a fixed set of levels and the conclusions of any analysis is in relationship to these levels. If the levels have been selected at random from a population of levels the factor is called a random effects factor The conclusions of the analysis will be directed at the population of levels and not only the levels selected for the experiment
165
The Anova table for the two factor model (A, B – fixed)
Source SS df MS EMS F A SSA a -1 MSA MSA/MSError B b - 1 MSB MSB/MSError AB SSAB (a -1)(b -1) MSAB MSAB/MSError Error SSError ab(n – 1) MSError EMS = Expected Mean Square
166
The Anova table for the two factor model (A – fixed, B - random)
Source SS df MS EMS F A SSA a -1 MSA MSA/MSAB B b - 1 MSB MSB/MSError AB SSAB (a -1)(b -1) MSAB MSAB/MSError Error SSError ab(n – 1) MSError Note: The divisor for testing the main effects of A is no longer MSError but MSAB.
167
Example: 3 factors A fixed, B, C random
Source EMS F A B C AB AC BC ABC Error
168
Example: 3 factors A , B fixed, C random
Source EMS F A B C AB AC BC ABC Error
169
Example: 3 factors A , B and C fixed
Source EMS F A B C AB AC BC ABC Error
170
Crossed and Nested Factors
171
The factors A, B are called crossed if every level of A appears with every level of B in the treatment combinations. Levels of B Levels of A
172
Factor B is said to be nested within factor A if the levels of B differ for each level of A.
Levels of A Levels of B
173
Example: A company has a = 4 plants for producing paper
Example: A company has a = 4 plants for producing paper. Each plant has 6 machines for producing the paper. The company is interested in how paper strength (Y) differs from plant to plant and from machine to machine within plant Plants Machines
174
Machines (B) are nested within plants (A)
The model for a two factor experiment with B nested within A.
175
The ANOVA table Source SS df MS F p - value A SSA a - 1 MSA
MSA/MSError B(A) SSB(A) a(b – 1) MSB(A) MSB(A) /MSError Error SSError ab(n – 1) MSError Note: SSB(A ) = SSB + SSAB and a(b – 1) = (b – 1) + (a - 1)(b – 1)
176
Factorial Experiments
Analysis of Variance Factorial Experiments
177
k Categorical independent variables A, B, C, … (the Factors) Let
Dependent variable Y k Categorical independent variables A, B, C, … (the Factors) Let a = the number of categories of A b = the number of categories of B c = the number of categories of C etc.
178
The Completely Randomized Design
We form the set of all treatment combinations – the set of all combinations of the k factors Total number of treatment combinations t = abc…. In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. Total number of experimental units N = nt=nabc..
179
Example: 3 factors A, B, C – all are random effects
Source EMS F A B C AB AC BC ABC Error
180
Example: 3 factors A fixed, B, C random
Source EMS F A B C AB AC BC ABC Error
181
Example: 3 factors A , B fixed, C random
Source EMS F A B C AB AC BC ABC Error
182
Example: 3 factors A , B and C fixed
Source EMS F A B C AB AC BC ABC Error
183
The Analysis of Covariance
ANACOVA
184
Multiple Regression Dependent variable Y (continuous)
Continuous independent variables X1, X2, …, Xp The continuous independent variables X1, X2, …, Xp are quite often measured and observed (not set at specific values or levels)
185
Analysis of Variance Dependent variable Y (continuous)
Categorical independent variables (Factors) A, B, C,… The categorical independent variables A, B, C,… are set at specific values or levels.
186
Analysis of Covariance
Dependent variable Y (continuous) Categorical independent variables (Factors) A, B, C,… Continuous independent variables (covariates) X1, X2, …, Xp
187
The Multiple Regression Model
188
The ANOVA Model
189
The ANACOVA Model
190
ANOVA Tables
191
The Multiple Regression Model
Source S.S. d.f. Regression SSReg p Error SSError n – p - 1 Total SSTotal n - 1
192
The ANOVA Model Source S.S. d.f. A SSA a - 1 B SSB b - 1 AB SSAB
Main Effects A SSA a - 1 B SSB b - 1 Interactions AB SSAB (a – 1)(b – 1) ⁞ Error SSError n – p - 1 Total SSTotal n - 1
193
The ANACOVA Model Source S.S. d.f. Covariates SSCovaraites p A SSA
Main Effects A SSA a - 1 B SSB b - 1 Interactions AB SSAB (a – 1)(b – 1) ⁞ Error SSError n – p - 1 Total SSTotal n - 1
194
Analysis of unbalanced Factorial Designs
Type I, Type II, Type III Sum of Squares
195
Sum of squares for testing an effect
modelComplete ≡ model with the effect in. modelReduced ≡ model with the effect out.
196
Type I SS Type I estimates of the sum of squares associated with an effect in a model are calculated when sums of squares for a model are calculated sequentially Example Consider the three factor factorial experiment with factors A, B and C. The Complete model Y = m + A + B + C + AB + AC + BC + ABC
197
A sequence of increasingly simpler models
Y = m + A + B + C + AB + AC + BC + ABC Y = m + A+ B + C + AB + AC + BC Y = m + A + B+ C + AB + AC Y = m + A + B + C+ AB Y = m + A + B + C Y = m + A + B Y = m + A Y = m
198
Type I S.S.
199
Type II SS Type two sum of squares are calculated for an effect assuming that the Complete model contains every effect of equal or lesser order. The reduced model has the effect removed ,
200
The Complete models Y = m + A + B + C + AB + AC + BC + ABC (the three factor model) Y = m + A+ B + C + AB + AC + BC (the all two factor model) Y = m + A + B + C (the all main effects model) The Reduced models For a k-factor effect the reduced model is the all k-factor model with the effect removed
202
Type III SS The type III sum of squares is calculated by comparing the full model, to the full model without the effect.
203
Comments When using The type I sum of squares the effects are tested in a specified sequence resulting in a increasingly simpler model. The test is valid only the null Hypothesis (H0) has been accepted in the previous tests. When using The type II sum of squares the test for a k-factor effect is valid only the all k-factor model can be assumed. When using The type III sum of squares the tests require neither of these assumptions.
204
An additional Comment When the completely randomized design is balanced (equal number of observations per treatment combination) then type I sum of squares, type II sum of squares and type III sum of squares are equal.
205
Experimental Designs The objective of Experimental design is to reduce the magnitude of random error resulting in more powerful tests to detect experimental effects
206
Other experimental designs
Randomized Block design Repeated Measures designs
207
The Randomized Block Design
208
Suppose a researcher is interested in how several treatments affect a continuous response variable (Y). The treatments may be the levels of a single factor or they may be the combinations of levels of several factors. Suppose we have available to us a total of N = nt experimental units to which we are going to apply the different treatments.
209
The Completely Randomized (CR) design randomly divides the experimental units into t groups of size n and randomly assigns a treatment to each group.
210
The Randomized Block Design
divides the group of experimental units into n homogeneous groups of size t. These homogeneous groups are called blocks. The treatments are then randomly assigned to the experimental units in each block - one treatment to a unit in each block.
211
The Completely Randomizes Design
Treats 1 2 3 … t Experimental units randomly assigned to treatments
212
Randomized Block Design
Blocks All treats appear once in each block
213
The Model for a randomized Block Experiment
i = 1,2,…, t j = 1,2,…, b yij = the observation in the jth block receiving the ith treatment m = overall mean ti = the effect of the ith treatment bj = the effect of the jth Block eij = random error
214
The Anova Table for a randomized Block Experiment
Source S.S. d.f. M.S. F p-value Treat SST t-1 MST MST /MSE Block SSB n-1 MSB MSB /MSE Error SSE (t-1)(b-1) MSE
215
A randomized block experiment is assumed to be a two-factor experiment.
The factors are blocks and treatments. The is one observation per cell. It is assumed that there is no interaction between blocks and treatments. The degrees of freedom for the interaction is used to estimate error.
216
The ANOVA table for the Completely Randomized Design
Source df Sum of Squares Treatments t - 1 SSTr Error t(n – 1) SSError Total tn - 1 SSTotal The ANOVA table for the Randomized Block Design Source df Sum of Squares Blocks n - 1 SSBlocks Treatments t - 1 SSTr Error (t – 1) (n – 1) SSError Total tn - 1 SSTotal
217
Comments The error term, , for the Completely Randomized Design models variability in the reponse, y, between experimental units The error term, , for the Completely Block Design models variability in the reponse, y, between experimental units in the same block (hopefully the is considerably smaller than The ability to detect treatment differences depends on the magnitude of the random error term
218
Repeated Measures Designs
219
In a Repeated Measures Design
We have experimental units that may be grouped according to one or several factors (the grouping factors) Then on each experimental unit we have not a single measurement but a group of measurements (the repeated measures) The repeated measures may be taken at combinations of levels of one or several factors (The repeated measures factors)
220
Anova Table for a Repeated Measures Design
221
Latin Square Designs
222
Latin Square Designs Selected Latin Squares 3 x 3 4 x 4 A B C A B C D A B C D A B C D A B C D B C A B A D C B C D A B D A C B A D C C A B C D B A C D A B C A D B C D A B D C A B D A B C D C B A D C B A 5 x 5 6 x 6 A B C D E A B C D E F B A E C D B F D C A E C D A E B C D E F B A D E B A C D A F E C B E C D B A E C A B F D F E B A D C
223
Definition A Latin square is a square array of objects (letters A, B, C, …) such that each object appears once and only once in each row and each column. Example - 4 x 4 Latin Square. A B C D B C D A C D A B D A B C
224
In a Latin square You have three factors:
Treatments (t) (letters A, B, C, …) Rows (t) Columns (t) The number of treatments = the number of rows = the number of colums = t. The row-column treatments are represented by cells in a t x t array. The treatments are assigned to row-column combinations using a Latin-square arrangement
225
The Model for a Latin Experiment
i = 1,2,…, t j = 1,2,…, t k = 1,2,…, t yij(k) = the observation in ith row and the jth column receiving the kth treatment m = overall mean tk = the effect of the ith treatment No interaction between rows, columns and treatments ri = the effect of the ith row gj = the effect of the jth column eij(k) = random error
226
A Latin Square experiment is assumed to be a three-factor experiment.
The factors are rows, columns and treatments. It is assumed that there is no interaction between rows, columns and treatments. The degrees of freedom for the interactions is used to estimate error.
227
The Anova Table for a Latin Square Experiment
Source S.S. d.f. M.S. F p-value Treat SSTr t-1 MSTr MSTr /MSE Rows SSRow MSRow MSRow /MSE Cols SSCol MSCol MSCol /MSE Error SSE (t-1)(t-2) MSE Total SST t2 - 1
228
Experimental Design Of interest: to compare t treatments (the treatment combinations of one or several factors)
229
The Completely Randomized Design
Treats 1 2 3 … t Experimental units randomly assigned to treatments
230
The Model for a CR Experiment
i = 1,2,…, t j = 1,2,…, n yij = the observation in jth observation receiving the ith treatment m = overall mean ti = the effect of the ith treatment eij = random error
231
The Anova Table for a CR Experiment
Source S.S. d.f. M.S. F p-value Treat SSTr t-1 MST MST /MSE Error SSE t(n-1) MSE
232
Randomized Block Design
Blocks 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ t t t t t t t t t All treats appear once in each block
233
The Model for a RB Experiment
i = 1,2,…, t j = 1,2,…, b yij = the observation in jth block receiving the ith treatment m = overall mean ti = the effect of the ith treatment No interaction between blocks and treatments bj = the effect of the jth block eij = random error
234
A Randomized Block experiment is assumed to be a two-factor experiment.
The factors are blocks and treatments. It is assumed that there is no interaction between blocks and treatments. The degrees of freedom for the interaction is used to estimate error.
235
The Anova Table for a randomized Block Experiment
Source S.S. d.f. M.S. F p-value Treat SST t-1 MST MST /MSE Block SSB n-1 MSB MSB /MSE Error SSE (t-1)(b-1) MSE
236
The Latin square Design
Columns Rows t 1 2 3 2 3 1 3 1 2 ⁞ t All treats appear once in each row and each column
237
The Model for a Latin Experiment
i = 1,2,…, t j = 1,2,…, t k = 1,2,…, t yij(k) = the observation in ith row and the jth column receiving the kth treatment m = overall mean tk = the effect of the ith treatment No interaction between rows, columns and treatments ri = the effect of the ith row gj = the effect of the jth column eij(k) = random error
238
A Latin Square experiment is assumed to be a three-factor experiment.
The factors are rows, columns and treatments. It is assumed that there is no interaction between rows, columns and treatments. The degrees of freedom for the interactions is used to estimate error.
239
The Anova Table for a Latin Square Experiment
Source S.S. d.f. M.S. F p-value Treat SSTr t-1 MSTr MSTr /MSE Rows SSRow MSRow MSRow /MSE Cols SSCol MSCol MSCol /MSE Error SSE (t-1)(t-2) MSE Total SST t2 - 1
240
Graeco-Latin Square Designs
Mutually orthogonal Squares
241
Definition A Greaco-Latin square consists of two latin squares (one using the letters A, B, C, … the other using greek letters a, b, c, …) such that when the two latin square are supper imposed on each other the letters of one square appear once and only once with the letters of the other square. The two Latin squares are called mutually orthogonal. Example: a 7 x 7 Greaco-Latin Square Aa Be Cb Df Ec Fg Gd Bb Cf Dc Eg Fd Ga Ae Cc Dg Ed Fa Ge Ab Bf Dd Ea Fe Gb Af Bc Cg Ee Fb Gf Ac Bg Cd Da Ff Gc Ag Bd Ca De Eb Gg Ad Ba Ce Db Ef Fc
242
Note: There exists at most (t –1) t x t Latin squares L1, L2, …, Lt-1 such that any pair are mutually orthogonal. e.g. It is possible that there exists a set of six 7 x 7 mutually orthogonal Latin squares L1, L2, L3, L4, L5, L6 .
243
The Model for a Greaco-Latin Experiment
j = 1,2,…, t i = 1,2,…, t k = 1,2,…, t l = 1,2,…, t yij(kl) = the observation in ith row and the jth column receiving the kth Latin treatment and the lth Greek treatment
244
tk = the effect of the kth Latin treatment
m = overall mean tk = the effect of the kth Latin treatment ll = the effect of the lth Greek treatment ri = the effect of the ith row gj = the effect of the jth column eij(k) = random error No interaction between rows, columns, Latin treatments and Greek treatments
245
A Greaco-Latin Square experiment is assumed to be a four-factor experiment.
The factors are rows, columns, Latin treatments and Greek treatments. It is assumed that there is no interaction between rows, columns, Latin treatments and Greek treatments. The degrees of freedom for the interactions is used to estimate error.
246
Greaco-Latin Square Experiment
The Anova Table for a Greaco-Latin Square Experiment Source S.S. d.f. M.S. F p-value Latin SSLa t-1 MSLa MSLa /MSE Greek SSGr MSGr MSGr /MSE Rows SSRow MSRow MSRow /MSE Cols SSCol MSCol MSCol /MSE Error SSE (t-1)(t-3) MSE Total SST t2 - 1
247
Incomplete Block Designs
248
Randomized Block Design
We want to compare t treatments Group the N = bt experimental units into b homogeneous blocks of size t. In each block we randomly assign the t treatments to the t experimental units in each block. The ability to detect treatment to treatment differences is dependent on the within block variability.
249
Comments The within block variability generally increases with block size. The larger the block size the larger the within block variability. For a larger number of treatments, t, it may not be appropriate or feasible to require the block size, k, to be equal to the number of treatments. If the block size, k, is less than the number of treatments (k < t)then all treatments can not appear in each block. The design is called an Incomplete Block Design.
250
Comments regarding Incomplete block designs
When two treatments appear together in the same block it is possible to estimate the difference in treatments effects. The treatment difference is estimable. If two treatments do not appear together in the same block it not be possible to estimate the difference in treatments effects. The treatment difference may not be estimable.
251
Example Consider the block design with 6 treatments and 6 blocks of size two. 1 2 2 3 1 3 4 5 5 6 4 6 The treatments differences (1 vs 2, 1 vs 3, 2 vs 3, 4 vs 5, 4 vs 6, 5 vs 6) are estimable. If one of the treatments is in the group {1,2,3} and the other treatment is in the group {4,5,6}, the treatment difference is not estimable.
252
Definitions Two treatments i and i* are said to be connected if there is a sequence of treatments i0 = i, i1, i2, … iM = i* such that each successive pair of treatments (ij and ij+1) appear in the same block In this case the treatment difference is estimable. An incomplete design is said to be connected if all treatment pairs i and i* are connected. In this case all treatment differences are estimable.
253
Example Consider the block design with 5 treatments and 5 blocks of size two. 1 2 2 3 1 3 4 5 1 4 This incomplete block design is connected. All treatment differences are estimable. Some treatment differences are estimated with a higher precision than others.
254
Incomplete Block Designs
Balanced incomplete block designs Partially balanced incomplete block designs
255
Definition An incomplete design is said to be a Balanced Incomplete Block Design. if all treatments appear in exactly r blocks. This ensures that each treatment is estimated with the same precision The value of l is the same for each treatment pair. if all treatment pairs i and i* appear together in exactly l blocks. This ensures that each treatment difference is estimated with the same precision.
256
Some Identities bk = rt r(k-1) = l (t – 1)
Let b = the number of blocks. t = the number of treatments k = the block size r = the number of times a treatment appears in the experiment. l = the number of times a pair of treatment appears together in the same block bk = rt Both sides of this equation are found by counting the total number of experimental units in the experiment. r(k-1) = l (t – 1) Both sides of this equation are found by counting the total number of experimental units that appear with a specific treatment in the experiment.
257
BIB Design A Balanced Incomplete Block Design
(b = 15, k = 4, t = 6, r = 10, l = 6)
258
Anova Table for Incomplete Block Designs
259
Designs for Estimating
Carry-over (or Residual) Effects of Treatments Ref: “Design and Analysis of Experiments” Roger G. Petersen, Dekker 1985
260
The Cross-over or Simple Reversal Design
An Example A clinical psychologist wanted to test two drugs, A and B, which are intended to increase reaction time to a certain stimulus. He has decided to use n = 8 subjects selected at random and randomly divided into two groups of four. The first group will receive drug A first then B, while the second group will receive drug B first then A.
261
To conduct the trial he administered a drug to the individual, waited 15 minutes for absorption, applied the stimulus and then measured reaction time. The data and the design is tabulated below:
262
The Switch-back or Double Reversal Design
An Example A following study was interested in the effect of concentrate type on the daily production of fat-corrected milk (FCM) . Two concentrates were used: A - high fat; and B - low fat. Five test animals were then selected for each of the two sequence groups ( A-B-A and B-A-B) in a switch-back design.
263
The data and the design is tabulated below:
One animal in the first group developed mastitis and was removed from the study.
264
The Incomplete Block Switch-back Design
An Example An insurance company was interested in buying a quantity of word processing machines for use by secretaries in the stenographic pool. The selection was narrowed down to three models (A, B, and C). A study was to be carried out , where the time to process a test document would be determined for a group of secretaries on each of the word processing models. For various reasons the company decided to use an incomplete block switch back design using n = 6 secretaries from the secretarial pool.
265
The data and the design is tabulated below:
BIB incomplete block design with t = 3 treatments – A, B and block size k = 2. A B A C B C
266
Designs for Estimating
Carry-over (or Residual) Effects of Treatments Ref: “Design and Analysis of Experiments” Roger G. Petersen, Dekker 1985
267
The Latin Square Change-Over (or Round Robin) Design
Selected Latin Squares Change-Over Designs (Balanced for Residual Effects) Period = Rows Columns = Subjects
268
Four Treatments
269
An Example An experimental psychologist wanted to determine the effect of three new drugs (A, B and C) on the time for laboratory rats to work their way through a maze. A sample of n= 12 test animals were used in the experiment. It was decided to use a Latin square Change-Over experimental design.
270
The data and the design is tabulated below:
271
Orthogonal Linear Contrasts
This is a technique for partitioning ANOVA sum of squares into individual degrees of freedom
272
Definition Let x1, x2, ... , xp denote p numerical quantities computed from the data. These could be statistics or the raw observations. A linear combination of x1, x2, ... , xp is defined to be a quantity ,L ,computed in the following manner: L = c1x1+ c2x cpxp where the coefficients c1, c2, ... , cp are predetermined numerical values:
273
Definition Let m1, m2, ... , mp denote p means and c1, c2, ... , cp denote p coefficients such that: c1+ c cp = 0, Then the linear combination L = c1m1+ c2m cpmp is called a linear contrast of the p means m1, m2, ... , mp .
274
Examples 1. A linear combination A linear contrast 2. A linear contrast L = m1 - 4 m2+ 6m3 - 4 m4 + m5 = (1) m1+ (-4) m2+ (6) m3 + (-4) m4 + (1) m5
275
Definition Let A = a1m1+ a2m2+ ... + apmp and
B = b1m1+ b2m bpmp be two linear contrasts of the p means m1, m2, ... , mp. Then A and B are called Orthogonal Linear Contrasts if in addition to: a1+ a ap = 0 and b1+ b bp = 0, it is also true that: a1b1+ a2b apbp = 0.
276
Example Let Note:
277
Definition B= b1m1+ b2m2+ ... + bpmp , ..., and
Let A = a1m1+ a2m apmp, B= b1m1+ b2m bpmp , ..., and L= l1m1+ l2m lpmp be a set linear contrasts of the p means m1, m2, ... , mp. Then the set is called a set of Mutually Orthogonal Linear Contrasts if each linear contrast in the set is orthogonal to any other linear contrast.
278
Theorem: The maximum number of linear contrasts in a set of Mutually Orthogonal Linear Contrasts of the quantities m1, m2, ... , mp is p - 1. p - 1 is called the degrees of freedom (d.f.) for comparing quantities m1, m2, ... , mp .
279
Comments Linear contrasts are making comparisons amongst the p values m1, m2, ... , mp Orthogonal Linear Contrasts are making independent comparisons amongst the p values m1, m2, ... , mp . The number of independent comparisons amongst the p values m1, m2, ... , mp is p – 1.
280
Definition Let denote a linear contrast of the p means
where each mean, , is calculated from n observations.
281
Then the Sum of Squares for testing
the Linear Contrast L, i.e. H0: L = 0 against HA: L ≠ 0 is defined to be:
282
the degrees of freedom (df) for testing the Linear Contrast L, is defined to be
the F-ratio for testing the Linear Contrast L, is defined to be:
283
To test if a set of mutually orthogonal linear contrasts are zero:
i.e. H0: L1 = 0, L2 = 0, ... , Lk= 0 then the Sum of Squares is: the degrees of freedom (df) is the F-ratio is:
284
Theorem: Let L1, L2, ... , Lp-1 denote p-1 mutually orthogonal Linear contrasts for comparing the p means . Then the Sum of Squares for comparing the p means based on p – 1 degrees of freedom , SSBetween, satisfies:
285
Comment Defining a set of Orthogonal Linear Contrasts for comparing the p means allows the researcher to "break apart" the Sum of Squares for comparing the p means, SSBetween, and make individual tests of each the Linear Contrast.
286
Techniques for constructing orthogonal linear contrasts
287
Comparing first k – 1 with kth
Consider the p values – y1, y2, y3, ... , yp L1 = 1st vs 2nd = y1 - y2 L2 = 1st , 2nd vs 3rd = ½ (y1 + y2) – y3 L3 = 1st , 2nd , 3rd vs 4th = 1/3 (y1 + y2 + y3) – y4 etc
288
Helmert contrasts Contrast coefficients L1 -1 1 L2 2 L3 3 L4 4
L2 2 L3 3 L4 4 Contrast explanation L1 2nd versus 1st L2 3rd versus 1st and 2nd L3 4th versus 1st, 2nd and 3rd L4 5th versus 1st, 2nd, 3rd and 4th
289
Comparing between Groups then within groups
Consider the p = 10 values – y1, y2, y3, ... , y10 Suppose these 10 values are grouped Group 1 y1, y2, y3 Group 2 y4, y5, y6 , y7 Group 3 y8, y9, y10 Comparison of Groups (2 d.f.) L1 = Group 1 vs Group 2 = 1/3(y1 + y2 + y3) - 1/4(y4 + y5 + y6 + y7) L2 = Group 1, Group 2 vs Group 2 = 1/7(y1 + y2 + y3 + y4 + y5 + y6 + y7) - 1/3( y8 + y9 + y10)
290
Comparison of within Groups Within Group 1 (2 d. f
Comparison of within Groups Within Group 1 (2 d.f.) L3 = 1 vs 2 = y1 - y2 L4 = 1,2 vs 3= 1/2(y1 + y2) - y3 Within Group 2 (3 d.f.) L5 = 4 vs 5 = y4 – y5 L6 = 4, 5 vs 6= 1/2(y4 + y5) – y6 L7 = 4, 5, 6 vs 7= 1/3(y4 + y5 + y6) –y7 Within Group 3 (2 d.f.) L8 = 8 vs 9 = y8 – y9 L9 = 8, 9 vs 10= 1/2(y8 + y9) –y10
291
Comparisons when Grouping is done on two different ways
Consider the p = ab values – y11, y12, y13, ... , y1b , y21, y22, y23, ... , y2b , ... , ya1, ya2, ya3, ... , yab Column Groups 1 2 3 ... b Row Groups y11 y12 y13 y1b y21 y22 y23 y2b y31 y32 y33 y3b ⁞ a ya1 ya2 ya3 yab
292
Comparison of Row Groups (a - 1 d.f.) R1 , R2 , R3 , ... , Ra -1
Comparison of Column Groups (b - 1 d.f.) C1 , C2 , C3 , ... , Cb -1 Interaction contrasts (a - 1) (b - 1) d.f. (RC)11 = R1 × C1 , (RC)12 = R1 × C2 , ... , (RC)a - 1,b - 1 = Ra - 1 × Cb – 1 Comment: The coefficients of (RC)ij = Ri × Cj are found by multiplying the coefficients of Ri with the coefficients of Cj.
293
Orthogonal Linear Contrasts
≠ Polynomial Regression
294
Let m1, m2, ... , mp denote p means and consider the first differences
Dmi = mi - mi-1 if m1 = m2 = ... = mp then Dmi = mi - mi-1 = 0 If the points (1, m1), (2, m2) … (p, mp) lie on a straight line with non-zero slope then Dmi = mi - mi-1 ≠ 0 but equal.
295
Consider the 2nd differences
D2mi = (mi - mi-1)-(mi -1 - mi-2) = mi - 2mi-1 + mi-2 If the points (1, m1), (2, m2) … (p, mp) lie on a straight line then D2mi = mi - 2mi-1 + mi-2 = 0 If the points (1, m1), (2, m2) … (p, mp) lie on a quadratic curve then D2mi = mi - 2mi-1 + mi-2 ≠ 0 but equal.
296
Consider the 3rd differences
D3mi = mi - 3mi-1 + 3mi-2 - mi-3 If the points (1, m1), (2, m2) … (p, mp) lie on a quadratic curve then D3mi = mi - 3mi-1 + 3mi-2 - mi-3 = 0 If the points (1, m1), (2, m2) … (p, mp) lie on a cubic curve then D3mi = mi - 3mi-1 + 3mi-2 - mi-3 ≠ 0 but equal.
297
Continuing, 4th differences, D4mi will be non- zero but equal if the points (1, m1), (2, m2) … (p, mp) lie on a quartic curve (4th degree). 5th differences, D5mi will be non- zero but equal if the points (1, m1), (2, m2) … (p, mp) lie on a quintic curve (5th degree). etc.
298
Let L = a2 Dm2 + a3 Dm3 + … + ap Dmp Q2 = b3 D2m3 + … + bp D2mp C = c4 D3m4 + … + cp D3mp Q4 = d5 D4m5+ … + dp D4mp etc. Where a2, …, ap, b1, …, bp, c1, … etc are chosen so that L, Q2, C, Q4, … etc are mutually orthogonal contrasts.
299
If the means are equal then
L = Q2 = C = Q4 = … = 0. If the means are linear then L ≠ 0 but Q2 = C = Q4 = … = 0. If the means are quadratic then Q2 ≠ 0 but C = Q4, … = 0. If the means are cubic then C ≠ 0 but Q4 = … = 0.
300
Orthogonal Linear Contrasts for Polynomial Regression
301
Orthogonal Linear Contrasts for Polynomial Regression
302
Multiple Testing Fisher’s L.S.D. (Least Significant Difference) Procedure Tukey’s Multiple comparison procedure Scheffe’s multiple comparison procedure
303
Multiple Testing – a Simple Example
Suppose we are interested in testing to see if two parameters (q1 and q2) are equal to zero. There are two approaches We could test each parameter separately H0: q1 = 0 against HA: q1 ≠ 0 , then H0: q2 = 0 against HA: q2 ≠ 0 We could develop an overall test H0: q1 = 0, q2= 0 against HA: q1 ≠ 0 or q2 ≠ 0
304
To test each parameter separately
then We might use the following test: then is chosen so that the probability of a Type I errorof each test is a.
305
To perform an overall test
H0: q1 = 0, q2= 0 against HA: q1 ≠ 0 or q2 ≠ 0 we might use the test is chosen so that the probability of a Type I error is a.
313
Multiple Comparison Tests
Post-hoc Tests Multiple Comparison Tests
314
Multiple Comparison Tests
Post-hoc Tests Multiple Comparison Tests
315
Multiple Testing Fisher’s L.S.D. (Least Significant Difference) Procedure Tukey’s Multiple comparison procedure Scheffe’s multiple comparison procedure
316
Suppose we have p means An F-test has revealed that there are significant differences amongst the p means We want to perform an analysis to determine precisely where the differences exist.
317
Example One –way ANOVA The F test – for comparing k means
Situation We have k normal populations Let mi and s denote the mean and standard deviation of population i. i = 1, 2, 3, … k. Note: we assume that the standard deviation for each population is the same. s1 = s2 = … = sk = s
318
We want to test against
319
Anova Table Mean Square F-ratio Between k - 1 SSBetween MSBetween
Source d.f. Sum of Squares Mean Square F-ratio Between k - 1 SSBetween MSBetween MSB /MSW Within N - k SSWithin MSWithin Total N - 1 SSTotal
320
Comments The F-test H0: m1 = m2 = m3 = … = mk against HA: at least one pair of means are different If H0 is accepted we know that all means are equal (not significantly different) If H0 is rejected we conclude that at least one pair of means is significantly different. The F – test gives no information to which pairs of means are different. One now can use two sample t tests to determine which pairs means are significantly different
321
Fishers LSD (least significant difference) procedure:
322
Fishers LSD (least significant difference) procedure:
Test H0: m1 = m2 = m3 = … = mk against HA: at least one pair of means are different, using the ANOVA F-test If H0 is accepted we know that all means are equal (not significantly different). Then stop in this case If H0 is rejected we conclude that at least one pair of means is significantly different, then follow this by using two sample t tests to determine which pairs means are significantly different
323
Tukey’s Multiple Comparison Test
324
Let denote the standard error of each Tukey's Critical Differences Two means are declared significant if they differ by more than this amount. = the tabled value for Tukey’s studentized range p = no. of means, n = df for Error
325
Table: Critical values for Tukey’s studentized Range distribution
329
Scheffe’s Multiple Comparison Test
330
Scheffe's Critical Differences (for Linear contrasts)
A linear contrast is declared significant if it exceeds this amount. = the tabled value for F distribution (p -1 = df for comparing p means, n = df for Error)
331
Scheffe's Critical Differences
(for comparing two means) Two means are declared significant if they differ by more than this amount.
332
Multiple Confidence Intervals
Tukey’s Multiple confidence intervals Scheffe’s Multiple confidence intervals One-at-a-time confidence intervals
333
Tukey’s Multiple confidence intervals
Comments Tukey’s Multiple confidence intervals One-at-a-time confidence intervals The probability that each of these interval contains mi – mj is 1 – a. The probability that all of these interval contains mi – mj is considerably lower than 1 – a Scheffe’s Multiple confidence intervals These intervals can be computed not only for simple differences in means, mi – mj , but also any other linear contrast, c1m1 + … + ckmk. The probability that all of these intervals contain its linear contrast is 1 – a
334
There are many multiple (post hoc) comparison procedures
Tukey’s Scheffe’, Duncan’s Multiple Range Neumann-Keuls etc Considerable controversy: “I have not included the multiple comparison methods of D.B. Duncan because I have been unable to understand their justification” H. Scheffe, Analysis of Variance
335
2k Experiments, Incomplete block designs for 2k experiments, fractional 2k experiments
336
Factorial Experiments
337
k Categorical independent variables A, B, C, … (the Factors) Let
Dependent variable y k Categorical independent variables A, B, C, … (the Factors) Let a = the number of categories of A b = the number of categories of B c = the number of categories of C etc. t = abc... Treatment combinations
338
The Completely Randomized Design
We form the set of all treatment combinations – the set of all combinations of the k factors Total number of treatment combinations t = abc…. In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. Total number of experimental units N = nt=nabc..
339
The ANOVA Table three factor experiment
340
If the number of factors, k, is large then it may be appropriate to keep the number of levels of each factor low (2 or 3) to keep the number of treatment combinations, t, small. t = 2k if a = b =c = ... =2 or t = 3k if a = b =c = ... =3 The experimental designs are called 2k and 3k designs
341
The ANOVA Table 23 experiment
Source Sum of Squares d.f. A SSA 1 B SSB C SSC AB SSAB AC SSAC BC SSBC ABC SSABC Error SSError 23(n – 1)
342
Notation for treatment combinations for 2k experiments
There are several methods for indicating treatment combinations in a 2k experiment and 3k experiment. A sequence of small letters representing the factors with subscripts (0,1 for 2k experiment and 0, 1, 2 for a 3k experiment) A sequence of k digits (0,1 for 2k experiment and 0, 1, 2 for a 3k experiment. A third way of representing treatment combinations for 2k experiment is by representing each treatment combination by a sequence of small letters. If a factor is at its high level, it’s letter is present. If a factor is at its low level, it’s letter is not present.
343
The 8 treatment combinations in a 23 experiment
(a0, b0, c0), (a1, b0, c0), (a0, b1, c0), (a0, b0, c1), (a1, b1, c0), (a1, b0, c1), (a0, b1, c1), (a1, b1, c1) 000, 100, 010, 001, 110, 101, 011, 111 1, a, b, c, ab, ac, bc, abc In the last way of representing the treatment combinations, a more natural ordering is: 1, a, b, ab, c, ac, bc, abc Using this ordering the 16 treatment combinations in a 24 experiment 1, a, b, ab, c, ac, bc, abc, d, da, db, dab, dc, dac, dbc, dabc
344
Notation for Linear contrasts treatment combinations in a 2k experiments
The linear contrast for 1 d.f. representing the Main effect of A LA = (1 + b + c + bc) – (a + ab +ac + abc) = comparison of the treatment combinations when A is at its low level with treatment combinations when A is at its high level. Note: LA = (1 - a) (1 + b) (1 + c) also LB = (1 + a) (1 - b) (1 + c) = (1 + a + c + ac) – (b + ab +bc + abc) LC = (1 + a) (1 + b) (1 - c) = (1 + a + b + ab) – (c + ca +cb + abc)
345
The linear contrast for 1 d.f. representing the interaction AB
LAB = (1 - a) (1 - b) (1 + c) = (1 + ab + c + abc) – (a + b +ac + bc) = comparison of the treatment combinations where A and B are both at a high level or both at a low level with treatment combinations either A is at its high level and B is at a low level or B is at its high level and A is at a low level. LAC = (1 - a) (1 + b) (1 - c) = (1 + ac + b + abc) – (a + c +ab + bc) LBC = (1 + a) (1 - b) (1 - c) = (1 + bc + a + abc) – (b + c +ac + ab)
346
The linear contrast for 1 d.f. representing the interaction ABC
LABC = (1 - a) (1 - b) (1 - c) = (1 + ab + ac + bc) – (a + b + c + abc) In general Linear contrasts are of the form: L = (1 ± a)(1 ± b)(1 ± c) etc We use minus (-) if the factor is present in the effect and plus (+) if the factor is not present.
347
+ × + = + - × + = - + × - = - - × - = +
The sign of coefficients of each treatment for each contrast (LA, LB, LAB, LC, LAC, LBC, LABC) is illustrated in the table below: For the main effects (LA, LB, LC) the sign is negative (-) if the letter is present in the treatment, positive (+) if the letter is not present. The interactions are products of the main effects: + × + = + - × + = - + × - = - - × - = +
348
Strategy for a single replication (n = 1)
The ANOVA Table 23 experiment Source Sum of Squares d.f. A SSA 1 B SSB C SSC AB SSAB AC SSAC BC SSBC ABC SSABC Error SSError 23(n – 1) If n = 1 then there is 0 df for estimating error. In practice the higher order interactions are not usually present. One makes this assumption and pools together these degrees of freedom to estimate Error
349
In a 7 factor experiment (each at two levels) there are 27 =128 treatments.
350
ANOVA table: Pool together these degrees of freedom to estimate Error
Source d.f. Main Effects 7 2-factor interactions 21 3-factor interactions 35 4-factor interactions 5-factor interactions 6-factor interactions 7-factor interaction 1 Pool together these degrees of freedom to estimate Error
351
Randomized Block design for 2k experiments
Blocks ... n 1 2 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc A Randomized Block Design for a 23 experiment
352
The ANOVA Table 23 experiment in RB design
Source Sum of Squares d.f. Blocks SSBlocks n - 1 A SSA 1 B SSB C SSC AB SSAB AC SSAC BC SSBC ABC SSABC Error SSError (23 – 1)(n – 1)
353
Incomplete Block designs for 2k experiments Confounding
354
... 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a
A Randomized Block Design for a 23 experiment Blocks ... n 1 2 3 4 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc
355
Incomplete Block designs for 2k experiments
A Randomized Block Design for a 2k experiment requires blocks of size 2k. The ability to detect treatment differences depends on the magnitude of the within block variability. This can be reduced by decreasing the block size. Blocks 1 2 Example: a 23 experiment in blocks of size 4 (1 replication). The ABC interaction is confounded with blocks ac 1 a bc ac b c ab
356
ac 1 a bc b ac c ab Blocks In this experiment the linear contrast 1 2
LABC = (1 + ab + ac + bc) – (a + b + c + abc) In addition to measuring the ABC interaction it is also subject to block to block differences. ac 1 a bc b ac The ABC interaction it is said to be confounded with block differences. c ab The linear contrasts LA = (1 + b + c + bc) – (a + ab +ac + abc) LB = (1 + a + c + ac) – (b + ab +bc + abc) LC = (1 + a + b + ab) – (c + ca +cb + abc LAB = (1 + ab + c + abc) – (a + b +ac + bc) LAC = (1 + ac + b + abc) – (a + c +ab + bc) LBC = (1 + bc + a + abc) – (b + c +ac + ab) are not subject to block to block differences
357
To confound an interaction (e. g
To confound an interaction (e. g. ABC) consider the linear contrast associated with the interaction: LABC = 1 + ab + ac + bc – a – b – c – abc Assign treatments associated with positive (+) coefficients to one block and treatments associated with negative (-) coefficients to the other block Blocks 1 2 ac 1 a bc b ac c ab
358
The ANOVA Table 23 experiment in incomplete design with 2 blocks of size 4
Source Sum of Squares d.f. Blocks SSBlocks 1 A SSA B SSB C SSC AB SSAB AC SSAC BC SSBC Total SSTotal 7
359
Confounding more than one interaction to further reduce block size
360
Example: contrasts for 23 experiment
If I want to confound ABC, one places the treatments associated with the positive sign (+) in one block and the treatments associated with the negative sign (-) in the other block. If I want to confound both BC and ABC, one chooses the blocks using the sign categories (+,+) (+,-) (-,+) (-,-) Comment: There will also be a third contrast that will also be confounded
361
1 a ab b bc abc ac c LABC = (1 + ab + ac + bc) – (a + b + c + abc) and
Example: a 23 experiment in blocks of size 2 (1 replicate). BC and ABC interaction is confounded in the four block. Block 1 Block 2 Block 3 Block 4 1 a ab b bc abc ac c LABC = (1 + ab + ac + bc) – (a + b + c + abc) and LBC = (1 + bc + a + abc) – (b + c +ac + ab) are confounded with blocks LA = (1 + b + c + bc) – (a + ab +ac + abc) is also confounded with blocks LB = (1 + a + c + ac) – (b + ab +bc + abc) LC = (1 + a + b + ab) – (c + ca +cb + abc LAB = (1 + ab + c + abc) – (a + b +ac + bc) LAC = (1 + ac + b + abc) – (a + c +ab + bc) are not subject to block to block differences
362
The ANOVA Table 23 experiment in incomplete design with 4 blocks of size 2 (ABC, BC and hence A confounded with blocks) Source Sum of Squares d.f. Blocks SSBlocks 3 B SSB 1 C SSC AB SSAB AC SSAC Total SSTotal 7 There are no degrees of freedom for Error. Solution: Assume either one or both of the two factor interactions are not present and use those degrees of freedom to estimate error
363
Rule: (for determining additional contrasts that are confounded with block)
“Multiply” the confounded interactions together. If a factor is raised to the power 2, delete it Example: Suppose that ABC and BC is confounded, then so also is (ABC)(BC) = AB2C2 = A. A better choice would be to confound AC and BC, then the third contrast that would be confounded would be (AC)(BC) = ABC2 = AB
364
If I want to confound both AC and BC, one chooses the blocks using the sign categories (+,+) (+,-) (-,+) (-,-). As noted this would also confound (AC)(BC) = ABC2 = AB. Block 1 Block 2 Block 3 Block 4 1 b a ab abc ac bc c
365
The ANOVA Table 23 experiment in incomplete design with 4 blocks of size 2 (AC, BC and hence AB confounded with blocks) Source Sum of Squares d.f. Blocks SSBlocks 3 A SSA 1 B SSB C SSC ABC SSABC Total SSTotal 7 There are no degrees of freedom for Error. Solution: Assume that the three factor interaction is not present and use this degrees of freedom to estimate error
366
Partial confounding
367
1 ab ab bc 1 a bc c b a c b abc ac ab ab ac 1 a b ac c ab bc
Example: a 23 experiment in blocks of size 4 (3 replicates). BC interaction is confounded in 1st replication. AC interaction is confounded in 2nd replication. AB interaction is confounded in 3rd replication. Replicate 1 BC confounded Replicate 2 AC confounded Replicate 3 AB confounded Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 1 ab ab bc 1 a bc c b a c b abc ac ab ab ac 1 a b ac c ab bc The main effects (A, B and C) and the three factor interaction ABC can be estimated using all three replicates. The two factor interaction AB can be estimated using replicates 1 and 2, AC using replicates 1 and 3, BC using replicates 2 and 3,
368
The ANOVA Table Source Sum of Squares d.f. Reps SSBlocks 2
Blocks within Reps SSBlocks(Reps) 3 A SSA 1 B SSB C SSC AB SSAB Reps I,II AC SSAC Reps I,III BC SSBC Reps II,III ABC SSABC Error SSError 11 Total SSTotal 23
369
Example: A chemist is interested in determining how purity (Y) of a chemical product, depends on agitation rate (A), base component concentration (B) and concentration of reagent (C). He decides to use a 23 design. Only 4 runs can be done each day (Block) and he wanted to have 3 replications of the experiment. Replicate 1 BC confounded Replicate 2 AC confounded Replicate 3 AB confounded day 1 day 2 day 3 day 4 day 5 day 6 1 25 ab 43 abc 39 bc 38 26 a 34 c 30 b 29 37 32 42 ac 40 27 46 52 33 51 36
370
The ANOVA Table F0.05(1,11) = 4.84 and F0.01(1,11) = 9.65 Source
Sum of Squares d.f. Mean Square F Reps 111.00 2 55.50 Blocks within Reps 108.00 3 36.00 A 600.00 1 40.6** B 253.50 17.2** C 54.00 3.7(ns) AB (Reps I,II) 6.25 < 1 AC (Reps I,III) 1.00 BC (Reps II,III) ABC 13.50 Error 162.50 11 14.77 Total 23 F0.05(1,11) = and F0.01(1,11) = 9.65
371
Fractional Factorials
372
In a 2k experiment the number of experimental units required may be quite large even for moderate values of k. For k = 7, 27 = 128 and n27 = 256 if n = 2. Solution: Use only n = 1 replicate and use higher order interactions to estimate error. It is very rare thqt the higher order interactions are significant An alternative solution is to use ½ a replicate, ¼ a replicate, 1/8 a replicate etc. (i.e. a fractional replicate) 2k – 1 = ½ 2k design, 2k – 2 = ¼ 2k design
373
In a fractional factorial design, some ot he effects (interactions or main effects) may not be estimable. However it may be assumed that these effects are not present (in particular the higher order interactions)
374
Example: 24 experiment, A, B, C, D - contrasts
To construct a ½ replicate of this design in which the four factor interaction, ABCD, select only the treatment combinations where the coefficient is positive (+) for ABCD
375
The treatments and contrasts of a ½ 24 = 24-1 experiment
Notice that some of the contrasts are equivalent e.g. A and BCD, B and ACD, etc In this case the two contrasts are said to be aliased. Note the defining contrast, ABCD is aliased with the constant term I. To determine aliased contrasts multiply the any effect by the effect of the defining contrast e.g. (A)×(ABCD) = A2BCD = BCD
376
Aliased contrasts in a 24 -1 design with ABCD the defining contrast
A with BCD B with ACD C with ABD D with ABC AB with CD AC with BD AD with BC If an effect is aliased with another effect you can either estimate one or the other but not both
377
The ANOVA for a 24 -1 design with ABCD the defining contrast
Source df A 1 B C D AB AC AD Total 7
378
Example: ¼ 24 experiment To construct a ¼ replicate of the 24 design. Choose two defining contrasts, AB and CD, say and select only the treatment combinations where the coefficient is positive (+) for both AB and CD
379
The treatments and contrasts of a ¼ 24 = 24-2 experiment
Aliased contrasts I and AC and BD and ABCD A and C and ABD and BCD B and ABC and D and ACD AB and BC and AD and CD
380
The ANOVA for a 24 -1 design with ABCD the defining contrast
Source df A 1 B AB Total 3 There may be better choices for the defining contrasts The smaller fraction of a 2k design becomes more appropriate as k increases.
381
Response surfaces
382
We have a dependent variable y, independent variables x1, x2, ... ,xp
The general form of the model y = f(x1, x2, ... ,xp) + e Contour Map Surface Graph
383
The linear model y = b0 + b1x1 + b2x2 +... + bpxp + e Contour Map
Surface Graph
384
The quadratic response model
Linear terms Quadratic terms Contour Map Surface Graph
385
The quadratic response model (3 variables)
Linear terms Quadratic terms To fit this model we would be given the data on y, x1, x2, x3. From that data we would compute: We then regress y on x1, x2, x3, u4, u5, u6 , u7, u8 and u9
386
Exploration of a response surface The method of steepest ascent
387
Situation We have a dependent variable y, independent variables x1, x2, ... ,xp The general form of the model y = f(x1, x2, ... ,xp) + e We want to find the values of x1, x2, ... ,xp to maximize (or minmize) y. We will assume that the form of f(x1, x2, ... ,xp) is unknown. If it was known (e.g. A quadratic response model), we could estimate the parameters and determine the optimum values of x1, x2, ... ,xp using calculus
388
The method of steepest ascent:
Choose a region in the domain of f(x1, x2, ... ,xp) Collect data in that region Fit a linear model (plane) to that data. Determine from that plane the direction of its steepest ascent. (direction (b1, b2, ... ,bp )) Move off in the direction of steepest ascent collecting on y. Continue moving in that direction as long as y is increasing and stop when y stops increasing. Choose a region surrounding that point and return to step 2. Continue until the plane fitted to the data is horizontal Consider fitting a quadratic response model in this region and determining where it is optimal.
389
The method of steepest ascent:
domain of f(x1, x2, ... ,xp) Optimal (x1, x2) Initial region direction of steepest ascent. Final region 2nd region
390
Logistic regression
391
Recall the simple linear regression model:
y = b0 + b1x + e where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model: y = b0 + b1x1 + b2x2 + … + + bpxp + e Here we are trying to predict a continuous dependent variable y from a several continuous dependent variables x1 , x2 , … , xp .
392
Now suppose the dependent variable y is binary.
It takes on two values “Success” (1) or “Failure” (0) We are interested in predicting a y from a continuous dependent variable x. This is the situation in which Logistic Regression is used
393
The logisitic Regression Model
Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. is called the odds ratio The ratio: This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio
394
Example: odds ratio, log odds ratio
Suppose a die is rolled: Success = “roll a six”, p = 1/6 The odds ratio The log odds ratio
395
The logisitic Regression Model
Assumes the log odds ratio is linearly related to x. i. e. : In terms of the odds ratio
396
The logisitic Regression Model
Solving for p in terms x. or
397
Interpretation of the parameter b0 (determines the intercept)
x
398
Interpretation of the parameter b1 (determines when p is 0
Interpretation of the parameter b1 (determines when p is 0.50 (along with b0)) p when x
399
Also when is the rate of increase in p with respect to x when p = 0.50
400
Interpretation of the parameter b1 (determines slope when p is 0.50 )
x
401
The data The data will for each case consist of
a value for x, the continuous independent variable a value for y (1 or 0) (Success or Failure) Total of n = 250 cases
403
Estimation of the parameters
The parameters are estimated by Maximum Likelihood estimation and require a statistical package such as SPSS
404
Using SPSS to perform Logistic regression
Open the data file:
405
Choose from the menu: Analyze -> Regression -> Binary Logistic
406
The following dialogue box appears
Select the dependent variable (y) and the independent variable (x) (covariate). Press OK.
407
Here is the output The Estimates and their S.E.
408
The parameter Estimates
409
Interpretation of the parameter b0 (determines the intercept)
Interpretation of the parameter b1 (determines when p is 0.50 (along with b0))
410
Another interpretation of the parameter b1
is the rate of increase in p with respect to x when p = 0.50
411
The Multiple Logistic Regression model
412
Here we attempt to predict the outcome of a binary response variable Y from several independent variables X1, X2 , … etc
413
Multiple Logistic Regression an example
In this example we are interested in determining the risk of infants (who were born prematurely) of developing BPD (bronchopulmonary dysplasia) More specifically we are interested in developing a predictive model which will determine the probability of developing BPD from X1 = gestational Age and X2 = Birthweight
414
For n = 223 infants in prenatal ward the following measurements were determined
X1 = gestational Age (weeks), X2 = Birth weight (grams) and Y = presence of BPD
415
The data
416
The results
417
Graph: Showing Risk of BPD vs GA and BrthWt
418
Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data
419
Multiway Frequency Tables
Two-Way A
420
Three -Way B A C
421
Three -Way C B A
422
four -Way B A C D
423
Log Linear Model
424
Three-way Frequency Tables
425
Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general Where – Side conditions hold
426
Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general
427
Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
428
1. Model: (All Main effects model)
ln mijk = u + u1(i) + u2(j) + u3(k) i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables.
429
2. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.
430
3. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.
431
4. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u23(j,k) i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.
432
5. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) i.e. u23(j,k) = u123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.
433
6. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k) i.e. u13(i,k) = u123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.
434
7. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k) i.e. u12(i,j) = u123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.
435
8. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) i.e. u123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
436
9. Model: (the saturated model)
ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) + u123(i,j,k) Notation: [123] Description: No simplifying dependence structure.
437
Hierarchical Log-linear models for 3 way table
Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123] The saturated model
438
Comments Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) In certain situations the equations need to be solved numerically For the saturated model (all interactions and main effects), the estimate of mijk… is xijk… .
439
Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data
440
Multiway Frequency Tables
Two-Way A
441
four -Way B A C D
442
Log Linear Model
443
Two- way table where The multiplicative form:
444
Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where
445
Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general or the multiplicative form
446
Comments The log-linear model is similar to the ANOVA models for factorial experiments. The ANOVA models are used to understand the effects of categorical independent variables (factors) on a continuous dependent variable (Y). The log-linear model is used to understand dependence amongst categorical variables The presence of interactions indicate dependence between the variables present in the interactions
447
Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
448
1. Model: (All Main effects model)
ln mijk = u + u1(i) + u2(j) + u3(k) i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables.
449
2. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.
450
3. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.
451
4. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u23(j,k) i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.
452
5. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) i.e. u23(j,k) = u123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.
453
6. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k) i.e. u13(i,k) = u123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.
454
7. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k) i.e. u12(i,j) = u123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.
455
8. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) i.e. u123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
456
9. Model: (the saturated model)
ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) + u123(i,j,k) Notation: [123] Description: No simplifying dependence structure.
457
Hierarchical Log-linear models for 3 way table
Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123] The saturated model
458
Goodness of Fit Statistics
These statistics can be used to check if a log-linear model will fit the observed frequency table
459
Goodness of Fit Statistics
The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if c2 or G2 is greater than
460
Conditional Test Statistics
461
Suppose that we are considering two Log-linear models and that Model 2 is a special case of Model 1.
That is the parameters of Model 2 are a subset of the parameters of Model 1. Also assume that Model 1 has been shown to adequately fit the data.
462
In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:
463
Stepwise selection procedures
Forward Selection Backward Elimination
464
Forward Selection: Starting with a model that under fits the data, log-linear parameters that are not in the model are added step by step until a model that does fit is achieved. At each step the log-linear parameter that is most significant is added to the model: To determine the significance of a parameter added we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter
465
Backward Elimination:
Starting with a model that over fits the data, log-linear parameters that are in the model are deleted step by step until a model that continues to fit the model and has the smallest number of significant parameters is achieved. At each step the log-linear parameter that is least significant is deleted from the model: To determine the significance of a parameter deleted we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter
466
Modelling of response variables
Independent → Dependent
467
Logit Models To date we have not worried whether any of the variables were dependent of independent variables. The logit model is used when we have a single binary dependent variable.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.