Presentation is loading. Please wait.

Presentation is loading. Please wait.

Review.

Similar presentations


Presentation on theme: "Review."— Presentation transcript:

1 Review

2 Fitting Equations to Data

3 The Multiple Linear Regression Model
An important statistical model

4 In Multiple Linear Regression we assume the following model
Y = b0 + b1 X1 + b2 X bp Xp + e This model is called the Multiple Linear Regression Model. Again are unknown parameters of the model and where b0, b1, b2, ... , bp are unknown parameters and e is a random disturbance assumed to have a normal distribution with mean 0 and standard deviation s.

5 The importance of the Linear model
1.     It is the simplest form of a model in which each independent variable has some effect on the .dependent variable Y. When fitting models to data one tries to find the simplest form of a model that still adequately describes the relationship between the dependent variable and the independent variables. The linear model is sometimes the first model to be fitted and only abandoned if it turns out to be inadequate.

6 In many instances a linear model is the most appropriate model to describe the dependence relationship between the dependent variable and the independent variables. This will be true if the dependent variable increases at a constant rate as any or the independent variables is increased while holding the other independent variables constant.

7 3.     Many non-Linear models can be put into the form of a Linear model by appropriately transforming the dependent variables and/or any or all of the independent variables. This important fact ensures the wide utility of the Linear model. (i.e. the fact the many non-linear models are linearizable.)

8 Summary of the Statistics used in Multiple Regression

9 Note: = predicted value of yi The Least Squares Estimates:
- The values that minimize Note: = predicted value of yi

10 The Analysis of Variance Table Entries
a) Adjusted Total Sum of Squares (SSTotal) b) Residual Sum of Squares (SSError) c) Regression Sum of Squares (SSReg) Note: i.e. SSTotal = SSReg +SSError

11 The Analysis of Variance Table
Source Sum of Squares d.f. Mean Square F Regression SSReg p SSReg/p = MSReg MSReg/s2 Error SSError n-p-1 SSError/(n-p-1) =MSError = s2 Total SSTotal n-1

12 Testing for Hypotheses related to Multiple Regression.

13 When testing hypotheses there are two models of interest.
1. The Complete Model Y = b0 + b1X1 + b2X2 + b3X bpXp+ e 2. The Reduced Model The model implied by H0. You are interested in knowing whether the complete model can be simplified to the reduced model.

14 Some Comments The complete model contains more parameters and will always provide a better fit to the data than the reduced model. The Residual Sum of Squares for the complete model will always be smaller than the R.S.S. for the reduced model. If the reduction in the R.S,S. is small as we change from the reduced model to the complete model, the reduced model should be accepted as providing an adequate fit. If the reduction in the R.S,S. is large as we change from the reduced model to the complete model, the reduced model should be rejected as providing an adequate fit and the complete model should be kept. These principles form the basis for the following test.

15 Testing the General Linear Hypothesis
The F-test for H0 is performed by carrying out two runs of a multiple regression package.

16 Run 1: Fit the complete model.
Resulting in the following Anova Table: Source df Sum of Squares Regression p SSReg Residual (Error) n-p-1 SSError Total n-1 SSTotal

17 Run 2: Fit the reduced model (q parameters eliminated)
Resulting in the following Anova Table: Source df Sum of Squares Regression p-q SS1Reg Residual (Error) n-p+q-1 SS1Error Total n-1 SSTotal

18 The Test: The Test is carried out using the Test Statistic where SSH0 = SS1Error- SSError= SSReg- SS1Reg and s2 = SSError/(n-p-1). The test statistic, F, has an F-distribution with n1 = q d.f. in the numerator and n2 = n – p - 1 d.f. in the denominator if H0 is true.

19 The Anova Table for the Test:
Source df Sum of Squares Mean Square F Regression p-q SS1Reg [1/(p-q)]SS1Reg MS1Reg/s2 (for the reduced model) Departure q SSH0 (1/q)SSH0 MSH0/s2 from H0 Residual n-p-1 SSError s2 (Error) Total n-1 SSTotal

20 The Use of Dummy Variables

21 In the examples so far the independent variables are continuous numerical variables.
Suppose that some of the independent variables are categorical. Dummy variables are artificially defined variables designed to convert a model including categorical independent variables to the standard multiple regression model.

22 Example: Comparison of Slopes of k Regression Lines with Common Intercept

23 Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both Y (the response variable) and X (an independent variable) Y is assumed to be linearly related to X with the slope dependent on treatment (population), while the intercept is the same for each treatment

24 The Model:

25 This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments. Dummy variables are variables that are artificially defined

26 In this case we define a new variable for each category of the categorical variable.
That is we will define Xi for each category of treatments as follows:

27 Then the model can be written as follows:
The Complete Model: where

28 In this case Dependent Variable: Y Independent Variables: X1, X2, ... , Xk

29 In the above situation we would likely be interested in testing the equality of the slopes. Namely the Null Hypothesis (q = k – 1)

30 Independent Variable: X = X1+ X2+... + Xk
The Reduced Model: Dependent Variable: Y Independent Variable: X = X1+ X Xk

31 Example: Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance)

32 Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both Y (then response variable) and X (an independent variable) Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment. Y is called the response variable, while X is called the covariate.

33 The Model:

34 In this case we define a new variable for each category of the categorical variable.
That is we will define Xi for categories I i = 1, 2, …, (k – 1) of treatments as follows:

35 Then the model can be written as follows:
The Complete Model: where

36 In this case Dependent Variable: Y Independent Variables: X1, X2, ... , Xk-1, X

37 In the above situation we would likely be interested in testing the equality of the intercepts. Namely the Null Hypothesis (q = k – 1)

38 Independent Variable: X
The Reduced Model: Dependent Variable: Y Independent Variable: X

39 The F Test

40 The Analysis of Covariance
This analysis can also be performed by using a package that can perform Analysis of Covariance (ANACOVA) The package sets up the dummy variables automatically

41 Another application of the use of dummy variables
The dependent variable, Y, is linearly related to X, but the slope changes at one or several known values of X (nodes). Y X nodes

42 Y X x1 x2 xk b1 b2 bk The model or

43 Now define Etc.

44 Then the model can be written

45 Selecting the Best Equation
Multiple Regression Selecting the Best Equation

46 Techniques for Selecting the "Best" Regression Equation
The best Regression equation is not necessarily the equation that explains most of the variance in Y (the highest R2). This equation will be the one with all the variables included. The best equation should also be simple and interpretable. (i.e. contain a small no. of variables). Simple (interpretable) & Reliable - opposing criteria. The best equation is a compromise between these two.

47 We will discuss several strategies for selecting the best equation:
All Possible Regressions Uses R2, s2, Mallows Cp   Cp = RSSp/s2complete - [n-2(p+1)] "Best Subset" Regression Uses R2,Ra2, Mallows Cp Backward Elimination Stepwise Regression

48 I All Possible Regressions
Suppose we have the p independent variables X1, X2, ..., Xp. Then there are 2p subsets of variables

49 Variables in Equation Model
no variables Y = b0 + e X1 Y = b0 + b1 X1+ e X2 Y = b0 + b2 X2+ e X3 Y = b0 + b3 X3+ e X1, X2 Y = b0 + b1 X1+ b2 X2+ e X1, X3 Y = b0 + b1 X1+ b3 X3+ e X2, X3 Y = b0 + b2 X2+ b3 X3+ e and X1, X2, X3 Y = b0 + b1 X1+ b2 X2+ b2 X3+ e

50 Use of R2 1. Assume we carry out 2p runs for each of the subsets. Divide the Runs into the following sets Set 0: No variables Set 1: One independent variable. ... Set p: p independent variables. 2. Order the runs in each set according to R2. 3. Examine the leaders in each run looking for consistent patterns - take into account correlation between independent variables.

51 Example (k=4) X1, X2, X3, X4 Variables in for leading runs 100 R2% Set 1: X % Set 2: X1, X % X1, X % Set 3: X1, X2, X % Set 4: X1, X2, X3, X % Examination of the correlation coefficients reveals a high correlation between X1, X3 (r13= ) and between X2, X4 (r24= ). Best Equation Y = b0 + b1 X1+ b4 X4+ e

52 Use of R2 Number of variables required, p, coincides with where R2 begins to level out

53 Use of the Residual Mean Square (RMS) (s2)
When all of the variables having a non-zero effect have been included in the mode then the residual mean square is an estimate of s2. If "significant" variables have been left out then RMS will be biased upward.

54 No. of Variables p RMS s2(p) Average s2(p) , 82.39, , 2 5.79*,122.71,7.48**, 3 5.35, 5.33, 5.65, *- run X1, X2 **- run X1, X s2- approximately 6.

55 Use of s2 Number of variables required, p, coincides with where s2 levels out

56 Use of Mallows Cp If the equation with p variables is adequate then both s2complete and RSSp/(n-p-1) will be estimating s2. If "significant" variables have been left out then RMS will be biased upward.

57 Then Thus if we plot, for each run, Cp vs p and look for Cp close to p + 1 then we will be able to identify models giving a reasonable fit.

58 Run Cp p + 1 no variables 1,2,3, , 142.5, 315.2, 12,13,14 2.7, 198.1, 5.5 3 23,24, , 138.2, 22.4 123,124,134, , 3.0, 3.5, 7.5 4

59 Use of Cp Cp p Number of variables required, p, coincides with where Cp becomes close to p + 1

60 II "Best Subset" Regression
Similar to all possible regressions. If p, the number of variables, is large then the number of runs , 2p, performed could be extremely large. In this algorithm the user supplies the value K and the algorithm identifies the best K subsets of X1, X2, ..., Xp for predicting Y.

61 III Backward Elimination
In this procedure the complete regression equation is determined containing all the variables - X1, X2, ..., Xp. Then variables are checked one at a time and the least significant is dropped from the model at each stage. The procedure is terminated when all of the variables remaining in the equation provide a significant contribution to the prediction of the dependent variable Y.

62 The precise algorithm proceeds as follows:
Fit a regression equation containing all variables in the equation.

63 2. A partial F-test is computed for each of the independent variables still in the equation.
The Partial F statistic: where RSS1 = the residual sum of squares with all variables that are presently in the equation, RSS2 = the residual sum of squares with one of the variables removed, and MSE1 = the Mean Square for Error with all variables that are presently in the equation.

64 3. The lowest partial F value is compared with Fa for some pre-specified a .
If FLowest  Fa then remove that variable and return to step 2. If FLowest > Fa then accept the equation as it stands.

65 IV Stepwise Regression
In this procedure the regression equation is determined containing no variables in the model. Variables are then checked one at a time using the partial correlation coefficient as a measure of importance in predicting the dependent variable Y. At each stage the variable with the highest significant partial correlation coefficient is added to the model. Once this has been done the partial F statistic is computed for all variables now in the model is computed to check if any of the variables previously added can now be deleted.

66 This procedure is continued until no further variables can be added or deleted from the model.
The partial correlation coefficient for a given variable is the correlation between the given variable and the response when the present independent variables in the equation are held fixed. It is also the correlation between the given variable and the residuals computed from fitting an equation with the present independent variables in the equation.

67 Transformations

68 Transformations to Linearity
Many non-linear curves can be put into a linear form by appropriate transformations of the either the dependent variable Y or some (or all) of the independent variables X1, X2, ... , Xp . This leads to the wide utility of the Linear model. We have seen that through the use of dummy variables, categorical independent variables can be incorporated into a Linear Model. We will now see that through the technique of variable transformation that many examples of non-linear behaviour can also be converted to linear behaviour.

69 Polynomial Models y = b0 + b1x + b2x2 + b3x3 Linear form Y = b0 + b1 X1 + b2 X2 + b3 X3 Variables Y = y, X1 = x , X2 = x2, X3 = x3

70 Exponential Models with a polynomial exponent
Linear form lny = b0 + b1 X1 + b2 X2 + b3 X3+ b4 X4 Y = lny, X1 = x , X2 = x2, X3 = x3, X4 = x4

71 Trigonometric Polynomial Models
y = b0 + g1cos(2pf1x) + d1sin(2pf1x) + … + gkcos(2pfkx) + dksin(2pfkx) Linear form Y = b0 + g1 C1 + d1 S1 + … + gk Ck + dk Sk Variables Y = y, C1 = cos(2pf1x) , S2 = sin(2pf1x) , … Ck = cos(2pfkx) , Sk = sin(2pfkx)

72 Response Surface models
Dependent variable Y and two independent variables x1 and x2. (These ideas are easily extended to more the two independent variables) The Model (A cubic response surface model) or Y = b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4 + b5 X5 + b6 X6 + b7 X7 + b8 X8 + b9 X9+ e where

73 The Box-Cox Family of Transformations

74 The Transformation Staircase

75 The Bulging Rule x up y up y down x down

76 Nonlinearizable models
Non-Linear Models Nonlinearizable models

77 Non-Linear Growth models
many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring e) “rate of increase in Y” =

78 The Logistic Growth Model
Equation: or (ignoring e) “rate of increase in Y” =

79 The Gompertz Growth Model:
Equation: or (ignoring e) “rate of increase in Y” =

80 Non-Linear Regression

81 Least Squares in the Nonlinear Case

82 Suppose that we have collected data on the Y,
(y1, y2, ...yn) corresponding to n sets of values of the independent variables X1, X2, ... and Xp (x11, x21, ..., xp1) , (x12, x22, ..., xp2), ... and (x12, x22, ..., xp2).

83 For a set of possible values q1, q2,
For a set of possible values q1, q2, ... , qq of the parameters, a measure of how well these values fit the model described in equation * above is the residual sum of squares function where is the predicted value of the response variable yi from the values of the p independent variables x1i, x2i, ..., xpi using the model in equation * and the values of the parameters q1, q2, ... , qq.

84 The Least squares estimates of q1, q2, ... , qq, are values
which minimize S(q1, q2, ... , qq). It can be shown that the error terms are independent normally distributed with mean 0 and common variance s2 than the least squares estimates are also the maximum likelihood estimate of q1, q2, ... , qq).

85 To find the least squares estimate we need to determine when all the derivatives S(q1, q2, ... , qq) with respect to each parameter q1, q2, ... and qq are equal to zero. This quite often leads to a set of equations in q1, q2, ... and qq that are difficult to solve even with one parameter and a comparatively simple nonlinear model. When more parameters are involved and the model is more complicated, the solution of the normal equations can be extremely difficult to obtain, and iterative methods must be employed.

86 Techniques for Estimating the Parameters of a Nonlinear System
In some nonlinear problems it is convenient to determine equations (the Normal Equations) for the least squares estimates , the values that minimize the sum of squares function, S(q1, q2, ... , qq). These equations are nonlinear and it is usually necessary to develop an iterative technique for solving them.

87 We shall mention three of these:
In addition to this approach there are several currently employed methods available for obtaining the parameter estimates by a routine computer calculation. We shall mention three of these: 1) Steepest descent, 2) Linearization, and 3) Marquardt's procedure.

88 In each case a iterative procedure is used to find the least squares estimators .
That is an initial estimates, ,for these values are determined. The procedure than finds successfully better estimates, that hopefully converge to the least squares estimates,

89 Steepest Descent Steepest descent path Initial guess

90 Linearization The linearization (or Taylor series) method uses the results of linear least squares in a succession of stages. Suppose the postulated model is of the form: Y = f(X1, X2, ..., Xp| q1, q2, ... , qq) + e Let be initial values for the parameters q1, q2, ... , qq. These initial values may be intelligent guesses or preliminary estimates based on whatever information are available.

91 These initial values will, hopefully, be improved upon in the successive iterations to be described below. The linearization method approximates f(X1, X2, ..., Xp| q1, q2, ... , qq) with a linear function of q1, q2, ... , qq using a Taylor series expansion of f(X1, X2, ..., Xp| q1, q2, ... , qq) about the point and curtailing the expansion at the first derivatives. The method then uses the results of linear least squares to find values, that provide the least squares fit of of this linear function to the data .

92 The procedure is then repeated again until the successive approximations converge to hopefully at the least squares estimates:

93 Linearization Contours of RSS for linear approximatin 2nd guess
Initial guess

94 3rd guess Contours of RSS for linear approximatin 2nd guess Initial guess

95 4th guess 3rd guess Contours of RSS for linear approximatin 2nd guess Initial guess

96 The Examination of Residuals

97 The residuals are defined as the n differences :
where is an observation and is the corresponding fitted value obtained by use of the fitted model.

98 Many of the statistical procedures used in linear and nonlinear regression analysis are based certain assumptions about the random departures from the proposed model. Namely; the random departures are assumed i) to have zero mean, ii) to have a constant variance, s2, iii) independent, and iv) follow a normal distribution.

99 Thus if the fitted model is correct,
the residuals should exhibit tendencies that tend to confirm the above assumptions, or at least, should not exhibit a denial of the assumptions.

100 The principal ways of plotting the residuals ei are:
1. Overall. 2. In time sequence, if the order is known. 3. Against the fitted values 4. Against the independent variables xij for each value of j In addition to these basic plots, the residuals should also be plotted 5. In any way that is sensible for the particular problem under consideration,

101 The residuals can be plotted in an overall plot in several ways.

102 1. The scatter plot. 2. The histogram. 3. The box-whisker plot.
4. The kernel density plot 5. a normal plot or a half normal plot on standard probability paper.

103 The standard statistical test for testing Normality are:
1. The Kolmogorov-Smirnov test. 2. The Chi-square goodness of fit test

104 Namely the random departures for observations that were taken at neighbouring points in time are autocorrelated. This autocorrelation can sometimes be seen in a time sequence plot. The following three graphs show a sequence of residuals that are respectively i) positively autocorrelated , ii) independent and iii) negatively autocorrelated.

105 i) Positively auto-correlated residuals

106 ii) Independent residuals

107 iii) Negatively auto-correlated residuals

108 There are several statistics and statistical tests that can also pick out autocorrelation amongst the residuals. The most common are: i) The Durbin Watson statistic ii) The autocorrelation function iii) The runs test

109 The Durbin Watson statistic :
The Durbin-Watson statistic which is used frequently to detect serial correlation is defined by the following formula: If the residuals are serially correlated the differences, ei - ei+1, will be stochastically small. Hence a small value of the Durbin-Watson statistic will indicate positive autocorrelation. Large values of the Durbin-Watson statistic on the other hand will indicate negative autocorrelation. Critical values for this statistic, can be found in many statistical textbooks.

110 The autocorrelation function:
The autocorrelation function at lag k is defined by : This statistic measures the correlation between residuals the occur a distance k apart in time. One would expect that residuals that are close in time are more correlated than residuals that are separated by a greater distance in time. If the residuals are independent than rk should be close to zero for all values of k A plot of rk versus k can be very revealing with respect to the independence of the residuals. Some typical patterns of the autocorrelation function are given below:

111 This statistic measures the correlation between residuals the occur a distance k apart in time.
One would expect that residuals that are close in time are more correlated than residuals that are separated by a greater distance in time. If the residuals are independent than rk should be close to zero for all values of k A plot of rk versus k can be very revealing with respect to the independence of the residuals.

112 Some typical patterns of the autocorrelation function are given below:
Auto correlation pattern for independent residuals

113 Various Autocorrelation patterns for serially correlated residuals

114

115 Plot Against fitted values and the Predictor Variables Xij
If we "step back" from this diagram and the residuals behave in a manner consistent with the assumptions of the model we obtain the impression of a horizontal "band " of residuals which can be represented by the diagram below.

116 Individual observations lying considerably outside of this band indicate that the observation may be and outlier. An outlier is an observation that is not following the normal pattern of the other observations. Such an observation can have a considerable effect on the estimation of the parameters of a model. Sometimes the outlier has occurred because of a typographical error. If this is the case and it is detected than a correction can be made. If the outlier occurs for other (and more natural) reasons it may be appropriate to construct a model that incorporates the occurrence of outliers.

117 If our "step back" view of the residuals resembled any of those shown below we should conclude that assumptions about the model are incorrect. Each pattern may indicate that a different assumption may have to be made to explain the “abnormal” residual pattern. b) a)

118 Pattern a) indicates that the variance the random departures is not constant (homogeneous) but increases as the value along the horizontal axis increases (time, or one of the independent variables). This indicates that a weighted least squares analysis should be used. The second pattern, b) indicates that the mean value of the residuals is not zero. This is usually because the model (linear or non linear) has not been correctly specified. Linear and quadratic terms have been omitted that should have been included in the model.

119 Example – Analysis of Residuals
Motor Vehicle Data Dependent = mpg Independent = Engine size, horsepower and weight

120 When a linear model was fit and residuals examined graphically the following plot resulted:

121 The pattern that we are looking for is:

122 The pattern that was found is:
This indicates a nonlinear relationship: This can be handle by adding polynomial terms (quadratic, cubic, quartic etc.) of the independent variables or transforming the dependent variable

123 Performing the log transformation on the dependent variable (mpg) results in the following residual plot There still remains some non linearity

124 The log transformation

125 The Box-Cox transformations
l = 2 l = 1 l = 0 l = -1 l = -1

126 The log (l = 0) transformation was not totally successful - try moving further down the staircase of the family of transformations (l = -0.5)

127 try moving a bit further down the staircase of the family of transformations (l = -1.0)

128 The results after deleting the outlier are given below:

129 This corresponds to the model
and

130 Checking normality with a P-P plot

131 Factorial Experiments
Analysis of Variance Experimental Design

132 k Categorical independent variables A, B, C, … (the Factors) Let
Dependent variable Y k Categorical independent variables A, B, C, … (the Factors) Let a = the number of categories of A b = the number of categories of B c = the number of categories of C etc.

133 The Completely Randomized Design
We form the set of all treatment combinations – the set of all combinations of the k factors Total number of treatment combinations t = abc…. In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. Total number of experimental units N = nt=nabc..

134 The treatment combinations can thought to be arranged in a k-dimensional rectangular block
1 2 b 1 2 A a

135 C B A

136 Another way of representing the treatment combinations in a factorial experiment
... A ... D

137 Profile of a Factor Plot of observations means vs. levels of the factor. The levels of the other factors may be held constant or we may average over the other levels

138 Definition: A factor is said to not affect the response if the profile of the factor is horizontal for all combinations of levels of the other factors: No change in the response when you change the levels of the factor (true for all combinations of levels of the other factors) Otherwise the factor is said to affect the response:

139 Definition: Two (or more) factors are said to interact if changes in the response when you change the level of one factor depend on the level(s) of the other factor(s). Profiles of the factor for different levels of the other factor(s) are not parallel Otherwise the factors are said to be additive . Profiles of the factor for different levels of the other factor(s) are parallel.

140 If two (or more) factors interact each factor effects the response.
If two (or more) factors are additive it still remains to be determined if the factors affect the response In factorial experiments we are interested in determining which factors effect the response and which groups of factors interact .

141 Factor A has no effect B A

142 Additive Factors B A

143 Interacting Factors B A

144 The testing in factorial experiments
Test first the higher order interactions. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response.

145 Anova Table entries Sum of squares interaction (or main) effects being tested = (product of sample size and levels of factors not included in the interaction) × (Sum of squares of effects being tested) Degrees of freedom = df = product of (number of levels - 1) of factors included in the interaction.

146 Analysis of Variance (ANOVA) Table Entries (Two factors – A and B)

147 The ANOVA Table 2 Factor Experiment

148 Analysis of Variance (ANOVA) Table Entries (Three factors – A, B and C)

149 The ANOVA Table

150 The Completely Randomized Design is called balanced
If the number of observations per treatment combination is unequal the design is called unbalanced. (resulting mathematically more complex analysis and computations) If for some of the treatment combinations there are no observations the design is called incomplete. (some of the parameters - main effects and interactions - cannot be estimated.)

151 Factorial Experiments
Analysis of Variance Experimental Design

152 k Categorical independent variables A, B, C, … (the Factors) Let
Dependent variable Y k Categorical independent variables A, B, C, … (the Factors) Let a = the number of categories of A b = the number of categories of B c = the number of categories of C etc.

153 Objectives Determine which factors have some effect on the response Which groups of factors interact

154 The Completely Randomized Design
We form the set of all treatment combinations – the set of all combinations of the k factors Total number of treatment combinations t = abc…. In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. Total number of experimental units N = nt=nabc..

155 Factor A has no effect B A

156 Additive Factors B A

157 Interacting Factors B A

158 The testing in factorial experiments
Test first the higher order interactions. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response.

159 The statistical model for the 3 factor Experiment

160 The statistical model for the 3 factor Experiment

161 Anova table for the 3 factor Experiment
Source SS df MS F p -value A SSA a - 1 MSA MSA/MSError B SSB b - 1 MSB MSB/MSError C SSC c - 1 MSC MSC/MSError AB SSAB (a - 1)(b - 1) MSAB MSAB/MSError AC SSAC (a - 1)(c - 1) MSAC MSAC/MSError BC SSBC (b - 1)(c - 1) MSBC MSBC/MSError ABC SSABC (a - 1)(b - 1)(c - 1) MSABC MSABC/MSError Error SSError abc(n - 1) MSError

162 The testing in factorial experiments
Test first the higher order interactions. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact The testing continues with lower order interactions and main effects for factors which have not yet been determined to affect the response.

163 Random Effects and Fixed Effects Factors

164 So far the factors that we have considered are fixed effects factors
This is the case if the levels of the factor are a fixed set of levels and the conclusions of any analysis is in relationship to these levels. If the levels have been selected at random from a population of levels the factor is called a random effects factor The conclusions of the analysis will be directed at the population of levels and not only the levels selected for the experiment

165 The Anova table for the two factor model (A, B – fixed)
Source SS df MS EMS F A SSA a -1 MSA MSA/MSError B b - 1 MSB MSB/MSError AB SSAB (a -1)(b -1) MSAB MSAB/MSError Error SSError ab(n – 1) MSError EMS = Expected Mean Square

166 The Anova table for the two factor model (A – fixed, B - random)
Source SS df MS EMS F A SSA a -1 MSA MSA/MSAB B b - 1 MSB MSB/MSError AB SSAB (a -1)(b -1) MSAB MSAB/MSError Error SSError ab(n – 1) MSError Note: The divisor for testing the main effects of A is no longer MSError but MSAB.

167 Example: 3 factors A fixed, B, C random
Source EMS F A B C AB AC BC ABC Error

168 Example: 3 factors A , B fixed, C random
Source EMS F A B C AB AC BC ABC Error

169 Example: 3 factors A , B and C fixed
Source EMS F A B C AB AC BC ABC Error

170 Crossed and Nested Factors

171 The factors A, B are called crossed if every level of A appears with every level of B in the treatment combinations. Levels of B Levels of A

172 Factor B is said to be nested within factor A if the levels of B differ for each level of A.
Levels of A Levels of B

173 Example: A company has a = 4 plants for producing paper
Example: A company has a = 4 plants for producing paper. Each plant has 6 machines for producing the paper. The company is interested in how paper strength (Y) differs from plant to plant and from machine to machine within plant Plants Machines

174 Machines (B) are nested within plants (A)
The model for a two factor experiment with B nested within A.

175 The ANOVA table Source SS df MS F p - value A SSA a - 1 MSA
MSA/MSError B(A) SSB(A) a(b – 1) MSB(A) MSB(A) /MSError Error SSError ab(n – 1) MSError Note: SSB(A ) = SSB + SSAB and a(b – 1) = (b – 1) + (a - 1)(b – 1)

176 Factorial Experiments
Analysis of Variance Factorial Experiments

177 k Categorical independent variables A, B, C, … (the Factors) Let
Dependent variable Y k Categorical independent variables A, B, C, … (the Factors) Let a = the number of categories of A b = the number of categories of B c = the number of categories of C etc.

178 The Completely Randomized Design
We form the set of all treatment combinations – the set of all combinations of the k factors Total number of treatment combinations t = abc…. In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. Total number of experimental units N = nt=nabc..

179 Example: 3 factors A, B, C – all are random effects
Source EMS F A B C AB AC BC ABC Error

180 Example: 3 factors A fixed, B, C random
Source EMS F A B C AB AC BC ABC Error

181 Example: 3 factors A , B fixed, C random
Source EMS F A B C AB AC BC ABC Error

182 Example: 3 factors A , B and C fixed
Source EMS F A B C AB AC BC ABC Error

183 The Analysis of Covariance
ANACOVA

184 Multiple Regression Dependent variable Y (continuous)
Continuous independent variables X1, X2, …, Xp The continuous independent variables X1, X2, …, Xp are quite often measured and observed (not set at specific values or levels)

185 Analysis of Variance Dependent variable Y (continuous)
Categorical independent variables (Factors) A, B, C,… The categorical independent variables A, B, C,… are set at specific values or levels.

186 Analysis of Covariance
Dependent variable Y (continuous) Categorical independent variables (Factors) A, B, C,… Continuous independent variables (covariates) X1, X2, …, Xp

187 The Multiple Regression Model

188 The ANOVA Model

189 The ANACOVA Model

190 ANOVA Tables

191 The Multiple Regression Model
Source S.S. d.f. Regression SSReg p Error SSError n – p - 1 Total SSTotal n - 1

192 The ANOVA Model Source S.S. d.f. A SSA a - 1 B SSB b - 1 AB SSAB
Main Effects A SSA a - 1 B SSB b - 1 Interactions AB SSAB (a – 1)(b – 1) Error SSError n – p - 1 Total SSTotal n - 1

193 The ANACOVA Model Source S.S. d.f. Covariates SSCovaraites p A SSA
Main Effects A SSA a - 1 B SSB b - 1 Interactions AB SSAB (a – 1)(b – 1) Error SSError n – p - 1 Total SSTotal n - 1

194 Analysis of unbalanced Factorial Designs
Type I, Type II, Type III Sum of Squares

195 Sum of squares for testing an effect
modelComplete ≡ model with the effect in. modelReduced ≡ model with the effect out.

196 Type I SS Type I estimates of the sum of squares associated with an effect in a model are calculated when sums of squares for a model are calculated sequentially Example Consider the three factor factorial experiment with factors A, B and C. The Complete model Y = m + A + B + C + AB + AC + BC + ABC

197 A sequence of increasingly simpler models
Y = m + A + B + C + AB + AC + BC + ABC Y = m + A+ B + C + AB + AC + BC Y = m + A + B+ C + AB + AC Y = m + A + B + C+ AB Y = m + A + B + C Y = m + A + B Y = m + A Y = m

198 Type I S.S.

199 Type II SS Type two sum of squares are calculated for an effect assuming that the Complete model contains every effect of equal or lesser order. The reduced model has the effect removed ,

200 The Complete models Y = m + A + B + C + AB + AC + BC + ABC (the three factor model) Y = m + A+ B + C + AB + AC + BC (the all two factor model) Y = m + A + B + C (the all main effects model) The Reduced models For a k-factor effect the reduced model is the all k-factor model with the effect removed

201

202 Type III SS The type III sum of squares is calculated by comparing the full model, to the full model without the effect.

203 Comments When using The type I sum of squares the effects are tested in a specified sequence resulting in a increasingly simpler model. The test is valid only the null Hypothesis (H0) has been accepted in the previous tests. When using The type II sum of squares the test for a k-factor effect is valid only the all k-factor model can be assumed. When using The type III sum of squares the tests require neither of these assumptions.

204 An additional Comment When the completely randomized design is balanced (equal number of observations per treatment combination) then type I sum of squares, type II sum of squares and type III sum of squares are equal.

205 Experimental Designs The objective of Experimental design is to reduce the magnitude of random error resulting in more powerful tests to detect experimental effects

206 Other experimental designs
Randomized Block design Repeated Measures designs

207 The Randomized Block Design

208 Suppose a researcher is interested in how several treatments affect a continuous response variable (Y). The treatments may be the levels of a single factor or they may be the combinations of levels of several factors. Suppose we have available to us a total of N = nt experimental units to which we are going to apply the different treatments.

209 The Completely Randomized (CR) design randomly divides the experimental units into t groups of size n and randomly assigns a treatment to each group.

210 The Randomized Block Design
divides the group of experimental units into n homogeneous groups of size t. These homogeneous groups are called blocks. The treatments are then randomly assigned to the experimental units in each block - one treatment to a unit in each block.

211 The Completely Randomizes Design
Treats 1 2 3 … t Experimental units randomly assigned to treatments

212 Randomized Block Design
Blocks All treats appear once in each block

213 The Model for a randomized Block Experiment
i = 1,2,…, t j = 1,2,…, b yij = the observation in the jth block receiving the ith treatment m = overall mean ti = the effect of the ith treatment bj = the effect of the jth Block eij = random error

214 The Anova Table for a randomized Block Experiment
Source S.S. d.f. M.S. F p-value Treat SST t-1 MST MST /MSE Block SSB n-1 MSB MSB /MSE Error SSE (t-1)(b-1) MSE

215 A randomized block experiment is assumed to be a two-factor experiment.
The factors are blocks and treatments. The is one observation per cell. It is assumed that there is no interaction between blocks and treatments. The degrees of freedom for the interaction is used to estimate error.

216 The ANOVA table for the Completely Randomized Design
Source df Sum of Squares Treatments t - 1 SSTr Error t(n – 1) SSError Total tn - 1 SSTotal The ANOVA table for the Randomized Block Design Source df Sum of Squares Blocks n - 1 SSBlocks Treatments t - 1 SSTr Error (t – 1) (n – 1) SSError Total tn - 1 SSTotal

217 Comments The error term, , for the Completely Randomized Design models variability in the reponse, y, between experimental units The error term, , for the Completely Block Design models variability in the reponse, y, between experimental units in the same block (hopefully the is considerably smaller than The ability to detect treatment differences depends on the magnitude of the random error term

218 Repeated Measures Designs

219 In a Repeated Measures Design
We have experimental units that may be grouped according to one or several factors (the grouping factors) Then on each experimental unit we have not a single measurement but a group of measurements (the repeated measures) The repeated measures may be taken at combinations of levels of one or several factors (The repeated measures factors)

220 Anova Table for a Repeated Measures Design

221 Latin Square Designs

222 Latin Square Designs Selected Latin Squares 3 x 3 4 x 4 A B C A B C D A B C D A B C D A B C D B C A B A D C B C D A B D A C B A D C C A B C D B A C D A B C A D B C D A B D C A B D A B C D C B A D C B A 5 x 5 6 x 6 A B C D E A B C D E F B A E C D B F D C A E C D A E B C D E F B A D E B A C D A F E C B E C D B A E C A B F D F E B A D C

223 Definition A Latin square is a square array of objects (letters A, B, C, …) such that each object appears once and only once in each row and each column. Example - 4 x 4 Latin Square. A B C D B C D A C D A B D A B C

224 In a Latin square You have three factors:
Treatments (t) (letters A, B, C, …) Rows (t) Columns (t) The number of treatments = the number of rows = the number of colums = t. The row-column treatments are represented by cells in a t x t array. The treatments are assigned to row-column combinations using a Latin-square arrangement

225 The Model for a Latin Experiment
i = 1,2,…, t j = 1,2,…, t k = 1,2,…, t yij(k) = the observation in ith row and the jth column receiving the kth treatment m = overall mean tk = the effect of the ith treatment No interaction between rows, columns and treatments ri = the effect of the ith row gj = the effect of the jth column eij(k) = random error

226 A Latin Square experiment is assumed to be a three-factor experiment.
The factors are rows, columns and treatments. It is assumed that there is no interaction between rows, columns and treatments. The degrees of freedom for the interactions is used to estimate error.

227 The Anova Table for a Latin Square Experiment
Source S.S. d.f. M.S. F p-value Treat SSTr t-1 MSTr MSTr /MSE Rows SSRow MSRow MSRow /MSE Cols SSCol MSCol MSCol /MSE Error SSE (t-1)(t-2) MSE Total SST t2 - 1

228 Experimental Design Of interest: to compare t treatments (the treatment combinations of one or several factors)

229 The Completely Randomized Design
Treats 1 2 3 … t Experimental units randomly assigned to treatments

230 The Model for a CR Experiment
i = 1,2,…, t j = 1,2,…, n yij = the observation in jth observation receiving the ith treatment m = overall mean ti = the effect of the ith treatment eij = random error

231 The Anova Table for a CR Experiment
Source S.S. d.f. M.S. F p-value Treat SSTr t-1 MST MST /MSE Error SSE t(n-1) MSE

232 Randomized Block Design
Blocks 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 t t t t t t t t t All treats appear once in each block

233 The Model for a RB Experiment
i = 1,2,…, t j = 1,2,…, b yij = the observation in jth block receiving the ith treatment m = overall mean ti = the effect of the ith treatment No interaction between blocks and treatments bj = the effect of the jth block eij = random error

234 A Randomized Block experiment is assumed to be a two-factor experiment.
The factors are blocks and treatments. It is assumed that there is no interaction between blocks and treatments. The degrees of freedom for the interaction is used to estimate error.

235 The Anova Table for a randomized Block Experiment
Source S.S. d.f. M.S. F p-value Treat SST t-1 MST MST /MSE Block SSB n-1 MSB MSB /MSE Error SSE (t-1)(b-1) MSE

236 The Latin square Design
Columns Rows t 1 2 3 2 3 1 3 1 2 t All treats appear once in each row and each column

237 The Model for a Latin Experiment
i = 1,2,…, t j = 1,2,…, t k = 1,2,…, t yij(k) = the observation in ith row and the jth column receiving the kth treatment m = overall mean tk = the effect of the ith treatment No interaction between rows, columns and treatments ri = the effect of the ith row gj = the effect of the jth column eij(k) = random error

238 A Latin Square experiment is assumed to be a three-factor experiment.
The factors are rows, columns and treatments. It is assumed that there is no interaction between rows, columns and treatments. The degrees of freedom for the interactions is used to estimate error.

239 The Anova Table for a Latin Square Experiment
Source S.S. d.f. M.S. F p-value Treat SSTr t-1 MSTr MSTr /MSE Rows SSRow MSRow MSRow /MSE Cols SSCol MSCol MSCol /MSE Error SSE (t-1)(t-2) MSE Total SST t2 - 1

240 Graeco-Latin Square Designs
Mutually orthogonal Squares

241 Definition A Greaco-Latin square consists of two latin squares (one using the letters A, B, C, … the other using greek letters a, b, c, …) such that when the two latin square are supper imposed on each other the letters of one square appear once and only once with the letters of the other square. The two Latin squares are called mutually orthogonal. Example: a 7 x 7 Greaco-Latin Square Aa Be Cb Df Ec Fg Gd Bb Cf Dc Eg Fd Ga Ae Cc Dg Ed Fa Ge Ab Bf Dd Ea Fe Gb Af Bc Cg Ee Fb Gf Ac Bg Cd Da Ff Gc Ag Bd Ca De Eb Gg Ad Ba Ce Db Ef Fc

242 Note: There exists at most (t –1) t x t Latin squares L1, L2, …, Lt-1 such that any pair are mutually orthogonal. e.g. It is possible that there exists a set of six 7 x 7 mutually orthogonal Latin squares L1, L2, L3, L4, L5, L6 .

243 The Model for a Greaco-Latin Experiment
j = 1,2,…, t i = 1,2,…, t k = 1,2,…, t l = 1,2,…, t yij(kl) = the observation in ith row and the jth column receiving the kth Latin treatment and the lth Greek treatment

244 tk = the effect of the kth Latin treatment
m = overall mean tk = the effect of the kth Latin treatment ll = the effect of the lth Greek treatment ri = the effect of the ith row gj = the effect of the jth column eij(k) = random error No interaction between rows, columns, Latin treatments and Greek treatments

245 A Greaco-Latin Square experiment is assumed to be a four-factor experiment.
The factors are rows, columns, Latin treatments and Greek treatments. It is assumed that there is no interaction between rows, columns, Latin treatments and Greek treatments. The degrees of freedom for the interactions is used to estimate error.

246 Greaco-Latin Square Experiment
The Anova Table for a Greaco-Latin Square Experiment Source S.S. d.f. M.S. F p-value Latin SSLa t-1 MSLa MSLa /MSE Greek SSGr MSGr MSGr /MSE Rows SSRow MSRow MSRow /MSE Cols SSCol MSCol MSCol /MSE Error SSE (t-1)(t-3) MSE Total SST t2 - 1

247 Incomplete Block Designs

248 Randomized Block Design
We want to compare t treatments Group the N = bt experimental units into b homogeneous blocks of size t. In each block we randomly assign the t treatments to the t experimental units in each block. The ability to detect treatment to treatment differences is dependent on the within block variability.

249 Comments The within block variability generally increases with block size. The larger the block size the larger the within block variability. For a larger number of treatments, t, it may not be appropriate or feasible to require the block size, k, to be equal to the number of treatments. If the block size, k, is less than the number of treatments (k < t)then all treatments can not appear in each block. The design is called an Incomplete Block Design.

250 Comments regarding Incomplete block designs
When two treatments appear together in the same block it is possible to estimate the difference in treatments effects. The treatment difference is estimable. If two treatments do not appear together in the same block it not be possible to estimate the difference in treatments effects. The treatment difference may not be estimable.

251 Example Consider the block design with 6 treatments and 6 blocks of size two. 1 2 2 3 1 3 4 5 5 6 4 6 The treatments differences (1 vs 2, 1 vs 3, 2 vs 3, 4 vs 5, 4 vs 6, 5 vs 6) are estimable. If one of the treatments is in the group {1,2,3} and the other treatment is in the group {4,5,6}, the treatment difference is not estimable.

252 Definitions Two treatments i and i* are said to be connected if there is a sequence of treatments i0 = i, i1, i2, … iM = i* such that each successive pair of treatments (ij and ij+1) appear in the same block In this case the treatment difference is estimable. An incomplete design is said to be connected if all treatment pairs i and i* are connected. In this case all treatment differences are estimable.

253 Example Consider the block design with 5 treatments and 5 blocks of size two. 1 2 2 3 1 3 4 5 1 4 This incomplete block design is connected. All treatment differences are estimable. Some treatment differences are estimated with a higher precision than others.

254 Incomplete Block Designs
Balanced incomplete block designs Partially balanced incomplete block designs

255 Definition An incomplete design is said to be a Balanced Incomplete Block Design. if all treatments appear in exactly r blocks. This ensures that each treatment is estimated with the same precision The value of l is the same for each treatment pair. if all treatment pairs i and i* appear together in exactly l blocks. This ensures that each treatment difference is estimated with the same precision.

256 Some Identities bk = rt r(k-1) = l (t – 1)
Let b = the number of blocks. t = the number of treatments k = the block size r = the number of times a treatment appears in the experiment. l = the number of times a pair of treatment appears together in the same block bk = rt Both sides of this equation are found by counting the total number of experimental units in the experiment. r(k-1) = l (t – 1) Both sides of this equation are found by counting the total number of experimental units that appear with a specific treatment in the experiment.

257 BIB Design A Balanced Incomplete Block Design
(b = 15, k = 4, t = 6, r = 10, l = 6)

258 Anova Table for Incomplete Block Designs

259 Designs for Estimating
Carry-over (or Residual) Effects of Treatments Ref: “Design and Analysis of Experiments” Roger G. Petersen, Dekker 1985

260 The Cross-over or Simple Reversal Design
An Example A clinical psychologist wanted to test two drugs, A and B, which are intended to increase reaction time to a certain stimulus. He has decided to use n = 8 subjects selected at random and randomly divided into two groups of four. The first group will receive drug A first then B, while the second group will receive drug B first then A.

261 To conduct the trial he administered a drug to the individual, waited 15 minutes for absorption, applied the stimulus and then measured reaction time. The data and the design is tabulated below:

262 The Switch-back or Double Reversal Design
An Example A following study was interested in the effect of concentrate type on the daily production of fat-corrected milk (FCM) . Two concentrates were used: A - high fat; and B - low fat. Five test animals were then selected for each of the two sequence groups ( A-B-A and B-A-B) in a switch-back design.

263 The data and the design is tabulated below:
One animal in the first group developed mastitis and was removed from the study.

264 The Incomplete Block Switch-back Design
An Example An insurance company was interested in buying a quantity of word processing machines for use by secretaries in the stenographic pool. The selection was narrowed down to three models (A, B, and C). A study was to be carried out , where the time to process a test document would be determined for a group of secretaries on each of the word processing models. For various reasons the company decided to use an incomplete block switch back design using n = 6 secretaries from the secretarial pool.

265 The data and the design is tabulated below:
BIB incomplete block design with t = 3 treatments – A, B and block size k = 2. A B A C B C

266 Designs for Estimating
Carry-over (or Residual) Effects of Treatments Ref: “Design and Analysis of Experiments” Roger G. Petersen, Dekker 1985

267 The Latin Square Change-Over (or Round Robin) Design
Selected Latin Squares Change-Over Designs (Balanced for Residual Effects) Period = Rows Columns = Subjects

268 Four Treatments

269 An Example An experimental psychologist wanted to determine the effect of three new drugs (A, B and C) on the time for laboratory rats to work their way through a maze. A sample of n= 12 test animals were used in the experiment. It was decided to use a Latin square Change-Over experimental design.

270 The data and the design is tabulated below:

271 Orthogonal Linear Contrasts
This is a technique for partitioning ANOVA sum of squares into individual degrees of freedom

272 Definition Let x1, x2, ... , xp denote p numerical quantities computed from the data. These could be statistics or the raw observations. A linear combination of x1, x2, ... , xp is defined to be a quantity ,L ,computed in the following manner: L = c1x1+ c2x cpxp where the coefficients c1, c2, ... , cp are predetermined numerical values:

273 Definition Let m1, m2, ... , mp denote p means and c1, c2, ... , cp denote p coefficients such that: c1+ c cp = 0, Then the linear combination L = c1m1+ c2m cpmp is called a linear contrast of the p means m1, m2, ... , mp .

274 Examples 1. A linear combination A linear contrast 2. A linear contrast L = m1 - 4 m2+ 6m3 - 4 m4 + m5 = (1) m1+ (-4) m2+ (6) m3 + (-4) m4 + (1) m5

275 Definition Let A = a1m1+ a2m2+ ... + apmp and
B = b1m1+ b2m bpmp be two linear contrasts of the p means m1, m2, ... , mp. Then A and B are called Orthogonal Linear Contrasts if in addition to: a1+ a ap = 0 and b1+ b bp = 0, it is also true that: a1b1+ a2b apbp = 0.

276 Example Let Note:

277 Definition B= b1m1+ b2m2+ ... + bpmp , ..., and
Let A = a1m1+ a2m apmp, B= b1m1+ b2m bpmp , ..., and L= l1m1+ l2m lpmp be a set linear contrasts of the p means m1, m2, ... , mp. Then the set is called a set of Mutually Orthogonal Linear Contrasts if each linear contrast in the set is orthogonal to any other linear contrast.

278 Theorem: The maximum number of linear contrasts in a set of Mutually Orthogonal Linear Contrasts of the quantities m1, m2, ... , mp is p - 1. p - 1 is called the degrees of freedom (d.f.) for comparing quantities m1, m2, ... , mp .

279 Comments Linear contrasts are making comparisons amongst the p values m1, m2, ... , mp Orthogonal Linear Contrasts are making independent comparisons amongst the p values m1, m2, ... , mp . The number of independent comparisons amongst the p values m1, m2, ... , mp is p – 1.

280 Definition Let denote a linear contrast of the p means
where each mean, , is calculated from n observations.

281 Then the Sum of Squares for testing
the Linear Contrast L, i.e. H0: L = 0 against HA: L ≠ 0 is defined to be:

282 the degrees of freedom (df) for testing the Linear Contrast L, is defined to be
the F-ratio for testing the Linear Contrast L, is defined to be:

283 To test if a set of mutually orthogonal linear contrasts are zero:
i.e. H0: L1 = 0, L2 = 0, ... , Lk= 0 then the Sum of Squares is: the degrees of freedom (df) is the F-ratio is:

284 Theorem: Let L1, L2, ... , Lp-1 denote p-1 mutually orthogonal Linear contrasts for comparing the p means . Then the Sum of Squares for comparing the p means based on p – 1 degrees of freedom , SSBetween, satisfies:

285 Comment Defining a set of Orthogonal Linear Contrasts for comparing the p means allows the researcher to "break apart" the Sum of Squares for comparing the p means, SSBetween, and make individual tests of each the Linear Contrast.

286 Techniques for constructing orthogonal linear contrasts

287 Comparing first k – 1 with kth
Consider the p values – y1, y2, y3, ... , yp L1 = 1st vs 2nd = y1 - y2 L2 = 1st , 2nd vs 3rd = ½ (y1 + y2) – y3 L3 = 1st , 2nd , 3rd vs 4th = 1/3 (y1 + y2 + y3) – y4 etc

288 Helmert contrasts Contrast coefficients L1 -1 1 L2 2 L3 3 L4 4
L2 2 L3 3 L4 4 Contrast explanation L1 2nd versus 1st L2 3rd versus 1st and 2nd L3 4th versus 1st, 2nd and 3rd L4 5th versus 1st, 2nd, 3rd and 4th

289 Comparing between Groups then within groups
Consider the p = 10 values – y1, y2, y3, ... , y10 Suppose these 10 values are grouped Group 1 y1, y2, y3 Group 2 y4, y5, y6 , y7 Group 3 y8, y9, y10 Comparison of Groups (2 d.f.) L1 = Group 1 vs Group 2 = 1/3(y1 + y2 + y3) - 1/4(y4 + y5 + y6 + y7) L2 = Group 1, Group 2 vs Group 2 = 1/7(y1 + y2 + y3 + y4 + y5 + y6 + y7) - 1/3( y8 + y9 + y10)

290 Comparison of within Groups Within Group 1 (2 d. f
Comparison of within Groups Within Group 1 (2 d.f.) L3 = 1 vs 2 = y1 - y2 L4 = 1,2 vs 3= 1/2(y1 + y2) - y3 Within Group 2 (3 d.f.) L5 = 4 vs 5 = y4 – y5 L6 = 4, 5 vs 6= 1/2(y4 + y5) – y6 L7 = 4, 5, 6 vs 7= 1/3(y4 + y5 + y6) –y7 Within Group 3 (2 d.f.) L8 = 8 vs 9 = y8 – y9 L9 = 8, 9 vs 10= 1/2(y8 + y9) –y10

291 Comparisons when Grouping is done on two different ways
Consider the p = ab values – y11, y12, y13, ... , y1b , y21, y22, y23, ... , y2b , ... , ya1, ya2, ya3, ... , yab Column Groups 1 2 3 ... b Row Groups y11 y12 y13 y1b y21 y22 y23 y2b y31 y32 y33 y3b a ya1 ya2 ya3 yab

292 Comparison of Row Groups (a - 1 d.f.) R1 , R2 , R3 , ... , Ra -1
Comparison of Column Groups (b - 1 d.f.) C1 , C2 , C3 , ... , Cb -1 Interaction contrasts (a - 1) (b - 1) d.f. (RC)11 = R1 × C1 , (RC)12 = R1 × C2 , ... , (RC)a - 1,b - 1 = Ra - 1 × Cb – 1 Comment: The coefficients of (RC)ij = Ri × Cj are found by multiplying the coefficients of Ri with the coefficients of Cj.

293 Orthogonal Linear Contrasts
Polynomial Regression

294 Let m1, m2, ... , mp denote p means and consider the first differences
Dmi = mi - mi-1 if m1 = m2 = ... = mp then Dmi = mi - mi-1 = 0 If the points (1, m1), (2, m2) … (p, mp) lie on a straight line with non-zero slope then Dmi = mi - mi-1 ≠ 0 but equal.

295 Consider the 2nd differences
D2mi = (mi - mi-1)-(mi -1 - mi-2) = mi - 2mi-1 + mi-2 If the points (1, m1), (2, m2) … (p, mp) lie on a straight line then D2mi = mi - 2mi-1 + mi-2 = 0 If the points (1, m1), (2, m2) … (p, mp) lie on a quadratic curve then D2mi = mi - 2mi-1 + mi-2 ≠ 0 but equal.

296 Consider the 3rd differences
D3mi = mi - 3mi-1 + 3mi-2 - mi-3 If the points (1, m1), (2, m2) … (p, mp) lie on a quadratic curve then D3mi = mi - 3mi-1 + 3mi-2 - mi-3 = 0 If the points (1, m1), (2, m2) … (p, mp) lie on a cubic curve then D3mi = mi - 3mi-1 + 3mi-2 - mi-3 ≠ 0 but equal.

297 Continuing, 4th differences, D4mi will be non- zero but equal if the points (1, m1), (2, m2) … (p, mp) lie on a quartic curve (4th degree). 5th differences, D5mi will be non- zero but equal if the points (1, m1), (2, m2) … (p, mp) lie on a quintic curve (5th degree). etc.

298 Let L = a2 Dm2 + a3 Dm3 + … + ap Dmp Q2 = b3 D2m3 + … + bp D2mp C = c4 D3m4 + … + cp D3mp Q4 = d5 D4m5+ … + dp D4mp etc. Where a2, …, ap, b1, …, bp, c1, … etc are chosen so that L, Q2, C, Q4, … etc are mutually orthogonal contrasts.

299 If the means are equal then
L = Q2 = C = Q4 = … = 0. If the means are linear then L ≠ 0 but Q2 = C = Q4 = … = 0. If the means are quadratic then Q2 ≠ 0 but C = Q4, … = 0. If the means are cubic then C ≠ 0 but Q4 = … = 0.

300 Orthogonal Linear Contrasts for Polynomial Regression

301 Orthogonal Linear Contrasts for Polynomial Regression

302 Multiple Testing Fisher’s L.S.D. (Least Significant Difference) Procedure Tukey’s Multiple comparison procedure Scheffe’s multiple comparison procedure

303 Multiple Testing – a Simple Example
Suppose we are interested in testing to see if two parameters (q1 and q2) are equal to zero. There are two approaches We could test each parameter separately H0: q1 = 0 against HA: q1 ≠ 0 , then H0: q2 = 0 against HA: q2 ≠ 0 We could develop an overall test H0: q1 = 0, q2= 0 against HA: q1 ≠ 0 or q2 ≠ 0

304 To test each parameter separately
then We might use the following test: then is chosen so that the probability of a Type I errorof each test is a.

305 To perform an overall test
H0: q1 = 0, q2= 0 against HA: q1 ≠ 0 or q2 ≠ 0 we might use the test is chosen so that the probability of a Type I error is a.

306

307

308

309

310

311

312

313 Multiple Comparison Tests
Post-hoc Tests Multiple Comparison Tests

314 Multiple Comparison Tests
Post-hoc Tests Multiple Comparison Tests

315 Multiple Testing Fisher’s L.S.D. (Least Significant Difference) Procedure Tukey’s Multiple comparison procedure Scheffe’s multiple comparison procedure

316 Suppose we have p means An F-test has revealed that there are significant differences amongst the p means We want to perform an analysis to determine precisely where the differences exist.

317 Example One –way ANOVA The F test – for comparing k means
Situation We have k normal populations Let mi and s denote the mean and standard deviation of population i. i = 1, 2, 3, … k. Note: we assume that the standard deviation for each population is the same. s1 = s2 = … = sk = s

318 We want to test against

319 Anova Table Mean Square F-ratio Between k - 1 SSBetween MSBetween
Source d.f. Sum of Squares Mean Square F-ratio Between k - 1 SSBetween MSBetween MSB /MSW Within N - k SSWithin MSWithin Total N - 1 SSTotal

320 Comments The F-test H0: m1 = m2 = m3 = … = mk against HA: at least one pair of means are different If H0 is accepted we know that all means are equal (not significantly different) If H0 is rejected we conclude that at least one pair of means is significantly different. The F – test gives no information to which pairs of means are different. One now can use two sample t tests to determine which pairs means are significantly different

321 Fishers LSD (least significant difference) procedure:

322 Fishers LSD (least significant difference) procedure:
Test H0: m1 = m2 = m3 = … = mk against HA: at least one pair of means are different, using the ANOVA F-test If H0 is accepted we know that all means are equal (not significantly different). Then stop in this case If H0 is rejected we conclude that at least one pair of means is significantly different, then follow this by using two sample t tests to determine which pairs means are significantly different

323 Tukey’s Multiple Comparison Test

324 Let denote the standard error of each Tukey's Critical Differences Two means are declared significant if they differ by more than this amount. = the tabled value for Tukey’s studentized range p = no. of means, n = df for Error

325 Table: Critical values for Tukey’s studentized Range distribution

326

327

328

329 Scheffe’s Multiple Comparison Test

330 Scheffe's Critical Differences (for Linear contrasts)
A linear contrast is declared significant if it exceeds this amount. = the tabled value for F distribution (p -1 = df for comparing p means, n = df for Error)

331 Scheffe's Critical Differences
(for comparing two means) Two means are declared significant if they differ by more than this amount.

332 Multiple Confidence Intervals
Tukey’s Multiple confidence intervals Scheffe’s Multiple confidence intervals One-at-a-time confidence intervals

333 Tukey’s Multiple confidence intervals
Comments Tukey’s Multiple confidence intervals One-at-a-time confidence intervals The probability that each of these interval contains mi – mj is 1 – a. The probability that all of these interval contains mi – mj is considerably lower than 1 – a Scheffe’s Multiple confidence intervals These intervals can be computed not only for simple differences in means, mi – mj , but also any other linear contrast, c1m1 + … + ckmk. The probability that all of these intervals contain its linear contrast is 1 – a

334 There are many multiple (post hoc) comparison procedures
Tukey’s Scheffe’, Duncan’s Multiple Range Neumann-Keuls etc Considerable controversy: “I have not included the multiple comparison methods of D.B. Duncan because I have been unable to understand their justification” H. Scheffe, Analysis of Variance

335 2k Experiments, Incomplete block designs for 2k experiments, fractional 2k experiments

336 Factorial Experiments

337 k Categorical independent variables A, B, C, … (the Factors) Let
Dependent variable y k Categorical independent variables A, B, C, … (the Factors) Let a = the number of categories of A b = the number of categories of B c = the number of categories of C etc. t = abc... Treatment combinations

338 The Completely Randomized Design
We form the set of all treatment combinations – the set of all combinations of the k factors Total number of treatment combinations t = abc…. In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. Total number of experimental units N = nt=nabc..

339 The ANOVA Table three factor experiment

340 If the number of factors, k, is large then it may be appropriate to keep the number of levels of each factor low (2 or 3) to keep the number of treatment combinations, t, small. t = 2k if a = b =c = ... =2 or t = 3k if a = b =c = ... =3 The experimental designs are called 2k and 3k designs

341 The ANOVA Table 23 experiment
Source Sum of Squares d.f. A SSA 1 B SSB C SSC AB SSAB AC SSAC BC SSBC ABC SSABC Error SSError 23(n – 1)

342 Notation for treatment combinations for 2k experiments
There are several methods for indicating treatment combinations in a 2k experiment and 3k experiment. A sequence of small letters representing the factors with subscripts (0,1 for 2k experiment and 0, 1, 2 for a 3k experiment) A sequence of k digits (0,1 for 2k experiment and 0, 1, 2 for a 3k experiment. A third way of representing treatment combinations for 2k experiment is by representing each treatment combination by a sequence of small letters. If a factor is at its high level, it’s letter is present. If a factor is at its low level, it’s letter is not present.

343 The 8 treatment combinations in a 23 experiment
(a0, b0, c0), (a1, b0, c0), (a0, b1, c0), (a0, b0, c1), (a1, b1, c0), (a1, b0, c1), (a0, b1, c1), (a1, b1, c1) 000, 100, 010, 001, 110, 101, 011, 111 1, a, b, c, ab, ac, bc, abc In the last way of representing the treatment combinations, a more natural ordering is: 1, a, b, ab, c, ac, bc, abc Using this ordering the 16 treatment combinations in a 24 experiment 1, a, b, ab, c, ac, bc, abc, d, da, db, dab, dc, dac, dbc, dabc

344 Notation for Linear contrasts treatment combinations in a 2k experiments
The linear contrast for 1 d.f. representing the Main effect of A LA = (1 + b + c + bc) – (a + ab +ac + abc) = comparison of the treatment combinations when A is at its low level with treatment combinations when A is at its high level. Note: LA = (1 - a) (1 + b) (1 + c) also LB = (1 + a) (1 - b) (1 + c) = (1 + a + c + ac) – (b + ab +bc + abc) LC = (1 + a) (1 + b) (1 - c) = (1 + a + b + ab) – (c + ca +cb + abc)

345 The linear contrast for 1 d.f. representing the interaction AB
LAB = (1 - a) (1 - b) (1 + c) = (1 + ab + c + abc) – (a + b +ac + bc) = comparison of the treatment combinations where A and B are both at a high level or both at a low level with treatment combinations either A is at its high level and B is at a low level or B is at its high level and A is at a low level. LAC = (1 - a) (1 + b) (1 - c) = (1 + ac + b + abc) – (a + c +ab + bc) LBC = (1 + a) (1 - b) (1 - c) = (1 + bc + a + abc) – (b + c +ac + ab)

346 The linear contrast for 1 d.f. representing the interaction ABC
LABC = (1 - a) (1 - b) (1 - c) = (1 + ab + ac + bc) – (a + b + c + abc) In general Linear contrasts are of the form: L = (1 ± a)(1 ± b)(1 ± c) etc We use minus (-) if the factor is present in the effect and plus (+) if the factor is not present.

347 + × + = + - × + = - + × - = - - × - = +
The sign of coefficients of each treatment for each contrast (LA, LB, LAB, LC, LAC, LBC, LABC) is illustrated in the table below: For the main effects (LA, LB, LC) the sign is negative (-) if the letter is present in the treatment, positive (+) if the letter is not present. The interactions are products of the main effects: + × + = + - × + = - + × - = - - × - = +

348 Strategy for a single replication (n = 1)
The ANOVA Table 23 experiment Source Sum of Squares d.f. A SSA 1 B SSB C SSC AB SSAB AC SSAC BC SSBC ABC SSABC Error SSError 23(n – 1) If n = 1 then there is 0 df for estimating error. In practice the higher order interactions are not usually present. One makes this assumption and pools together these degrees of freedom to estimate Error

349 In a 7 factor experiment (each at two levels) there are 27 =128 treatments.

350 ANOVA table: Pool together these degrees of freedom to estimate Error
Source d.f. Main Effects 7 2-factor interactions 21 3-factor interactions 35 4-factor interactions 5-factor interactions 6-factor interactions 7-factor interaction 1 Pool together these degrees of freedom to estimate Error

351 Randomized Block design for 2k experiments
Blocks ... n 1 2 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc A Randomized Block Design for a 23 experiment

352 The ANOVA Table 23 experiment in RB design
Source Sum of Squares d.f. Blocks SSBlocks n - 1 A SSA 1 B SSB C SSC AB SSAB AC SSAC BC SSBC ABC SSABC Error SSError (23 – 1)(n – 1)

353 Incomplete Block designs for 2k experiments Confounding

354 ... 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a
A Randomized Block Design for a 23 experiment Blocks ... n 1 2 3 4 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc 1 a b ab c ac bc abc

355 Incomplete Block designs for 2k experiments
A Randomized Block Design for a 2k experiment requires blocks of size 2k. The ability to detect treatment differences depends on the magnitude of the within block variability. This can be reduced by decreasing the block size. Blocks 1 2 Example: a 23 experiment in blocks of size 4 (1 replication). The ABC interaction is confounded with blocks ac 1 a bc ac b c ab

356 ac 1 a bc b ac c ab Blocks In this experiment the linear contrast 1 2
LABC = (1 + ab + ac + bc) – (a + b + c + abc) In addition to measuring the ABC interaction it is also subject to block to block differences. ac 1 a bc b ac The ABC interaction it is said to be confounded with block differences. c ab The linear contrasts LA = (1 + b + c + bc) – (a + ab +ac + abc) LB = (1 + a + c + ac) – (b + ab +bc + abc) LC = (1 + a + b + ab) – (c + ca +cb + abc LAB = (1 + ab + c + abc) – (a + b +ac + bc) LAC = (1 + ac + b + abc) – (a + c +ab + bc) LBC = (1 + bc + a + abc) – (b + c +ac + ab) are not subject to block to block differences

357 To confound an interaction (e. g
To confound an interaction (e. g. ABC) consider the linear contrast associated with the interaction: LABC = 1 + ab + ac + bc – a – b – c – abc Assign treatments associated with positive (+) coefficients to one block and treatments associated with negative (-) coefficients to the other block Blocks 1 2 ac 1 a bc b ac c ab

358 The ANOVA Table 23 experiment in incomplete design with 2 blocks of size 4
Source Sum of Squares d.f. Blocks SSBlocks 1 A SSA B SSB C SSC AB SSAB AC SSAC BC SSBC Total SSTotal 7

359 Confounding more than one interaction to further reduce block size

360 Example: contrasts for 23 experiment
If I want to confound ABC, one places the treatments associated with the positive sign (+) in one block and the treatments associated with the negative sign (-) in the other block. If I want to confound both BC and ABC, one chooses the blocks using the sign categories (+,+) (+,-) (-,+) (-,-) Comment: There will also be a third contrast that will also be confounded

361 1 a ab b bc abc ac c LABC = (1 + ab + ac + bc) – (a + b + c + abc) and
Example: a 23 experiment in blocks of size 2 (1 replicate). BC and ABC interaction is confounded in the four block. Block 1 Block 2 Block 3 Block 4 1 a ab b bc abc ac c LABC = (1 + ab + ac + bc) – (a + b + c + abc) and LBC = (1 + bc + a + abc) – (b + c +ac + ab) are confounded with blocks LA = (1 + b + c + bc) – (a + ab +ac + abc) is also confounded with blocks LB = (1 + a + c + ac) – (b + ab +bc + abc) LC = (1 + a + b + ab) – (c + ca +cb + abc LAB = (1 + ab + c + abc) – (a + b +ac + bc) LAC = (1 + ac + b + abc) – (a + c +ab + bc) are not subject to block to block differences

362 The ANOVA Table 23 experiment in incomplete design with 4 blocks of size 2 (ABC, BC and hence A confounded with blocks) Source Sum of Squares d.f. Blocks SSBlocks 3 B SSB 1 C SSC AB SSAB AC SSAC Total SSTotal 7 There are no degrees of freedom for Error. Solution: Assume either one or both of the two factor interactions are not present and use those degrees of freedom to estimate error

363 Rule: (for determining additional contrasts that are confounded with block)
“Multiply” the confounded interactions together. If a factor is raised to the power 2, delete it Example: Suppose that ABC and BC is confounded, then so also is (ABC)(BC) = AB2C2 = A. A better choice would be to confound AC and BC, then the third contrast that would be confounded would be (AC)(BC) = ABC2 = AB

364 If I want to confound both AC and BC, one chooses the blocks using the sign categories (+,+) (+,-) (-,+) (-,-). As noted this would also confound (AC)(BC) = ABC2 = AB. Block 1 Block 2 Block 3 Block 4 1 b a ab abc ac bc c

365 The ANOVA Table 23 experiment in incomplete design with 4 blocks of size 2 (AC, BC and hence AB confounded with blocks) Source Sum of Squares d.f. Blocks SSBlocks 3 A SSA 1 B SSB C SSC ABC SSABC Total SSTotal 7 There are no degrees of freedom for Error. Solution: Assume that the three factor interaction is not present and use this degrees of freedom to estimate error

366 Partial confounding

367 1 ab ab bc 1 a bc c b a c b abc ac ab ab ac 1 a b ac c ab bc
Example: a 23 experiment in blocks of size 4 (3 replicates). BC interaction is confounded in 1st replication. AC interaction is confounded in 2nd replication. AB interaction is confounded in 3rd replication. Replicate 1 BC confounded Replicate 2 AC confounded Replicate 3 AB confounded Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 1 ab ab bc 1 a bc c b a c b abc ac ab ab ac 1 a b ac c ab bc The main effects (A, B and C) and the three factor interaction ABC can be estimated using all three replicates. The two factor interaction AB can be estimated using replicates 1 and 2, AC using replicates 1 and 3, BC using replicates 2 and 3,

368 The ANOVA Table Source Sum of Squares d.f. Reps SSBlocks 2
Blocks within Reps SSBlocks(Reps) 3 A SSA 1 B SSB C SSC AB SSAB Reps I,II AC SSAC Reps I,III BC SSBC Reps II,III ABC SSABC Error SSError 11 Total SSTotal 23

369 Example: A chemist is interested in determining how purity (Y) of a chemical product, depends on agitation rate (A), base component concentration (B) and concentration of reagent (C). He decides to use a 23 design. Only 4 runs can be done each day (Block) and he wanted to have 3 replications of the experiment. Replicate 1 BC confounded Replicate 2 AC confounded Replicate 3 AB confounded day 1 day 2 day 3 day 4 day 5 day 6 1 25 ab 43 abc 39 bc 38 26 a 34 c 30 b 29 37 32 42 ac 40 27 46 52 33 51 36

370 The ANOVA Table F0.05(1,11) = 4.84 and F0.01(1,11) = 9.65 Source
Sum of Squares d.f. Mean Square F Reps 111.00 2 55.50 Blocks within Reps 108.00 3 36.00 A 600.00 1 40.6** B 253.50 17.2** C 54.00 3.7(ns) AB (Reps I,II) 6.25 < 1 AC (Reps I,III) 1.00 BC (Reps II,III) ABC 13.50 Error 162.50 11 14.77 Total 23 F0.05(1,11) = and F0.01(1,11) = 9.65

371 Fractional Factorials

372 In a 2k experiment the number of experimental units required may be quite large even for moderate values of k. For k = 7, 27 = 128 and n27 = 256 if n = 2. Solution: Use only n = 1 replicate and use higher order interactions to estimate error. It is very rare thqt the higher order interactions are significant An alternative solution is to use ½ a replicate, ¼ a replicate, 1/8 a replicate etc. (i.e. a fractional replicate) 2k – 1 = ½ 2k design, 2k – 2 = ¼ 2k design

373 In a fractional factorial design, some ot he effects (interactions or main effects) may not be estimable. However it may be assumed that these effects are not present (in particular the higher order interactions)

374 Example: 24 experiment, A, B, C, D - contrasts
To construct a ½ replicate of this design in which the four factor interaction, ABCD, select only the treatment combinations where the coefficient is positive (+) for ABCD

375 The treatments and contrasts of a ½ 24 = 24-1 experiment
Notice that some of the contrasts are equivalent e.g. A and BCD, B and ACD, etc In this case the two contrasts are said to be aliased. Note the defining contrast, ABCD is aliased with the constant term I. To determine aliased contrasts multiply the any effect by the effect of the defining contrast e.g. (A)×(ABCD) = A2BCD = BCD

376 Aliased contrasts in a 24 -1 design with ABCD the defining contrast
A with BCD B with ACD C with ABD D with ABC AB with CD AC with BD AD with BC If an effect is aliased with another effect you can either estimate one or the other but not both

377 The ANOVA for a 24 -1 design with ABCD the defining contrast
Source df A 1 B C D AB AC AD Total 7

378 Example: ¼ 24 experiment To construct a ¼ replicate of the 24 design. Choose two defining contrasts, AB and CD, say and select only the treatment combinations where the coefficient is positive (+) for both AB and CD

379 The treatments and contrasts of a ¼ 24 = 24-2 experiment
Aliased contrasts I and AC and BD and ABCD A and C and ABD and BCD B and ABC and D and ACD AB and BC and AD and CD

380 The ANOVA for a 24 -1 design with ABCD the defining contrast
Source df A 1 B AB Total 3 There may be better choices for the defining contrasts The smaller fraction of a 2k design becomes more appropriate as k increases.

381 Response surfaces

382 We have a dependent variable y, independent variables x1, x2, ... ,xp
The general form of the model y = f(x1, x2, ... ,xp) + e Contour Map Surface Graph

383 The linear model y = b0 + b1x1 + b2x2 +... + bpxp + e Contour Map
Surface Graph

384 The quadratic response model
Linear terms Quadratic terms Contour Map Surface Graph

385 The quadratic response model (3 variables)
Linear terms Quadratic terms To fit this model we would be given the data on y, x1, x2, x3. From that data we would compute: We then regress y on x1, x2, x3, u4, u5, u6 , u7, u8 and u9

386 Exploration of a response surface The method of steepest ascent

387 Situation We have a dependent variable y, independent variables x1, x2, ... ,xp The general form of the model y = f(x1, x2, ... ,xp) + e We want to find the values of x1, x2, ... ,xp to maximize (or minmize) y. We will assume that the form of f(x1, x2, ... ,xp) is unknown. If it was known (e.g. A quadratic response model), we could estimate the parameters and determine the optimum values of x1, x2, ... ,xp using calculus

388 The method of steepest ascent:
Choose a region in the domain of f(x1, x2, ... ,xp) Collect data in that region Fit a linear model (plane) to that data. Determine from that plane the direction of its steepest ascent. (direction (b1, b2, ... ,bp )) Move off in the direction of steepest ascent collecting on y. Continue moving in that direction as long as y is increasing and stop when y stops increasing. Choose a region surrounding that point and return to step 2. Continue until the plane fitted to the data is horizontal Consider fitting a quadratic response model in this region and determining where it is optimal.

389 The method of steepest ascent:
domain of f(x1, x2, ... ,xp) Optimal (x1, x2) Initial region direction of steepest ascent. Final region 2nd region

390 Logistic regression

391 Recall the simple linear regression model:
y = b0 + b1x + e where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model: y = b0 + b1x1 + b2x2 + … + + bpxp + e Here we are trying to predict a continuous dependent variable y from a several continuous dependent variables x1 , x2 , … , xp .

392 Now suppose the dependent variable y is binary.
It takes on two values “Success” (1) or “Failure” (0) We are interested in predicting a y from a continuous dependent variable x. This is the situation in which Logistic Regression is used

393 The logisitic Regression Model
Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. is called the odds ratio The ratio: This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio

394 Example: odds ratio, log odds ratio
Suppose a die is rolled: Success = “roll a six”, p = 1/6 The odds ratio The log odds ratio

395 The logisitic Regression Model
Assumes the log odds ratio is linearly related to x. i. e. : In terms of the odds ratio

396 The logisitic Regression Model
Solving for p in terms x. or

397 Interpretation of the parameter b0 (determines the intercept)
x

398 Interpretation of the parameter b1 (determines when p is 0
Interpretation of the parameter b1 (determines when p is 0.50 (along with b0)) p when x

399 Also when is the rate of increase in p with respect to x when p = 0.50

400 Interpretation of the parameter b1 (determines slope when p is 0.50 )
x

401 The data The data will for each case consist of
a value for x, the continuous independent variable a value for y (1 or 0) (Success or Failure) Total of n = 250 cases

402

403 Estimation of the parameters
The parameters are estimated by Maximum Likelihood estimation and require a statistical package such as SPSS

404 Using SPSS to perform Logistic regression
Open the data file:

405 Choose from the menu: Analyze -> Regression -> Binary Logistic

406 The following dialogue box appears
Select the dependent variable (y) and the independent variable (x) (covariate). Press OK.

407 Here is the output The Estimates and their S.E.

408 The parameter Estimates

409 Interpretation of the parameter b0 (determines the intercept)
Interpretation of the parameter b1 (determines when p is 0.50 (along with b0))

410 Another interpretation of the parameter b1
is the rate of increase in p with respect to x when p = 0.50

411 The Multiple Logistic Regression model

412 Here we attempt to predict the outcome of a binary response variable Y from several independent variables X1, X2 , … etc

413 Multiple Logistic Regression an example
In this example we are interested in determining the risk of infants (who were born prematurely) of developing BPD (bronchopulmonary dysplasia) More specifically we are interested in developing a predictive model which will determine the probability of developing BPD from X1 = gestational Age and X2 = Birthweight

414 For n = 223 infants in prenatal ward the following measurements were determined
X1 = gestational Age (weeks), X2 = Birth weight (grams) and Y = presence of BPD

415 The data

416 The results

417 Graph: Showing Risk of BPD vs GA and BrthWt

418 Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data

419 Multiway Frequency Tables
Two-Way A

420 Three -Way B A C

421 Three -Way C B A

422 four -Way B A C D

423 Log Linear Model

424 Three-way Frequency Tables

425 Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general Where – Side conditions hold

426 Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general

427 Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

428 1. Model: (All Main effects model)
ln mijk = u + u1(i) + u2(j) + u3(k) i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables.

429 2. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.

430 3. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.

431 4. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u23(j,k) i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.

432 5. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) i.e. u23(j,k) = u123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.

433 6. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k) i.e. u13(i,k) = u123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.

434 7. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k) i.e. u12(i,j) = u123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.

435 8. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) i.e. u123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

436 9. Model: (the saturated model)
ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) + u123(i,j,k) Notation: [123] Description: No simplifying dependence structure.

437 Hierarchical Log-linear models for 3 way table
Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123] The saturated model

438 Comments Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) In certain situations the equations need to be solved numerically For the saturated model (all interactions and main effects), the estimate of mijk… is xijk… .

439 Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data

440 Multiway Frequency Tables
Two-Way A

441 four -Way B A C D

442 Log Linear Model

443 Two- way table where The multiplicative form:

444 Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where

445 Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general or the multiplicative form

446 Comments The log-linear model is similar to the ANOVA models for factorial experiments. The ANOVA models are used to understand the effects of categorical independent variables (factors) on a continuous dependent variable (Y). The log-linear model is used to understand dependence amongst categorical variables The presence of interactions indicate dependence between the variables present in the interactions

447 Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

448 1. Model: (All Main effects model)
ln mijk = u + u1(i) + u2(j) + u3(k) i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables.

449 2. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.

450 3. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.

451 4. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u23(j,k) i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.

452 5. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) i.e. u23(j,k) = u123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.

453 6. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k) i.e. u13(i,k) = u123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.

454 7. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k) i.e. u12(i,j) = u123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.

455 8. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) i.e. u123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

456 9. Model: (the saturated model)
ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) + u123(i,j,k) Notation: [123] Description: No simplifying dependence structure.

457 Hierarchical Log-linear models for 3 way table
Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123] The saturated model

458 Goodness of Fit Statistics
These statistics can be used to check if a log-linear model will fit the observed frequency table

459 Goodness of Fit Statistics
The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if c2 or G2 is greater than

460 Conditional Test Statistics

461 Suppose that we are considering two Log-linear models and that Model 2 is a special case of Model 1.
That is the parameters of Model 2 are a subset of the parameters of Model 1. Also assume that Model 1 has been shown to adequately fit the data.

462 In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:

463 Stepwise selection procedures
Forward Selection Backward Elimination

464 Forward Selection: Starting with a model that under fits the data, log-linear parameters that are not in the model are added step by step until a model that does fit is achieved. At each step the log-linear parameter that is most significant is added to the model: To determine the significance of a parameter added we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter

465 Backward Elimination:
Starting with a model that over fits the data, log-linear parameters that are in the model are deleted step by step until a model that continues to fit the model and has the smallest number of significant parameters is achieved. At each step the log-linear parameter that is least significant is deleted from the model: To determine the significance of a parameter deleted we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter

466 Modelling of response variables
Independent → Dependent

467 Logit Models To date we have not worried whether any of the variables were dependent of independent variables. The logit model is used when we have a single binary dependent variable.


Download ppt "Review."

Similar presentations


Ads by Google