Lecture 2.2 2016 Michael Stuart Design and Analysis of Experiments Lecture Review Lecture 2.1 –Minute test –Why block? –Deleted residuals 2.Interaction 3.Random Block Effects 4.Introduction to 2-level factorial designs –a 2 2 experiment –introducing the Design Matrix
Why block? Blocking is useful when there are known external factors (covariates) that affect variation between plots. Blocking reduces bias arising due to block effects disproportionately affecting factor effects due to levels disproportionally allocated to blocks. Neighbouring plots are likely to be more homogeneous than separated plots, so that –blocking reduces variation in results when treatments are compared within blocks –(and increases precision when results are combined across blocks).
Deleted residuals Minitab does this automatically for all cases! They are used to allow each case to be assessed using a criterion not affected by the case. The residuals are not deleted, it is the case that is deleted while the corresponding "deleted residual is calculated Simple linear regression illustrates:
Deleted residual Given an exceptional case, deleted residual> residual using all the data deleted s< s using all the data deleted standardised residual >> standardised residual using all the data Using deleted residuals accentuates exceptional cases
Multi-factor designs reveal interaction Pressure Temperature High Low
Interaction defined Factors interact when the effect of changing one factor depends on the level of the other. Interaction displayed
Iron-deficiency anemia Contributory factors: –cooking pot type Aluminium (A), Clay (C) and Iron (I) –food type Meat (M), Legumes (L) and Vegetables (V)
Interaction
Interaction LegumesVegetableMeat Aluminium Change effect Clay Change effect Iron
Interaction LegumesVegetableMeat Aluminium Change effect Clay Change effect Iron
Two 2-level factors Pressure Temperature High Low Pressure effect Low T: 60 – 65 = –5 High T: 75 – 70 = +5 Diff:5 – (–5) = 10 Temperature effect Low P: 70 – 65 = 5 High P: 75 – 60 = 15 Diff:15 – 5 = 10
Model for analysis Iron content includes –a contribution for each food type plus –a contribution for each pot type plus –a contribution for each food type / pot type combination plus –a contribution due to chance variation
Model for analysis Y = + + + where is the overall mean, is the food effect, above or below the mean, depending on which food type is used, is the pot effect, above or below the mean, depending on which pot type is involved is the food/pot interaction effect, depending on which food type / pot type combination is used represents chance variation
Estimating the model Food Type Pot Means Pot Main Effects MLV A – 2.5 = – 0.6 Pot Type C – 2.5 = – 0.5 I – 2.5 = Food Means Food Main Effects 3.0 – 2.5 = – 2.5 = – 2.5 = – 0.7
Estimating the model Food Type Pot Means Pot Main Effects MLV A – 2.5 = – 0.6 Pot Type C – 2.5 = – 0.5 I – 2.5 = Food Means Food Main Effects 3.0 – 2.5 = – 2.5 = – 2.5 = – 0.7
Interaction effects MLV Pot Effects A C I Food Effects Interaction Effects
Interaction effects MLV Pot Effects A C I Food Effects Interaction Effects
Estimating Calculate s from each cell, based on 4 – 1 = 3 df, Estimate is average across all 9 cells, with 9 x 3 = 27 df
Analysis of Variance SS(Total) = SS(Pot effects) + SS(Food effects) + SS(Interaction effects) + SS(Error) Source DF SS MS F-Value P-Value Pot Food Pot*Food Error Total
Recall:Case Study Reducing yield loss in a chemical process Process: chemicals blended, filtered and dried Problem:yield loss at filtration stage Proposal:adjust initial blend to reduce yield loss Plan: –prepare five different blends –use each blend in successive process runs, in random order –repeat at later times (blocks)
Results Ref: BlendLoss.xls
Initial data analysis Little variation between blocks More variation between blends Disturbing interaction pattern; see later
Analysis of Variance Blend Loss analysis model included Blend effects + Block effects + Chance variation, –NO INTERACTION EFFECTS Analysis of Variance for Loss Source DF Seq SS Adj SS Adj MS F P Blend Block Error Total
Include interaction in model? Analysis of Variance for Loss Source DF Adj SS Adj MS F P Blend ** Block ** Blend*Block ** Error 0 * * * Total
ANOVA with no replication Recall F-test logic: –MS(Error) ≈ 2 –MS(Effect) ≈ 2 + effect contribution –F = MS(Effect) / MS(Error) ≈ 1 if effect absent, >>1 if effect present No replication? use MS(Interaction) as MS(Error) If Block by Treatment interaction is absent, –OK If Block by Treatment interaction is present, –conservative test
Fitted values show no interaction. Recall :Estimating the model
Classwork Calculate fitted values
Classwork (cont'd) Make a Block profile plot
Fitted values; NO INTERACTION
Interaction? Blend x Block interaction? no general test without replication
Part 3 Random block effects Contribution of blend effect is predictable, depends on the known makeup of each blend Contribution of block effect is not predictable, depends on current conditions at run time. Convention: –Blend effect is fixed, –Block effect is random A, B, C, D, E are fixed but unknown, I, II, III are random numbers Assumption: N( 0, B )
Random block effects Recall F-test logic: –MS(Error) ≈ 2 –MS(Effect) ≈ 2 + effect contribution –F = MS(Effect) / MS(Error) ≈ 1 if effect absent, >>1 if effect present For Blend Effect, effect contribution= For Block Effect, effect contribution= No effect on logic of F-test
Minitab analysis Analysis of Variance for Loss, using Adjusted SS for Tests Source DF Adj SS Adj MS F-Value P-Value Block Blend Error Total Expected Mean Square Source for Each Term 1 Block (3) (1) 2 Blend (3) + Q[2] 3 Error (3) Ref: DCM, p. 125, p. 133
Part 4 Introduction to 2-level factorial designs A 2 2 experiment Project: optimisation of a chemical process yield Factors (with levels): operating temperature (Low, High) catalyst (C1, C2) Design: Process run at all four possible combinations of factor levels, in duplicate, in random order.
Design set up
Go to Excel Randomisation
Design set up: Run order NB: Reset factor levels each time
Classwork What were the experimental units factors factor levels treatments response blocks allocation procedure
Results (run order)
Results (standard order)
Analysis (Minitab) Main effects and Interaction plots ANOVA results –with diagnostics Calculation of t-statistics
Main Effects and Interactions
Minitab DOE command; Estimated Effects and Coefficients for Yield Term Effect Coef SE Coef T P Constant Temperature Catalyst Temperature*Catalyst S = Effect = Coef x 2 SE(Effect) = SE(Coef) x 2 Analyze Factorial Design subcommand
Minitab DOE Analyze Factorial Design Estimated Effects and Coefficients for Yield (coded units) Term Effect Coef SE Coef T P Constant Temperature Catalyst Temperature*Catalyst S = R-Sq = 95.83% R-Sq(adj) = 92.69% Analysis of Variance for Yield (coded units) Source DF Seq SS Adj SS Adj MS F P Main Effects Way Interactions Residual Error Pure Error Total
Minitab DOE Analyze Factorial Design Estimated Effects and Coefficients for Yield (coded units) Term Effect Coef SE Coef T P Constant Temperature Catalyst Temperature*Catalyst S = R-Sq = 95.83% R-Sq(adj) = 92.69% Analysis of Variance for Yield (coded units) Source DF Seq SS Adj SS Adj MS F P Main Effects Way Interactions Residual Error Pure Error Total
Minitab DOE Analyze Factorial Design Estimated Effects and Coefficients for Yield (coded units) Term Effect Coef SE Coef T P Constant Temperature Catalyst Temperature*Catalyst S = R-Sq = 95.83% R-Sq(adj) = 92.69% Analysis of Variance for Yield (coded units) Source DF Seq SS Adj SS Adj MS F P Main Effects Way Interactions Residual Error Pure Error Total
ANOVA results ANOVA superfluous for 2 k experiments "There is nothing to justify this complexity other than a misplaced belief in the universal value of an ANOVA table". BHH, Section 5.10, p.188 "The standard form of the 'analysis of variance' does not seem to me to be useful for 2 n data. Daniel (1976), Section 7.1, p.128
Diagnostic Plots
Direct Calculation
Classwork Calculate a confidence interval for the Temperature effect. All effects may be estimated and tested in this way. Homework Test the statistical significance of and calculate confidence intervals for the Catalyst effect and the Temperature by Catalyst interaction effect.
Application Finding the optimum More Minitab results Least Squares Means for Yield Mean SE Mean Temperature Low High Catalyst Temperature*Catalyst Low High Low High
Optimum operating conditions Highest yield achieved –with Catalyst 2 –at High temperature. Estimated yield: 81.5% 95% confidence interval: 81.5 ± 2.78 × 2.622, i.e., 81.5 ± 7.3, i.e., ( 74.2, 88.8 )
Exercise As part of a project to develop a GC method for analysing trace compounds in wine without the need for prior extraction of the compounds, a synthetic mixture of aroma compounds in ethanol-water was prepared. The effects of two factors, Injection volume and Solvent flow rate, on GC measured peak areas given by the mixture were assessed using a 2 2 factorial design with 3 replicate measurements at each design point. The results are shown in the table that follows. What conclusions can be drawn from these data? Display results numerically and graphically. Check model assumptions by using appropriate residual plots.
Measurements for GC study (EM, Exercise 5.1, pp )
Introducing the Design Matrix
Design Matrix
Design Matrix with Y's
Design Matrix with Data
Augmented Design Matrix with Y's
Augmented Design Matrix with Data Calculate effects as Mean(+) – Mean(–)
Dual role of the design matrix Prior to the experiment, the rows designate the design points, the sets of conditions under which the process is to be run. After the experiment, the columns designate the contrasts, the combinations of design point means which measure the main effects of the factors. The extended design matrix facilitates the calculation of interaction effects
Reading EM §5.3 DCM §6-2, §6-2