Presentation is loading. Please wait.

Presentation is loading. Please wait.

23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

Similar presentations


Presentation on theme: "23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,"— Presentation transcript:

1 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable, X, sometimes called a covariate. The procedure, ANCOVA, is a combination of ANOVA with regression.

2 23-2 Example: Calf Weight Gain An animal scientist wishes to examine the impact of a pair of new dietary supplements on calf weight gain (response). Three treatments are defined: standard diet, standard diet + supplement Q, and standard diet + supplement R. All new calves from a large herd are available for use as study units. She selects 30 calves for study. Calves are randomized to the three diets at random (completely randomized design). Initial weights are recorded, then calves are placed on the diets. At the end of four weeks the final weight is taken and weight gain is computed. Simple analysis of variance and associated multiple comparisons procedures indicate no significant differences in weight gain between the two supplementary diets, but big differences between the supplemental diets and the standard diet. Is this the end of the story? …

3 23-3 ANOVA Results Average Weight Gain (Response g/day) Standard Diet + Supplement Q + Supplement R xx x xx x x xx xxx xxx x xx xx x x x xx x x x Simple ANOVA of a one-way classification would suggest no difference between Supplements Q and R but both different from Standard diet.

4 23-4 Initial Weights Initial Weight Standard Diet + Supplement Q + Supplement R xx x x xx x x xx xx x xxxx x xx x x x xx xx x x x Plotting of the initial weights by group shows that the groups were not equal when it came to initial weights.

5 23-5 Weight Gain to Initial Weight Standard Diet Weight (kg) age If animals come into the study at different ages, they have different initial weights and are at different points on the growth curve. Expected weight gains will be different depending on age at entry into study.

6 23-6 Regression of Initial Weight to Weight Gain Initial Weight (x) Weight Gain (g/day) (Y) If we disregard the age of the animal but instead focus on the initial weight, we see that there is a linear relationship between initial weight and the weight gain expected.

7 23-7 Covariates Initial weight in the previous example is a covariable or covariate. A covariate is a disturbing variable (confounder), that is, it is known to have an effect on the response. Usually, the covariate can be measured but often we may not be able to control its effect through blocking. In the EXAMPLE, had the animal scientist known that the calves were very variable in initial weight (or age), she could have: Created blocks of 3 or 6 equal weight animals, and randomized treatments to calves within these blocks. This would have entailed some cost in terms of time spent sorting the calves and then keeping track of block membership over the life of the study. It was much easier to simply record the calf initial weight and then use analysis of covariance for the final analysis. In many cases, due to the continuous nature of the covariate, blocking is just not feasible.

8 23-8 Expectations under H o Initial Weight (x) Expected Weight Gain (g/day) (Y) Under H o : no treatment effects. If all animals had come in with the same initial weight, All three treatments would produce the same weight gain. Average Weight Animal

9 23-9 Expectations under H A Initial Weight (x) Expected Weight Gain (g/day) (Y) Standard Diet (c) + Supplement Q (q) + Supplement R (r) Under H a : Significant Treatment effects Average Weight Animal WG s WG Q WG R Different treatments produce different weight gains for animals of the same initial weight.

10 23-10 Different Initial Weights Initial Weight (x) Expected Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r Under H o : no treatment effects. If the average initial weights in the treatment groups differ, the observed weight gains will be different, even if treatments have no effect. WG s WG Q WG R

11 23-11 Observed Responses under H A Initial Weight (x) Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r q q q q q q q q q q c c c c c c c c c c r r r r r r r r r r Standard Diet + Supplement Q + Supplement R Under H A : Significant Treatment effects Suppose now that different supplements actually do increase weight gain. This translates to animals in different treatment groups following different, but parallel regression lines with initial weight. WG R WG s WG Q What difference in weight gain is due to Initial weight and what is due to Treatment?

12 23-12 Observed Group Means Initial Weight (x) Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r q q q q q q q q q q c c c c c c c c c c r r r r r r r r r r Standard Diet + Supplement Q + Supplement R Unadjusted treatment means Simple one-way classification ANOVA (without accounting for initial weight) gives us the wrong answer!

13 23-13 Predicted Average Responses Initial Weight (x) Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r q q q q q q q q q q c c c c c c c c c c r r r r r r r r r r Standard Diet + Supplement Q + Supplement R Expected weight gain is computed for treatments for the average initial weight and comparisons are then made. Adjusted treatment means

14 23-14 ANCOVA: Objectives The objective of an analysis of covariance is to compare the treatment means after adjusting for differences among the treatments due to differences in the covariate levels for the treatments groups. The analysis proceeds by combining a regression model with an analysis of variance model.

15 23-15 Model The  i, i=1,…,t, are estimates of how each of the t treatments modifies the overall mean response. (The index j=1,…,n, runs over the n replicates for each treatment.) The slope coefficient, , is a measure of how the average response changes as the value of the covariate changes. The analysis proceeds by fitting a linear regression model with dummy variables to code for the different treatment levels.

16 23-16 A Priori Assumptions The covariate is related to the response, and can account for variation in the response. Check with a scatterplot of Y vs. X. The covariate is NOT related to the treatments. If Y is related to X, then the variance of the treatment differences is increased relative to that obtained from an ANOVA model without X, which results in a loss of precision. The treatment’s regression equations are linear in the covariate. Check with a scatterplot of Y vs. X, for each treatment. Non-linearity can be accommodated (e.g. polynomial terms, transforms), but analysis may be more complex. The regression lines for the different treatments are parallel. This means there is only one slope in the Y vs. X plots. Non-parallel lines can be accommodated, but this complicates the analysis since differences in treatments will now depend on the value of X.

17 23-17 Example Four different formulations of an industrial glue are being tested. The tensile strength (response) of the glue is known to be related to the thickness as applied. Five observations on strength (Y) in pounds, and thickness (X) in 0.01 inches are made for each formulation. Here: There are t=4 treatments (formulations of glue). Covariate X is thickness of applied glue. Each treatment is replicated n=5 times at different values of X. FormulationStrengthThickness 146.513 145.914 149.812 146.112 144.314 248.712 249.010 250.111 248.512 245.214 346.315 347.114 348.911 348.211 350.310 444.716 443.015 451.010 448.112 446.811

18 23-18 Formulation Profiles

19 23-19 SAS Program data glue; input FormulationStrengthThickness; datalines; 146.513 145.914 149.812 146.112 144.314 248.712 249.010 250.111 248.512 245.214 346.315 347.114 348.911 348.211 350.310 444.716 443.015 451.010 448.112 446.811 ; run; proc glm; class formulation; model strength = thickness formulation / solution ; lsmeans formulation / stderr pdiff; run; The basic model is a combination of regression and one-way classification.

20 23-20 Output: Use Type III SS to test significance of each variable Source DF Squares Mean Square F Value Pr > F Model 4 66.31065753 16.57766438 10.17 0.0003 Error 15 24.44684247 1.62978950 Corrected Total 19 90.75750000 R-Square Coeff Var Root MSE Strength Mean 0.730636 2.691897 1.276632 47.42500 Source DF Type I SS Mean Square F Value Pr > F Thickness 1 63.50120135 63.50120135 38.96 <.0001 Formulation 3 2.80945618 0.93648539 0.57 0.6405 Source DF Type III SS Mean Square F Value Pr > F Thickness 1 53.20115753 53.20115753 32.64 <.0001 Formulation 3 2.80945618 0.93648539 0.57 0.6405 Standard Parameter Estimate Error t Value Pr > |t| Intercept 58.93698630 B 2.21321008 26.63 <.0001 Thickness -0.95445205 0.16705494 -5.71 <.0001 Formulation 1 -0.00910959 B 0.80810401 -0.01 0.9912 Formulation 2 0.62554795 B 0.82451389 0.76 0.4598 Formulation 3 0.86732877 B 0.81361075 1.07 0.3033 Formulation 4 0.00000000 B... Regression on thickness is significant. No formulation differences. Divide by MSE to get mean squares. MSE

21 23-21 Least Squares Means (Adjusted Formulation means computed at the average value of Thickness [=12.45]) The GLM Procedure Least Squares Means Strength Standard LSMEAN Formulation LSMEAN Error Pr > |t| Number 1 47.0449486 0.5782732 <.0001 1 2 47.6796062 0.5811616 <.0001 2 3 47.9213870 0.5724527 <.0001 3 4 47.0540582 0.5739134 <.0001 4 Least Squares Means for effect Formulation Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Strength i/j 1 2 3 4 1 0.4574 0.3011 0.9912 2 0.4574 0.7695 0.4598 3 0.3011 0.7695 0.3033 4 0.9912 0.4598 0.3033

22 23-22 ANCOVA in Minitab Formulation Strength Thickness 1 46.513 1 45.914 1 49.812 1 46.112 1 44.314 2 48.712 2 49.010 2 50.111 2 48.512 2 45.214 3 46.315 3 47.114 3 48.911 3 48.211 3 50.310 4 44.716 4 43.015 4 51.010 4 48.112 4 46.811 Stat > ANOVA > General Linear Model … > Responses: Strength > Model: Formulation > Covariates: Thickness > Options: Adjusted (Type III) Sums of Squares General Linear Model: Strength versus Formulation Factor Type Levels Values Formulat fixed 4 1 2 3 4 Source DF Seq SS Adj SS Adj MS F P Thicknes 1 63.501 53.201 53.201 32.64 0.000 Formulat 3 2.809 2.809 0.936 0.57 0.640 Error 15 24.447 24.447 1.630 Total 19 90.758 Term Coef SE Coef T P Constant 59.308 2.099 28.25 0.000 Thicknes -0.9545 0.1671 -5.71 0.000 Formulat 1 -0.3801 0.5029 -0.76 0.462 2 0.2546 0.5062 0.50 0.622 3 0.4964 0.4962 1.00 0.333

23 23-23 Factor Plots… > Main Effects Plot > Formulation

24 23-24 ANCOVA in R > glue <- read.table("glue.txt",header=TRUE) > glue$Formulation <- as.factor(glue$Formulation) > # fit linear models: full, thickness only, formulation only > full.lm <- lm(Strength ~ Formulation + Thickness, data=glue) > thick.lm <- lm(Strength ~ Thickness, data=glue) > formu.lm <- lm(Strength ~ Formulation, data=glue) > > anova(thick.lm,full.lm) Analysis of Variance Table Model 1: Strength ~ Thickness Model 2: Strength ~ Formulation + Thickness Res.Df RSS Df Sum of Sq F Pr(>F) 1 18 27.2563 2 15 24.4468 3 2.8095 0.5746 0.6405 > anova(formu.lm,full.lm) Analysis of Variance Table Model 1: Strength ~ Formulation Model 2: Strength ~ Formulation + Thickness Res.Df RSS Df Sum of Sq F Pr(>F) 1 16 77.648 2 15 24.447 1 53.201 32.643 4.105e-05 *** Test for Formulation differences Test for significance of Thickness

25 23-25 R > summary(full.lm) Call: lm(formula = Strength ~ Formulation + Thickness, data = glue) Residuals: Min 1Q Median 3Q Max -1.6380 -1.0398 0.1873 0.6966 2.3255 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 58.92788 2.24551 26.243 5.97e-14 *** Formulation2 0.63466 0.83193 0.763 0.457 Formulation3 0.87644 0.81840 1.071 0.301 Formulation4 0.00911 0.80810 0.011 0.991 Thickness -0.95445 0.16706 -5.713 4.11e-05 *** > summary(thick.lm) Call: lm(formula = Strength ~ Thickness, data = glue) Residuals: Min 1Q Median 3Q Max -2.0813 -0.7324 0.1274 0.9090 1.9230 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 59.9294 1.9504 30.726 < 2e-16 *** Thickness -1.0044 0.1551 -6.476 4.32e-06 *** Residual standard error: 1.231 on 18 degrees of freedom Multiple R-Squared: 0.6997, Adjusted R-squared: 0.683 F-statistic: 41.94 on 1 and 18 DF, p-value: 4.317e-06 Full model (can be refined by omitting formulation) Reduced model (formulation omitted)

26 23-26 R Plot lines for full model; but these can all be replaced by single line for reduced model (blue).

27 23-27 R Check fit of reduced model (with just thickness).


Download ppt "23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,"

Similar presentations


Ads by Google