23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

Topic 12: Multiple Linear Regression
Latin Square Designs (§15.4)
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
EPI 809/Spring Probability Distribution of Random Error.
Generalized Linear Models (GLM)
Analysis of Economic Data
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Multiple Regression Predicting a response with multiple explanatory variables.
Multiple regression analysis
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Chapter 12 Multiple Regression
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Linear Regression and Correlation Analysis
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
Incomplete Block Designs
Ch. 14: The Multiple Regression Model building
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Correlation and Regression Analysis
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
22-1 Factorial Treatments (§ 14.3, 14.4, 15.4) Completely randomized designs - One treatment factor at t levels. Randomized block design - One treatment.
Inference for regression - Simple linear regression
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
M23- Residuals & Minitab 1  Department of ISM, University of Alabama, ResidualsResiduals A continuation of regression analysis.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
23-1 Multiple Covariates and More Complicated Designs in ANCOVA (§16.4) The simple ANCOVA model discussed earlier with one treatment factor and one covariate.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Xuhua Xia Polynomial Regression A biologist is interested in the relationship between feeding time and body weight in the males of a mammalian species.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
The Completely Randomized Design (§8.3)
March 28, 30 Return exam Analyses of covariance 2-way ANOVA Analyses of binary outcomes.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
Topic 26: Analysis of Covariance. Outline One-way analysis of covariance –Data –Model –Inference –Diagnostics and rememdies Multifactor analysis of covariance.
PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.
By: Corey T. Williams 03 May Situation Objective.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Environmental Modeling Basic Testing Methods - Statistics III.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
1 Experimental Statistics - week 9 Chapter 17: Models with Random Effects Chapter 18: Repeated Measures.
Linear Models Alan Lee Sample presentation for STATS 760.
Experimental Statistics - week 3
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Experimental Statistics - week 9
Topic 27: Strategies of Analysis. Outline Strategy for analysis of two-way studies –Interaction is not significant –Interaction is significant What if.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
1 Experimental Statistics - week 8 Chapter 17: Mixed Models Chapter 18: Repeated Measures.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Chapter 14 Introduction to Multiple Regression
Chapter 20 Linear and Multiple Regression
Chapter 12 Simple Linear Regression and Correlation
CHAPTER 7 Linear Correlation & Regression Methods
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
CHAPTER 29: Multiple Regression*
Chapter 12 Simple Linear Regression and Correlation
Multiple Regression Chapter 14.
Presentation transcript:

23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable, X, sometimes called a covariate. The procedure, ANCOVA, is a combination of ANOVA with regression.

23-2 Example: Calf Weight Gain An animal scientist wishes to examine the impact of a pair of new dietary supplements on calf weight gain (response). Three treatments are defined: standard diet, standard diet + supplement Q, and standard diet + supplement R. All new calves from a large herd are available for use as study units. She selects 30 calves for study. Calves are randomized to the three diets at random (completely randomized design). Initial weights are recorded, then calves are placed on the diets. At the end of four weeks the final weight is taken and weight gain is computed. Simple analysis of variance and associated multiple comparisons procedures indicate no significant differences in weight gain between the two supplementary diets, but big differences between the supplemental diets and the standard diet. Is this the end of the story? …

23-3 ANOVA Results Average Weight Gain (Response g/day) Standard Diet + Supplement Q + Supplement R xx x xx x x xx xxx xxx x xx xx x x x xx x x x Simple ANOVA of a one-way classification would suggest no difference between Supplements Q and R but both different from Standard diet.

23-4 Initial Weights Initial Weight Standard Diet + Supplement Q + Supplement R xx x x xx x x xx xx x xxxx x xx x x x xx xx x x x Plotting of the initial weights by group shows that the groups were not equal when it came to initial weights.

23-5 Weight Gain to Initial Weight Standard Diet Weight (kg) age If animals come into the study at different ages, they have different initial weights and are at different points on the growth curve. Expected weight gains will be different depending on age at entry into study.

23-6 Regression of Initial Weight to Weight Gain Initial Weight (x) Weight Gain (g/day) (Y) If we disregard the age of the animal but instead focus on the initial weight, we see that there is a linear relationship between initial weight and the weight gain expected.

23-7 Covariates Initial weight in the previous example is a covariable or covariate. A covariate is a disturbing variable (confounder), that is, it is known to have an effect on the response. Usually, the covariate can be measured but often we may not be able to control its effect through blocking. In the EXAMPLE, had the animal scientist known that the calves were very variable in initial weight (or age), she could have: Created blocks of 3 or 6 equal weight animals, and randomized treatments to calves within these blocks. This would have entailed some cost in terms of time spent sorting the calves and then keeping track of block membership over the life of the study. It was much easier to simply record the calf initial weight and then use analysis of covariance for the final analysis. In many cases, due to the continuous nature of the covariate, blocking is just not feasible.

23-8 Expectations under H o Initial Weight (x) Expected Weight Gain (g/day) (Y) Under H o : no treatment effects. If all animals had come in with the same initial weight, All three treatments would produce the same weight gain. Average Weight Animal

23-9 Expectations under H A Initial Weight (x) Expected Weight Gain (g/day) (Y) Standard Diet (c) + Supplement Q (q) + Supplement R (r) Under H a : Significant Treatment effects Average Weight Animal WG s WG Q WG R Different treatments produce different weight gains for animals of the same initial weight.

23-10 Different Initial Weights Initial Weight (x) Expected Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r Under H o : no treatment effects. If the average initial weights in the treatment groups differ, the observed weight gains will be different, even if treatments have no effect. WG s WG Q WG R

23-11 Observed Responses under H A Initial Weight (x) Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r q q q q q q q q q q c c c c c c c c c c r r r r r r r r r r Standard Diet + Supplement Q + Supplement R Under H A : Significant Treatment effects Suppose now that different supplements actually do increase weight gain. This translates to animals in different treatment groups following different, but parallel regression lines with initial weight. WG R WG s WG Q What difference in weight gain is due to Initial weight and what is due to Treatment?

23-12 Observed Group Means Initial Weight (x) Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r q q q q q q q q q q c c c c c c c c c c r r r r r r r r r r Standard Diet + Supplement Q + Supplement R Unadjusted treatment means Simple one-way classification ANOVA (without accounting for initial weight) gives us the wrong answer!

23-13 Predicted Average Responses Initial Weight (x) Weight Gain (g/day) (Y) cc c c cc c c cc qq q qqqq q qq r r r rr rr r r r q q q q q q q q q q c c c c c c c c c c r r r r r r r r r r Standard Diet + Supplement Q + Supplement R Expected weight gain is computed for treatments for the average initial weight and comparisons are then made. Adjusted treatment means

23-14 ANCOVA: Objectives The objective of an analysis of covariance is to compare the treatment means after adjusting for differences among the treatments due to differences in the covariate levels for the treatments groups. The analysis proceeds by combining a regression model with an analysis of variance model.

23-15 Model The  i, i=1,…,t, are estimates of how each of the t treatments modifies the overall mean response. (The index j=1,…,n, runs over the n replicates for each treatment.) The slope coefficient, , is a measure of how the average response changes as the value of the covariate changes. The analysis proceeds by fitting a linear regression model with dummy variables to code for the different treatment levels.

23-16 A Priori Assumptions The covariate is related to the response, and can account for variation in the response. Check with a scatterplot of Y vs. X. The covariate is NOT related to the treatments. If Y is related to X, then the variance of the treatment differences is increased relative to that obtained from an ANOVA model without X, which results in a loss of precision. The treatment’s regression equations are linear in the covariate. Check with a scatterplot of Y vs. X, for each treatment. Non-linearity can be accommodated (e.g. polynomial terms, transforms), but analysis may be more complex. The regression lines for the different treatments are parallel. This means there is only one slope in the Y vs. X plots. Non-parallel lines can be accommodated, but this complicates the analysis since differences in treatments will now depend on the value of X.

23-17 Example Four different formulations of an industrial glue are being tested. The tensile strength (response) of the glue is known to be related to the thickness as applied. Five observations on strength (Y) in pounds, and thickness (X) in 0.01 inches are made for each formulation. Here: There are t=4 treatments (formulations of glue). Covariate X is thickness of applied glue. Each treatment is replicated n=5 times at different values of X. FormulationStrengthThickness

23-18 Formulation Profiles

23-19 SAS Program data glue; input FormulationStrengthThickness; datalines; ; run; proc glm; class formulation; model strength = thickness formulation / solution ; lsmeans formulation / stderr pdiff; run; The basic model is a combination of regression and one-way classification.

23-20 Output: Use Type III SS to test significance of each variable Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE Strength Mean Source DF Type I SS Mean Square F Value Pr > F Thickness <.0001 Formulation Source DF Type III SS Mean Square F Value Pr > F Thickness <.0001 Formulation Standard Parameter Estimate Error t Value Pr > |t| Intercept B <.0001 Thickness <.0001 Formulation B Formulation B Formulation B Formulation B... Regression on thickness is significant. No formulation differences. Divide by MSE to get mean squares. MSE

23-21 Least Squares Means (Adjusted Formulation means computed at the average value of Thickness [=12.45]) The GLM Procedure Least Squares Means Strength Standard LSMEAN Formulation LSMEAN Error Pr > |t| Number < < < < Least Squares Means for effect Formulation Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Strength i/j

23-22 ANCOVA in Minitab Formulation Strength Thickness Stat > ANOVA > General Linear Model … > Responses: Strength > Model: Formulation > Covariates: Thickness > Options: Adjusted (Type III) Sums of Squares General Linear Model: Strength versus Formulation Factor Type Levels Values Formulat fixed Source DF Seq SS Adj SS Adj MS F P Thicknes Formulat Error Total Term Coef SE Coef T P Constant Thicknes Formulat

23-23 Factor Plots… > Main Effects Plot > Formulation

23-24 ANCOVA in R > glue <- read.table("glue.txt",header=TRUE) > glue$Formulation <- as.factor(glue$Formulation) > # fit linear models: full, thickness only, formulation only > full.lm <- lm(Strength ~ Formulation + Thickness, data=glue) > thick.lm <- lm(Strength ~ Thickness, data=glue) > formu.lm <- lm(Strength ~ Formulation, data=glue) > > anova(thick.lm,full.lm) Analysis of Variance Table Model 1: Strength ~ Thickness Model 2: Strength ~ Formulation + Thickness Res.Df RSS Df Sum of Sq F Pr(>F) > anova(formu.lm,full.lm) Analysis of Variance Table Model 1: Strength ~ Formulation Model 2: Strength ~ Formulation + Thickness Res.Df RSS Df Sum of Sq F Pr(>F) e-05 *** Test for Formulation differences Test for significance of Thickness

23-25 R > summary(full.lm) Call: lm(formula = Strength ~ Formulation + Thickness, data = glue) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-14 *** Formulation Formulation Formulation Thickness e-05 *** > summary(thick.lm) Call: lm(formula = Strength ~ Thickness, data = glue) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) < 2e-16 *** Thickness e-06 *** Residual standard error: on 18 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 1 and 18 DF, p-value: 4.317e-06 Full model (can be refined by omitting formulation) Reduced model (formulation omitted)

23-26 R Plot lines for full model; but these can all be replaced by single line for reduced model (blue).

23-27 R Check fit of reduced model (with just thickness).