Experimental design and analysis Multiple linear regression Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.
Multiple regression One response (dependent) variable: –Y–Y More than one predictor (independent variable) variable: –X 1, X 2, X 3 etc. –number of predictors = p Number of observations = n
Example A sample of 51 mammal species (n = 51) Response variable: –total sleep time in hrs/day (y) Predictors: –body weight in kg (x 1 ) –brain weight in g (x 2 ) –maximum life span in years (x 3 ) –gestation time in days (x 4 )
Regression models Population model (equation): y i = 0 + 1 x 1 + 2 x i Sample equation: y i = b 0 + b 1 x 1 + b 2 x
Example Regression model: sleep = intercept + 1 *bodywt + 2 *brainwt + 3 *lifespan + 4 *gestime
Multiple regression equation Total sleep Log lifespan Log body weight
Partial regression coefficients Ho: 1 = 0 Partial population regression coefficient (slope) for y on x 1, holding all other x’s constant, equals zero Example: –slope of regression of sleep against body weight, holding brain weight, max. life span and gestation time constant, is 0.
Partial regression coefficients Ho: 2 = 0 Partial population regression coefficient (slope) for y on x 2, holding all other x’s constant, equals zero Example: –slope of regression of sleep against brain weight, holding body weight, max. life span and gestation time constant, is 0.
Testing H O : i = 0 Use partial t-tests: t = b i / SEb i Compare with t-distribution with n-2 df Separate t-test for each partial regression coefficient in model Usual logic of t-tests: –reject H O if P < 0.05
Model comparison To test H O : 1 = 0 Fit full model: –y = 0 + 1 x 1 + 2 x 2 + 3 x 3 +… Fit reduced model: –y = 0 + 2 x 2 + 3 x 3 +… Calculate SS extra : –SS Regression(full) - SS Regression(reduced) F = MS extra / MS Residual(full)
Overall regression model Ho: 1 = 2 =... = 0 (all population slopes equal zero). Test of whether overall regression equation is significant. Use ANOVA F-test: –Variation explained by regression –Unexplained (residual) variation
Regression diagnostics Residual is still observed y - predicted y –Studentised residuals still work Other diagnostics still apply: –residual plots –Cook’s D statistics
Assumptions Normality and homogeneity of variance for response variable Independence of observations Linearity No collinearity
Collinearity Collinearity: –predictors correlated Assumption of no collinearity: –predictor variables are uncorrelated with (ie. independent of) each other Collinearity makes estimates of i ’s and their significance tests unreliable: –low power for individual tests on i ’s
Response (y) and 2 predictors (x 1 and x 2 ); n=20 1. x 1 and x 2 uncorrelated (r = -0.24) coeffsetoltP intercept x <0.001 x R 2 = 0.787, F = 31.38, P < Collinearity
intercept x x rearrange x 2 so x 1 and x 2 highly correlated (r = 0.99) coeffsetoltP R 2 = 0.780, F = 30.05, P < 0.001
Checks for collinearity Correlation matrix between predictors Tolerance for each predictor: –1-R 2 for regression of that predictor on all others –if tolerance is low (<0.1) then collinearity is a problem Variance inflation factor (VIF) for each predictor: –1/tolerance –if VIF>10 then collinearity is a problem
Explained variance R 2 proportion of variation in y explained by linear relationship with x 1, x 2 etc. SS Regression SS Total
Example SleepBodywtBrainwtLifespanGestime etc. African elephant Arctic fox etc.
Boxplots of variables
Collinearity problem for body weight and brain weight low tolerance highly correlated ParameterEstimateSEToltP Intercept <0.001 Bodywt Brainwt Lifespan Gestime R 2 = Predictors log transformed
No collinearity between any predictors: all tolerances OK reduced SE and larger slope for body weight ParameterEstimateSEToltP Intercept <0.001 Bodwt Lifespan Gestime R 2 = Omit brain weight because body weight and brain weight are so highly correlated.
Examples from literature
Lampert (1993) Ecology 74: Response variable: –Daphnia (water flea) clutch size Predictors: –body size (mm) –particulate organic carbon (mg/L) –temperature ( o C)
Lampert (1993) ParameterCoeff.SEtP Intercept Body size POC Temp ANOVA P = 0.052, R 2 = 0.684, n = 11
Williams et al. (1993) Ecology 74: Response variable: –Zostera (seagrass) growth Predictors: –epiphyte biomass –porewater ammonium
Williams et al. (1993) ParameterCoeff.P Epiphyte biomass0.340>0.05 Porewater ammonium0.919<0.05 R 2 = 0.71 Tolerance = (so no collinearity)