Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Similar presentations


Presentation on theme: "Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors."— Presentation transcript:

1 Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

2 Multiple regression One response (dependent) variable: –Y–Y More than one predictor (independent variable) variable: –X 1, X 2, X 3 etc. –number of predictors = p Number of observations = n

3 Example A sample of 51 mammal species (n = 51) Response variable: –total sleep time in hrs/day (y) Predictors: –body weight in kg (x 1 ) –brain weight in g (x 2 ) –maximum life span in years (x 3 ) –gestation time in days (x 4 )

4 Regression models Population model (equation): y i =  0 +  1 x 1 +  2 x 2 +.... +  i Sample equation: y i = b 0 + b 1 x 1 + b 2 x 2 +....

5 Example Regression model: sleep = intercept +  1 *bodywt +  2 *brainwt +  3 *lifespan +  4 *gestime

6 Multiple regression equation Total sleep Log lifespan Log body weight

7 Partial regression coefficients Ho:  1 = 0 Partial population regression coefficient (slope) for y on x 1, holding all other x’s constant, equals zero Example: –slope of regression of sleep against body weight, holding brain weight, max. life span and gestation time constant, is 0.

8 Partial regression coefficients Ho:  2 = 0 Partial population regression coefficient (slope) for y on x 2, holding all other x’s constant, equals zero Example: –slope of regression of sleep against brain weight, holding body weight, max. life span and gestation time constant, is 0.

9 Testing H O :  i = 0 Use partial t-tests: t = b i / SEb i Compare with t-distribution with n-2 df Separate t-test for each partial regression coefficient in model Usual logic of t-tests: –reject H O if P < 0.05

10 Model comparison To test H O :  1 = 0 Fit full model: –y =  0 +  1 x 1 +  2 x 2 +  3 x 3 +… Fit reduced model: –y =  0 +  2 x 2 +  3 x 3 +… Calculate SS extra : –SS Regression(full) - SS Regression(reduced) F = MS extra / MS Residual(full)

11 Overall regression model Ho:  1 =  2 =... = 0 (all population slopes equal zero). Test of whether overall regression equation is significant. Use ANOVA F-test: –Variation explained by regression –Unexplained (residual) variation

12 Regression diagnostics Residual is still observed y - predicted y –Studentised residuals still work Other diagnostics still apply: –residual plots –Cook’s D statistics

13 Assumptions Normality and homogeneity of variance for response variable Independence of observations Linearity No collinearity

14 Collinearity Collinearity: –predictors correlated Assumption of no collinearity: –predictor variables are uncorrelated with (ie. independent of) each other Collinearity makes estimates of  i ’s and their significance tests unreliable: –low power for individual tests on  i ’s

15 Response (y) and 2 predictors (x 1 and x 2 ); n=20 1. x 1 and x 2 uncorrelated (r = -0.24) coeffsetoltP intercept-0.171.03-0.160.873 x 1 1.130.140.957.86<0.001 x 2 0.120.140.950.860.404 R 2 = 0.787, F = 31.38, P < 0.001 Collinearity

16 intercept0.490.720.690.503 x 1 1.551.210.011.280.219 x 2 -0.451.210.01-0.370.714 2. rearrange x 2 so x 1 and x 2 highly correlated (r = 0.99) coeffsetoltP R 2 = 0.780, F = 30.05, P < 0.001

17 Checks for collinearity Correlation matrix between predictors Tolerance for each predictor: –1-R 2 for regression of that predictor on all others –if tolerance is low (<0.1) then collinearity is a problem Variance inflation factor (VIF) for each predictor: –1/tolerance –if VIF>10 then collinearity is a problem

18 Explained variance R 2 proportion of variation in y explained by linear relationship with x 1, x 2 etc. SS Regression SS Total

19 Example SleepBodywtBrainwtLifespanGestime 3.36654.0005712.038.6645 12.53.38544.514.060 etc. African elephant Arctic fox etc.

20 Boxplots of variables

21 Collinearity problem for body weight and brain weight low tolerance highly correlated ParameterEstimateSEToltP Intercept18.943.116.09<0.001 Bodywt-0.761.310.08-0.580.565 Brainwt-0.842.030.05-0.420.680 Lifespan2.602.050.331.270.211 Gestime-5.111.810.36-2.820.007 R 2 = 0.486 Predictors log transformed

22 No collinearity between any predictors: all tolerances OK reduced SE and larger slope for body weight ParameterEstimateSEToltP Intercept19.063.076.21<0.001 Bodwt-1.250.590.36-2.090.042 Lifespan2.191.780.431.230.225 Gestime-5.391.670.42-3.230.002 R 2 = 0.484 Omit brain weight because body weight and brain weight are so highly correlated.

23 Examples from literature

24 Lampert (1993) Ecology 74:1455-1466 Response variable: –Daphnia (water flea) clutch size Predictors: –body size (mm) –particulate organic carbon (mg/L) –temperature ( o C)

25 Lampert (1993) ParameterCoeff.SEtP Intercept-42.3427.52-1.540.168 Body size14.767.102.080.076 POC0.270.430.610.559 Temp0.730.681.070.321 ANOVA P = 0.052, R 2 = 0.684, n = 11

26 Williams et al. (1993) Ecology 74:904-918 Response variable: –Zostera (seagrass) growth Predictors: –epiphyte biomass –porewater ammonium

27 Williams et al. (1993) ParameterCoeff.P Epiphyte biomass0.340>0.05 Porewater ammonium0.919<0.05 R 2 = 0.71 Tolerance = 0.839 (so no collinearity)


Download ppt "Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors."

Similar presentations


Ads by Google