Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.

Similar presentations


Presentation on theme: "Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis."— Presentation transcript:

1 Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis

2 Multiple regression models One response (dependent) variable: –Y–Y More than one predictor (independent variable) variable: –X 1, X 2, X 3 …, X j –number of predictors = p (j = 1 to p) Number of observations = n (i = 1 to n)

3 Forest fragmentation

4 56 forest patches in SE Victoria (Loyn 1987) Response variable: –bird abundance Predictor variables: –patch area (ha) –years isolated (years) –distance to nearest patch (km) –distance to nearest larger patch (km) –stock grazing intensity (1 to 5 scale) –altitude (m)

5 Biomoinitoring with Vallisneria Indicators of sublethal effects of organochlorine contamination –leaf-to-shoot surface area ratio of Vallisneria americana –response variable Predictors: –sediment contamination, plant density, PAR, rivermile, water depth 225 sites in Great Lakes Potter & Lovett-Doust (2001)

6 Regression models Linear model: Sample equation: ...ybbxbx i  01i1i12i2i2 y i =  0 +  1 x i1 +  2 x i2 +.... +  i

7 Example Regression model: (bird abundance) i =  0 +  1 (patch area) i +  2 (years isolated) i +  3 (nearest patch distance) i +  4 (nearest large patch distance) i +  5 (stock grazing) i +  6 (altitude) i +  i

8 Multiple regression plane bird abundance altitude log 10 area

9 Partial regression coefficients H 0 :  1 = 0 Partial population regression coefficient (slope) for Y on X 1, holding all other X’s constant, equals zero Example: –slope of regression of bird abundance against patch area, holding years isolated, distance to nearest patch, distance to nearest larger patch, stock grazing intensity and altitude constant, equals 0.

10 Testing H 0 :  i = 0 Use partial t-tests: t = b i / SE(b i ) Compare with t-distribution with n-2 df Separate t-test for each partial regression coefficient in model Usual logic of t-tests: –reject H 0 if P < 0.05

11 Model comparison Test H 0 :  1 = 0 Fit full model: –y =  0 +  1 x 1 +  2 x 2 +  3 x 3 +… Fit reduced model: –y =  0 +  2 x 2 +  3 x 3 +… Calculate SS extra : –SS Regression(full) - SS Regression(reduced) F = MS extra / MS Residual(full)

12 Overall regression model H 0 :  1 =  2 =... = 0 (all population slopes equal zero) Test of whether overall regression equation is significant Use ANOVA F-test: –variation explained by regression –unexplained (residual) variation

13 Explained variance r 2 proportion of variation in Y explained by linear relationship with X 1, X 2 etc. SS Regression SS Total

14 Forest fragmentation Intercept20.7898.28500.015 Log10 area7.4701.4650.565 <0.001 Log10 distance-0.9072.676-0.0350.736 Log10 ldistance -0.6482.123-0.0350.761 Grazing-1.6680.930-0.2290.079 Altitude0.0200.0240.0790.419 Years-0.0740.045-0.1760.109 r 2 = 0.685, F 6,49 = 17.754, P <0.001 ParameterCoefficientSEStand coeffP

15 Biomoinitoring with Vallisneria ParameterCoefficientSEP Intercept1.0540.5650.063 Sediment contamination1.3520.4820.006 Plant density0.0280.007<0.001 PAR-0.0870.017<0.001 Rivermile1.00 x 10 -4 9.17 x 10 -5 0.277 Water depth0.2460.4860.613

16 Assumptions Normality and homogeneity of variance for response variable Independence of observations Linearity No collinearity

17 Scatterplots Scatterplot matrix (SPLOM) –pairwise plots for all variables Partial regression (added variable) plots –relationship between Y and X j, holding other Xs constant –residuals from Y against all Xs except X j vs residuals from X j against all other Xs –graphs partial regression slope for X j

18 Partial regression plot (log 10 area) -2012 Log 10 Area -20 -10 0 10 20 Bird abundance

19 Regression diagnostics Residual: –observed y i - predicted y i Residual plots: –residual against predicted y i –residual against each X Influence: –Cook’s D statistics

20 Collinearity Collinearity: –predictors correlated Assumption of no collinearity: –predictor variables uncorrelated with (ie. independent of) each other Effect of collinearity: –estimates of  j s and significance tests unreliable

21 Response (Y) and 2 predictors (X 1 and X 2 ) 1. X 1 and X 2 uncorrelated (r = -0.24) coeffsetoltP intercept-0.171.03-0.160.873 X 1 1.130.140.957.86<0.001 X 2 0.120.140.950.860.404 r 2 = 0.787, F = 31.38, P < 0.001 Collinearity

22 intercept0.490.720.690.503 X 1 1.551.210.011.280.219 X 2 -0.451.210.01-0.370.714 2. Rearrange X 2 so X 1 and X 2 highly correlated (r = 0.99) coeffsetoltP r 2 = 0.780, F = 30.05, P < 0.001

23 Checks for collinearity Correlation matrix and/or SPLOM between predictors Tolerance for each predictor: –1-r 2 for regression of that predictor on all others –if tolerance is low (near 0.1) then collinearity is a problem –VIF (variance inflation factor)

24 Forest fragmentation Tolerances: 0.396 – 0.681

25 Solutions to collinearity Drop redundant (correlated) predictors Principal components regression –potentially useful –replace predictors by independent components from PCA on predictor variables Ridge regression –controversial and complex

26 Predictor importance Tests on partial regression slopes Standardised partial regression slopes

27 Predictor importance Change in explained variation –compare fit of full model to reduced model omitting X j Hierarchical partitioning –splits total r 2 for each predictor into independent contribution of each predictor joint contribution of each predictor with other predictors

28 Forest fragmentation PredictorIndependentJointTotal Stand coeff r 2 r 2 r 2 Log10 area0.3150.2320.5480.565 Log10 distance0.0070.0090.016-0.035 Log10 ldistance 0.014<0.0010.014-0.035 Altitude 0.0570.0920.149 0.079 Grazing0.1900.2750.466 -0.229 Years0.1010.1520.253-0.176

29 Interactions Interactive effect of X 1 and X 2 on Y Dependence of partial regression slope of Y against X 1 on the value of X 2 Dependence of partial regression slope of Y against X 2 on the value of X 1 y i =  0 +  1 x i1 +  2 x i2 +  3 x i1 x i2 +  i

30 Forest fragmentation Does effect of grazing on bird abundance depend on area? –log 10 area x grazing interaction Does effect of grazing depend on years since isolation? –grazing x years interaction Etc.

31 Interpreting interactions Interactions highly correlated with individual predictors: –collinearity problem –centring variables (subtracting mean) removes collinearity Simple regression slopes: –slope of Y on X 1 for different values of X 2 –slope of Y on X 2 for different values of X 1 –use if interaction is significant

32 Polynomial regression Modeling some curvilinear relationships Include quadratic (X 2 ) or cubic (X 3 ) etc. Quadratic model: y i =  0 +  1 x i1 +  2 x i1 2 +  i Compare fit with: y i =  0 +  1 x i1 +  i Does quadratic fit better than linear?

33 Local and regional species richness Relationship between local and regional species richness in North America –Caley & Schluter (1997) Two models compared: local spp =  0 +  1 (regional spp) +  2 (regional spp) 2 +  local spp =  0 +  1 (regional spp) + 

34 050100150200250 Regional species richness 0 50 100 150 200 Local species richness Linear Quadratic

35 Model comparison Full model: SS Residual = 376.620, df = 5 Reduced model: SS Residual = 1299.257, df = 6 Difference due to (regional spp) 2 : SS Extra = 922.7, df = 1, MS Extra = 922.7 F = 12.249, P < 0.018 See Quinn & Keough Box 6.6

36 Categorical predictors Convert categorical predictors into multiple continuous predictors –dummy (indicator) variables Each dummy variable coded as 0 or 1 Usually no. of dummy variables = no. groups minus 1

37 Forest fragmentation GrazingGrazing 1 Grazing 2 Grazing 3 Grazing 4 intensity Zero (1)0000 Low (2)1000 Medium (3)0100 High (4)0010 Intense (5)0001 Each dummy variable measures effect of low – intense categories compared to “reference” category – zero grazing

38 Forest fragmentation CoefficientEstSEtP Intercept21.6033.0926.987<0.001 Grazing-2.8540.713-4.005<0.001 Log 10 area6.8901.2905.341<0.001 Intercept15.7162.7675.679<0.001 Grazing 1 0.3832.9120.1310.896 Grazing 2 -0.1892.549-0.0740.941 Grazing 3 -1.5922.976-0.5350.595 Grazing 4 -11.8942.931-4.058<0.001 Log 10 area7.2471.2555.774<0.001

39 Categorical predictors All linear models fit categorical predictors using dummy variables ANOVA models combine dummy variables into single factor effect –partition SS into factor and residual –dummy variable effects often provided by software Models with both categorical (factor) and continuous (covariate) predictors –adjust factor effects based on covariate –reduce residual based on strength of relationship between Y and covariate – more powerful test of factor


Download ppt "Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis."

Similar presentations


Ads by Google