Download presentation
Presentation is loading. Please wait.
Published byShauna Quinn Modified over 9 years ago
1
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 1 Multiple linear regression When and why we use it The general multiple regression model Hypothesis testing in multiple regression The problem of multicollinearity Multiple regression procedures Polynomial regression Power analysis in multiple regression
2
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 2 Some GLM procedures *either categorical or treated as a categorical variable
3
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 3 When do we use multiple regression? to compare the relationship between a continuous dependent (Y) variable and several continuous independent (X 1, X 2, …) variables e.g. relationship between lake primary production, phosphorous concentration and zooplankton abundance Log [P] Log Production Log [P] Log Production Log [Zoo]
4
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 4 The general model is: which defines a k- dimensional plane, where = intercept, j = partial regression coefficient of Y on X j, X ij is value of ith observation of dependent variable X j, and i is the residual of the ith observation. The multiple regression model: general form X2X2 X1X1 Y X2X2 X1X1 Y, X 1, X 2 ^ Y X, X 1 2.
5
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 5 What is the partial regression coefficient anyway? j is the rate of change in Y per change in X j with all other variables held constant; this is not the slope of the regression of Y on X j, pooled over all other variables! Partial regression Simple (pooled) regression -4-2024 -8 -4 0 4 8 X1X1 Y X 2 = 3 X 2 = 1 X 2 = -1 X 2 = -3
6
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 6 The effect of scale Two independent variables on different scales will have different slopes, even if the proportional change in Y is the same. So, if we want to measure the relative strength of the influence of each variable on Y, we must eliminate the effect of different scales. Y j = 2 4 2 0 1 2 XjXj Y j =.02 4 2 0 100 200
7
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 7 Since j depends on the size of X j, to examine the relative effect of each independent variable we must standardize the regression coefficients by first transforming all variables and fitting the regression model based on the transformed variables. The standardized coefficients j * estimate the relative strength of the influence of variable X j on Y. The multiple regression model: standardized form
8
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 8 Regression coefficients: summary Partial regression coefficient: equals the slope of the regression of Y on X j when all other independent variables are held constant. Standardized partial regression coefficient: the rate of change of Y in standard deviation units per one standard deviation of X j with all other independent variables held constant.
9
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 9 Assumptions independence of residuals homoscedasticity of residuals linearity (Y on all X) no error on independent variables normality of residuals
10
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 10 Hypothesis testing in simple linear regression: partitioning the total sums of squares Total SS Model (Explained) SS Unexplained (Error) SS Y = +
11
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 11 Partition total sums of squares into model and residual SS: Hypothesis testing in multiple regression I: partitioning the total sums of squares X2X2 X1X1 Y Model SS Total SS Residual SS
12
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 12 Hypothesis testing I: partitioning the total sums of squares So, MS model = s 2 Y and MS error = 0 if observed = expected for all i. Calculate F = MS model /MS error and compare with F distribution with k and N – (k+1) df. k=number of independent variables H 0 : F = 1
13
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 13 Hypothesis testing II: testing individual partial regression coefficients Test each hypothesis by a t- test: Note: these are 2-tailed hypotheses! YY H 02 : 2 = 0, accepted X 2, X 1 fixed X 1 = 2 X 1 = 3 YY X 1, X 2 fixed H 01 : = 0, rejected X 2 = 1 X 2 = 2
14
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 14 Multicollinearity Independent variables are correlated, and therefore, not independent: evaluate by looking at covariance or correlation matrix. Variance Covariance X1X1 colinear X2X2 independent X3X3 X2X2
15
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 15 Multicollinearity: problems If two independent variables X 1 and X 2 are uncorrelated, then the model sums of squares for a linear model with both included equals the sum of the SS model for each considered separately. But if they are correlated, the former will be less than the latter. So, the real question is: given a model with X 1 included, how much does SS model increase when X 2 is also included (or vice versa)?
16
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 16 Multicollinearity: consequences inflated standard errors for regression coefficients sensitivity of parameter estimates to small changes in data But, estimates of partial regression coefficients remain unbiased. One or more independent variables may not appear in the final regression model not because they do not covary with Y, but because they covary with another X.
17
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 17 Detecting multicollinearity high R 2 but few or no significant t-tests for individual independent variables high pairwise correlations between X’s high partial correlations among regressors (independent variables are a linear combination of others) Eigenvalues, condition index, tolerance and variance inflation factors
18
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 18 Quantifying the effect of multicollinearity Eigenvectors: a set of “lines” 1, 2,…, k in a k-dimensional space which are orthogonal to each other Eigenvalue: the magnitude (length) of the corresponding eigenvector X2X2 X1X1 11 22 X2X2 X1X1 11 22 1 2
19
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 19 Quantifying the effect of multicollinearity Eigenvalues: if all k eigenvalues are approximately equal, multicollinearity is low. Condition index: sqrt( l / s ); near 1 indicates low multicollinearity. Tolerance: 1 - proportion of variance in each independent variable accounted for by all other independent variables: near 1 indicates low multicollinearity. X2X2 X1X1 Low correlation 1 = 2 X2X2 X1X1 High correlation 1 >> 2
20
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 20 Remedial measures Get more data to reduce correlations. Drop some variables. Use principal component or ridge regression, which yield biased estimates but with smaller standard errors.
21
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 21 Multiple regression: the general idea Evaluate significance of a variable by fitting two models: one with the term in, the other with it removed. Test for change in model fit ( MF) associated with removal of the term in question. Unfortunately, M F may depend on what other variables are in model if there is multicollinearity! Model A (X 1 in) Model B (X 2 out) M F (e.g. R 2 ) Delete X 1 ( small) Retain X 1 ( large)
22
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 22 Fitting multiple regression models Goal: find the “best” model, given the available data. Problem 1: what is “best”? –highest R 2 ? –lowest RMS? –highest R 2 but contains only individually significant independent variables? –maximizes R 2 with minimum number of independent variables?
23
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 23 Selection of independent variables (cont’d) Problem 2: even if “best” is defined, by what method do we find it? Possibilities: –compute all possible models (2 k -1) and choose the best one. –use some procedure for winnowing down the set of possible models.
24
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 24 Strategy I: computing all possible models Compute all possible models and choose the “best” one. cons: –time-consuming –leaves definition of “best” to researcher pros: –if the “best” model is defined, you will find it! {X 1, X 2, X 3 } {X2}{X2} {X1}{X1} {X3}{X3} {X1, X2}{X1, X2} {X2, X3}{X2, X3} {X1, X3}{X1, X3} {X1, X2, X3}{X1, X2, X3}
25
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 25 Strategy II: forward selection Start with variable that has highest (significant) R 2, i.e. highest partial correlation coefficient r. Add others one at a time until no further significant increase in R 2 with j s recomputed at each step. problem: if X j is included, it stays in even if it contributes little to the SS model once other variables are included. {X 1, X 2, X 3 } {X2}{X2} r 2 > r 1 > r 3 {X1, X2, X3}{X1, X2, X3} {X1, X2}{X1, X2} R R 2 R R 21 R 21 R 2 R 21 R 2 {X2}{X2} {X1, X2, X3}{X1, X2, X3} Final model R 123 R 21 {X 1, X 2 } R 123 R 21
26
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 26 Forward selection: order of entry Begin with the variable with the highest partial correlation coefficient. Next entry is that variable which gives largest increase in overall R 2 by F-test of significance of increase, above some specified F-to-enter (below specified p to enter) value. {X 1, X 2, X 3, X 4 } {X2}{X2} r 2 > r 1 > r 3 > r 4 {X2, X1}{X2, X1} {X2, X4}{X2, X4} p[F(X 2, X 4 )] =.55 X 4 eliminated p to enter =.05 {X2, X3}{X2, X3}{X2, X1}{X2, X1} p[F(X 2 )] =.001 p[F(X 2, X 1 )] =.002 p[F(X 2, X 3 )] =.04... {X2, X3}{X2, X3}
27
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 27 Strategy III: backward selection Start with all variables. Drop variables whose removal does not significantly reduce R 2, one at a time, starting with the one with the lowest partial correlation coefficient. But, once X j is dropped, it stays out even if it explains a significant amount of the remaining variability once other variables are excluded. {X 1, X 2, X 3 } {X3}{X3} r 2 < r 1 < r 3 {X1, X3}{X1, X3} R R 13 R 3 R 13 R 13 R 123 {X3}{X3} {X1, X2, X3}{X1, X2, X3} Final model R R 123 R 13 R 123 R 3 R 13 {X1, X3}{X1, X3}
28
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 28 Backward selection: order of entry Begin with the variable with the smallest partial correlation coefficient. Next removal is that variable which gives the smallest increase in overall R 2 by F-test of significance of increase, below some specified F-to-remove (above specified p to remove) value. {X 1, X 2, X 3, X 4 } {X 2, X 1, X 3 } r 2 > r 1 > r 3 > r 4 {X2, X1}{X2, X1} p[F(X 2, X 1 )] =.25 p to remove =.10 p[F(X 2, X 3 )] =.001... p[F(X 2, X 1, X 3 )] =.44 X 4 removed X 3 removedX 1, X 2 still in X 2, X 3, X 1 still in {X1, X3}{X1, X3}{X2, X3}{X2, X3} p[F(X 1, X 3 )] =.009
29
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 29 Strategy IV: stepwise selection Once a variable is included (removed), set of remaining variables is scanned for other variables that should now be deleted (included), including those added (removed) at earlier stages. To avoid infinite loops, we usually set p to enter > p to remove. {X 1, X 2, X 3, X 4 } {X2}{X2} r 2 > r 1 > r 4 > r 3 {X 1, X 2, X 3 } {X2, X4}{X2, X4} p[F(X 2, X 4 )] =.03 p to enter =.10 p to remove =.05 {X2, X3}{X2, X3}{X2, X1}{X2, X1} p[F(X 2 )] =.001 p[F(X 2, X 1 )] =.002 p[F(X 2, X 3 )] =.09 {X 1, X 2, X 4 } p[F(X 1, X 2, X 4 )] =.02 p[F(X1, X 2, X 3 )] =.19 {X1, X4}{X1, X4}
30
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 30 Example log of herptile species richness (logherp) as a function of log wetland area (logarea), percentage of land within 1 km covered in forest (cpfor2) and density of hard-surface roads within 1 km (thtdens)
31
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 31 Example (all variables) Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.2848 0.1914 1.4876 0.1499 LOGAREA 0.2285 0.0576 3.9636 0.0006 CPFOR2 0.0011 0.0014 0.7740 0.4465 THTDEN -0.0358 0.0157 -2.2761 0.0321 Residual standard error: 0.1619 on 24 degrees of freedom Multiple R-Squared: 0.5471 F-statistic: 9.662 on 3 and 24 degrees of freedom, the p-value is 0.0002291 2 observations deleted due to missing values
32
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 32 Example: forward stepwise *** Stepwise Regression *** *** Stepwise Model Comparisons *** Start: AIC= 0.8392 LOGHERP ~ LOGAREA + CPFOR2 + THTDEN Single term deletions Model:LOGHERP ~ LOGAREA + CPFOR2 + THTDEN scale: 0.02622382 Df Sum of Sq RSS Cp 0.629372 0.839162 LOGAREA 1 0.4119815 1.041353 1.198696 CPFOR2 1 0.0157081 0.645080 0.802423 THTDEN 1 0.1358509 0.765223 0.922566
33
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 33 Forward stepwise (cont’d) Step: AIC= 0.8024 LOGHERP ~ LOGAREA + THTDEN Single term deletions Model: LOGHERP ~ LOGAREA + THTDEN scale: 0.02622382 Df Sum of Sq RSS Cp 0.645080 0.802423 LOGAREA 1 0.4020378 1.047118 1.152013 THTDEN 1 0.2509250 0.896005 1.000900 Single term additions Model: LOGHERP ~ LOGAREA + THTDEN scale: 0.02622382 Df Sum of Sq RSS Cp 0.6450798 0.8024227 CPFOR2 1 0.01570813 0.6293717 0.8391622
34
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 34 Forward stepwise: final model *** Linear Model *** Call: lm(formula = LOGHERP ~ LOGAREA + THTDEN, data = Mregdat, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.3158 -0.1233 0.02095 0.132 0.3167 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.3763 0.1493 2.5213 0.0184 LOGAREA 0.2250 0.0570 3.9473 0.0006 THTDEN -0.0420 0.0135 -3.1184 0.0045 Residual standard error: 0.1606 on 25 degrees of freedom Multiple R-Squared: 0.5358 F-statistic: 14.43 on 2 and 25 degrees of freedom, the p-value is 0.00006829
35
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 35 Example: backward stepwise (final model) *** Linear Model *** Call: lm(formula = LOGHERP ~ LOGAREA + THTDEN, data = Mregdat, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.3158 -0.1233 0.02095 0.132 0.3167 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.3763 0.1493 2.5213 0.0184 LOGAREA 0.2250 0.0570 3.9473 0.0006 THTDEN -0.0420 0.0135 -3.1184 0.0045 Residual standard error: 0.1606 on 25 degrees of freedom Multiple R-Squared: 0.5358 F-statistic: 14.43 on 2 and 25 degrees of freedom, the p- value is 0.00006829
36
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 36 Example: subset model Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.0271 0.1667 0.1623 0.8724 LOGAREA 0.2478 0.0616 4.0224 0.0005 CPFOR2 0.0027 0.0013 2.0670 0.0492 Residual standard error: 0.175 on 25 degrees of freedom Multiple R-Squared: 0.4493 F-statistic: 10.2 on 2 and 25 degrees of freedom, the p-value is 0.0005774 2 observations deleted due to missing values
37
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 37 What if relationship between Y and one or more X’s is nonlinear? Option 1: transform data. Option 2: use non-linear regression. Option 3: use polynomial regression.
38
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 38 In polynomial regression, the regression model includes terms of increasingly higher powers of the dependent variable. The polynomial regression model
39
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 39 Fit simple linear model. Fit model with quadratic, test for increase in SS model. Continue with higher order (cubic, quartic, etc.) until there is no further significant increase in SS model. Include terms of order up to the power of (number of points of inflexion plus 1). The polynomial regression model: procedure
40
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 40 Polynomial regression: caveats The biological significance of the higher order terms in a polynomial regression (if any) is generally not known. By definition, polynomial terms are strongly correlated; hence, standard errors will be large (precision is low), and increase with the order of the term. Extrapolation of polynomial models is always nonsense. X1X1 Y Y = X 1 - X 1 2
41
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 41 Partial and total R 2 The total R 2 (R 2 YB ) is the proportion of variance in Y accounted for (explained by) a set of independent variables B. The partial R 2 (R 2 YA,B - R 2 YA ) is the proportion of variance in Y accounted for by B when the variance accounted for by another set A is removed. Proportion of variance accounted for by both A and B (R 2 YA,B ) Proportion of variance accounted for by A only (R 2 YA )(total R 2 ) Proportion of variance accounted for by B independent of A (R 2 YA,B - R 2 YA ) (partial R 2 )
42
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 42 Partial and total R 2 The total R 2 (R 2 YB ) for set B equals the partial R 2 (R 2 YA,B - R 2 YA ) with respect to set B if either (1) the total R 2 for A (R 2 YA ) is zero, or (2) if A and B are independent (in which case R 2 YA,B = R 2 YA + R 2 YB ). Proportion of variance accounted for by B (R 2 YB )(total R 2 ) Proportion of variance independent of A (R 2 YA,B - R 2 YA ) (partial R 2 ) A Y B A Equal iff
43
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 43 Partial and total R 2 in multiple regression Suppose we have three independent variables X 1,X 2 and X 3. Log [P] Log Production Log [Zoo]
44
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 44 Defining effect size in multiple regression The effect size, denoted f 2 is given by the ratio of the factor (source) R 2 factor and the appropriate error R 2 error. Note: both R 2 factor and R 2 error depend on the null hypothesis under investigation.
45
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 45 Case 1: a set B of variables {X 1, X 2, …} is related to Y, and the total R 2 (R 2 YB ) is determined. The error variance proportion is then 1- R 2 YB. H 0 : R 2 YB = 0 Example: effect of wetland area, surrounding forest cover, and surrounding road densities on herptile species richness in southeastern Ontario wetlands B ={LOGAREA, CPFOR2,THTDEN } Defining effect size in multiple regression: case 1
46
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 46 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.2848 0.1914 1.4876 0.1499 LOGAREA 0.2285 0.0576 3.9636 0.0006 CPFOR2 0.0011 0.0014 0.7740 0.4465 THTDEN -0.0358 0.0157 -2.2761 0.0321 Residual standard error: 0.1619 on 24 degrees of freedom Multiple R-Squared: 0.5471 F-statistic: 9.662 on 3 and 24 degrees of freedom, the p-value is 0.0002291 2 observations deleted due to missing values
47
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 47 Defining effect size in multiple regression: case 2 Case 2: the proportion of variance of Y due to B over and above that due to A is determined (R 2 YA,B - R 2 YA ). The error variance proportion is then 1- R 2 YA,B. H 0 : R 2 YA,B - R 2 YA = 0 Example: herptile richness in southeastern Ontario wetlands B ={THTDEN}, A = {LOGAREA, CPFOR2},AB = {LOGAREA, CPFOR2, THTDEN}
48
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 48 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.2848 0.1914 1.4876 0.1499 LOGAREA 0.2285 0.0576 3.9636 0.0006 CPFOR2 0.0011 0.0014 0.7740 0.4465 THTDEN -0.0358 0.0157 -2.2761 0.0321 Residual standard error: 0.1619 on 24 degrees of freedom Multiple R-Squared: 0.5471 F-statistic: 9.662 on 3 and 24 degrees of freedom, the p-value is 0.0002291 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.0271 0.1667 0.1623 0.8724 LOGAREA 0.2478 0.0616 4.0224 0.0005 CPFOR2 0.0027 0.0013 2.0670 0.0492 Residual standard error: 0.175 on 25 degrees of freedom Multiple R-Squared: 0.4493 F-statistic: 10.2 on 2 and 25 degrees of freedom, the p-value is 0.0005774
49
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 49 Defining effect size in multiple regression: case 2 The proportion of variance of LOGHERP due to THTDEN (B) over and above that due to LOGAREA and CPFOR2 (A) is R 2 YA,B - R 2 YA =.098. The error variance proportion is then 1- R 2 YA,B = 1 -.547. So effect size for variable THTDEN is 0.216.
50
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 50 Determining power Once f 2 has been determined, either a priori (as an alternate hypothesis) or a posteriori (the observed effect size), calculate non- central F parameter . knowing and factor (source) ( 1 ) and error ( 2 ) degrees of freedom, we can determine power from appropriate tables for given . =.05) =.01) Decreasing 2 1- 1 = 2 =.05 =.01 2 3 4 5 1 1.5 2 2.5
51
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 51 Example: herptile richness in southeastern Ontario wetlands sample of 28 wetlands 3 variables (LOGAREA, CPFOR2, THTDEN) Dependent variable is log 10 of the number of herptile species. What is probability of detecting a true effect size for CPFOR2 equal to the estimated effect size once effects of LOGAREA and THTDEN have been controlled for, given = 0.05?
52
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 52 Example: herptile richness in southeastern Ontario wetlands Sample effect size f 2 for CPFOR2 once effects of LOGAREA and THTDEN have been controlled for =.024. Source (CPFOR2) df = 1 = 1 Error df = 2 = 28 - 1 - 1 - 1 = 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.