Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Slides:



Advertisements
Similar presentations
Analysis of covariance Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
Advertisements

Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Objectives (BPS chapter 24)
ANALYSIS OF VARIANCE  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors. One factor.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
One-Way Between Subjects ANOVA. Overview Purpose How is the Variance Analyzed? Assumptions Effect Size.
N-way ANOVA. 3-way ANOVA 2 H 0 : The mean respiratory rate is the same for all species H 0 : The mean respiratory rate is the same for all temperatures.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
Intro to Statistics for the Behavioral Sciences PSYC 1900
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Simple Linear Regression Analysis
Multiple Regression and Correlation Analysis
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Multiple Regression Dr. Andy Field.
14-1 Transformations in Statistical Analysis Assumptions of linear statistical models. Types of Transformations Alternatives to Transformations Outline.
Simple Linear Regression Analysis
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
Lecture 5 Correlation and Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Copyright, Gerry Quinn & Mick Keough, 1998 Please do not copy or distribute this file without the authors’ permission Experimental design and analysis.
Understanding Multivariate Research Berry & Sanders.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Overview of our study of the multiple linear regression model Regression models with more than one slope parameter.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter 11: Linear Regression E370, Spring From Simple Regression to Multiple Regression.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Stats Methods at IC Lecture 3: Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 20 Linear and Multiple Regression
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Regression Diagnostics
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Prepared by Lee Revere and John Large
Multiple Regression Chapter 14.
Chapter Fourteen McGraw-Hill/Irwin
Presentation transcript:

Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Multiple regression One response (dependent) variable: –Y–Y More than one predictor (independent variable) variable: –X 1, X 2, X 3 etc. –number of predictors = p Number of observations = n

Example A sample of 51 mammal species (n = 51) Response variable: –total sleep time in hrs/day (y) Predictors: –body weight in kg (x 1 ) –brain weight in g (x 2 ) –maximum life span in years (x 3 ) –gestation time in days (x 4 )

Regression models Population model (equation): y i =  0 +  1 x 1 +  2 x  i Sample equation: y i = b 0 + b 1 x 1 + b 2 x

Example Regression model: sleep = intercept +  1 *bodywt +  2 *brainwt +  3 *lifespan +  4 *gestime

Multiple regression equation Total sleep Log lifespan Log body weight

Partial regression coefficients Ho:  1 = 0 Partial population regression coefficient (slope) for y on x 1, holding all other x’s constant, equals zero Example: –slope of regression of sleep against body weight, holding brain weight, max. life span and gestation time constant, is 0.

Partial regression coefficients Ho:  2 = 0 Partial population regression coefficient (slope) for y on x 2, holding all other x’s constant, equals zero Example: –slope of regression of sleep against brain weight, holding body weight, max. life span and gestation time constant, is 0.

Testing H O :  i = 0 Use partial t-tests: t = b i / SEb i Compare with t-distribution with n-2 df Separate t-test for each partial regression coefficient in model Usual logic of t-tests: –reject H O if P < 0.05

Model comparison To test H O :  1 = 0 Fit full model: –y =  0 +  1 x 1 +  2 x 2 +  3 x 3 +… Fit reduced model: –y =  0 +  2 x 2 +  3 x 3 +… Calculate SS extra : –SS Regression(full) - SS Regression(reduced) F = MS extra / MS Residual(full)

Overall regression model Ho:  1 =  2 =... = 0 (all population slopes equal zero). Test of whether overall regression equation is significant. Use ANOVA F-test: –Variation explained by regression –Unexplained (residual) variation

Regression diagnostics Residual is still observed y - predicted y –Studentised residuals still work Other diagnostics still apply: –residual plots –Cook’s D statistics

Assumptions Normality and homogeneity of variance for response variable Independence of observations Linearity No collinearity

Collinearity Collinearity: –predictors correlated Assumption of no collinearity: –predictor variables are uncorrelated with (ie. independent of) each other Collinearity makes estimates of  i ’s and their significance tests unreliable: –low power for individual tests on  i ’s

Response (y) and 2 predictors (x 1 and x 2 ); n=20 1. x 1 and x 2 uncorrelated (r = -0.24) coeffsetoltP intercept x <0.001 x R 2 = 0.787, F = 31.38, P < Collinearity

intercept x x rearrange x 2 so x 1 and x 2 highly correlated (r = 0.99) coeffsetoltP R 2 = 0.780, F = 30.05, P < 0.001

Checks for collinearity Correlation matrix between predictors Tolerance for each predictor: –1-R 2 for regression of that predictor on all others –if tolerance is low (<0.1) then collinearity is a problem Variance inflation factor (VIF) for each predictor: –1/tolerance –if VIF>10 then collinearity is a problem

Explained variance R 2 proportion of variation in y explained by linear relationship with x 1, x 2 etc. SS Regression SS Total

Example SleepBodywtBrainwtLifespanGestime etc. African elephant Arctic fox etc.

Boxplots of variables

Collinearity problem for body weight and brain weight low tolerance highly correlated ParameterEstimateSEToltP Intercept <0.001 Bodywt Brainwt Lifespan Gestime R 2 = Predictors log transformed

No collinearity between any predictors: all tolerances OK reduced SE and larger slope for body weight ParameterEstimateSEToltP Intercept <0.001 Bodwt Lifespan Gestime R 2 = Omit brain weight because body weight and brain weight are so highly correlated.

Examples from literature

Lampert (1993) Ecology 74: Response variable: –Daphnia (water flea) clutch size Predictors: –body size (mm) –particulate organic carbon (mg/L) –temperature ( o C)

Lampert (1993) ParameterCoeff.SEtP Intercept Body size POC Temp ANOVA P = 0.052, R 2 = 0.684, n = 11

Williams et al. (1993) Ecology 74: Response variable: –Zostera (seagrass) growth Predictors: –epiphyte biomass –porewater ammonium

Williams et al. (1993) ParameterCoeff.P Epiphyte biomass0.340>0.05 Porewater ammonium0.919<0.05 R 2 = 0.71 Tolerance = (so no collinearity)