Chapter 9 Multiple Linear Regression

Slides:



Advertisements
Similar presentations
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Advertisements

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Multiple Regression and Correlation Analysis
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Simple Linear Regression Analysis
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Correlation & Regression
Objectives of Multiple Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Selecting Variables and Avoiding Pitfalls Chapters 6 and 7.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Chapter 2 HYPOTHESIS TESTING
Chapter 20 Linear and Multiple Regression
Regression and Correlation
Chapter 12 Trend Analysis
Chapter 15 Multiple Regression and Model Building
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Regression Analysis AGEC 784.
Inference for Least Squares Lines
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 11 LOcally-WEighted Scatterplot Smoother (Lowess)
Chapter 3 INTERVAL ESTIMATES
Correlation, Bivariate Regression, and Multiple Regression
Chapter 11 Analysis of Covariance
Multiple Regression Analysis: Estimation
Chapter 4 Comparing Two Groups of Data
Correlation and Simple Linear Regression
Multiple Regression Analysis and Model Building
Multiple Regression and Model Building
Multiple Regression Analysis
Regression model with multiple predictors
Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Simple Linear Regression
Correlation and Regression
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
Prepared by Lee Revere and John Large
Multiple Regression Models
CHAPTER- 17 CORRELATION AND REGRESSION
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Multiple Linear Regression
Regression Forecasting and Model Building
Chapter 13 Additional Topics in Regression Analysis
3.2. SIMPLE LINEAR REGRESSION
Multiple Regression Berlin Chen
MGS 3100 Business Analysis Regression Feb 18, 2016
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Chapter 9 Multiple Linear Regression BAE 5333 Applied Water Resources Statistics Biosystems and Agricultural Engineering Department Division of Agricultural Sciences and Natural Resources Oklahoma State University Source Dr. Dennis R. Helsel & Dr. Edward J. Gilroy 2006 Applied Environmental Statistics Workshop and Statistical Methods in Water Resources

Multiple Linear Regression Y = bo + b1X1 + b2X2 + … + bkXk + e Y = response or dependant variable bo = intercept bi = slopes Xi = explanatory or independent variables e = error term

Multiple Linear Regression Model Y X1 X2 Y = 0 + 1X1 + 2X2 e (positive) e (negative) Source: http://www.swlearning.com/quant/kvanli/sixth_edition/powerpoint/ch15.ppt

Multiple Linear Regression Parametric method of fitting a surface Same assumptions as Simple Linear Regression Linear pattern of data (in 3+ dimensions) Variance of residuals is constant for all X Normal distribution of residuals

Biggest Issue in Multiple Linear Regression: Multicollinearity Cause Redundant variables More than one X variable explaining same effect Symptoms Slope coefficients with signs that make no sense Two variables describing same effect with opposite signs Stepwise, backwards, forwards methods give different results (more later)

Biggest Issue in Multiple Linear Regression: Multicollinearity Measure with Variance Inflation Factor, VIF Measures the correlation (not just pair wise) among X variables Has NOTHING to do with the response variable Y Want all VIFs < 10 Solutions Drop one or more redundant variables Alternate design – collect additional data

Biggest Issue in Multiple Linear Regression: Multicollinearity Issues Regression equation is still VALID with multicollinearity Cannot put any physical meaning to the sign and magnitude of the coefficients You should NEVER apply a regression equation outside the range of data that were used to develop it

Regression Model with Two Variables X1 & X2 are independent X1 & X2 are partially correlated, introducing multicollinearity X1 & X2 are perfectly correlated, and thus are redundant variables Source: http://www.grapentine.com/articles/multicol.pdf

Hypothesis Tests for MLR t-test for each slope coefficient Null Hypothesis Slope = 0 No influence of a X on Y Do not include in model Alternative Hypothesis Slope ≠ 0 X influences Y Keep variable in equation Y = bo + b1X1 + b2X2 + … + bkXk + e

Partial t-test Partial t-test Ho: bj=0 Ha: bj≠0 Reject Ho when p-value < α (2 sided) Multicollinearity inflates SE(bj) and hence lowers tj

Overall F-test Null Hypothesis All slopes = 0 Implies best estimate of Y is the mean of Y Alternative Hypothesis At least one slope ≠ 0 Current model better than no model Does not imply this is the best model Reject Ho when F is large, p-value < α (2-sided test)

How to Build a Good Regression Model Choose the best units for Y Run regression with all variables Check for non-constant variance with residual plots

How to Build a Good Regression Model Choose the best units for X using partial plots – want a linear relationship

Partial Plots Shows the relationship between an explanatory variable (Xi) and the response variable (Y) given other independent variables are in the model Simple plot of Y vs. Xi will not show this because it doesn’t consider the other Xis Want plot to be linear Curvature indicates a transformation is required for Xi

Partial Regression Plots Added Value or Leverage Plots May not show proper relationship If several variables already in the model are incorrectly specified If strong multicollinearity exists Plots the residuals Yi from the regression of Y on all X’s except Xi and the residuals from Xi regressed on all other X’s. In other words it plots the relationship between Y and Xi that remains when the effects of Xi+1, …, Xk are removed Good for diagnosing outliers & determining if a variable should be included in the model

Partial Residual Plots Adjusted Variable or Component Plots Show the relationships between each explanatory variable (Xi) and the response variable (Y) not explained by all other variables Primarily used to identify violations of the linearity assumption Good for diagnosing nonlinearity

Partial Residual Plots Adjusted Variable or Component Plots Partial Residual, ej* ej* y = observed dependant variable y(j) = predicted y from regression equation where xj is left out of the model xj* ˄ Adjusted Explanatory Variable, xj* x = observed explanatory variable x(j) = predicted x from regression equation with all variables ˄

How to Build a Good Regression Model Check for multicollinearity Rj2 is the R2 between xj and all other Xs One VIF for each X variable Want all Variance Inflation Factors (VIFs) < 10

How to Build a Good Regression Model Choose the best model Use an overall measure of quality Mallow’s CP Low Adjusted R2 High Predicted R2 High PRESS Low RMSE Low R2 by itself is not adequate, since it always increases as the number of variables increases.

Different Types of R2 R2 - Coefficient of Determination Adjusted R2 Percentage of total variation explained by the model In general, a higher R2 indicates a better model fit Adjusted R2 Accounts for the number of predictors in your model Useful for comparing models with different numbers of predictors

Different Types of R2 Predicted R2 Indicates how well the model predicts responses for new observations For each observation, Xi, delete the ith observation from the data set, estimate the regression equation from the remaining n-1 observations, use the fitted regression function to obtain Ŷi Can prevent over fitting the model since it is calculated with observations not included in model calculation

Measures of Quality Mallow’s Cp Used to compare a full model to a model with a subset of predictors. In general, look for models where Mallows' Cp is small and close to np, where np is the number of predictors in the model (including the constant).   A small Cp value indicates that the model is relatively precise (has small variance) in estimating the true regression coefficients and predicting future responses. Models with considerable lack-of-fit and bias have values of Cp larger than np.

Mallow’s Cp Test Statistic Measures of Quality Mallow’s Cp Test Statistic σ2 = true error, usually estimated as the minimum MSE among the 2k possible models MSE = mean square error for p coefficient model n = Number of observations p = number of coefficients (explanatory variables+1) k = total number of explanatory variables

Measures of Quality Mallow’s Cp SSEp = residual sum of squares for p variables MSEfull = residual mean square with k variables n = Number of observations p = number of independent variables (subset of k) k = total number of independent variables

PRESS (Prediction Sum of Squares) Measures of Quality PRESS (Prediction Sum of Squares) Assesses the model's predictive ability In general, the smaller the PRESS value, the better the model's predictive ability Used to calculate Predicted R2 Residual, ei, for the ith point from a model with the ith point deleted

How to Build a Good Regression Model Final Check Compute the regression model, look for: Linear pattern of the data Constant variance for all X Normal distribution of residuals Significant t-statistics on all variables

Stepwise Regression Automated tool used to identify a useful subset of predictors to build a multiple linear regression model. The process systematically adds the most significant variable or removes the least significant variable during each step. Standard Stepwise Adds and removes predictors as needed for each step. Stops when all variables not in the model have p-values greater then α and all variables in the model have p-values less than or equal to α. Source: MINITAB 15

Stepwise Regression (cont.) Forward Selection Starts with no predictors in the model. Add the most significant variable for each step. Stops when all variables not in the model have p-values greater than α. Backwards Elimination Starts with all predictors in the model and removes the least significant variable for each step. Stops when all variables in the model have p-values that are less than or equal α. Source: MINITAB 15

Stepwise Regression (cont.) Potential Pitfalls If two independent variables are highly correlated, only one may end up in the model even though either may be important. Because the procedure fits many models, it could be selecting ones that fit the data well due to chance alone. May not always end with the model with the highest R2 value possible. Automatic procedures cannot take into account the analyst knowledge about the data. Therefore, the model selected may not be the best from a practical point of view. Source: MINITAB 15

MINITAB Laboratory 8 Reading Assignment Chapter 11 Multiple Linear Regression (pages 295 to 322) Statistical Methods in Water Resources by D.R. Helsel and R.M. Hirsch MINITAB Laboratory 8