Lecture 24 Multiple Regression (Sections 19.4-19.5)

Slides:



Advertisements
Similar presentations
Chapter 18 Multiple Regression.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
1 Multiple Regression Model Error Term Assumptions –Example 1: Locating a motor inn Goodness of Fit (R-square) Validity of estimates (t-stats & F-stats)
Lecture 9- Chapter 19 Multiple regression Introduction In this chapter we extend the simple linear regression model and allow for any number of.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Additional Topics in Regression Analysis
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Stat 112: Lecture 17 Notes Chapter 6.8: Assessing the Assumption that the Disturbances are Independent Chapter 7.1: Using and Interpreting Indicator Variables.
1 Multiple Regression Chapter Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.
Lecture 25 Multiple Regression Diagnostics (Sections )
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
1 Multiple Regression. 2 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent variables.
Stat 112: Lecture 14 Notes Finish Chapter 6:
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Chapter Topics Types of Regression Models
Lecture 24: Thurs., April 8th
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics - I
Class 7: Thurs., Sep. 30. Outliers and Influential Observations Outlier: Any really unusual observation. Outlier in the X direction (called high leverage.
1 Simple Linear Regression and Correlation Chapter 17.
Lecture 23 Multiple Regression (Sections )
Topic 3: Regression.
1 4. Multiple Regression I ECON 251 Research Methods.
Linear Regression Example Data
Slide Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Business Statistics - QBM117 Statistical inference for regression.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Chapter 7 Forecasting with Simple Regression
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Regression Method.
Chapter 12 Multiple Regression and Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 18 Multiple Regression.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
SCHEDULE OF WEEK 10 Project 2 is online, due by Monday, Dec 5 at 03:00 am 2. Discuss the DW test and how the statistic attains less/greater that 2 values.
Lecture 10: Correlation and Regression Model.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Statistics for Managers Using Microsoft® Excel 5th Edition
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 6: Multiple Regression Model Building Priyantha.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
1 Assessment and Interpretation: MBA Program Admission Policy The dean of a large university wants to raise the admission standards to the popular MBA.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Inference for Least Squares Lines
Linear Regression.
Chapter 12: Regression Diagnostics
BEC 30325: MANAGERIAL ECONOMICS
Chapter 13 Additional Topics in Regression Analysis
BEC 30325: MANAGERIAL ECONOMICS
Presentation transcript:

Lecture 24 Multiple Regression (Sections )

The conditions required for the model assessment to apply must be checked. –Is the error variable normally distributed? –Is the regression function correctly specified as a linear function of x 1,…,x k Plot the residuals versus x’s and –Is the error variance constant? –Are the errors independent? –Can we identify outliers and influential observations? –Is multicollinearity a problem? 19.4 Regression Diagnostics - II Draw a histogram of the residuals Plot the residuals versus y ^ Plot the residuals versus the time periods

Influential Observation Influential observation: An observation is influential if removing it would markedly change the results of the analysis. In order to be influential, a point must either be an outlier in terms of the relationship between its y and x’s or have unusually distant x’s (high leverage) and not fall exactly into the relationship between y and x’s that the rest of the data follows.

Simple Linear Regression Example Data in salary.jmp. Y=Weekly Salary, X=Years of Experience.

Identification of Influential Observations Cook’s distance is a measure of the influence of a point – the effect that omitting the observation has on the estimated regression coefficients. Use Save Columns, Cook’s D Influence to obtain Cook’s Distance.

Cook’s Distance Rule of thumb: Observation with Cook’s Distance (D i ) >1 has high influence. You may also be concerned about any observation that has D i <1 but has a much bigger D i than any other observation.

Strategy for dealing with influential observations/outliers Do the conclusions change when the obs. is deleted? –If No. Proceed with the obs. Included. Study the obs to see if anything can be learned. –If Yes. Is there reason to believe the case belongs to a population other than the one under investigation? If Yes. Omit the case and proceed. If No. Does the case have unusually “distant” independent variables. –If Yes. Omit the case and proceed. Report conclusions for the reduced range of explanatory variables. –If No. Not much can be said. More data are needed to resolve the questions.

Multicollinearity Multicollinearity: Condition in which independent variables are highly correlated. Exact collinearity: Y=Weight, X 1 =Height in inches, X 2 =Height in feet. Then provide the same predictions. Multicollinearity causes two kinds of difficulties: –The t statistics appear to be too small. –The  coefficients cannot be interpreted as “slopes”.

Multicollinearity Diagnostics Diagnostics: –High correlation between independent variables –Counterintuitive signs on regression coefficients –Low values for t-statistics despite a significant overall fit, as measured by the F statistic.

Diagnostics: Multicollinearity Example 19.2: Predicting house price ( Xm19- 02) Xm –A real estate agent believes that a house selling price can be predicted using the house size, number of bedrooms, and lot size. –A random sample of 100 houses was drawn and data recorded. –Analyze the relationship among the four variables

The proposed model is PRICE =  0 +  1 BEDROOMS +  2 H-SIZE +  3 LOTSIZE +  The model is valid, but no variable is significantly related to the selling price ?! Diagnostics: Multicollinearity

Multicollinearity is found to be a problem. Diagnostics: Multicollinearity Multicollinearity causes two kinds of difficulties: –The t statistics appear to be too small. –The  coefficients cannot be interpreted as “slopes”.

Remedying Violations of the Required Conditions Nonnormality or heteroscedasticity can be remedied using transformations on the y variable. The transformations can improve the linear relationship between the dependent variable and the independent variables. Many computer software systems allow us to make the transformations easily.

A brief list of transformations »y’ = log y (for y > 0) Use when the s  increases with y, or Use when the error distribution is positively skewed »y’ = y 2 Use when the s 2  is proportional to E(y), or Use when the error distribution is negatively skewed »y’ = y 1/2 (for y > 0) Use when the s 2  is proportional to E(y) »y’ = 1/y Use when s 2  increases significantly when y increases beyond some critical value. Reducing Nonnormality by Transformations Transformations, Example.

Durbin - Watson Test: Are the Errors Autocorrelated? This test detects first order autocorrelation between consecutive residuals in a time series If autocorrelation exists the error variables are not independent Residual at time i

Positive First Order Autocorrelation Residuals Time Positive first order autocorrelation occurs when consecutive residuals tend to be similar. Then, the value of d is small (less than 2). 0 +

Negative First Order Autocorrelation Residuals Time Negative first order autocorrelation occurs when consecutive residuals tend to markedly differ. Then, the value of d is large (greater than 2).

Durbin-Watson Test in JMP H 0 : No first-order autocorrelation. H 1 : First-order autocorrelation Use row diagnostics, Durbin-Watson test in JMP after fitting the model. Autocorrelation is an estimate of correlation between errors.

Example 19.3 (Xm19-03)Xm19-03 –How does the weather affect the sales of lift tickets in a ski resort? –Data of the past 20 years sales of tickets, along with the total snowfall and the average temperature during Christmas week in each year, was collected. –The model hypothesized was TICKETS=  0 +  1 SNOWFALL+  2 TEMPERATURE+  –Regression analysis yielded the following results: Testing the Existence of Autocorrelation, Example