1 4. Multiple Regression I ECON 251 Research Methods.

Slides:



Advertisements
Similar presentations
Chapter 18 Multiple Regression.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
1 Multiple Regression Model Error Term Assumptions –Example 1: Locating a motor inn Goodness of Fit (R-square) Validity of estimates (t-stats & F-stats)
Lecture 9- Chapter 19 Multiple regression Introduction In this chapter we extend the simple linear regression model and allow for any number of.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Multiple Regression
Simple Linear Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
1 Multiple Regression Chapter Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.
Lecture 25 Multiple Regression Diagnostics (Sections )
Chapter 12 Multiple Regression
1 BA 275 Quantitative Business Methods Residual Analysis Multiple Linear Regression Adjusted R-squared Prediction Dummy Variables Agenda.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
1 Multiple Regression. 2 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent variables.
Lecture 24 Multiple Regression (Sections )
Chapter Topics Types of Regression Models
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics - I
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
1 Simple Linear Regression and Correlation Chapter 17.
Lecture 23 Multiple Regression (Sections )
Topic 3: Regression.
Linear Regression Example Data
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 12 Multiple Regression and Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
SCHEDULE OF WEEK 10 Project 2 is online, due by Monday, Dec 5 at 03:00 am 2. Discuss the DW test and how the statistic attains less/greater that 2 values.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
Residual Analysis Purposes –Examine Functional Form (Linear vs. Non- Linear Model) –Evaluate Violations of Assumptions Graphical Analysis of Residuals.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
1 Assessment and Interpretation: MBA Program Admission Policy The dean of a large university wants to raise the admission standards to the popular MBA.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 13 Simple Linear Regression
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
Inference for Least Squares Lines
Linear Regression.
Prepared by Lee Revere and John Large
BEC 30325: MANAGERIAL ECONOMICS
Chapter 13 Additional Topics in Regression Analysis
BEC 30325: MANAGERIAL ECONOMICS
Presentation transcript:

1 4. Multiple Regression I ECON 251 Research Methods

2  In this section, we extend the simple linear regression model, and allow for any number ( k ) of independent variables. This should yield a better model in most cases. y =  0 +  1 x 1 +  2 x 2 + …+  k x k +   We add Adjusted R 2 to our model assessment tools.  Because of the complexity of the calculations, we will rely exclusively on the computer to do our model estimation. Coefficients Dependent variableIndependent variables Random error variable Basic Multiple Regression Model

3 y =  0 +  1 x X1X1 Y X2X2 The simple linear regression model allows for one independent variable, “ x ” y =  0 +  1 x +  The multiple linear regression model allows for more than one independent variable. Y =  0 +  1 x 1 +  2 x 2 +  Note how the straight line becomes a plain. y =  0 +  1 x 1 +  2 x 2 Basic Multiple Regression Model

4  One of the most important aspects of regression analysis is verifying that our results are not being impacted by assumption violations or “other dangers.” That is why we return to this important topic. In this section, we will be looking for solutions to instances where we encounter problems.  Recall Our List of “Assumption Violations & Other Dangers”: The error (  term is properly distributed. Which means: 1.The probability distribution of  is normal, with a mean of 0. 2.The standard deviation of  is   for all values of x. 3.The set of errors associated with different values of y are all independent. Other assumptions, that when violated can threaten the usefulness of your results include: 4.No unnecessary outliers 5.No serious multicollinearity Regression Diagnostics

5 Assumptions #1 and #2 – Remedying Violations  We discussed both assumptions in the last section, as well as how to detect them using visual inspection of graphs.  Nonnormality or heteroscedasticity can be remedied using transformations on the y variable.  The transformations can improve the linear relationship between the dependent variable and the independent variables.  Many computer software systems allow us to make the transformations easily.

6 » y ’ = ln y (for y > 0) ―Use when the s  increases with y, or ―Use when the error distribution is positively skewed » y ’ = y 2 ―Use when the s 2  is proportional to E( y ), or ―Use when the error distribution is negatively skewed » y ’ = y 1/2 (for y > 0) ―Use when the s 2  is proportional to E( y ) » y ’ = 1/ y ―Use when s 2  increases significantly when y increases beyond some value. A brief list of transformations

7 Example – Quiz Score  A statistics professor wanted to know whether time limit affect the scores on a quiz?  A random sample of 100 students was split into 5 groups.  Each student wrote a quiz, but each group was given a different time limit. See data below. ScoreScore Analyze these results, and include diagnostics

8 The errors seem to be _______ distributed The model tested: SCORE =  0 +  1 TIME +  There is ________ linear relationship between time and score. This model is ______ and provides a ______ fit. Example – Quiz Score

9 The standard error of estimate seems to __________ with the predicted value of y. Two transformations are used to remedy this problem: 1. y ’ = ln y 2. y ’ = 1/ y Example – Quiz Score

10 Let us see what happens when a transformation is applied 40,18 40,23 40, , 2.89 Ln 23 = Ln 18 = 2.89 The original data, where “Score” is a function of “Time” The modified data, where LnScore is a function of “Time" Example – Quiz Score

11 The new regression analysis and the diagnostics are: The model tested: LnScore =  ’ 0 +  ’ 1 TIME +  ’ Predicted LnScore = Time This model is _______ and provides a ________ fit. Example – Quiz Score

12 The errors seem to be _________ distributed The standard errors still changes with the predicted y, but the change is _______ than before. Example – Quiz Score

13 Example – Quiz Score  Let TIME = 55 minutes LnScore = * Time = * (55) =  How do we use the modified model to predict? To find the predicted score, take the antilog: antilog e = e =  If 55 minutes is given for the quiz, we expect the score to be  Find the predicted score if 50 minutes are given for the quiz.

14 Example – Quiz Score

15  Exists when independent variables included in the same regression, are linearly related to one another.  Multicollinearity nearly always exists. We will (somewhat arbitrarily) consider it serious if the absolute value of the correlation coefficient exceeds 0.8.  Example – House Price A real estate agent believes that a house selling price can be predicted using the house size, number of bedrooms, and lot size. A random sample of 100 houses was drawn and data recorded. Analyze the relationship among the four variables Assumption #5 Violation – Serious Multicollinearity

16  The proposed model is PRICE =  0 +  1 BEDROOMS +  2 H-SIZE +  3 LOTSIZE +  Excel solution The model is ____, but no variable is significantly related to the selling price !! Example – House Price

17  However, when regressing the price on each independent variable alone, it is found that each variable is strongly related to the selling price. Multicollinearity is the source of this problem.  Multicollinearity inflates S b i ’s: Bringing t-stats closer to zero and insignificance. The  coefficients cannot be interpreted as “slopes”. Example – House Price

18 Correcting for Multicollinearity: Get rid of one of the variables that is a duplicate, and re-estimate the model. With this done, and the high R 2 relative to your first model, and the high p-value for “Bedrooms”, estimate the model with only “House Size” as a variable. Example – House Price

19 Note R 2 is nearly as high as original model, but adjusted R 2 is actually higher than before. F-test for overall validity of model is fine, t-test for your independent variable also fine. This is your final model. Now: Estimate sale price for a house with 3 bedrooms, 2000 sq ft of house on a lot of 5,000 sq ft. Compare results of final model with original model. Example – House Price

20  This condition is common with time series data.  When it exists in time series data, it is referred to as Autocorrelation.  Detection: run a regression save residuals plot residuals against time if you see a pattern, your regression may have auto- correlation problem Assumptions #3 Violation – Non-Independence of Errors

y Time Positive first order autocorrelation occurs when consecutive residuals tend to be similar. Positive first order autocorrelation Negative first order autocorrelation Residuals Time 0 0 Residuals Time Negative first order autocorrelation occurs when consecutive residuals tend to markedly differ. Time y Autocorrelation

22  How does the weather affect the sales of lift tickets in a ski resort?  Data of the past 20 years sales of tickets, along with the total snowfall and the average temperature during Christmas week in each year, was collected.  The model hypothesized was TICKETS =  0 +  1 SNOWFALL +  2 TEMPERATURE +   Regression analysis yielded the following results: Example – Lift Ticket

23 The model seems to be very poor: The fit is _______ (R-square=0.12), It is _________ (Signif. F =0.33) No variable is ___________ to Sales Example – Lift Ticket

24 Residual over time Residual vs. predicted y The errors are ___ independent The error variance is constant The error distribution Example – Lift Ticket

25 The modified regression model TICKETS =  0 +  1 SNOWFALL +  2 TEMPERATURE +  3 YEARS +   Are all the required conditions for this model met?  How good is the fit of this model?  Is the model useful?  Which variables are linearly related to ticket sales and which ones are not?  The autocorrelation has occurred over time.  Therefore, a time dependent variable added to the model may correct the problem Example – Lift Ticket

26 The fit of this model is _____ R 2 = 0.74The model is _____. Significance F = 5.93 E-5. All the required conditions ______ for this model. TEMPERATURE is ________ related to ticket sales. SNOWFALL and YEARS ________ related to ticket sales