Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 6: Multiple Regression Model Building Priyantha.

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Qualitative Variables and
Lecture 9- Chapter 19 Multiple regression Introduction In this chapter we extend the simple linear regression model and allow for any number of.
Chapter 13 Multiple Regression
To accompany Quantitative Analysis for Management, 9e by Render/Stair/Hanna 4-1 © 2006 by Prentice Hall, Inc., Upper Saddle River, NJ Chapter 4 RegressionModels.
Lecture 25 Multiple Regression Diagnostics (Sections )
Chapter 12 Multiple Regression
Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
1 Multiple Regression. 2 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent variables.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Lecture 24 Multiple Regression (Sections )
Lecture 20 Simple linear regression (18.6, 18.9)
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Lecture 23 Multiple Regression (Sections )
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
1 4. Multiple Regression I ECON 251 Research Methods.
Ch. 14: The Multiple Regression Model building
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Correlation and Linear Regression
Objectives of Multiple Regression
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Inference for regression - Simple linear regression
Chapter 12 Multiple Regression and Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
Chapter 14 Introduction to Multiple Regression
7.1 Multiple Regression More than one explanatory/independent variable This makes a slight change to the interpretation of the coefficients This changes.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 23 Multiple Regression.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Economics 173 Business Statistics Lecture 22 Fall, 2001© Professor J. Petry
Outline When X’s are Dummy variables –EXAMPLE 1: USED CARS –EXAMPLE 2: RESTAURANT LOCATION Modeling a quadratic relationship –Restaurant Example.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Copyright © 2009 Cengage Learning 18.1 Chapter 20 Model Building.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2011 Pearson Education, Inc. Regression Diagnostics Chapter 22.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Chapter 12 Inference for Linear Regression. Reminder of Linear Regression First thing you should do is examine your data… First thing you should do is.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 7: Time Series Analysis and Forecasting 1 Priyantha.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 8: Time Series Analysis and Forecasting 2 Priyantha.
1 Assessment and Interpretation: MBA Program Admission Policy The dean of a large university wants to raise the admission standards to the popular MBA.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 4: Inference about regression Priyantha Wijayatunga,
Chapter 14 Introduction to Multiple Regression
Chapter 15 Multiple Regression and Model Building
Inference for Least Squares Lines
Multiple Regression Analysis and Model Building
Essentials of Modern Business Statistics (7e)
CHAPTER 29: Multiple Regression*
Prepared by Lee Revere and John Large
Presentation transcript:

Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 6: Multiple Regression Model Building Priyantha Wijayatunga, Department of Statistics, Umeå University These materials are altered ones from copyrighted lecture slides (© 2009 W.H. Freeman and Company) from the homepage of the book: The Practice of Business Statistics Using Data for Decisions :Second Edition by Moore, McCabe, Duckworth and Alwan.

Multiple Linear Regression Model Building Reference to the book: Chapter 11.3  Model building  Models for curved relationships  Models with categorical explanatory variables  Variable selection methods

Model Building  Often we have many explanatory variables. A model using just a few of the variables often predicts about as well as a model using all of the explanatory variables.  We may also find that the reciprocal of a variable is a better choice than the variable itself, or that including the square of a variable improves prediction.  How can we find a good model? That is the model building issue  In regression modeling  It can cover a variety of mathematical models  linear relationships.  non-linear relationships.  nominal independent variables.  It provides efficient methods for model building

Earlier: RegressionX,Y interval data CorrelationX,Y interval or ordinal data Now: Independnet and dependent variable relation can be non–linear RegressionY interval data X interval or nominal data

Prices of Homes Homes for sale in zip code 47904: Response variable is Price.

Price and square feet Plot of Price versus Square Feet: The relationship is approximately linear, but curves up somewhat for the higher priced homes. Note: We excluded 7 homes with price >$150,000 and SqFt > 1800

Regression of price on square feet : The fitted model is predicted Price = 45, SqFt The coefficient for SqFt is statistically significant (P < ). Each additional square foot of area raises selling price by $34.32 on average. 37.3% of the variation is explained by a linear relationship with SqFt.

Models for curved relationships  The scatterplot suggests the relationship between square foot and price may be slightly curved.  One simple kind of curved relationship is a quadratic function.  The model is: y =  0 +  1 x +  2 x 2 + 

Quadratic regression of price on square feet : The fitted model is predicted Price = 81, SqFt SqFt2 where SqFt2 is the square of SqFt. The coefficient for SqFt2 is not statistically significant (P = 0.41). 38.6% of the variation in Price is explained by this model. We conclude that adding SqFt2 to our model is not helpful.

P th –order polynomial model of one predictor variable 2 nd order polynomial model of two predictor variables 2 nd order polynomial model of two predictor variables with iteraction

y =  0 +  1 x 1 +  2 x 2 +…+  p x p +  y =  0 +  1 x +  2 x 2 + …+  p x p +  Polynomial Models with One Predictor Variable

y  0  1 x    First order model (p = 1)  Second order model (p=2)  2 < 0  2 > 0 Polynomial Models with One Predictor Variable y =  0 +  1 x +  2 x 2 + 

+  3 x 3 +   3 < 0  3 > 0 Third order model (p = 3) Polynomial Models with One Predictor Variable 13

Nominal Independent Variables  In many real-life situations one or more independent variables are nominal.  Including nominal variables in a regression analysis model is done via indicator variables.  An indicator variable (I) or dummy variable can assume one out of two values, “zero” or “one”. 1 if the first condition out of two is met 0 if the second condition out of two is met I=

Water Filter Service A water and sewage company, provides services on water filter systems. To estimate the cost for a service job, a model for how much time a repair taking is analysed. The following data was collected. ServiceMonths since last repair Kind of repairTime for repair (hours) 12electronic2,9 26mechanic3,0 38electronic4,8 43mechanic1,8 52electronic2,9 67electronic4,9 79mechanic4,2 88mechanic4,8 94electronic4,4 106electronic4,5

Our model Y = Time for repair (hours) X 1 = Months since last repair 0 if an electronic repair I = 1 if an mechanic repair Note: for a nominal variable with m categories we need m-1 indicator variables.

I 2 = 1 if region A 0 otherwise I 3 = 1 if region B 0 otherwise The category “region C” is defined by: I 2 = 0; I 3 = 0 The Region C is called the omitted category Now assume that we instead of ”type of repair” had a variable for different regions, let’s say A, B, and C.

Our model

Interpreting Regression Coefficients Multiple Regression: A coefficient is interpreted by holding the other variables constant Now, Interpretation of and are as usual When X 1 is held constant, Y for the Region A is on average number of hours more than that for Region C. When X 1 is held constant, Y for the Region B is on average number of hours more than that for Region C.

T-test for the parameters In our example, one can test H 0 :  2 = 0 H 1 :  2 ≠ 0 Research hypothesis: The time for service in region A is different than the time for service in region C. Or for an other hypothesis test: H 0 :  3 = 0 H 1 :  3 > 0 Research hypothesis: The time for service in region B is longer than the time for service in region C.

Example A used car price (in thousands) is believed to be related with number of miles in the odometer (in thousands) and the color of the car where most popular colors are white and silver. In order to build the model a random sample of 100 used cars that were sold out in auctions during the last month were selected. Price Odometer Color white white other other silver silver.....

I 2 = 1 if color is white 0 otherwise I 3 = 1 if color is silver 0 otherwise The other colors are defined by: I 2 = 0; I 3 = 0 The ”other colors” is called the omitteted category

Now our data look like this Price Odometer I_2 I_3 Our Model is:

Models with categorical explanatory variables  The plot of Price vs. Bedrooms appears to show a curved relationship.  Create a categorical variable Bed3 using ‘number of bedrooms’. Bed3 = 1 if the home has three or more bedrooms and Bed3 = 0 if it does not.  Bed3 is called an indicator variable.

Price and number of bedrooms: The fitted model is predicted Price = 75, ,146 Bed3 The coefficient for Bed3 is statistically significant (P = ). 19% of the variation is explained by the model. It suggests that Bed3 may be a useful explanatory variable.

FINAL MODEL: Price, square feet, and bathrooms. The fitted model is predicted Price = 59, SqFt + 13,161B2 + 16,859Bh Where B2, Bh are indicator variables for an extra full bath and extra half bath, respectively. 57.7% of the variation in Price is explained by this model. All coefficients for the explanatory variables are statistically significant.

Variable Selection Methods  Sometimes the effect of one explanatory variable depends upon the value of another explanatory variable. We account for this situation in a regression model by including interaction terms.  Modern regression software offers variable selection methods that examine, for example, the R 2 values for all possible multiple regression models.  The software then presents us with the models having the highest R 2 for each number of explanatory variables.

First order model, two predictors and inter- action y =  0 +  1 x 1 +  2 x 2 +  3 x 1 x 2 +  x1x1 X 2 = 2 X 2 = 3 X 2 =1  0 +  2 (1)] +[  1 +  3 (1)]x 1  0 +  2 (3)] +[  1 +  3 (3)]x1  0 +  2 (2)] +[  1 +  3 (2)]x 1 The two variables interact to affect the value of y. First order model y =  0 +  1 x 1 +  2 x 2 +  The effect of one predictor variable on y is independent of the effect of the other predictor variable on y. x1x1 X 2 = 1 X 2 = 2 X 2 = 3  0 +  2 (1)] +  1 x 1  0 +  2 (2)] +  1 x 1  0 +  2 (3)] +  1 x 1 Polynomial Models with Two Predictors

Second order model with interaction y =  0 +  1 x 1 +  2 x 2 +  3 x  4 x  5 x 1 2 x  Second order model y =  0 +  1 x 1 +  2 x 2 +  3 x  4 x  X 2 =1 X 2 = 2 X 2 = 3 x1x1 X 2 =1 X 2 = 2 X 2 = 3 Polynomial Models with Two Predictors x1x1 30

Interaction  The interaction effect is the effect of the variables interaction on the dependent variable. This is an extra effect that arise through the combination of two independent variables. X 1 Price of a product Y SaleX 2 Cost for advertising of a product

First-order model with interaction Second-order model with interaction

Example Suppose that an analyst working for a fast–food restaurant chain has been asked to construct a regression model that will help to identify new locations that are likely to be profitable. Analyst believes that annual gross sales (Y) are related with macro–economic variable ”mean annual household income in the neighborhood (X 1 ) and demographic variable ”mean age of children in the neighbourhood” (X 2 ) in a second order model with interaction between them. To estimate a model, he gets 25 areas in random and a portion of data is as follows Revene (Y) in $1000 Income (X 1 ) in $1000 age (X 2 ) in years

The conditions required for the model assessment to apply must be checked.  Is the error variable normally distributed?  Is the error variance constant?  Are the errors independent?  Can we identify outlier? Regression Diagnostics Draw a histogram of the residuals Plot the residuals versus y ^ Plot the residuals versus the time periods

Multicollinearity (Colliniearity or intercorrelation) Two or more independent variables are highly correlated. X 1 share price X 2 interest Y return X 3 inflation F test of analysis of variance is not affected

Estimated model

Remedying Violations of Required Conditions of Regression  Non–normality or heteroscedasticity can be remedied using transformations on the y variable.  The transformations can improve the linear relationship between the dependent variable and the independent variables.  Many computer software systems allow us to make the transformations easily.

Transformation Y*=logY

A brief list of transformations 1. y* = log y (for y > 0) Use when the s 2  increases with y, or Use when the error distribution is positively skewed 2. y* = y 2 Use when the s 2 is proportional to E(y), or Use when the error distribution is negatively skewed 3. y* = y 1/2 (for y > 0) Use when the s 2 is proportional to E(y) 4. y* = 1/y Use when s 2 increases significantly when y increases beyond some critical value. Reducing Non–normality by Transformations