Lecture 4 Introduction to Multiple Regression

Slides:



Advertisements
Similar presentations
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Advertisements

LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Multiple Regression
Chapter 14 Introduction to Multiple Regression
Korelasi Ganda Dan Penambahan Peubah Pertemuan 13 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Interaksi Dalam Regresi (Lanjutan) Pertemuan 25 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Multiple Regression
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Chapter 12 Simple Linear Regression
Chapter Topics Types of Regression Models
© 2004 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Linear Regression Example Data
Ch. 14: The Multiple Regression Model building
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Business Statistics: A Decision-Making Approach 8 th Edition Chapter 15 Multiple.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Statistics for Managers Using Microsoft Excel 3rd Edition
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 15 Multiple Regression
© 2003 Prentice-Hall, Inc.Chap 11-1 Business Statistics: A First Course (3 rd Edition) Chapter 11 Multiple Regression.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Lecture 14 Multiple Regression Model
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Chapter 14 Introduction to Multiple Regression
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Statistics for Business and Economics 8 th Edition Chapter 12 Multiple Regression Ch Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Lecture 10: Correlation and Regression Model.
Lecture 3 Introduction to Multiple Regression Business and Economic Forecasting.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Statistics for Managers Using Microsoft® Excel 5th Edition
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
© 2000 Prentice-Hall, Inc. Chap Chapter 10 Multiple Regression Models Business Statistics A First Course (2nd Edition)
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 13 Simple Linear Regression
Chapter 15 Multiple Regression Model Building
Chapter 14 Introduction to Multiple Regression
Chapter 15 Multiple Regression and Model Building
Multiple Regression Analysis and Model Building
Chapter 15 Multiple Regression Analysis and Model Building
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
Presentation transcript:

Lecture 4 Introduction to Multiple Regression

Learning Objectives In this chapter, you learn: How to develop a multiple regression model How to interpret the regression coefficients How to determine which independent variables to include in the regression model How to use categorical variables in a regression model

The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi) Multiple Regression Model with k Independent Variables: Y-intercept Population slopes Random Error

Multiple Regression Equation The coefficients of the multiple regression model are estimated using sample data Multiple regression equation with k independent variables: Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients In this chapter we will use Excel to obtain the regression slope coefficients and other regression summary measures.

Example: 2 Independent Variables A distributor of frozen dessert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) Independent variables: Price (in $) Advertising ($100’s) Data are collected for 15 weeks

Pie Sales Example Sales = b0 + b1 (Price) + b2 (Advertising) Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3 8.00 3.0 4 430 4.5 5 6.80 6 380 4.0 7 4.50 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 11 340 7.20 12 300 7.90 3.2 13 440 5.90 14 15 2.7 Multiple regression equation: Sales = b0 + b1 (Price) + b2 (Advertising)

MegaStat Multiple Regression Output 0.521 Adjusted R² 0.442 R 0.722 Std. Error 47.463 Dep. Var. Sales ANOVA Source SS df MS F p-value Regression 29,460.0269 2 14,730 6.54 .0120 Residual 27,033.3065 12 2,252 Total 56,493.3333 14   Regression output confidence interval variables coefficients std. error t (df=12) 95% lower 95% upper Intercept 306.5262 Price($) -24.9751 10.8321 -2.306 .0398 -48.5763 -1.3739 Adv($100) 74.1310 25.9673 2.855 .0145 17.5530 130.7089 52.1% of the variation in pie sales is explained by the variation in price and advertising(only 44.2% is explained after adjusting for sample size and number of variables). A typical error in predicting sales from price and advertising is 47.463

The Multiple Regression Equation where Sales is in number of pies per week Price is in $ Advertising is in $100’s. b1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each $1 increase in selling price, holding constant the amount of advertising b2 = 74.131: sales will increase, on average, by 74.131 pies per week for each $100 increase in advertising, holding constant the price

Using The Equation to Make Predictions Predict sales for a week in which the selling price is $5.50 and advertising is $350: Note that Advertising is in $100’s, so $350 means that X2 = 3.5 Predicted sales is 428.62 pies

Predictions in Excel using MegaStat

Predictions in MegaStat (continued) We can be 95% confident that if price is set at $5.50 and $350 is spent on advertising that between 319 and 539 pies will be sold. Predicted values for: Sales   95% Confidence Intervals 95% Prediction Intervals Price($) Advertis($100) Predicted lower upper 5.5 3.5 428.622 391.118 466.125 318.617 538.626 7.0 4.5 465.290 401.301 529.279 343.680 586.900

Adjusted R2 R2 never decreases when a new X variable is added to the model This can be a disadvantage when comparing models Adjusted R2 tells you the percentage of variability in Y explained by the equation after adjusting for the number of variables in the equation. Penalize excessive use of unimportant independent variables Smaller than R2 Most useful when deciding between models with different number of variables

Is the Model Significant? F Test for Overall Significance of the Model Shows if there is a linear relationship between all of the X variables considered together and Y Use F-test statistic to obtain p-value Hypotheses: H0: β1 = β2 = … = βk = 0 (no linear relationship) Ha: at least one βi ≠ 0 (at least one independent variable affects Y)

F Test for Overall Significance In Excel (continued) ANOVA table Source SS df MS F p-value Regression 29,460.0269 2 14,730.0134 6.54 .0120 Residual 27,033.3065 12 2,252.7755 Total 56,493.3333 14  

F Test for Overall Significance (continued) Decision: Conclusion: H0: β1 = β2 = 0 Ha: β1 and β2 not both zero  = .05 P = .012 Since we can be 98.8% confident in Ha we will conclude that both slopes are not 0. There is evidence that at least one independent variable affects Y

Are Individual Variables Significant? Obtain p-values from individual variable slopes See if there is a linear relationship between the variable Xj and Y holding constant the effects of other X variables Hypotheses: H0: βj = 0 (no linear relationship) Ha: βj ≠ 0 (linear relationship does exist between Xj and Y)

Are Individual Variables Significant? (continued) H0: βj = 0 (no linear relationship) Ha: βj ≠ 0 (linear relationship does exist between Xj and Y) Regression output variables coefficients std. error t (df=12) p-value Intercept 306.5262 Price($) -24.9751 10.8321 -2.306 .0398 Advertisi($100) 74.1310 25.9673 2.855 .0145

Inferences about the Slope: t Test Example From the Excel output: H0: βj = 0 Ha: βj  0 For Price , p-value = .0398 For Advertising, p-value = .0145  = .05 We can be 96.02% confident that price is related to sales holding advertising confident and we can be 98.55% confident that advertising is related to sales holding price constant. Conclusion: There is evidence that both Price and Advertising affect pie sales at  = .05

Confidence Interval Estimate for the Slope Confidence interval for the population slope βj Regression output MegaStat confidence interval variables coeff std. error 95% lower 95% upper Intercept 306.5262 Price($) -24.9751 10.8321 -48.5763 -1.3739 Advertising($100) 74.1310 25.9673 17.5530 130.7089 Weekly sales reduced by between 1.37 to 48.58 pies for each increase of $1 in the selling price, holding advertising constant. Weekly sales are increased by between 17.6 to 130.7 pies for each increase of $100 in advertising, holding price constant.

Multiple Regression Assumptions Errors (residuals) from the regression model: ei = (Yi – Yi) < Assumptions: Same as for simple regression The equation is a linear one for all X’s Errors have constant variability The errors are normally distributed The errors are independent over time

Residual Plots Used in Multiple Regression These residual plots are used in multiple regression: Residuals vs. X1 Check linearity (poly R2 > 0.2?) Residuals vs. X2 Check linearity (poly R2 > 0.2?) Residuals vs. pred. Y Check constant variability (linear R2 > 0.2?) in Absolute resids. vs predicted plot Residuals to check normality (NPP or Sk/K > + 1?) Residuals vs. time (if time series data) (D-W < 1.3?) Use the residual plots and various statistics to check for violations of regression assumptions as in L3

Using Dummy Variables A dummy variable is a categorical independent variable with two levels: yes or no, male or female, before/after merger coded as 0 or 1 Assumes the slopes associated with numerical independent variables do not change with the value for the categorical variable If more than two levels, the number of dummy variables needed is (number of levels - 1)

Dummy-Variable Example Let: Y = pie sales X1 = price X2 = holiday (X2 = 1 if a holiday occurred during the week) (X2 = 0 if there was no holiday that week)

Dummy-Variable Example (continued) Holiday No Holiday Different intercept Same slope Y (sales) If H0: β2 = 0 is rejected, then “Holiday” has a significant effect on pie sales b0 + b2 Holiday (X2 = 1) b0 No Holiday (X2 = 0) X1 (Price)

Interpreting the Dummy Variable Coefficient Example: Sales: number of pies sold per week Price: pie price in $ Holiday: 1 If a holiday occurred during the week 0 If no holiday occurred b2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price

Interaction Between Independent Variables Hypothesizes interaction between pairs of X variables Response to one X variable may vary at different levels of another X variable Contains cross-product term Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc.

Effect of Interaction Given: Without interaction term, effect of X1 on Y is measured by β1 With interaction term, effect of X1 on Y is measured by β1 + β3 X2 Effect changes as X2 changes Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc.

Slopes are different if the effect of X1 on Y depends on X2 value Interaction Example Suppose X2 is a dummy variable and the estimated regression equation is = 1 + 2X1 + 3X2 + 4X1X2 Y 12 X2 = 1: Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 8 4 X2 = 0: Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1 X1 0.5 1 1.5 Slopes are different if the effect of X1 on Y depends on X2 value Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc.

Collinearity Collinearity: High correlation exists among two or more independent variables This means the correlated variables contribute redundant information to the multiple regression model Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc..

Collinearity (continued) Including two highly correlated independent variables can adversely affect the regression results No new information provided Can lead to unstable coefficients (large standard error and low t-values) Coefficient signs may not match prior expectations Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc..

Some Indications of Strong Collinearity Incorrect signs on the coefficients Large change in the value of a previous coefficient when a new variable is added to the model A previously significant variable becomes non-significant when a new independent variable is added The estimate of the standard deviation of the model increases when a variable is added to the model Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc..

Detecting Collinearity (Variance Inflationary Factor) VIFj is used to measure collinearity: where R2j is the coefficient of determination of variable Xj with all other X variables If VIFj > 5, Xj is highly correlated with the other independent variables Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc..

Example: Pie Sales Sales = b0 + b1 (Price) + b2 (Advertising) Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3 8.00 3.0 4 430 4.5 5 6.80 6 380 4.0 7 4.50 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 11 340 7.20 12 300 7.90 3.2 13 440 5.90 14 15 2.7 Recall the multiple regression equation: Sales = b0 + b1 (Price) + b2 (Advertising) Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc..

Detecting Collinearity in Excel using Megastat Megastat / regression Analysis Check the “variance inflationary factor (VIF)” box Output for the pie sales example: Since there are only two independent variables, only one VIF is reported VIF is < 5 There is no evidence of collinearity between Price and Advertising

Lecture 4 Summary Developed the multiple regression model Discussed interpreting slopes (holding other variables constant) Tested the significance of the multiple regression model and the individual coefficients (slopes) Discussed adjusted R2 Discussed using residual plots to check model assumptions Used dummy variables to represent categorical variables Looked for interactions between variables Discussed possible problems with collinarity