Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and 2000 12-1 l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

Managerial Economics in a Global Economy
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 13 Multiple Regression
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
Multiple Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Lecture 5 Correlation and Regression
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
BPS - 3rd Ed. Chapter 211 Inference for Regression.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Regression. Idea behind Regression Y X We have a scatter of points, and we want to find the line that best fits that scatter.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 4 Introduction to Multiple Regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Regression and Time Series CHAPTER 11 Correlation and Regression: Measuring and Predicting Relationships.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Stats Methods at IC Lecture 3: Regression.
Chapter 14 Introduction to Multiple Regression
Chapter 4 Basic Estimation Techniques
Basic Estimation Techniques
Essentials of Modern Business Statistics (7e)
Multiple Regression and Model Building
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
Simple Linear Regression
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Presentation transcript:

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Multiple Regression  Predicting a single Y variable from two or more X variables Describe and Understand the Relationship Understand the effect of one X variable while holding the others fixed Forecast (Predict) a New Observation Lets you use all available information (X variables) to find out about what you don’t know (the Y variable for this new situation) Adjust and Control a Process because the regression equation (you hope) tells you what would happen if you made a change

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Input Data  n cases (elementary units)  k explanatory X variables Case 1 Case 2. Case n Y (dependent variable to be explained) X 1 (first independent or explanatory variable) X k (last independent or explanatory variable) … ……...………...…

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Results  Intercept: a Predicted value for Y when every X is 0  Regression Coefficients: b 1, b 2, …b k The effect of each X on Y, holding all other X variables constant  Prediction Equation or Regression Equation (Predicted Y) = a+b 1 X 1 +b 2 X 2 +…+b k X k The predicted Y, given the values for all X variables  Prediction Errors or Residuals (Actual Y) – (Predicted Y)

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Results (continued)  Standard Error of Estimate: S e or S Approximate size of errors made predicting Y  Coefficient of Determination: R 2 Percentage of variability in Y explained by the X variables as a group  F Test: Significant or Not Significant Tests whether the X variables, as a group, can predict Y better than just randomly

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Results (continued)  t Tests for Individual Regression Coefficients Significant or not significant, for each X variable Tests whether a particular X variable has an effect on Y, holding the other X variables constant Should be performed only if the F test is significant  Standard Errors of the Regression Coefficients (with n – k – 1 degrees of freedom) Indicates the estimated sampling standard deviation of each regression coefficient Used in the usual way to find confidence intervals and hypothesis tests for individual regression coefficients

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Magazine Ads  Input Data To predict cost of ads from magazine characteristics Audubon Better Homes. YM Y Page Costs (color ad) $25, , ,270 X 1 Audience (thousands) 1,645 34,797. 3,109 X 3 Median Income $38,787 41, ,696 X 2 Percent Male

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Prediction, Intercept a  Predicted Page Costs = a + b 1 X 1 + b 2 X 2 + b 3 X 3 = $4, (Audience) – 124(Percent Male) (Median Income) Intercept a = $4,043 Essentially a base rate, representing the cost of advertising in a magazine that has no audience, no male readers, and zero income level But there are no such magazines intercept a is merely there to help achieve best predictions

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Coefficient b 1  Predicted Page Costs = a + b 1 X 1 + b 2 X 2 + b 3 X 3 = $4, (Audience) – 124(Percent Male) (Median Income) Regression coefficient b 1 = 3.79 All else equal: The effect of Audience on Page Costs, while holding Percent Male and Median Income constant The effect of Audience on Page Costs, adjusted for Percent Male and Median Income On average, Page Costs are estimated to be $3.79 higher for a magazine with one more (thousand) Audience, as compared to another magazine with the same Percent Male and Median Income

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Coefficient b 2  Predicted Page Costs = a + b 1 X 1 + b 2 X 2 + b 3 X 3 = $4, (Audience) – 124(Percent Male) (Median Income) Regression coefficient b 2 = – 124 All else equal: The effect of Percent Male on Page Costs, while holding Audience and Median Income constant The effect of Percent Male on Page Costs, adjusted for Audience and Median Income On average, Page Costs are estimated to be $124 lower for a magazine with one more percentage point of male readers, as compared to another magazine with the same Audience and Median Income But don’t believe it! We will see that it is not significant

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Coefficient b 3  Predicted Page Costs = a + b 1 X 1 + b 2 X 2 + b 3 X 3 = $4, (Audience) – 124(Percent Male) (Median Income) Regression coefficient b 3 = All else equal: The effect of Median Income on Page Costs, while holding Audience and Percent Male constant The effect of Median Income on Page Costs, adjusted for Audience and Percent Male On average, Page Costs are estimated to be $0.903 higher for a magazine with one more dollar of Median Income, as compared to another magazine with the same Audience and Percent Male

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Prediction and Residual  Predicted Page Costs for Audubon = a + b 1 X 1 + b 2 X 2 + b 3 X 3 = $4, (Audience) – 124(Percent Male) (Median Income) = $4, (1,645) – 124(51.1) (38,787) = $38,966  Actual Page Costs are $25,315  Residual is $25,315 – 38,966 = –$13,651 Audubon has Page Costs $13,651 lower than you would expect for a magazine with its characteristics (Audience, Percent Male, and Median Income) Residual = Actual – Predicted

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Standard Error  Standard Error of Estimate S e Indicates the approximate size of the prediction errors About how far are the Y values from their predictions? For the magazine data S e = S = $21,578 Actual Page Costs are about $21,578 from their predictions for this group of magazines (using regression) Compare to S Y = $45,446: Actual Page Costs are about $45,446 from their average (not using regression) Using the regression equation to predict Page Costs (instead of simply using) the typical error is reduced from $45,446 to $21,578

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Coeff. of Determination  Coefficient of Determination R 2 Indicates the percentage of the variation in Y that is explained by (or attributed to) all of the X variables How well do the X variables explain Y? For the magazine data R 2 = = 78.7% The X variables (Audience, Percent Male, and Median Income) taken together explain 78.7% of the variance of Page Costs This leaves 100% – 78.7% = 21.3% of the variation in Page Costs unexplained

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Multiple Regression Linear Model  Linear Model for the Population Y = (  +  1 X 1 +  2 X 2 + … +  k X k ) +  = (Population relationship) + Randomness Where  has a normal distribution with mean 0 and constant standard deviation , and this randomness is independent from one case to another An assumption needed for statistical inference

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Population and Sample Quantities Table Intercept or constant Regression coefficients Uncertainty in Y 12...k12...k a b 1 b 2. b k S or S e Population (parameters: fixed and unknown) Sample (estimators: random and known)

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and The F test  Is the regression significant? Do the X variables, taken together, explain a significant amount of the variation in Y? The null hypothesis claims that, in the population, the X variables do not help explain Y; all coefficients are 0 H 0 :  1 =  2 = … =  k = 0 The research hypothesis claims that, in the population, at least one of the X variables does help explain Y H 1 : At least one of  1,  2, …,  k  0

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Performing the F test  Three equivalent methods for performing F test; they always give the same result Use the p-value If p < 0.05, then the test is significant Same interpretation as p-values in Chapter 10 Use the R 2 value If R 2 is larger than the value in the R 2 table, then the result is significant Do the X variables explain more than just randomness? Use the F statistic If the F statistic is larger than the value in the F table, then the result is significant

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: F test  For the magazine data, The X variables (Audience, Percent Male, and Median Income) explain a very highly significant percentage of the variation in Page Costs The p-value, listed as 0.000, is less than , and is therefore very highly significant (since it is less than 0.001) The R 2 value, 78.7%, is greater than 27.1% (from the R 2 table at level 0.1% with n = 55 and k = 3), and is therefore very highly significant The F statistic, 62.84, is greater than the value (between and 6.171) from the F table at level 0.1%, and is therefore very highly significant

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and t Tests  A t test for each regression coefficient To be used only if the F test is significant If F is not significant, you should not look at the t tests Does the j th X variable have a significant effect on Y, holding the other X variables constant? Hypotheses are H 0 :  j = 0,H 1 :  j  0 Test using the confidence interval use the t table with n – k – 1 degrees of freedom Or use the t statistic compare to the t table value with n – k – 1 degrees of freedom Significant if 0 is not in the interval

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: t Tests  Testing b 1, the coefficient for Audience b 1 = 3.79, t = 13.5, p = Audience has a very highly significant effect on Page Costs, after adjusting for Percent Male and Median Income  Testing b 2, the coefficient for Percent Male b 2 = – 124, t = – 0.90, p = Percent Male does not have a significant effect on Page Costs, after adjusting for Audience and Median Income  Testing b 3, the coefficient for Median Income b 3 = 0.903, t = 2.44, p = Median Income has a significant effect on Page Costs, after adjusting for Audience and Percent Male p < p > 0.05 p < 0.05

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Comparing the X variables  Standardized Regression Coefficients Indicate relative importance of the information each X variable brings in addition to the others Ordinary regression coefficients are in different units And cannot be compared without standardization Defined as for the j th X variable Compare the absolute values  Correlation Coefficients Indicate relative importance of the information each X variable brings without adjusting for the other X variables

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Problems with Multiple Regression  Multicollinearity When some X variables are too similar to one another Might do a good job of explaining and predicting Y But t tests might not significant because no X variable is bringing new information  Variable Selection How to choose from a long list of X variables? Too many: waste the information in the data Too few: risk ignoring useful predictive information  Model Misspecification Perhaps the multiple regression linear model is wrong Unequal variability? Nonlinearity? Interaction?