Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 18.1 Chapter 18 Multiple Regression.

Slides:



Advertisements
Similar presentations
Chapter 18 Multiple Regression.
Advertisements

Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Multiple Regression Chapter Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.
1 Multiple Regression Model Error Term Assumptions –Example 1: Locating a motor inn Goodness of Fit (R-square) Validity of estimates (t-stats & F-stats)
Multiple Regression Analysis
Lecture 9- Chapter 19 Multiple regression Introduction In this chapter we extend the simple linear regression model and allow for any number of.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
1 Multiple Regression Chapter Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.
Lecture 25 Multiple Regression Diagnostics (Sections )
Chapter 12 Multiple Regression
Lecture 22 Multiple Regression (Sections )
Chapter 13 Introduction to Linear Regression and Correlation Analysis
1 Multiple Regression. 2 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent variables.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Lecture 24 Multiple Regression (Sections )
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Lecture 23 Multiple Regression (Sections )
1 4. Multiple Regression I ECON 251 Research Methods.
Linear Regression Example Data
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Regression Method.
Chapter 12 Multiple Regression and Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Keller: Stats for Mgmt & Econ, 7th Ed
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Economics 173 Business Statistics Lecture 19 Fall, 2001© Professor J. Petry
Lecture 10: Correlation and Regression Model.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft® Excel 5th Edition
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Chapter 12 Simple Linear Regression.
Copyright © 2009 Cengage Learning 17.1 Chapter 19 Multiple Regression.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry
Multiple Regression Reference: Chapter 18 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
1 Assessment and Interpretation: MBA Program Admission Policy The dean of a large university wants to raise the admission standards to the popular MBA.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Inference for Least Squares Lines
Keller: Stats for Mgmt & Econ, 7th Ed
Regression Analysis Simple Linear Regression
Presentation transcript:

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 18 Multiple Regression

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Multiple Regression… The simple linear regression model was used to analyze how one interval variable (the dependent variable y ) is related to one other interval variable (the independent variable x ). Multiple regression allows for any number of independent variables. We expect to develop models that fit the data better than would a simple linear regression model.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc The Model… We now assume we have k independent variables potentially related to the one dependent variable. This relationship is represented in this first order linear equation: In the one variable, two dimensional case we drew a regression line; here we imagine a response surface. error variable dependent variable independent variables coefficients

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Required Conditions… For these regression methods to be valid the following four conditions for the error variable ( ) must be met: The probability distribution of the error variable ( ) is normal. The mean of the error variable is 0. The standard deviation of is, which is a constant. The errors are independent.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Estimating the Coefficients… The sample regression equation is expressed as: We will use computer output to: Assess the model… How well it fits the data Is it useful Are any required conditions violated? Employ the model… Interpreting the coefficients Predictions using the prediction equation Estimating the expected value of the dependent variable

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Regression Analysis Steps…  Use a computer and software to generate the coefficients and the statistics used to assess the model.  Diagnose violations of required conditions. If there are problems, attempt to remedy them.  Assess the model’s fit. standard error of estimate, coefficient of determination, F-test of the analysis of variance.  If , , and  are OK, use the model to predict or estimate the expected value of the dependent variable.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.1 – La Quinta Inns… Where should La Qunita locate a new motel? Factors influencing profitability… Profitability Competition Market Awareness Demand Generators Community Distance to downtown. Median household income. Distance to the nearest La Quinta inn. # of rooms within 3 mile radius Physical Offices, Higher Ed. factor measure *these need to be interval data !

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.1 – La Quinta Inns… Where should La Qunita locate a new motel? Several possible predictors of profitability were identified, and data were collected. Its believed that operating margin (y) is dependent upon these factors:data x 1 = Total motel and hotel rooms within 3 mile radius x 2 = Number of miles to closest competition x 3 = Volume of office space in surrounding community x 4 = College and university student numbers in community x 5 = Median household income in community x 6 = Distance (in miles) to the downtown core.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Transformation… Can we transform this data: into a mathematical model that looks like this: margin competition (i.e. # of rooms) awareness (distance to nearest alt.) physical (distance to downtown) …

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.1… In Excel: Tools > Data Analysis… > Regression COMPUTE

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc The Model… Although we haven’t done any assessment of the model yet, at first pass: it suggests that increases in The number of miles to closest competition, office space, student enrollment and household income will positively impact the operating margin. Likewise, increases in the total number of lodging rooms within a short distance and the distance from downtown will negatively impact the operating margin… INTERPRET

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Model Assessment… We will assess the model in three ways: the standard error of estimate, the coefficient of determination, and the F-test of the analysis of variance. The standard error of estimate is used “downstream” in the subsequent calculations (though we will restrict our discussion to Excel based output vs. manual calculations).

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Standard Error of Estimate… In multiple regression, the standard error of estimate is defined as: n is the sample size and k is the number of independent variables in the model. We compare this value to the mean value of y: It seems the standard error of estimate is not particularly small. What can we conclude? calculate

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Coefficient of Determination… Again, the coefficient of determination is defined as: This means that 52.51% of the variation in operating margin is explained by the six independent variables, but 47.49% remains unexplained.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Adjusted R 2 value… What’s this? The “adjusted” R 2 is: the coefficient of determination adjusted for degrees of freedom. It takes into account the sample size n, and k, the number of independent variables, and is given by:

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Testing the Validity of the Model… In a multiple regression model (i.e. more than one independent variable), we utilize an analysis of variance technique to test the overall validity of the model. Here’s the idea: H 0 : H 1 : At least one is not equal to zero. If the null hypothesis is true, none of the independent variables is linearly related to y, and so the model is invalid. If at least one is not equal to 0, the model does have some validity.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Testing the Validity of the Model… ANOVA table for regression analysis… Source of Variation degrees of freedom Sums of Squares Mean SquaresF-Statistic RegressionkSSRMSR = SSR/kF=MSR/MSE Errorn–k–1SSEMSE = SSE/(n–k-1) Totaln–1 A large value of F indicates that most of the variation in y is explained by the regression equation and that the model is valid. A small value of F indicates that most of the variation in y is unexplained.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Testing the Validity of the Model… Our rejection region is: Since Excel calculated the F statistic as F = and our F Critical = 2.17, (and the p-value is zero) we reject H 0 in favor of H 1, that is: “there is a great deal of evidence to infer that the model is valid”

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Table 18.2… Summary SSER2R2 F Assessment of Model 001Perfect small close to 1largeGood large close to 0smallPoor 00Useless Once we’re satisfied that the model fits the data as well as possible, and that the required conditions are satisfied, we can interpret and test the individual coefficients and use the model to predict and estimate…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Interpreting the Coefficients* Intercept (b 0 ) This is the average operating margin when all of the independent variables are zero. Its meaningless to try and interpret this value, particularly if 0 is outside the range of the values of the independent variables (as is the case here). # of motel and hotel rooms (b 1 ) –.0076 Each additional room within three miles of the La Quinta inn, will decrease the operating margin. (I.e. for each additional 1000 rooms the margin decreases by 7.6%) Distance to nearest competitor (b 2 ) 1.65 For each additional mile that the nearest competitor is to a La Quinta inn, the average operating margin increases by 1.65%. *in each case we assume all other variables are held constant…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Interpreting the Coefficients* Office space (b 3 ).020 For each additional thousand square feet of office space, the margin will increase by.020. E.g. an extra 100,000 square feet of office space will increase margin (on average) by 2.0%. Student enrollment (b 4 ).21 For each additional thousand students, the average operating margin increases by.21% Median household income (b 5 ).41 For each additional thousand dollar increase in median household income, the average operating margin increases by.41% Distance to downtown core (b 6 ) –.23 For each additional mile to the downtown center, the operating margin decreases on average by.23% *in each case we assume all other variables are held constant…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Testing the Coefficients… For each independent variable, we can test to determine whether there is enough evidence of a linear relationship between it and the dependent variable for the entire population… H 0 : = 0 H 1 : ≠ 0 (for i = 1, 2, …, k) and using: as our test statistic (with n–k–1 degrees of freedom).

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Testing the Coefficients… We can use our Excel output to quickly test each of the six coefficients in our model… Thus, the number of hotel and motel rooms, distance to the nearest motel, amount of office space, and median household income are linearly related to the operating margin. There is no evidence to infer that college enrollment and distance to downtown center are linearly related to operating margin. INTERPRET

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Using the Regression Equation… Much like we did with simple linear regression, we can produce a prediction interval for a particular value of y. As well, we can produce the confidence interval estimate of the expected value of y. Excel’s tools will do the work; our role is to set up the problem, understand and interpret the results.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Using the Regression Equation… Predict the operating margin if a La Quinta Inn is built at a location where…  There are 3815 rooms within 3 miles of the site.  The closest other hotel or motel is.9 miles away.  The amount of office space is 476,000 square feet.  There is one college and one university nearby with a total enrollment of 24,500 students.  Census data indicates the median household income in the area (rounded to the nearest thousand) is $35,000, and,  The distance to the downtown center is11.2 miles. our x i ’s…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Using the Regression Equation… We add one row (our given values for the independent variables) to the bottom of our data set: Then we use: Tools > Data Analysis Plus > Prediction Interval to crunch the numbers…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Prediction Interval… We predict that the operating margin will fall between 25.4 and If management defines a profitable inn as one with an operating margin greater than 50% and an unprofitable inn as one with an operating margin below 30%, they will pass on this site, since the entire prediction interval is below 50%. INTERPRET

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Confidence Interval… The expected operating margin of all sites that fit this category is estimated to be between 33.0 and We interpret this to mean that if we built inns on an infinite number of sites that fit the category described, the mean operating margin would fall between 33.0 and In other words, the average inn would not be profitable either… INTERPRET

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Regression Diagnostics… Calculate the residuals and check the following: Is the error variable nonnormal? Draw the histogram of the residuals Is the error variance constant? Plot the residuals versus the predicted values of y. Are the errors independent (time-series data)? Plot the residuals versus the time periods. Are there observations that are inaccurate or do not belong to the target population? Double-check the accuracy of outliers and influential observations.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Regression Diagnostics… Multiple regression models have a problem that simple regressions do not, namely multicollinearity. It happens when the independent variables are highly correlated. We’ll explore this concept through the following example…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.2… A real estate agent wanted to develop a model to predict the selling price of a home. The agent believed that the most important variables in determining the price of a house are its:  size,  number of bedrooms,  and lot size. The proposed model is: Housing market data have been gathered and Excel is the analysis tool of choice…data

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.2… Tools > Data Analysis > Regression… The F-test indicates the mode is valid… …but these t-stats suggest none of the variables are related to the selling price.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.2… How can the model be valid and fit well, when none of the independent variables that make up the model are linearly related to price? To answer this question we perform a t-test of the coefficient of correlation between each of the independent variables and the dependent variable…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.2… Unlike the t-tests in the multiple regression model, these three t-tests tell us that the number of bedrooms, the house size, and the lot size are all linearly related to the price…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.2… How to account for this apparent contradiction? The answer is that the three independent variables are correlated with each other ! (i.e. this is reasonable: larger houses have more bedrooms and are situated on larger lots, and smaller houses have fewer bedrooms and are located on smaller lots.) multicollinearity affected the t-tests so that they implied that none of the independent variables is linearly related to price when, in fact, all are

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Regression Diagnostics – Time Series… The Durbin-Watson test allows us to determine whether there is evidence of first-order autocorrelation — a condition in which a relationship exists between consecutive residuals, i.e. e i-1 and e i (i is the time period). The statistic for this test is defined as: d has a range of values: 0 ≤ d ≤ 4.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Durbin–Watson… 0d4 Small values of d (d < 2) indicate a positive first- order autocorrelation. Large values of d (d > 2) imply a negative first- order autocorrelation.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Durbin–Watson… (one-tail test) To test for positive first-order autocorrelation: If d < d L, we conclude that there is enough evidence to show that positive first-order autocorrelation exists. If d > d U, we conclude that there is not enough evidence to show that positive first-order autocorrelation exists. And if d L ≤ d ≤ d U, the test is inconclusive. dUdU dLdL positive first-order autocorrelation exists positive first-order autocorrelation does not exist inconclusive test d L, d U from table 11, appendix B

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Durbin–Watson… (one-tail test) To test for negative first-order autocorrelation: If d > 4 – d L, we conclude that there is enough evidence to show that negative first-order autocorrelation exists. If d < 4 – d U, we conclude that there is not enough evidence to show that negative first-order autocorrelation exists. And if 4 – d U ≤ d ≤ 4 – d L, the test is inconclusive. 4-d U 4-d L negative first-order autocorrelation does not exist negative first-order autocorrelation exists inconclusive test d L, d U from table 11, appendix B

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Durbin–Watson… (two-tail test) To test for first-order autocorrelation: If d 4 – d L, first-order autocorrelation exists. If d falls between d L and d U or between 4 – d U and 4 – d U, the test is inconclusive. If d falls between d U and 4 – d U there is no evidence of first order autocorrelation. 4-d U 4-d L exists inconclusive dUdU dLdL doesn’t exist

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.3… Can we create a model that will predict lift ticket sales at a ski hill based on two weather parameters? Variables: y - lift ticket sales during Christmas week, x 1 - total snowfall (inches), and x 2 - average temperature (degrees Fahrenheit) Our ski hill manager collected 20 years of data, let’s analyze it now…20 years of data

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.3… Both the coefficient of determination and the p-value of the F-test indicate the model is poor… Neither variable is linearly related to ticket sale…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.3… The histogram of residuals… reveals the errors may be normally distributed…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.3… In the plot of residuals versus predicted values (testing for heteroscedasticity) — the error variance appears to be constant…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.3… Plotting the residual over time reveals the problem… The errors are not independent!

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.3… Durbin-Watson Apply the Durbin-Watson Statistic from Data Analysis Plus to the entire list of residuals created by the Regression tool.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.3… In order to find out the critical values d L and d U, we have n = 20 (twenty years of data) k = 2 (two variables, snowfall & temperature) If we use a significance level of 5% and look up the critical values of d in Table 11 of Appendix B, we find: d L = 1.10 d U = 1.54 Since d < d L, we conclude that positive first-order autocorrelation exists.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.3… Autocorrelation usually indicates that the model needs to include an independent variable that has a time-ordered effect on the dependent variable, thus we create a new variable x 3 (which takes on the values of the years since the data were gathered, i.e. x 3 = 1, 2, …, 20.) Our new model becomes: y=tickets x 1 =snowfallx 2 =temperature x 3 =time

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.3… The fit of the model is high, The model is valid… Snowfall and time are linearly related to ticket sales; temperature is not… our new variable

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Example 18.3… If we re-run the Durbin-Watson statistic against the residuals from our Regression analysis, we can conclude that there is not enough evidence to infer the presence of first-order autocorrelation. (Determining d L and d U is left as an exercise for the reader…) Hence, we have improved out model dramatically!