Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Regression Model
Correlation and regression
Objectives (BPS chapter 24)
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Multiple Regression
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 12 Multiple Regression
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Linear Regression Example Data
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Chapter 13 Simple Linear Regression
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Correlation & Regression
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 13 Simple Linear Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
Chapter 11 Simple Regression
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Regression Method.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Ch4 Describing Relationships Between Variables. Pressure.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft® Excel 5th Edition
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Chapter 12 Simple Linear Regression.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Stats Methods at IC Lecture 3: Regression.
Inference for Least Squares Lines
CHAPTER 29: Multiple Regression*
Prepared by Lee Revere and John Large
Presentation transcript:

slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5) Material based on: Bowerman-O’Connell-Koehler, Brooks/Cole

slide 2 DSCI 5340 FORECASTING Page Ex 3.12 (Use Excel) Page 128 Ex 3.13, 3.17 Page 132 Ex 3.25 Page 134 Ex 3.35 Review of textbook HW

slide 3 DSCI 5340 FORECASTING In Excel, Make Sure Analysis ToolPak is an add-in. Excel Data Analysis Add-in

slide 4 DSCI 5340 FORECASTING Ex 3.12 Page 128 Scatter Plot An accountant wishes to predict direct labor cost (y) based on the batch size (x) of a product produced in a job shop. Data for 12 production runs are given.

slide 5 DSCI 5340 FORECASTING a.  y|x=60 =  0 +  1 (60) : The average value of y for repeated values of X=60. This is the point on the regression line predicted for Y at X=60. b.  y|x=30 =  0 +  1 (30) : The average value of y for repeated values of X=30. This is the point on the regression line predicted for Y at X=30. The distribution of values around X=30 should be similar to that for X=60. c. Interpretation of slope: As the Batch Size increases by one unit, the direct labor cost increases by  1 = SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations12 ANOVA dfSSMSFSignificance F Regression E-17 Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Batch_Size_X E Ex 3.13 Page 128 Interpretation of Mean of Y Given X Fitted model: Ŷ = X

slide 6 DSCI 5340 FORECASTING Intercept  0 : is the Labor Cost if the batch size is 0. Theoretically, this costs would be 0, but it can be interpreted as fixed costs. Interpretation of Error Term: There may be other factors that determine direct labor costs, such as benefits to employees, type of product, number of employees, etc. Thus, the model may be more accurate with additional independent variables that are being compensated by having an error term in the model. Ex 3.13 Page 128 Interpretation of Model

slide 7 DSCI 5340 FORECASTING Accu-Copiers, Inc., sells and services the Accu-500 copying machine. As part of its standard service contract, the company agrees to perform routine service on the copier. To obtain information about the time it takes to perform routine service, Accu- Copiers has collected data for 11 service calls, shown in Table 3.7 (p. 126) Ex 3.17 Page 128

slide 8 DSCI 5340 FORECASTING EX 3.17 Page 128

slide 9 DSCI 5340 FORECASTING The test for correlation between X and Y: H 0 : ρ = 0 vs. H a : ρ ≠ 0 Has the same test statistic and p-value as the test for significance of the regression slope coefficient. However, the two tests use different assumptions. EX 3-25 Page 132: Test for correlation

slide 10 DSCI 5340 FORECASTING EX 3-35 Page 134 A State Department of Taxation asked taxpayers to report the time y (in hours) required to complete a tax form and the number of times x (including this one) the taxpayer has filled out this form

slide 11 DSCI 5340 FORECASTING EX 3-35 Page 134 To understand this model, not that as x increases, 1/x decreases and thus μ y|x decreases.

slide 12 DSCI 5340 FORECASTING Multiple Regression Graphically

slide 13 DSCI 5340 FORECASTING The residuals will be denoted ê i : ê i = y i - í i They represent the distance that each dependent variable value is from the estimated regression line or the portion of the variation in y that cannot be “explained” with the data available. What assumptions can we test using these residuals? Residuals

slide 14 DSCI 5340 FORECASTING The relationship is linear. The disturbances e i have constant variance s 2 e. The disturbances are independent. The disturbances are normally distributed. What are the Assumptions of Regression Analysis? How can these assumptions be checked? Regression model assumptions

slide 15 DSCI 5340 FORECASTING Graphical Techniques scatterplots residual plots histograms (not an exact science)

slide 16 DSCI 5340 FORECASTING Property 1: The average of the residuals will be equal to zero. This property holds regardless of whether the assumptions are true or not and is a direct result of the way the least-squares method works. Property 2: There should be no systematic pattern in a residual plot. (What is a systematic pattern?) Property 3: Residuals should look like random numbers chosen from a normal distribution. (How close to normality should the chart look?) Properties of residual plots

slide 17 DSCI 5340 FORECASTING In a residual analysis it is suggested that the following plots be used: 1. Plot the residuals versus each explanatory variable. 2. Plot the residuals versus the predicted or fitted values. 3. If the data are measured over time, plot the residuals versus some variable representing the time sequence. What assumptions can each of these support or indicate a violation? Residual plots

slide 18 DSCI 5340 FORECASTING Plots may be constructed using the actual residuals, ê i, or the standardized residuals. The standardized residuals are simply the residuals divided by their standard deviation. Why do you think standardized residuals are sometimes used instead of regular residuals? Residual plots

slide 19 DSCI 5340 FORECASTING No Violations of the Assumptions of Regression Plot shows random residuals

slide 20 DSCI 5340 FORECASTING Does this Plot Look Like One of the Assumptions of Regression Analysis is Violated?

slide 21 DSCI 5340 FORECASTING PLOT OF RESIDUALS - Standardized values are small.

slide 22 DSCI 5340 FORECASTING The method of least squares estimation chooses the regression coefficient estimates so the error sum of squares, SSE, is a minimum. In doing this, the distances from the true y values, y i, to the points on the regression line of or surface, í i, are minimized. Least squares thus tries to avoid any large distances from y i to í i. Outliers

slide 23 DSCI 5340 FORECASTING OUTLIER: When a sample data point has a y value that is much different from the y values of the other points in the sample. An outlier is any value whose studentized residual is greater than 2. An outlier does not have to be influential. That is, removing the outlier may not change the regression coefficients very much. Outliers

slide 24 DSCI 5340 FORECASTING No influential observations

slide 25 DSCI 5340 FORECASTING A High Leverage Observation That is Not Influential

slide 26 DSCI 5340 FORECASTING The slope of the line appears to be determined almost entirely by this one point. The sixth observation is said to have high leverage and is referred to as a leverage point. What do you think the term “leverage point” means? Leverages

slide 27 DSCI 5340 FORECASTING Another measure sometimes used in place of the standardized residual is the standardized residual computed after deleting the ith observation. This measure is called the studentized residual or studentized deleted residual. (Note that SAS refers to the standardized residual as the studentized residual.) Studentized residuals

slide 28 DSCI 5340 FORECASTING Checking Model Assumptions Checking Assumption 1 - Normal distribution Construct a histogram Checking Assumption 3 - Errors are independent Durbin-Watson statistic Plot of errors and time Checking Assumption 2 - Constant variance Plot residuals versus predicted Y values

slide 29 DSCI 5340 FORECASTING Detecting Sample Outliers  Sample leverages  Standardized residuals  Cook’s distance measure Cook’s distance measure D i = (standardized residual) 2 1 k + 1 h i 1 - h i

slide 30 DSCI 5340 FORECASTING Example of An Influential Observation

slide 31 DSCI 5340 FORECASTING If an observation is exerting undue influence on the fit of the model, then from an exploratory and data-mining standpoint, removing the observation may reveal a substantial changes in the model. An observation may be miscoded or not be appropriate for the collected data. No more than 10% of the data should be deleted to improve the model. Should an unusual observation be deleted?

slide 32 DSCI 5340 FORECASTING Dummy Variables

slide 33 DSCI 5340 FORECASTING Test of Null Hypothesis (F-test) Tests the null hypothesis: H0:  2 =  3  p = 0 Ha: at least one beta is not zero Null hypothesis is known as a joint or simultaneous hypothesis, because it compares the values of all  i simultaneously. This tests overall significance of regression model. There is an F test for the overall model.

slide 34 DSCI 5340 FORECASTING Model building: Backward Selection A “deconstruction” approach Begin with the saturated (full) regression model Compute the drop in R 2 as a consequence of eliminating each predictor variable, and the partial F-test value; treat as if the variable was the last to enter the regression equation Compare the lowest partial F-test value, (designated F L ), to the critical value of F (designated F C ) a. If F L < F C, remove the variable and recompute the regression equation using the remaining predictor variables and return to step 2. b. F L > F C, adopt the regression equation as calculated

slide 35 DSCI 5340 FORECASTING Model building: Stepwise Selection Calculate correlations of all predictors with response variable Select the predictor variable with highest correlation. Regress Y on X i. Retain the predictor if there is a significant F-test value. Calculate partial correlations of all variable not in equation with response variable. Select next predictor to enter that has the highest partial correlation. Call this predictor Xj. Compute the regression equation with both X i and X j entered. Retain X j if its partial F-value exceeds the tabulated F (1, n-2-1) df. Now determine whether X i warrants retention. Compare its partial F-value as if X j was entered into the equation first.

slide 36 DSCI 5340 FORECASTING Stepwise Continued Retain if its F-value exceeds the tabulated F value Enter a new X k variable. Compute regression with three predictors. Compute partial F-values for X i, X j and X k. Determine whether any should be retained by comparing observed partial F with the critical F. 6) Retain regression equation when no other predictor can be entered or removed from the model.