Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)

DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5) Material based on: Bowerman-O’Connell-Koehler, Brooks/Cole

DSCI 5340 FORECASTING Page 127-128 Ex 3.12 (Use Excel) Page 128 Ex 3.13, 3.17 Page 132 Ex 3.25 Page 134 Ex 3.35 Review of textbook HW

DSCI 5340 FORECASTING In Excel, Make Sure Analysis ToolPak is an add-in. Excel Data Analysis Add-in

DSCI 5340 FORECASTING Ex 3.12 Page 128 Scatter Plot An accountant wishes to predict direct labor cost (y) based on the batch size (x) of a product produced in a job shop. Data for 12 production runs are given.

DSCI 5340 FORECASTING a.  y|x=60 =  0 +  1 (60) : The average value of y for repeated values of X=60. This is the point on the regression line predicted for Y at X=60. b.  y|x=30 =  0 +  1 (30) : The average value of y for repeated values of X=30. This is the point on the regression line predicted for Y at X=30. The distribution of values around X=30 should be similar to that for X=60. c. Interpretation of slope: As the Batch Size increases by one unit, the direct labor cost increases by  1 = 10.1463. SUMMARY OUTPUT Regression Statistics Multiple R0.99963578 R Square0.999271693 Adjusted R Square0.999198862 Standard Error8.641541386 Observations12 ANOVA dfSSMSFSignificance F Regression11024592.904102459313720.475.04436E-17 Residual10746.762375274.67624 Total111025339.667 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept18.487507544.6765797893.9532110.0027168.06743845928.90757661 Batch_Size_X10.146258960.086620659117.13445.04E-179.95325610410.33926181 Ex 3.13 Page 128 Interpretation of Mean of Y Given X Fitted model: Ŷ = 18.49 + 10.15X

DSCI 5340 FORECASTING Intercept  0 : 18.49 is the Labor Cost if the batch size is 0. Theoretically, this costs would be 0, but it can be interpreted as fixed costs. Interpretation of Error Term: There may be other factors that determine direct labor costs, such as benefits to employees, type of product, number of employees, etc. Thus, the model may be more accurate with additional independent variables that are being compensated by having an error term in the model. Ex 3.13 Page 128 Interpretation of Model

DSCI 5340 FORECASTING Accu-Copiers, Inc., sells and services the Accu-500 copying machine. As part of its standard service contract, the company agrees to perform routine service on the copier. To obtain information about the time it takes to perform routine service, Accu- Copiers has collected data for 11 service calls, shown in Table 3.7 (p. 126) Ex 3.17 Page 128

DSCI 5340 FORECASTING EX 3.17 Page 128

DSCI 5340 FORECASTING The test for correlation between X and Y: H 0 : ρ = 0 vs. H a : ρ ≠ 0 Has the same test statistic and p-value as the test for significance of the regression slope coefficient. However, the two tests use different assumptions. EX 3-25 Page 132: Test for correlation

DSCI 5340 FORECASTING EX 3-35 Page 134 A State Department of Taxation asked taxpayers to report the time y (in hours) required to complete a tax form and the number of times x (including this one) the taxpayer has filled out this form

DSCI 5340 FORECASTING EX 3-35 Page 134 To understand this model, not that as x increases, 1/x decreases and thus μ y|x decreases.

DSCI 5340 FORECASTING Multiple Regression Graphically

DSCI 5340 FORECASTING The residuals will be denoted ê i : ê i = y i - í i They represent the distance that each dependent variable value is from the estimated regression line or the portion of the variation in y that cannot be “explained” with the data available. What assumptions can we test using these residuals? Residuals

DSCI 5340 FORECASTING The relationship is linear. The disturbances e i have constant variance s 2 e. The disturbances are independent. The disturbances are normally distributed. What are the Assumptions of Regression Analysis? How can these assumptions be checked? Regression model assumptions

DSCI 5340 FORECASTING Graphical Techniques scatterplots residual plots histograms (not an exact science)

DSCI 5340 FORECASTING Property 1: The average of the residuals will be equal to zero. This property holds regardless of whether the assumptions are true or not and is a direct result of the way the least-squares method works. Property 2: There should be no systematic pattern in a residual plot. (What is a systematic pattern?) Property 3: Residuals should look like random numbers chosen from a normal distribution. (How close to normality should the chart look?) Properties of residual plots

DSCI 5340 FORECASTING In a residual analysis it is suggested that the following plots be used: 1. Plot the residuals versus each explanatory variable. 2. Plot the residuals versus the predicted or fitted values. 3. If the data are measured over time, plot the residuals versus some variable representing the time sequence. What assumptions can each of these support or indicate a violation? Residual plots

DSCI 5340 FORECASTING Plots may be constructed using the actual residuals, ê i, or the standardized residuals. The standardized residuals are simply the residuals divided by their standard deviation. Why do you think standardized residuals are sometimes used instead of regular residuals? Residual plots

DSCI 5340 FORECASTING No Violations of the Assumptions of Regression Plot shows random residuals

DSCI 5340 FORECASTING Does this Plot Look Like One of the Assumptions of Regression Analysis is Violated?

DSCI 5340 FORECASTING PLOT OF RESIDUALS - Standardized values are small.

DSCI 5340 FORECASTING The method of least squares estimation chooses the regression coefficient estimates so the error sum of squares, SSE, is a minimum. In doing this, the distances from the true y values, y i, to the points on the regression line of or surface, í i, are minimized. Least squares thus tries to avoid any large distances from y i to í i. Outliers

DSCI 5340 FORECASTING OUTLIER: When a sample data point has a y value that is much different from the y values of the other points in the sample. An outlier is any value whose studentized residual is greater than 2. An outlier does not have to be influential. That is, removing the outlier may not change the regression coefficients very much. Outliers

DSCI 5340 FORECASTING No influential observations

DSCI 5340 FORECASTING A High Leverage Observation That is Not Influential

DSCI 5340 FORECASTING The slope of the line appears to be determined almost entirely by this one point. The sixth observation is said to have high leverage and is referred to as a leverage point. What do you think the term “leverage point” means? Leverages

DSCI 5340 FORECASTING Another measure sometimes used in place of the standardized residual is the standardized residual computed after deleting the ith observation. This measure is called the studentized residual or studentized deleted residual. (Note that SAS refers to the standardized residual as the studentized residual.) Studentized residuals

DSCI 5340 FORECASTING Checking Model Assumptions Checking Assumption 1 - Normal distribution Construct a histogram Checking Assumption 3 - Errors are independent Durbin-Watson statistic Plot of errors and time Checking Assumption 2 - Constant variance Plot residuals versus predicted Y values

DSCI 5340 FORECASTING Detecting Sample Outliers  Sample leverages  Standardized residuals  Cook’s distance measure Cook’s distance measure D i = (standardized residual) 2 1 k + 1 h i 1 - h i

DSCI 5340 FORECASTING Example of An Influential Observation

DSCI 5340 FORECASTING If an observation is exerting undue influence on the fit of the model, then from an exploratory and data-mining standpoint, removing the observation may reveal a substantial changes in the model. An observation may be miscoded or not be appropriate for the collected data. No more than 10% of the data should be deleted to improve the model. Should an unusual observation be deleted?

DSCI 5340 FORECASTING Dummy Variables

DSCI 5340 FORECASTING Test of Null Hypothesis (F-test) Tests the null hypothesis: H0:  2 =  3  p = 0 Ha: at least one beta is not zero Null hypothesis is known as a joint or simultaneous hypothesis, because it compares the values of all  i simultaneously. This tests overall significance of regression model. There is an F test for the overall model.

DSCI 5340 FORECASTING Model building: Backward Selection A “deconstruction” approach Begin with the saturated (full) regression model Compute the drop in R 2 as a consequence of eliminating each predictor variable, and the partial F-test value; treat as if the variable was the last to enter the regression equation Compare the lowest partial F-test value, (designated F L ), to the critical value of F (designated F C ) a. If F L < F C, remove the variable and recompute the regression equation using the remaining predictor variables and return to step 2. b. F L > F C, adopt the regression equation as calculated

DSCI 5340 FORECASTING Model building: Stepwise Selection Calculate correlations of all predictors with response variable Select the predictor variable with highest correlation. Regress Y on X i. Retain the predictor if there is a significant F-test value. Calculate partial correlations of all variable not in equation with response variable. Select next predictor to enter that has the highest partial correlation. Call this predictor Xj. Compute the regression equation with both X i and X j entered. Retain X j if its partial F-value exceeds the tabulated F (1, n-2-1) df. Now determine whether X i warrants retention. Compare its partial F-value as if X j was entered into the equation first.

DSCI 5340 FORECASTING Stepwise Continued Retain if its F-value exceeds the tabulated F value Enter a new X k variable. Compute regression with three predictors. Compute partial F-values for X i, X j and X k. Determine whether any should be retained by comparing observed partial F with the critical F. 6) Retain regression equation when no other predictor can be entered or removed from the model.

Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)

Similar presentations

Presentation on theme: "Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)

Similar presentations

Presentation on theme: "Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)"— Presentation transcript:

Similar presentations

About project

Feedback