1 Assessment and Interpretation: MBA Program Admission Policy The dean of a large university wants to raise the admission standards to the popular MBA program. She plans to develop a method that can predict an applicant’s performance in the program. She believes a student’s success can be predicted by: –Undergraduate GPA –Graduate Management Admission Test (GMAT) score –Number of years of work experience
2 MBA Program Admission Policy A randomly selected sample of students who completed the MBA was selected. (See MBA).MBA Develop a plan to decide which applicant to admit.
3 MBA Program Admission Policy Solution –The model to estimate is: y = 0 + 1 x 1 + 2 x 2 + 3 x 3 + y = MBA GPA x 1 = undergraduate GPA [UnderGPA] x 2 = GMAT score [GMAT] x 3 = years of work experience [Work] –The estimated model: MBA GPA = b 0 + b 1 UnderGPA + b 2 GMAT + b 3 Work
4 MBA Program Admission Policy – Model Diagnostics We estimate the regression model then we check: Normality of errors
5 MBA Program Admission Policy – Model Diagnostics We estimate the regression model then we check: The variance of the error variable
6 MBA Program Admission Policy – Model Diagnostics
7 MBA Program Admission Policy – Model Assessment The model is valid (p-value = …) 46.35% of the variation in MBA GPA is explained by the model. GMAT score and years of work experience are linearly related to MBA GPA. Insufficient evidence of linear relationship between undergraduate GPA and MBA GPA.
8 The conditions required for the model assessment to apply must be checked. –Is the error variable normally distributed? –Is the error variance constant? –Are the errors independent? –Can we identify outlier? –Is multicolinearity (intercorrelation)a problem? Regression Diagnostics - II Draw a histogram of the residuals Plot the residuals versus y ^ Plot the residuals versus the time periods
9 Diagnostics: Multicolinearity Example 19.2: Predicting house price ( Xm19-02) Xm19-02 –A real estate agent believes that a house selling price can be predicted using the house size, number of bedrooms, and lot size. –A random sample of 100 houses was drawn and data recorded. –Analyze the relationship among the four variables
10 The proposed model is PRICE = 0 + 1 BEDROOMS + 2 H-SIZE + 3 LOTSIZE + The model is valid, but no variable is significantly related to the selling price ?! Diagnostics: Multicolinearity
11 Multicolinearity is found to be a problem. Diagnostics: Multicolinearity Multicolinearity causes two kinds of difficulties: –The t statistics appear to be too small. –The coefficients cannot be interpreted as “slopes”.
12 Remedying Violations of the Required Conditions Nonnormality or heteroscedasticity can be remedied using transformations on the y variable. The transformations can improve the linear relationship between the dependent variable and the independent variables. Many computer software systems allow us to make the transformations easily.
13 Durbin - Watson Test: Are the Errors Autocorrelated? This test detects first order autocorrelation between consecutive residuals in a time series If autocorrelation exists the error variables are not independent Residual at time i
14 Positive First Order Autocorrelation Residuals Time Positive first order autocorrelation occurs when consecutive residuals tend to be similar. Then, the value of d is small (less than 2). 0 +
15 Negative First Order Autocorrelation Residuals Time Negative first order autocorrelation occurs when consecutive residuals tend to markedly differ. Then, the value of d is large (greater than 2).
16 If d<d L there is enough evidence to show that positive first-order correlation exists If d>d U there is not enough evidence to show that positive first-order correlation exists If d is between d L and d U the test is inconclusive. One tail test for Positive First Order Autocorrelation dLdL First order correlation exists Inconclusive test Positive first order correlation Does not exists dUdU
17 One Tail Test for Negative First Order Autocorrelation If d>4-d L, negative first order correlation exists If d<4-d U, negative first order correlation does not exists if d falls between 4-d U and 4-d L the test is inconclusive. Negative first order correlation exists 4-d U 4-d L Inconclusive test Negative first order correlation does not exist
18 If d 4-d L first order autocorrelation exists If d falls between d L and d U or between 4-d U and 4-d L the test is inconclusive If d falls between d U and 4-d U there is no evidence for first order autocorrelation dLdL dUdU d U 4-d L First order correlation exists First order correlation exists Inconclusive test Inconclusive test First order correlation does not exist First order correlation does not exist Two-Tail Test for First Order Autocorrelation
19 Example 19.3 (Xm19-03)Xm19-03 –How does the weather affect the sales of lift tickets in a ski resort? –Data of the past 20 years sales of tickets, along with the total snowfall and the average temperature during Christmas week in each year, was collected. –The model hypothesized was TICKETS = 0 + 1 SNOWFALL + 2 TEMPERATURE+ –Regression analysis yielded the following results: Testing the Existence of Autocorrelation, Example
20 The Regression Equation – Assessment (I) The model seems to be very poor: The model seems to be very poor: R-square= It is not valid (Signif. F =0.3373) No variable is linearly related to Sales Xm19-03
21 Diagnostics: The Error Distribution The errors histogram The errors may be normally distributed
22 Residual vs. predicted y It appears there is no problem of heteroscedasticity (the error variance seems to be constant). Diagnostics: Heteroscedasticity
23 Residual over time Diagnostics: First Order Autocorrelation The errors are not independent!!
24 Test for positive first order auto- correlation: n=20, k=2. From the Durbin-Watson table we have: d L =1.10, d U =1.54. The statistic d= Conclusion: Because d<d L, there is sufficient evidence to infer that positive first order autocorrelation exists. Using the computer - Excel Tools > Data Analysis > Regression (check the residual option and then OK) Tools > Data Analysis Plus > Durbin Watson Statistic > Highlight the range of the residuals from the regression run > OK The residuals Diagnostics: First Order Autocorrelation
25 The Modified Model: Time Included The modified regression model (Xm19-03mod)Xm19-03mod TICKETS = 0 + 1 SNOWFALL + 2 TEMPERATURE + 3 TIME + All the required conditions are met for this model. The fit of this model is high R 2 = The model is valid. Significance F = SNOWFALL and TIME are linearly related to ticket sales. TEMPERATURE is not linearly related to ticket sales.