Presentation is loading. Please wait.

Presentation is loading. Please wait.

Session 10. Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous)

Similar presentations


Presentation on theme: "Session 10. Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous)"— Presentation transcript:

1 Session 10

2 Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous) dependent variables How? –Minitab procedure –Interpreting results –Some diagnostics –Making predictions –Comparison with regular regression model

3 Applied Regression -- Prof. Juran3 Logistic Regression In our previous discussions of regression analysis, we have implicitly assumed that the dependent variable is continuous. We have learned some methods for operationalizing binary independent variables (using dummy variables), but have not discussed any method for dealing with categorical or binary dependent variables with regression analysis. (One non-regression method is discriminant analysis.) There are a number of tools available, but we will focus here on logistic regression.

4 Applied Regression -- Prof. Juran4 The basic idea: instead of predicting the exact value of the (binary) dependent variable, we will try to model the probability that the dependent variable takes on the value of 1. In English,  is the probability that the dependent variable is 1, given a particular vector of values for the independent variables.

5 Applied Regression -- Prof. Juran5 Example: Rick Beck Consumer Credit

6 Applied Regression -- Prof. Juran6 Why not a normal multiple regression model?

7 Applied Regression -- Prof. Juran7 Here we have Since  is an estimated probability, it shouldn’t go outside of the range from zero to one. But our regression equation is unbounded, and in this data set sometimes  takes on illogical estimated values.

8 Applied Regression -- Prof. Juran8 We address this problem with a logistic response function:

9 Applied Regression -- Prof. Juran9

10 10 This sort of relationship will meet our criteria of keeping  in the proper range. (Note: the cumulative normal distribution has a similar shape, and is the basis for the probit model.) What we need is a transformation of either X or  such that the relationship is linear. This would enable us to use linear regression to create a model.

11 Applied Regression -- Prof. Juran11

12 Applied Regression -- Prof. Juran12

13 Applied Regression -- Prof. Juran13

14 Applied Regression -- Prof. Juran14 Minitab Results Response Information Here we get the number of observations that fall into each of the two response categories. The response value that has been designated as the “reference event” is the first entry under Value and labeled as the event. In this case, the reference event is “being in default”. Response Information Variable Value Count Default 1 153 (Event) 0 847 Total 1000

15 Applied Regression -- Prof. Juran15 Deviance Table Source DF Adj Dev Adj Mean Chi-Square P-Value Regression 5 283.811 56.7621 283.81 0.000 Single 1 13.113 13.1125 13.11 0.000 Credit D 1 60.523 60.5230 60.52 0.000 Credit E 1 84.985 84.9850 84.98 0.000 Children 1 9.932 9.9316 9.93 0.002 Debt 1 39.674 39.6744 39.67 0.000 Error 994 571.945 0.5754 Total 999 855.756 Similar to T tests for individual slopes Similar to F test for all slopes

16 Applied Regression -- Prof. Juran16 Smaller values of Akaike Information Criterion (AIC) indicate a better fit Deviance R-Sq R-Sq(adj) AIC 33.16% 32.58% 583.95 Coefficients Term Coef SE Coef VIF Constant -1.139 0.337 Single 0.970 0.272 1.56 Credit D 2.023 0.263 1.18 Credit E 3.038 0.348 1.24 Children -0.849 0.271 1.57 Debt -0.000019 0.000004 1.07 The regression model

17 Applied Regression -- Prof. Juran17 The coefficient of 0.970 for Single represents the estimated change in the log of P (default)/ P (not default) when the subject is single compared to when he/she is not single, with the other independent variables held constant. The coefficient of –0.019 for Debt is the estimated change in the log of P (default)/ P (not default) with a $1000 increase in Debt, with the other independent variables held constant.

18 Applied Regression -- Prof. Juran18 Regression Equation P(1) = exp(Y')/(1 + exp(Y')) Y' = -1.139 + 0.970 Single + 2.023 Credit D + 3.038 Credit E - 0.849 Children - 0.000019 Debt Goodness-of-Fit Tests Test DF Chi-Square P-Value Deviance 994 571.95 1.000 Pearson 994 642.32 1.000 Hosmer-Lemeshow 8 29.76 0.000

19 Applied Regression -- Prof. Juran19 Fits and Diagnostics for Unusual Observations Observed Obs Probability Fit Resid Std Resid 6 1.0000 0.4641 1.2391 1.25 X 39 1.0000 0.4372 1.2864 1.30 X 58 1.0000 0.4671 1.2338 1.25 X 62 1.0000 0.0872 2.2087 2.21 R 66 1.0000 0.6670 0.9000 0.91 X 85 1.0000 0.4510 1.2619 1.28 X 90 0.0000 0.6372 -1.4240 -1.44 X 115 0.0000 0.5637 -1.2879 -1.30 X 123 1.0000 0.6899 0.8616 0.88 X 136 1.0000 0.1037 2.1288 2.14 R

20 Applied Regression -- Prof. Juran20 Making Predictions

21 Applied Regression -- Prof. Juran21

22 Applied Regression -- Prof. Juran22

23 Applied Regression -- Prof. Juran23

24 Applied Regression -- Prof. Juran24

25 Applied Regression -- Prof. Juran25

26 Applied Regression -- Prof. Juran26

27 Applied Regression -- Prof. Juran27

28 Applied Regression -- Prof. Juran28 Summary Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous) dependent variables How? –Minitab procedure –Interpreting results –Some diagnostics –Making predictions –Comparison with regular regression model

29 Applied Regression -- Prof. Juran29 For Session 11 and 12 Student presentations


Download ppt "Session 10. Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous)"

Similar presentations


Ads by Google