Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13,
Part 3 Modeling Binary Choice
A Model for Binary Choice Yes or No decision (Buy/Not buy) Example, choose to fly or not to fly to a destination when there are alternatives. Model: Net utility of flying U fly = + 1Cost + 2Time + Income + Choose to fly if net utility is positive Data: X = [1,cost,terminal time] Z = [income] y = 1 if choose fly, U fly > 0, 0 if not.
What Can Be Learned from the Data? (A Sample of Consumers, i = 1,…,N) Are the attributes “relevant?” Predicting behavior - Individual - Aggregate Analyze changes in behavior when attributes change
Application 210 Commuters Between Sydney and Melbourne Available modes = Air, Train, Bus, Car Observed: Choice Attributes: Cost, terminal time, other Characteristics: Household income First application: Fly or other
Binary Choice Data Choose Air Gen.Cost Term Time Income
An Econometric Model Choose to fly iff U FLY > 0 U fly = + 1Cost + 2Time + Income + U fly > 0 > -( + 1Cost + 2Time + Income) Probability model: For any person observed by the analyst, Prob(fly) = Prob[ > -( + 1Cost + 2Time + Income)] Note the relationship between the unobserved and the outcome
+ 1Cost + 2TTime + Income
Econometrics How to estimate , 1, 2, ? It’s not regression The technique of maximum likelihood Prob[y=1] = Prob[ > -( + 1Cost + 2Time + Income)] Prob[y=0] = 1 - Prob[y=1] Requires a model for the probability
Completing the Model: F( ) The distribution Normal: PROBIT, natural for behavior Logistic: LOGIT, allows “thicker tails” Gompertz: EXTREME VALUE, asymmetric, underlies the basic logit model for multiple choice Does it matter? Yes, large difference in estimates Not much, quantities of interest are more stable.
Estimated Binary Choice Model | Binomial Probit Model | | Maximum Likelihood Estimates | | Model estimated: Jan 20, 2004 at 04:08:11PM.| | Dependent variable MODE | | Weighting variable None | | Number of observations 210 | | Iterations completed 6 | | Log likelihood function | | Restricted log likelihood | | Chi squared | | Degrees of freedom 3 | | Prob[ChiSqd > value] = | | Hosmer-Lemeshow chi-squared = | | P-value= with deg.fr. = 8 | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Index function for probability Constant GC TTME HINC
Estimated Binary Choice Models LOGIT PROBIT EXTREME VALUE Variable Estimate t-ratio Estimate t-ratio Estimate t-ratio Constant GC TTME HINC Log-L Log-L(0)
+ 1Cost + 2Time + (Income+1) Effect on predicted probability of an increase in income ( is positive)
How Well Does the Model Fit? There is no R squared “Fit measures” computed from log L “pseudo R squared = 1 – logL0/logL Others… - these do not measure fit. Direct assessment of the effectiveness of the model at predicting the outcome
Fit Measures for Binary Choice Likelihood Ratio Index Bounded by 0 and 1 Rises when the model is expanded Cramer (and others)
Fit Measures for the Logit Model | Fit Measures for Binomial Choice Model | | Probit model for variable MODE | | Proportions P0= P1= | | N = 210 N0= 152 N1= 58 | | LogL = LogL0 = | | Estrella = 1-(L/L0)^(-2L0/n) = | | Efron | McFadden | Ben./Lerman | | | | | | Cramer | Veall/Zim. | Rsqrd_ML | | | | | | Information Akaike I.C. Schwarz I.C. | | Criteria | Pseudo – R-squared
Predicting the Outcome Predicted probabilities P = F(a + b1Cost + b2Time + cIncome) Predicting outcomes Predict y=1 if P is large Use 0.5 for “large” (more likely than not) Count successes and failures
Individual Predictions from a Logit Model Observation Observed Y Predicted Y Residual x(i)b Pr[Y=1] Note two types of errors and two types of successes.
Predictions in Binary Choice Predict y = 1 if P > P* Success depends on the assumed P*
ROC Curve Plot %Y=1 correctly predicted vs. %y=1 incorrectly predicted 45 0 is no fit. Curvature implies fit. Area under the curve compares models
Aggregate Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 =.5000 Predicted Actual 0 1 | Total | | Total | 210
Analyzing Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 is P* (This table can be computed with any P*.) Predicted Actual 0 1 | Total N(a0,p0) N(a0,p1) | N(a0) 1 N(a1,p0) N(a1,p1) | N(a1) Total N(p0) N(p1) | N
Analyzing Predictions - Success Sensitivity = % actual 1s correctly predicted = 100N(a1,p1)/N(a1) % [100(38/58)=65.5%] Specificity = % actual 0s correctly predicted = 100N(a0,p0)/N(a0) % [100(151/152)=99.3%] Positive predictive value = % predicted 1s that were actual 1s = 100N(a1,p1)/N(p1) % [100(38/39)=97.4%] Negative predictive value = % predicted 0s that were actual 0s = 100N(a0,p0)/N(p0) % [100(151/171)=88.3%] Correct prediction = %actual 1s and 0s correctly predicted = 100[N(a1,p1)+N(a0,p0)]/N [100(151+38)/210=90.0%]
Analyzing Predictions - Failures False positive for true negative = %actual 0s predicted as 1s = 100N(a0,p1)/N(a0) % [100(1/152)=0.668%] False negative for true positive = %actual 1s predicted as 0s = 100N(a1,p0)/N(a1) % [100(20/258)=34.5%] False positive for predicted positive = % predicted 1s that were actual 0s = 100N(a0,p1)/N(p1) % [100(1/39)=2/56%] False negative for predicted negative = % predicted 0s that were actual 1s = 100N(a1,p0)/N(p0) % [100(20/171)=11.7%] False predictions = %actual 1s and 0s incorrectly predicted = 100[N(a0,p1)+N(a1,p0)]/N [100(1+20)/210=10.0%]
Aggregate Prediction is a Useful Way to Assess the Importance of a Variable Frequencies of actual & predicted outcomes. Predicted outcome has maximum probability. Threshold value for predicting Y=1 =.5000 Predicted Actual 0 1 | Total | | Total | 210 Predicted Actual 0 1 | Total | | Total | 210 Model fit without TTMEModel fit with TTME