Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13, 2004
Discrete Choice Modeling Econometric Methodology Binary Choice Models Multinomial Choice Model Building Specification Estimation Analysis Applications NLOGIT Software
Our Agenda 1. Methodology 2. Discrete Choice Models 3. Binary Choice Models 4. Panel Data Models for Binary Choice 5. Introduction to NLOGIT 6. Discrete Choice Settings 7. The Multinomial Logit Model 8. Heteroscedasticity in Utility Functions 9. Nested Logit Modeling 10. Latent Class Models 11. Mixed Logit Models and Simulation Based Estimation 12. Revealed and Stated Preference Data Sets
Part 1 Methodology
Measurement as Observation Population Measurement Theory Characteristics Behavior Patterns
Individual Behavioral Modeling Assumptions about behavior Common elements across individuals Unique elements Prediction Population aggregates Individual behavior
Modeling Choice Activity as choices Preferences Behavioral axioms Choice as utility maximization
Inference Population Measurement Econometrics Characteristics Behavior Patterns Choices
Econometric Frameworks Nonparametric Parametric Classical (Sampling Theory) Bayesian
Likelihood Based Inference Methods Behavioral Theory Statistical Theory Observed Measurement Likelihood Function The likelihood function embodies the theoretical description of the population. Characteristics of the population are inferred from the characteristics of the likelihood function. (Bayesian and Classical)
Modeling Discrete Choice Theoretical foundations Econometric methodology Models Statistical bases Econometric methods Estimation with econometric software Applications
Part 2 Basics of Discrete Choice Modeling
Modeling Consumer Choice: Continuous Measurement What do we measure? What is revealed by the data? What is the underlying model? What are the empirical tools? Example: Travel expenditure based on price and income Expenditure Income Low price High price
Discrete Choice Observed outcomes Inherently discrete: number of occurrences (e.g., family size; considered separately) Implicitly continuous: the observed data are discrete by construction (e.g., revealed preferences; our main subject) Implications For model building For analysis and prediction of behavior
Two Fundamental Building Blocks Underlying Behavioral Theory: Random utility model The link between underlying behavior and observed data Empirical Tool: Stochastic, parametric model for binary choice A p latform for models of discrete choice
Random Utility A Theoretical Proposition About Behavior Consumer making a choice among several alternatives Example, brand choice (car, food) Choice setting for a consumer: Notation Consumer i, i = 1, …, N Choice setting t, t = 1, …, T i (may be one) Choice set j, j = 1,…, J i (may be fixed)
Behavioral Assumptions Preferences are transitive and complete wrt choice situations Utility is defined over alternatives: U ijt Utility maximization assumption If U i1t > U i2t, consumer chooses alternative 1, not alternative 2. Revealed preference (duality) If the consumer chooses alternative 1 and not alternative 2, then U i1t > U i2t.
Random Utility Functions U itj = j + i ’x itj + i ’z it + ijt j = Choice specific constant x itj = Attributes of choice presented to person i = Person specific taste weights z it = Characteristics of the person i = Weights on person specific characteristics ijt = Unobserved random component of utility Mean: E[ ijt ] = 0, Var[ ijt ] = 1
Part 3 Modeling Binary Choice
A Model for Binary Choice Yes or No decision (Buy/Not buy) Example, choose to fly or not to fly to a destination when there are alternatives. Model: Net utility of flying U fly = + 1Cost + 2Time + Income + Choose to fly if net utility is positive Data: X = [1,cost,terminal time] Z = [income] y = 1 if choose fly, U fly > 0, 0 if not.
What Can Be Learned from the Data? (A Sample of Consumers, i = 1,…,N) Are the attributes “relevant?” Predicting behavior - Individual - Aggregate Analyze changes in behavior when attributes change
Application 210 Commuters Between Sydney and Melbourne Available modes = Air, Train, Bus, Car Observed: Choice Attributes: Cost, terminal time, other Characteristics: Household income First application: Fly or other
Binary Choice Data Choose Air Gen.Cost Term Time Income
An Econometric Model Choose to fly iff U FLY > 0 U fly = + 1Cost + 2Time + Income + U fly > 0 > -( + 1Cost + 2Time + Income) Probability model: For any person observed by the analyst, Prob(fly) = Prob[ > -( + 1Cost + 2Time + Income)] Note the relationship between the unobserved and the outcome
+ 1Cost + 2TTime + Income
Econometrics How to estimate , 1, 2, ? It’s not regression The technique of maximum likelihood Prob[y=1] = Prob[ > -( + 1Cost + 2Time + Income)] Prob[y=0] = 1 - Prob[y=1] Requires a model for the probability
Completing the Model: F( ) The distribution Normal: PROBIT, natural for behavior Logistic: LOGIT, allows “thicker tails” Gompertz: EXTREME VALUE, asymmetric, underlies the basic logit model for multiple choice Does it matter? Yes, large difference in estimates Not much, quantities of interest are more stable.
Estimated Binary Choice Models LOGIT PROBIT EXTREME VALUE Variable Estimate t-ratio Estimate t-ratio Estimate t-ratio Constant GC TTME HINC Log-L Log-L(0)
+ 1Cost + 2Time + (Income+1) Effect on predicted probability of an increase in income ( is positive)
Marginal Effects in Probability Models Prob[Outcome] = some F( + 1Cost…) “Partial effect” = F( + 1Cost…) / ”x” (derivative) Partial effects are derivatives Result varies with model Logit: F( + 1Cost…) / x = Prob * (1-Prob) * Probit: F( + 1Cost…) / x = Normal density Scaling usually erases model differences
The Delta Method
Marginal Effects for Binary Choice Logit Probit
Estimated Marginal Effects Estimate t-ratio Estimate t-ratio Estimate t-ratio GC TTME HINC Logit Probit Extreme Value
Marginal Effect for a Dummy Variable Prob[y i = 1|x i,d i ] = F( ’x i + d i ) =conditional mean Marginal effect of d Prob[y i = 1|x i,d i =1]=Prob[y i = 1|x i,d i =0] Logit:
(Marginal) Effect – Dummy Variable HighIncm = 1(Income > 50) | Partial derivatives of probabilities with | | respect to the vector of characteristics. | | They are computed at the means of the Xs. | | Observations used are All Obs. | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Characteristics in numerator of Prob[Y = 1] Constant GC E E TTME E E Marginal effect for dummy variable is P|1 - P|0. HIGHINCM E E (Autodetected)
Computing Effects Compute at the data means? Simple Inference is well defined Average individual effects More appropriate? Asymptotic standard errors. (Not done correctly in the literature – terms are correlated!)
Elasticities Elasticity = How to compute standard errors? Delta method Bootstrap Bootstrap the individual elasticities? (Will neglect variation in parameter estimates.) Bootstrap model estimation?
Estimated Income Elasticity for Air Choice Model | Results of bootstrap estimation of model.| | Model has been reestimated 25 times. | | Statistics shown below are centered | | around the original estimate based on | | the original full sample of observations.| | Result is ETA = | | bootstrap samples have 840 observations.| | Estimate RtMnSqDev Skewness Kurtosis | | | | Minimum =.125 Maximum = | Mean Income = 34.55, Mean P =.2716, Estimated ME = , Estimated Elasticity=
Odds Ratio – Logit Model Only Effect Measure? “Effect of a unit change in the odds ratio.”
Inference for Odds Ratios Logit coefficient = , estimate = b Coefficient = exp( ), estimate = exp(b) Standard error = exp(b) times se(b) t ratio is the same
How Well Does the Model Fit? There is no R squared “Fit measures” computed from log L “pseudo R squared = 1 – logL0/logL Others… - these do not measure fit. Direct assessment of the effectiveness of the model at predicting the outcome
Fit Measures for Binary Choice Likelihood Ratio Index Bounded by 0 and 1 Rises when the model is expanded Cramer (and others)
Fit Measures for the Logit Model | Fit Measures for Binomial Choice Model | | Probit model for variable MODE | | Proportions P0= P1= | | N = 210 N0= 152 N1= 58 | | LogL = LogL0 = | | Estrella = 1-(L/L0)^(-2L0/n) = | | Efron | McFadden | Ben./Lerman | | | | | | Cramer | Veall/Zim. | Rsqrd_ML | | | | | | Information Akaike I.C. Schwarz I.C. | | Criteria |
Predicting the Outcome Predicted probabilities P = F(a + b1Cost + b2Time + cIncome) Predicting outcomes Predict y=1 if P is large Use 0.5 for “large” (more likely than not) Count successes and failures
Individual Predictions from a Logit Model Observation Observed Y Predicted Y Residual x(i)b Pr[Y=1] Note two types of errors and two types of successes.
Predictions in Binary Choice Predict y = 1 if P > P* Success depends on the assumed P*
ROC Curve Plot %Y=1 correctly predicted vs. %y=1 incorrectly predicted 45 0 is no fit. Curvature implies fit. Area under the curve compares models
Aggregate Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 =.5000 Predicted Actual 0 1 | Total | | Total | 210
Analyzing Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 is P* (This table can be computed with any P*.) Predicted Actual 0 1 | Total N(a0,p0) N(a0,p1) | N(a0) 1 N(a1,p0) N(a1,p1) | N(a1) Total N(p0) N(p1) | N
Analyzing Predictions - Success Sensitivity = % actual 1s correctly predicted = 100N(a1,p1)/N(a1) % [100(38/58)=65.5%] Specificity = % actual 0s correctly predicted = 100N(a0,p0)/N(a0) % [100(151/152)=99.3%] Positive predictive value = % predicted 1s that were actual 1s = 100N(a1,p1)/N(p1) % [100(38/39)=97.4%] Negative predictive value = % predicted 0s that were actual 0s = 100N(a0,p0)/N(p0) % [100(151/171)=88.3%] Correct prediction = %actual 1s and 0s correctly predicted = 100[N(a1,p1)+N(a0,p0)]/N [100(151+38)/210=90.0%]
Analyzing Predictions - Failures False positive for true negative = %actual 0s predicted as 1s = 100N(a0,p1)/N(a0) % [100(1/152)=0.668%] False negative for true positive = %actual 1s predicted as 0s = 100N(a1,p0)/N(a1) % [100(20/258)=34.5%] False positive for predicted positive = % predicted 1s that were actual 0s = 100N(a0,p1)/N(p1) % [100(1/39)=2/56%] False negative for predicted negative = % predicted 0s that were actual 1s = 100N(a1,p0)/N(p0) % [100(20/171)=11.7%] False predictions = %actual 1s and 0s incorrectly predicted = 100[N(a0,p1)+N(a1,p0)]/N [100(1+20)/210=10.0%]
Aggregate Prediction is a Useful Way to Assess the Importance of a Variable Frequencies of actual & predicted outcomes. Predicted outcome has maximum probability. Threshold value for predicting Y=1 =.5000 Predicted Actual 0 1 | Total | | Total | 210 Predicted Actual 0 1 | Total | | Total | 210 Model fit without TTMEModel fit with TTME
Simulating the Model to Examine Changes in Market Shares Suppose TTME increased by 25% for everyone. Before increase After increase Predicted Actual 0 1 | Total | | Total | 210 Predicted Actual 0 1 | Total | | Total | 210 The model predicts 10 fewer people would fly NOTE: The same model used for both sets of predictions.
Scaling U itj = j + i ’x itj + i ’z it + ijt ijt = Unobserved random component of utility Mean: E[ ijt ] = 0, Var[ ijt ] = 1 Why assume variance = 1? What if there are subgroups with different variances? Cost of ignoring the between group variation? Specifically modeling More general heterogeneity across people Cost of the homogeneity assumption Modeling issues
Choice Between Two Alternatives By way of example: Automobile type Choices (1) SUV or (2) Sedan, J i = 2 One choice situation: T i = 1 Attribute: x ij = price, perhaps others Characteristic: z i = income No variation in taste parameters, i = What do revealed choices tell us?
Modeling the Binary Choice U i,suv = suv + P suv + suv Income + i,suv U i,sed = sed + P sed + sed Income + i,sed Chooses SUV: U i,suv > U i,sed U i,suv - U i,sed > 0 ( SUV - SED ) + (P SUV -P SED ) + ( SUV- sed)Income + i,suv - i,sed > 0 i > -[ + (P SUV -P SED ) + Income]
Probability Model for Choice Between Two Alternatives i > -[ + (P SUV -P SED ) + Income]
Individual vs. Grouped Data Proportions and Frequencies Likelihood is the same Yj i may be 1s and 0s, proportions, or frequencies for the two outcomes.
Weighting and Choice Based Sampling Weighted log likelihood for all data types Endogenous weights for individual data “Biased” sampling – “Choice Based”
Choice Based Sample SamplePopulationWeight Air27.62%14% Ground72.38%86%1.1882
Choice Based Sampling Correction Maximize Weighted Log Likelihood Covariance Matrix Adjustment V = H -1 G H -1 (all three weighted) H = Hessian G = Outer products of gradients “Robust” covariance matrix (?) (Above without weights. What is it robust to?)
Effect of Choice Based Sampling Unweighted |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Constant GC E E TTME E E HINC E E | Weighting variable CBWT | | Corrected for Choice Based Sampling | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Constant GC E E TTME E E HINC E E
Hypothesis Testing – Neyman/Pearson Comparisons of Likelihood Functions Likelihood Ratio Tests Lagrange Multiplier Tests Distance Measures: Wald Statistics (All to be demonstrated in the lab)
Heteroscedasticity in Binary Choice Models Random utility: Y i = 1 iff ’x i + i > 0 Resemblance to regression: How to accommodate heterogeneity in the random unobserved effects across individuals? Heteroscedasticity – different scaling Parameterize: Var[ i ] = exp( ’z i ) Reformulate probabilities Probit: Partial effects are now very complicated
Application: Credit Data Counts of major derogatory reports) “Deadbeat” = 1 if MAJORDRG > 0 Mean depends on AGE, INCOME, OWNRENT, SELFEMPLOYED Variance depends on AVGEXP, DEPENDT (average monthly expenditure, number of dependents) Probit model with heteroscedasticity
Probit with Heteroscedasticity | Binomial Probit Model | | Dependent variable DEADBEAT | | Number of observations 1319 | | Log likelihood function | | Restricted log likelihood | | Chi-squared | | Degrees of freedom 6 | | Significance level E-04 | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Index function for probability Constant AGE E E INCOME E E OWNRENT E SELFEMPL E-01 Variance function AVGEXP E E DEPNDT E E | Partial derivatives of E[y] = F[*] with | | respect to the vector of characteristics. | | They are computed at the means of the Xs. | | Observations used for means are All Obs. | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Index function for probability Constant E AGE E E INCOME E E OWNRENT E E SELFEMPL E E E-01 Variance function AVGEXP E E DEPNDT E E
Part 4 Panel Data Models for Binary Choice
Panel Data and Binary Choice Models U it = + ’x it + it + Person i specific effect Fixed effects using “dummy” variables U it = i + ’x it + it Random effects using omitted heterogeneity U it = + ’x it + ( it + v i ) Same outcome mechanism: Y it = [U it > 0]
Fixed and Random Effects Models Fixed Effects Robust to both specifications Inconvenient to compute (many parameters) Incidental parameters problem Random Effects Inconsistent if correlated with X Small number of parameters Easier to compute Computation – available estimators
Fixed Effects Dummy variable coefficients U it = i + ’x it + it Can be done by “brute force” for 10,000s of individuals F(.) = appropriate probability for the observed outcome Compute and i for i=1,…,N (may be large) See “Estimating Econometric Models with Fixed Effects” at
Random Effects U it = + ’x it + ( it + v v i ) Logit model (can be generalized) Joint probability for individual i | v i = Unobserved component v i must be eliminated Maximize wrt , and v How to do the integration? Analytic integration – quadrature; most familiar software Simulation
Estimation by Simulation is the sum of the logs of E[Pr(y1,y2,…|v i )]. Can be estimated by sampling vi and averaging. (Use random numbers.)
Random Effects is Equivalent to a Random Constant Term U it = + ’x it + ( it + v v i ) = ( + v i ) + ’x it + it = i + ’x it + it i is random with mean and variance View the simulation as sampling over i Why not make all the coefficients random?
A Sampling Experiment CLOGIT data using GC, TTME, INVT and HINC Standardized data: each X it * is (X it – Mean(X))/S x Constructed utilities U it = GC it * + 1 TTME it * + 1 INVT it * + (Random number it + HINC i *) Treat 4 observations in each group as a panel with T = 4. (We will examine a “live” panel data set in the lab.)
Estimated Fixed Effects Model | FIXED EFFECTS Logit Model | | Maximum Likelihood Estimates | | Dependent variable Z | | Weighting variable None | | Number of observations 840 | | Iterations completed 5 | | Log likelihood function | | Sample is 4 pds and 210 individuals. | | Bypassed 51 groups with inestimable a(i). | | LOGIT (Logistic) probability model | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Index function for probability GC E E TTME E E INVT E E INVC E E Partial derivatives of E[y] = F[*] with respect to the characteristics. Computed at the means of the Xs. Estimated E[y|means,mean alphai]=.501 Estimated scale factor for dE/dx=.250 GC E E TTME E E INVT E E INVC E E WHY?
Estimated Random Effects Model (1) | Logit Model for Panel Data | | Maximum Likelihood Estimates | | Dependent variable Z | | Weighting variable None | | Number of observations 840 | | Iterations completed 15 | | Log likelihood function | | Hosmer-Lemeshow chi-squared = | | P-value= with deg.fr. = 8 | | Random Effects Logit Model for Panel Data | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Characteristics in numerator of Prob[Y = 1] Constant GC E E TTME E E INVT E E INVC E E RndmEfct E E-07
Estimated Random Effects Model (2) | Random Coefficients Logit Model | | Maximum Likelihood Estimates | | Dependent variable Z | | Weighting variable None | | Number of observations 840 | | Iterations completed 14 | | Log likelihood function | | Restricted log likelihood | | Chi-squared | | Degrees of freedom 1 | | Significance level E-01 | | Sample is 4 pds and 210 individuals. | | LOGIT (Logistic) probability model | | Simulation based on 100 random draws | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Nonrandom parameters GC E E TTME E E INVT E E INVC E E Means for random parameters Constant Scale parameters for dists. of random parameters Constant E Conditional Mean at Sample Point.4886 Scale Factor for Marginal Effects.2499 GC E E TTME E E INVT E E INVC E E
Commands for Panel Data Models Model: LOGIT ; Lhs = … ; Rhs = … ; Pds = number of periods Common effect Fixed effects; FEM $ or ; Fixed $ Random; Random Effects $ Simulation; RPM ; Fcn=One(N) $ Use with Probit, Logit (and many others)