Modeling Heterogeneity in Discrete Choice: Recent Developments and Contrasts with Bayesian Estimation William Greene Department of Economics Stern School.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Dummy Dependent variable Models
MCMC estimation in MlwiN
[Part 13] 1/30 Discrete Choice Modeling Hybrid Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Brief introduction on Logistic Regression
Discrete Choice Modeling William Greene Stern School of Business New York University.
12. Random Parameters Logit Models. Random Parameters Model Allow model parameters as well as constants to be random Allow multiple observations with.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Part 24: Bayesian Estimation 24-1/35 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Error Component models Ric Scarpa Prepared for the Choice Modelling Workshop 1st and 2nd of May Brisbane Powerhouse, New Farm Brisbane.
Discrete Choice Models William Greene Stern School of Business New York University.
1/54: Topic 5.1 – Modeling Stated Preference Data Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML) and likelihood ratio (LR) test
Discrete Choice Models for Modal Split Overview. Outline n General procedure for model application n Basic assumptions in Random Utility Model n Uncertainty.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Maximum likelihood (ML)
15. Stated Preference Experiments. Panel Data Repeated Choice Situations Typically RP/SP constructions (experimental) Accommodating “panel data” Multinomial.
Discrete Choice Modeling William Greene Stern School of Business New York University.
[Part 15] 1/24 Discrete Choice Modeling Aggregate Share Data - BLP Discrete Choice Modeling William Greene Stern School of Business New York University.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Discrete Choice Models William Greene Stern School of Business New York University.
RESEARCH A systematic quest for undiscovered truth A way of thinking
Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 
Empirical Methods for Microeconomic Applications University of Lugano, Switzerland May 27-31, 2013 William Greene Department of Economics Stern School.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Econometric Analysis of Panel Data
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
1/54: Topic 5.1 – Modeling Stated Preference Data Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA.
1 Advances in the Construction of Efficient Stated Choice Experimental Designs John Rose 1 Michiel Bliemer 1,2 1 The University of Sydney, Australia 2.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Limited Dependent Variables Ciaran S. Phibbs May 30, 2012.
Discrete Choice Modeling
Part 25: Bayesian [1/57] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
Managerial Economics Demand Estimation & Forecasting.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Meeghat Habibian Analysis of Travel Choice Transportation Demand Analysis Lecture note.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Learning Simio Chapter 10 Analyzing Input Data
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Discrete Choice Modeling William Greene Stern School of Business New York University.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
[Part 15] 1/24 Discrete Choice Modeling Aggregate Share Data - BLP Discrete Choice Modeling William Greene Stern School of Business New York University.
Stochastic Frontier Models
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Discrete Choice Modeling William Greene Stern School of Business New York University.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Efficiency Measurement William Greene Stern School of Business New York University.
Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.
William Greene Stern School of Business New York University
Discrete Choice Models
A Logit model of brand choice calibrated on scanner data
Discrete Choice Modeling
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Discrete Choice Modeling
Discrete Choice Modeling
Econometrics Chengyuan Yin School of Mathematics.
Microeconometric Modeling
Microeconometric Modeling
Econometric Analysis of Panel Data
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Modeling Heterogeneity in Discrete Choice: Recent Developments and Contrasts with Bayesian Estimation William Greene Department of Economics Stern School of Business New York University

Abstract T his study examines some aspects of mixed (random parameters) logit modeling. We present some familiar results in specification and classical estimation of the random parameters model. We then describe several extensions of the mixed logit model developed in recent papers. The relationship of the mixed logit model to Bayesian treatments of the simple multinomial logit model is noted, and comparisons and contrasts of the two methods are described. The techniques described here are applied to two data sets, a stated/revealed choice survey of commuters and one simulated data set on brand choice.

Random Parameters Models of Discrete Choice  E conometric Methodology for Discrete Choice Models Classical Estimation and Inference Bayesian Methodology  M odel Building Developments The Mixed Logit Model Extensions of the Standard Model Modeling Individual Heterogeneity ‘Estimation’ of Individual Taste Parameters

Useful References  C lassical Train, K., Discrete Choice Methods with Simulation, Cambridge, (Train) Hensher, D., Rose, J., Greene, W., Applied Choice Analysis, Cambridge, Hensher, D., Greene, misc. papers, ,  B ayesian Allenby, G., Lenk, P., “Modeling Household Purchase Behavior with Logistic Normal Regression,” JASA, Allenby, G., Rossi, P., “Marketing Models of Consumer Heterogeneity,” Journal of Econometrics, (A&R) Yang, S., Allenby, G., “A Model for Observation, Structural, and Household Heterogeneity in Panel Data,” Marketing Letters, 2000.

A Random Utility Model Random Utility Model for Discrete Choice Among J alternatives at time t by person i. U itj =  j +  ′ x itj +  ijt  j = Choice specific constant x itj = Attributes of choice presented to person (Information processing strategy. Not all attributes will be evaluated. E.g., lexicographic utility functions over certain attributes.)  = ‘Taste weights,’ ‘Part worths,’ marginal utilities  ijt = Unobserved random component of utility Mean=E[  ijt] = 0; Variance=Var[  ijt] =  2

The Multinomial Logit Model Independent type 1 extreme value (Gumbel): F(  itj ) = 1 – Exp(-Exp(  itj )) Independence across utility functions Identical variances,  2 = π 2 /6 Same taste parameters for all individuals

What’s Wrong with this MNL Model?  I.I.D.  IIA (Independence from irrelevant alternatives) Peculiar behavioral assumption Leads to skewed, implausible empirical results Functional forms, e.g., nested logit, avoid IIA IIA will be a nonissue in what follows.  I nsufficiently heterogeneous: “… economists are often more interested in aggregate effects and regard heterogeneity as a statistical nuisance parameter problem which must be addressed but not emphasized. Econometricians frequently employ methods which do not allow for the estimation of individual level parameters.” (A&R, 1999)

Heterogeneity  O bservational: Observable differences across choice makers  C hoice strategy: How consumers make decisions  S tructure: Model frameworks  P references: Model ‘parameters’

Accommodating Heterogeneity  O bserved? Enter in the model in familiar (and unfamiliar) ways.  U nobserved? The purpose of this study.

Observable (Quantifiable) Heterogeneity in Utility Levels Choice, e.g., among brands of cars x itj = attributes: price, features Z it = observable characteristics: age, sex, income

Observable Heterogeneity in Preference Weights

‘Quantifiable’ Heterogeneity in Scaling w it = observable characteristics: age, sex, income, etc.

Attention to Heterogeneity  M odeling heterogeneity is important  S caling is extremely important  A ttention to heterogeneity – an informal survey of four literatures LevelsScaling Economics ● None Education ● None Marketing ● Jordan Louviere Transport ●●

Heterogeneity in Choice Strategy  C onsumers avoid ‘complexity’ Lexicographic preferences eliminate certain choices  choice set may be endogenously determined Simplification strategies may eliminate certain attributes  I nformation processing strategy is a source of heterogeneity in the model.

Modeling Attribute Choice  C onventional: U ijt =  ′x ijt. For ignored attributes, set x k,ijt =0. Eliminates x k,ijt from utility function Price = 0 is not a reasonable datum. Distorts choice probabilities  A ppropriate: Formally set  k = 0 Requires a ‘person specific’ model Accommodate as part of model estimation (Work in progress) Stochastic determination of attribution choices

Choice Strategy Heterogeneity  M ethodologically, a rather minor point – construct appropriate likelihood given known information  N ot a latent class model. Classes are not latent.  N ot the ‘variable selection’ issue (the worst form of “stepwise” modeling)  F amiliar strategy gives the wrong answer.

Application of Information Strategy  S tated/Revealed preference study, Sydney car commuters surveyed, about 10 choice situations for each.  E xisting route vs. 3 proposed alternatives.  A ttribute design Original: respondents presented with 3, 4, 5, or 6 attributes Attributes – four level design.  Free flow time  Slowed down time  Stop/start time  Trip time variability  Toll cost  Running cost Final: respondents use only some attributes and indicate when surveyed which ones they ignored

Estimation Results

“Structural Heterogeneity”  Marketing literature  Latent class structures Yang/Allenby - latent class random parameters models Kamkura et al – latent class nested logit models with fixed parameters

Latent Classes and Random Parameters

Latent Class Probabilities  A mbiguous at face value – Classical Bayesian model?  E quivalent to random parameters models with discrete parameter variation Using nested logits, etc. does not change this Precisely analogous to continuous ‘random parameter’ models  N ot always equivalent – zero inflation models

Unobserved Preference Heterogeneity  W hat is it?  H ow does it enter the model? Random ‘Effects’?Random Parameters?

S tochastic Frontier Models with Random Coefficients. M. Tsionas, Journal of Applied Econometrics, 17, 2, B ayesian analysis of a production model W hat do we (does he) mean by “random?”

What Do We Mean by Random Parameters?  C lassical Distribution across individuals Model of heterogeneity across individuals Characterization of the population “Superpopulation?” (A&R)  B ayesian Parameter Uncertainty? (A&R) Whose? Distribution defined by a ‘prior?’ Whose prior? Is it unique? Is one ‘right?’ Definitely NOT heterogeneity. That is handled by individual specific ‘random’ parameters in a hierarchical model.

Continuous Random Variation in Preference Weights

What Do We ‘Estimate?’ C lassical f(  i |Ω,Z i )=population ‘Estimate’ Ω, then E[  i |Ω,Z i )=cond’l mean V[  i |Ω,Z i )=cond’l var. Estimation Paradigm “Asymptotic (normal)” “Approximate” “Imaginary samples” B ayesian f(  |Ω 0 )=prior L(data|  )=Likelihood f(  |data, Ω 0 )=Posterior E(  |data, Ω 0 )=Posterior Mean V(  |data, Ω 0 )=Posterior Var. Estimation Paradigm “Exact” “More accurate” (“Not general beyond this prior and this sample…”)

How Do We ‘Estimate It?’  O bjective Bayesian: Posterior means Classical: Conditional Means  M echanics: Simulation based estimator Bayesian: Random sampling from the posterior distribution. Estimate the mean of a distribution. Always easy. Classical: Maximum simulated likelihood. Find the maximum of a function. Sometimes very difficult.  T hese will look suspiciously similar.

A Practical Model Selection Strategy What self contained device is available to suggest that the analyst is fitting the wrong model to the data?  Classical: The iterations fail to converge. The optimization otherwise breaks down. The model doesn’t ‘work.’  Bayesian? E.g., Yang/Allenby Structural/Preference Heterogeneity has both discrete and continuous variation in the same model. Is this identified? How would you know? The MCMC approach is too easy. It always works.

Bayesian Estimation Platform: The Posterior (to the data) Density

The Estimator is the Posterior Mean Simulation based (MCMC) estimation: Empirically, This is not ‘exact.’ It is the mean of a random sample.

Classical Estimation Platform: The Likelihood Expected value over all possible realizations of  i (according to the estimated asymptotic distribution). I.e., over all possible samples.

Maximum Simulated Likelihood T rue log likelihood S imulated log likelihood

Individual Parameters

Estimating  i “… In contrast, classical approaches to modeling heterogeneity yield only aggregate summaries of heterogeneity and do not provide actionable information about specific groups. The classical approach is therefore of limited value to marketers.” (A&R p. 72)

A Bayesian View: Not All Possible Samples, Just This Sample L ooks like the posterior mean

THE Random Parameters Logit Model

Heteroscedastic Logit Kernels Produces a stochastic underpinning for nested logit added to the full random parameters model. More general. Allows nests to overlap. Much easier to estimate than nested logit

Conditional Estimators

Simulation Based Estimation  B ayesian: Limited to convenient priors (normal, inverse gamma and Wishart) that produce mathematically tractable posteriors. Largely simple RPM’s without heterogeneity.  C lassical: Use any distributions for any parts of the heterogeneity that can be simulated. Rich layered model specifications. Comparable to Bayesian (Normal) Constrain parameters to be positive. (Triangular, Lognormal) Limit ranges of parameters. (Uniform, Triangular) Produce particular shapes of distributions such as small tails. (Beta, Weibull, Johnson S B ) Heteroscedasticity and scaling heterogeneity Nesting and multilayered correlation structures

Computational Difficulty? “ O utside of normal linear models with normal random coefficient distributions, performing the integral can be computationally challenging.” (A&R, p. 62) (No longer even remotely true) (1) MSL with dozens of parameters is simple (2) M ultivariate normal (multinomial probit) is no longer the benchmark alternative. (See McFadden and Train) (3) I ntelligent methods of integration (Halton sequences) speed up integration by factors of as much as 10. (These could be used by Bayesians.)

Individual Estimates  B ayesian… “exact…” what do we mean by “the exact posterior”  C lassical… “asymptotic…”  T hese will be very similar. Counterpoint is not a crippled LCM or MNP. Same model, similar values.  A theorem of Bernstein-von Mises: Bayesian > Classical as N   (The likelihood function dominates; posterior mean  mode of the likelihood; the more so as we are able to specify flat priors.)

Application Shoe Brand Choice  S imulated Data: Stated Choice, 400 respondents, 8 choice situations  3 choice/attributes + NONE Fashion = High / Low Quality = High / Low Price = 25/50/75,100 coded 1,2,3,4  H eterogeneity: Sex, Age (<25, 25-39, 40+)  U nderlying data generated by a 3 class latent class process (100, 200, 100 in classes)  T hanks to (Latent Gold and Jordan Louviere)

A Discrete (4 Brand) Choice Model

Random Parameters Model

Heterogeneous (in the Means) Random Parameters Model

Heterogeneity in Both Means and Variances

Individual (Kernel) Effects Model

Multinomial Logit Model Estimates

Individual E[  i |data i ] Estimates* The intervals could be made wider to account for the sampling variability of the underlying (classical) parameter estimators.

Extending the RP Model to WTP  U se the model to estimate conditional distributions for any function of parameters  W illingness to pay =  i,time /  i,cost  U se same method

What is the ‘Individual Estimate?’  P oint estimate of mean, variance and range of random variable  i | data i.  V alue is NOT an estimate of  i ; it is an estimate of E[  i | data i ]  W hat would be the best estimate of the actual realization  i |data i ?  A n interval estimate would account for the sampling ‘variation’ in the estimator of Ω that enters the computation.  B ayesian counterpart to the preceding? Posterior mean and variance? Same kind of plot could be done.

Methodological Differences Focal point of the discussion in the literature is the simplest possible MNL with random coefficients, This is far from adequate to capture the forms of heterogeneity discussed here. Many of the models discussed here are inconvenient or impossible with received Bayesian methods.

Standard Criticisms  O f the Classical Approach Computationally difficult (ML vs. MCMC) No attention is paid to household level parameters. There is no natural estimator of individual or household level parameters Responses: See the preceding and Train (2003, ch. 10)  O f Classical Inference in this Setting Asymptotics are “only approximate” and rely on “imaginary samples.” Bayesian procedures are “exact.” Response: The inexactness results from acknowledging that we try to extend these results outside the sample. The Bayesian results are “exact” but have no generality and are useless except for this sample, these data and this prior. (Or are they? Trying to extend them outside the sample is a distinctly classical exercise.)

Standard Criticisms  O f the Bayesian Approach Computationally difficult. Response: Not really, with MCMC and Metropolis-Hastings The prior (conjugate or not) is a canard. It has nothing to do with “prior knowledge” or the uncertainty of the investigator. Response: In fact, the prior usually has little influence on the results. (Bernstein and von Mises Theorem)  O f Bayesian ‘Inference ’ It is not statistical inference How do we discern any uncertainty in the results? This is precisely the underpinning of the Bayesian method. There is no uncertainty. It is ‘exact.’

A Preconclusion “ T he advantage of hierarchical Bayes models of heterogeneity is that they yield disaggregate estimates of model parameters. These estimates are of particular interest to marketers pursuing product differentiation strategies in which products are designed and offered to specific groups of individuals with specific needs. In contrast, classical approaches to modeling heterogeneity yield only aggregate summaries of heterogeneity and do not provide actionable information about specific groups. The classical approach is therefore of limited value to marketers.” (A&R p. 72)

Disaggregated Parameters  T he description of classical methods as only producing aggregate results is obviously untrue.  A s regards “targeting specific groups…” both of these sets of methods produce estimates for the specific data in hand. Unless we want to trot out the specific individuals in this sample to do the analysis and marketing, any extension is problematic. This should be understood in both paradigms.  N EITHER METHOD PRODUCES ESTIMATES OF INDIVIDUAL PARAMETERS, CLAIMS TO THE CONTRARY NOTWITHSTANDING. BOTH PRODUCE ESTIMATES OF THE MEAN OF THE CONDITIONAL (POSTERIOR) DISTRIBUTION OF POSSIBLE PARAMETER DRAWS CONDITIONED ON THE PRECISE SPECIFIC DATA FOR INDIVIDUAL I.

Conclusions  W hen estimates of the same model are compared, they rarely differ by enough to matter. See Train, Chapter 12 for a nice illustration  C lassical methods shown here provide rich model specifications and do admit ‘individual’ estimates. Have yet to be emulated by Bayesian methods  J ust two different algorithms. The philosophical differences in interpretation is a red herring. Appears that each has some advantages and disadvantages