Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discrete Choice Modeling

Similar presentations


Presentation on theme: "Discrete Choice Modeling"— Presentation transcript:

1 Discrete Choice Modeling
William Greene Stern School of Business New York University

2 Part 8 Models for Count Data

3 Application: Major Derogatory Reports
AmEx Credit Card Holders N = 1310 (of 13,777) Number of major derogatory reports in 1 year Issues: Nonrandom selection Excess zeros

4 Histogram for Credit Data
Histogram for MAJORDRG NOBS= 1310, Too low: 0, Too high: Bin Lower limit Upper limit Frequency Cumulative Frequency ======================================================================== ( .8038) ( .8038) ( .1038) ( .9076) ( .0382) ( .9458) ( .0183) ( .9641) ( .0130) ( .9771) ( .0076) ( .9847) ( .0038) ( .9885) ( .0046) ( .9931) ( .0000) ( .9931) ( .0015) ( .9947) ( .0008) ( .9954) ( .0031) ( .9985) ( .0008) ( .9992) ( .0000) ( .9992) ( .0008) (1.0000)

5 Doctor Visits

6 Basic Modeling for Counts of Events
E.g., Visits to site, number of purchases, number of doctor visits Regression approach Quantitative outcome measured Discrete variable, model probabilities Poisson probabilities – “loglinear model”

7 Poisson Model for Doctor Visits
Poisson Regression Dependent variable DOCVIS Log likelihood function Restricted log likelihood Chi squared [ 6 d.f.] Significance level McFadden Pseudo R-squared Estimation based on N = , K = 7 Information Criteria: Normalization=1/N Normalized Unnormalized AIC Chi- squared = RsqP= .0818 G - squared = RsqD= .0601 Overdispersion tests: g=mu(i) : Overdispersion tests: g=mu(i)^2: Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| ***

8 Partial Effects Partial derivatives of expected val. with respect to the vector of characteristics. Effects are averaged over individuals. Observations used for means are All Obs. Conditional Mean at Sample Point Scale Factor for Marginal Effects Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| ***

9 Poisson Model Specification Issues
Equi-Dispersion: Var[yi|xi] = E[yi|xi]. Overdispersion: If i = exp[’xi + εi], E[yi|xi] = γexp[’xi] Var[yi] > E[yi] (overdispersed) εi ~ log-Gamma  Negative binomial model εi ~ Normal[0,2]  Normal-mixture model εi is viewed as unobserved heterogeneity (“frailty”). Normal model may be more natural. Estimation is a bit more complicated.

10 Poisson Model for Doctor Visits
Poisson Regression Dependent variable DOCVIS Log likelihood function Restricted log likelihood Chi squared [ 6 d.f.] Significance level McFadden Pseudo R-squared Estimation based on N = , K = 7 Information Criteria: Normalization=1/N Normalized Unnormalized AIC Chi- squared = RsqP= .0818 G - squared = RsqD= .0601 Overdispersion tests: g=mu(i) : Overdispersion tests: g=mu(i)^2: Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| ***

11 Alternative Covariance Matrices
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X | Standard – Negative Inverse of Second Derivatives Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| *** | Robust – Sandwich Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| *** | Cluster Correction Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| ***

12 Negative Binomial Specification
Prob(Yi=j|xi) has greater mass to the right and left of the mean Conditional mean function is the same as the Poisson: E[yi|xi] = λi=Exp(’xi), so marginal effects have the same form. Variance is Var[yi|xi] = λi(1 + α λi), α is the overdispersion parameter; α = 0 reverts to the Poisson. Poisson is consistent when NegBin is appropriate. Therefore, this is a case for the ROBUST covariance matrix estimator. (Neglected heterogeneity that is uncorrelated with xi.)

13 NegBin Model for Doctor Visits
Negative Binomial Regression Dependent variable DOCVIS Log likelihood function NegBin LogL Restricted log likelihood Poisson LogL Chi squared [ 1 d.f.] Reject Poisson model Significance level McFadden Pseudo R-squared Estimation based on N = , K = 8 Information Criteria: Normalization=1/N Normalized Unnormalized AIC NegBin form 2; Psi(i) = theta Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| *** |Dispersion parameter for count data model Alpha| ***

14 Marginal Effects Scale Factor for Marginal Effects POISSON Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| *** Scale Factor for Marginal Effects NEGATIVE BINOMIAL AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| ***

15 Model Formulations E[yi |xi ]=λi

16 NegBin-1 Model Negative Binomial Regression Dependent variable DOCVIS Log likelihood function Restricted log likelihood NegBin form 1; Psi(i) = theta*exp[bx(i)] Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| ** HHNINC| *** HHKIDS| *** |Dispersion parameter for count data model Alpha| ***

17 NegBin-P Model Negative Binomial (P) Model Dependent variable DOCVIS Log likelihood function Restricted log likelihood Chi squared [ 1 d.f.] Variable| Coefficient Standard Error b/St.Er. Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| * HHNINC| *** HHKIDS| *** |Dispersion parameter for count data model Alpha| *** |Negative Binomial. General form, NegBin P P| *** NB NB Poisson

18 Marginal Effects for Different Models
Scale Factor for Marginal Effects POISSON Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| *** Scale Factor for Marginal Effects NEGATIVE BINOMIAL - 2 AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| *** Scale Factor for Marginal Effects NEGATIVE BINOMIAL - 1 AGE| *** EDUC| *** FEMALE| *** MARRIED| ** HHNINC| *** HHKIDS| *** Scale Factor for Marginal Effects NEGATIVE BINOMIAL - P AGE| *** EDUC| *** FEMALE| *** MARRIED| * HHNINC| *** HHKIDS| *** 18

19 Zero Inflation – ZIP Models
Two regimes: (Recreation site visits) Zero (with probability 1). (Never visit site) Poisson with Pr(0) = exp[- ’xi]. (Number of visits, including zero visits this season.) Unconditional: Pr[0] = P(regime 0) + P(regime 1)*Pr[0|regime 1] Pr[j | j >0] = P(regime 1)*Pr[j|regime 1] “Two inflation” – Number of children These are “latent class models”

20 Zero Inflation Models

21 Notes on Zero Inflation Models
Poisson is not nested in ZIP. tau = 0 in ZIP(tau) or γ = 0 in ZIP does not produce Poisson; it produces ZIP with P(regime 0) = ½. Standard tests are not appropriate Use Vuong statistic. ZIP model almost always wins. Zero Inflation models extend to NB models – ZINB(tau) and ZINB are standard models Creates two sources of overdispersion Generally difficult to estimate

22 ZIP(τ) Model Zero Altered Poisson Regression Model Logistic distribution used for splitting model. ZAP term in probability is F[tau x ln LAMBDA] Comparison of estimated models Pr[0|means] Number of zeros Log-likelihood Poisson Act.= Prd.= Z.I.Poisson Act.= Prd.= Note, the ZIP log-likelihood is not directly comparable. ZIP model with nonzero Q does not encompass the others. Vuong statistic for testing ZIP vs. unaltered model is Distributed as standard normal. A value greater than +1.96 favors the zero altered Z.I.Poisson model. A value less than rejects the ZIP model. Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Poisson/NB/Gamma regression model Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| *** HHNINC| *** HHKIDS| *** |Zero inflation model Tau| ***

23 ZIP Model Zero Altered Poisson Regression Model Logistic distribution used for splitting model. ZAP term in probability is F[tau x Z(i) ] Comparison of estimated models Pr[0|means] Number of zeros Log-likelihood Poisson Act.= Prd.= Z.I.Poisson Act.= Prd.= Vuong statistic for testing ZIP vs. unaltered model is Distributed as standard normal. A value greater than +1.96 favors the zero altered Z.I.Poisson model. A value less than rejects the ZIP model. Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Poisson/NB/Gamma regression model Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| *** HHNINC| *** HHKIDS| *** |Zero inflation model Constant| *** FEMALE| *** EDUC| ***

24 Marginal Effects for Different Models
Scale Factor for Marginal Effects POISSON Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| *** Scale Factor for Marginal Effects NEGATIVE BINOMIAL - 2 AGE| *** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| *** Scale Factor for Marginal Effects ZERO INFLATED POISSON AGE| *** EDUC| *** FEMALE| *** MARRIED| *** HHNINC| *** HHKIDS| *** 24

25 A Hurdle Model Two part model: Applications common in health economics
Model 1: Probability model for more than zero occurrences Model 2: Model for number of occurrences given that the number is greater than zero. Applications common in health economics Usage of health care facilities Use of drugs, alcohol, etc.

26 Hurdle Model

27 Hurdle Model for Doctor Visits
Poisson hurdle model for counts Dependent variable DOCVIS Log likelihood function Restricted log likelihood Chi squared [ 1 d.f.] Significance level McFadden Pseudo R-squared Estimation based on N = , K = 10 LOGIT hurdle equation Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Parameters of count model equation Constant| *** AGE| *** EDUC| *** FEMALE| *** MARRIED| *** HHNINC| *** HHKIDS| *** |Parameters of binary hurdle equation Constant| *** FEMALE| *** EDUC| ***

28 Partial Effects Partial derivatives of expected val. with respect to the vector of characteristics. Effects are averaged over individuals. Observations used for means are All Obs. Conditional Mean at Sample Point Scale Factor for Marginal Effects Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Effects in Count Model Equation Constant| AGE| EDUC| FEMALE| MARRIED| HHNINC| HHKIDS| |Effects in Binary Hurdle Equation Constant| *** FEMALE| *** EDUC| *** |Combined effect is the sum of the two parts Constant| * EDUC| *** FEMALE| ***

29 Panel Data Models Heterogeneity; λit = exp(β’xit + ci)
Fixed Effects Poisson: Standard, no incidental parameters issue NB Hausman, Hall, Griliches (1984) put FE in variance, not the mean Use “brute force” to get a conventional FE model Random Effects Poisson Log-gamma heterogeneity becomes an NB model Contemporary treatments are using normal heterogeneity with simulation or quadrature based estimators NB with random effects is equivalent to two “effects” one time varying one time invariant. The model is probably overspecified Random Parameters: Mixed models, latent class models, hiererchical – all extended to Poisson and NB

30 Random Parameters Model
Random Coefficients Poisson Model Dependent variable DOCVIS Log likelihood function Restricted log likelihood Chi squared [ 12 d.f.] Significance level McFadden Pseudo R-squared Estimation based on N = , K = 16 Unbalanced panel has individuals POISSON regression model Simulation based on Halton draws Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Means for random parameters Constant| *** EDUC| *** MARRIED| *** HHNINC| |Scale parameters for dists. of random parameters Constant| *** EDUC| *** MARRIED| *** HHNINC| *** |Heterogeneity in the means of random parameters cONE_AGE| *** cONE_FEM| *** cEDU_AGE| *** cEDU_FEM| *** cMAR_AGE| *** cMAR_FEM| *** cHHN_AGE| *** cHHN_FEM| ***


Download ppt "Discrete Choice Modeling"

Similar presentations


Ads by Google