Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland.

Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

 16.1 Models with Binary Dependent Variables  16.2 The Logit Model for Binary Choice  16.3 Multinomial Logit  16.4 Conditional Logit  16.5 Ordered Choice Models  16.6 Models for Count Data  16.7 Limited Dependent Variables

 Examples:  An economic model explaining why some individuals take a second, or third, job and engage in “moonlighting.”  An economic model of why the federal government awards development grants to some large cities and not others.  An economic model explaining why someone is in the labour force or not

 An economic model explaining why some loan applications are accepted and others not at a large metropolitan bank.  An economic model explaining why some individuals vote “yes” for increased spending in a school board election and others vote “no.”  An economic model explaining why some female college students decide to study engineering and others do not.

If the probability that an individual drives to work is p, then It follows that the probability that a person uses public transportation is. As long as these exhaust the possible (mutually exclusive) options

One problem with the linear probability model is that the error term is heteroskedastic; the variance of the error term e varies from one observation to another. y valuee valueProbability 1 0

Using generalized least squares, the estimated variance is: So the problem of heteroskedasticity is not insurmountable…

Problems:  We can easily obtain values of that are less than 0 or greater than 1  Some of the estimated variances in (16.6) may be negative, so the WLS would not work  Of course, the errors are not distributed normally  R2 is usually very poor and a questionable guide for goodness of fit

Figure 16.1 (a) Standard normal cumulative distribution function (b) Standard normal probability density function

where and is the standard normal probability density function evaluated at Note that this is clearly a nonlinear model: the marginal effect varies depending on where you measure it cumulativedensity

Equation (16.11) has the following implications: 1. Since is a probability density function its value is always positive. Consequently the sign of dp/dx is determined by the sign of  2. In the transportation problem we expect  2 to be positive so that dp/dx > 0; as x increases we expect p to increase.

2. As x changes the value of the function Φ(β 1 + β 2 x) changes. The standard normal probability density function reaches its maximum when z = 0, or when β 1 + β 2 x = 0. In this case p = Φ(0) =.5 and an individual is equally likely to choose car or bus transportation. The slope of the probit function p = Φ(z) is at its maximum when z = 0, the borderline case.

3. On the other hand, if β 1 + β 2 x is large, say near 3, then the probability that the individual chooses to drive is very large and close to 1. In this case a change in x will have relatively little effect since Φ(β 1 + β 2 x) will be nearly 0. The same is true if β 1 + β 2 x is a large negative value, say near  3. These results are consistent with the notion that if an individual is “set” in their ways, with p near 0 or 1, the effect of a small change in commuting time will be negligible.

Predicting the probability that an individual chooses the alternative y = 1: Although you have to be careful with this Interpretation!

Suppose that y 1 = 1, y 2 = 1 and y 3 = 0. Suppose that the values of x, in minutes, are x 1 = 15, x 2 = 20 and x 3 = 5.

In large samples the maximum likelihood estimator is normally distributed, consistent and best, in the sense that no competing estimator has smaller variance.

Marginal effect of DT Measured at DTIME = 20

If it takes someone 30 minutes longer to take public transportation than to drive to work, the estimated probability that auto transportation will be selected is Since this estimated probability is 0.798, which is greater than 0.5, we may want to “predict” that when public transportation takes 30 minutes longer than driving to work, the individual will choose to drive. But again use this cautiously!

In STATA: Use transport.dta

Linear fit??? 

Understand but do not use this one!!! You can choose p-values What is the meaning of this test? NORMAL distribution Not t distribution, because the properties of the probit are asymptotic

Principles of Econometrics, 3rd Edition26 Evaluates at the means by default too

You can request these iterations in GRETL too What yields cnorm(-0.0597171)???

Principles of Econometrics, 3rd Edition This is a probability

IN STATA * marginal effects mfx mfx,at (dtime=20) * direct calculation nlcom (normalden(_b[_cons]+_b[dtime]*30)*_b[dtime] ) and nlcom (normal(_b[_cons]+_b[dtime]*30) )

So the “logit”, the log-odds, is actually a fully linear function of X

1. As Probability goes from 0 to 1, logit goes from –infinite to + infinite 2. The logit is linear, but the probability is not 3. The explanatory variables are individual specific, but do not change across alternatives 4. The slope coefficient tells us by how much the log-odds changes with a unit change in the variable

1. This model can be in principle estimated with WLS (due to the heteroskedasticity in the error term) if we have grouped data (glogit in STATA, while blogit will run ML logit on grouped data) IN GRETL If you want to use logit for analysis of proportions (where the dependent variable is the proportion of cases having a certain characteristic, at each observation, rather than a 1 or 0 variable indicating whether the characteristic is present or not) you should not use the logit command, but rather construct the logit variable, as in genr lgt_p = log(p/(1 - p)) 2. Otherwise we use MLE on individual data

Goodness of fit  McFadden’s pseudo R2 ( remember that it does not have any natural interpretation for values between 0 and 1 )  Count R2 (% of correct predictions) ( dodgy but common! )  Etc.  Measures of goodness of fit are of secondary importance  What counts is the sign of the regression coefficients and their statistical and practical significance

Goodness of fit  Using MLE  A large sample method  => estimated errors are asymptotic  => we use Z test statistics (based on the normal distribution), instead of t statistics  A likelihood ratio test (with a test statistic distributed as chi-square with df= number of regressors) is equivalent to the F test

Goodness of fit: example ho See http://www.soziologie.uni-halle.de/langer/logitreg/books/long/stbfitstat.pdf How do you obtain this?

Goodness of fit: example So in STATA The “ones” do not Really have to be Actual ones, just Non-zeros IN GRETL if you do not have a binary Dependent variable It is assumed Ordered unless specified multinomial. If not discrete: error! But be very careful with these measures!

More diagnostics (STATA only)  To compute the deviance of the residuals: predict “newname”, deviance  The deviance for a logit model is like the RSS in OLS. The smaller the deviance the better the fit.  And (Logit only) to combine with information about leverage: predict “newnamedelta”, ddeviance  (A recommended cut-off value for the ddeviance is 4)

More diagnostics

Probit versus Logit Why does rule of thumb not work for dtime 

Probit versus Logit  A matter of taste nowadays, since we all have good computers  The underlying distributions share the mean of zero but have different variances:  Logit  And normal 1  So estimated slope coefficients differ by a factor of about 1.8 ( ). Logit ones are bigger

More on Probit versus Logit  Watch out for “perfect predictions”  Luckily STATA will flag them for you and drop the culprit observations  Gretl has a mechanism for preventing the algorithm from iterating endlessly in search of a nonexistent maximum. One sub-case of interest is when the perfect prediction problem arises because of a single binary explanatory variable. In this case, the offending variable is dropped from the model and estimation proceeds with the reduced specification.

More on Probit versus Logit  However, it may happen that no single “perfect classifier” exists among the regressors, in which case estimation is simply impossible and the algorithm stops with an error.  If this happens, unless your model is trivially mis- specified (like predicting if a country is an oil exporter on the basis of oil revenues), it is normally a small-sample problem: you probably just don’t have enough data to estimate your model. You may want to drop some of your explanatory variables.

More on Probit versus Logit  Learn about the test (Wald tests based on chi- 2) and lrtest commands (LR tests), so you can test hypotheses as we did with t-tests and F tests in OLS  They are asymptotically equivalent but can differ in small samples

More on Probit versus Logit  Learn about the many extra STATA capabilities, if you use it, that will make your postestimation life much easier  Long and Freese’s book is a great resource  GRETL is more limited but doing things by hand for now will actually be a good thing!

For example

More on Probit versus Logit  Stata users? Go through a couple of examples available online with your own STATA session connected to the internet. Examples:  http://www.ats.ucla.edu/stat/stata/dae/probit.htm  http://www.ats.ucla.edu/stat/stata/dae/logit.htm http://www.ats.ucla.edu/stat/stata/dae/logit.htm  http://www.ats.ucla.edu/stat/stata/output/old/lognoframe.htm http://www.ats.ucla.edu/stat/stata/output/old/lognoframe.htm  http://www.ats.ucla.edu/stat/stata/output/stata_logistic.htm

50 Principles of Econometrics, 3rd Edition  binary choice models  censored data  conditional logit  count data models  feasible generalized least squares  Heckit  identification problem  independence of irrelevant alternatives (IIA)  index models  individual and alternative specific variables  individual specific variables  latent variables  likelihood function  limited dependent variables  linear probability model  logistic random variable  logit  log-likelihood function  marginal effect  maximum likelihood estimation  multinomial choice models  multinomial logit  odds ratio  ordered choice models  ordered probit  ordinal variables  Poisson random variable  Poisson regression model  probit  selection bias  tobit model  truncated data

References  Long, S. and J. Freese for all topics (available on Google!)

Next  Multinomial Logit  Conditional Logit

Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland.

Similar presentations

Presentation on theme: "Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland.

Similar presentations

Presentation on theme: "Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland."— Presentation transcript:

Similar presentations

About project

Feedback