Download presentation
Presentation is loading. Please wait.
Published byDale Stevens Modified over 9 years ago
1
Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland
2
16.1 Models with Binary Dependent Variables 16.2 The Logit Model for Binary Choice 16.3 Multinomial Logit 16.4 Conditional Logit 16.5 Ordered Choice Models 16.6 Models for Count Data 16.7 Limited Dependent Variables
3
Examples: An economic model explaining why some individuals take a second, or third, job and engage in “moonlighting.” An economic model of why the federal government awards development grants to some large cities and not others. An economic model explaining why someone is in the labour force or not
4
An economic model explaining why some loan applications are accepted and others not at a large metropolitan bank. An economic model explaining why some individuals vote “yes” for increased spending in a school board election and others vote “no.” An economic model explaining why some female college students decide to study engineering and others do not.
5
If the probability that an individual drives to work is p, then It follows that the probability that a person uses public transportation is. As long as these exhaust the possible (mutually exclusive) options
7
One problem with the linear probability model is that the error term is heteroskedastic; the variance of the error term e varies from one observation to another. y valuee valueProbability 1 0
8
Using generalized least squares, the estimated variance is: So the problem of heteroskedasticity is not insurmountable…
10
Problems: We can easily obtain values of that are less than 0 or greater than 1 Some of the estimated variances in (16.6) may be negative, so the WLS would not work Of course, the errors are not distributed normally R2 is usually very poor and a questionable guide for goodness of fit
11
Figure 16.1 (a) Standard normal cumulative distribution function (b) Standard normal probability density function
13
where and is the standard normal probability density function evaluated at Note that this is clearly a nonlinear model: the marginal effect varies depending on where you measure it cumulativedensity
14
Equation (16.11) has the following implications: 1. Since is a probability density function its value is always positive. Consequently the sign of dp/dx is determined by the sign of 2. In the transportation problem we expect 2 to be positive so that dp/dx > 0; as x increases we expect p to increase.
15
2. As x changes the value of the function Φ(β 1 + β 2 x) changes. The standard normal probability density function reaches its maximum when z = 0, or when β 1 + β 2 x = 0. In this case p = Φ(0) =.5 and an individual is equally likely to choose car or bus transportation. The slope of the probit function p = Φ(z) is at its maximum when z = 0, the borderline case.
16
3. On the other hand, if β 1 + β 2 x is large, say near 3, then the probability that the individual chooses to drive is very large and close to 1. In this case a change in x will have relatively little effect since Φ(β 1 + β 2 x) will be nearly 0. The same is true if β 1 + β 2 x is a large negative value, say near 3. These results are consistent with the notion that if an individual is “set” in their ways, with p near 0 or 1, the effect of a small change in commuting time will be negligible.
17
Predicting the probability that an individual chooses the alternative y = 1: Although you have to be careful with this Interpretation!
18
Suppose that y 1 = 1, y 2 = 1 and y 3 = 0. Suppose that the values of x, in minutes, are x 1 = 15, x 2 = 20 and x 3 = 5.
19
In large samples the maximum likelihood estimator is normally distributed, consistent and best, in the sense that no competing estimator has smaller variance.
21
Marginal effect of DT Measured at DTIME = 20
22
If it takes someone 30 minutes longer to take public transportation than to drive to work, the estimated probability that auto transportation will be selected is Since this estimated probability is 0.798, which is greater than 0.5, we may want to “predict” that when public transportation takes 30 minutes longer than driving to work, the individual will choose to drive. But again use this cautiously!
23
In STATA: Use transport.dta
24
Linear fit???
25
Understand but do not use this one!!! You can choose p-values What is the meaning of this test? NORMAL distribution Not t distribution, because the properties of the probit are asymptotic
26
Principles of Econometrics, 3rd Edition26 Evaluates at the means by default too
27
You can request these iterations in GRETL too What yields cnorm(-0.0597171)???
28
Principles of Econometrics, 3rd Edition This is a probability
29
IN STATA * marginal effects mfx mfx,at (dtime=20) * direct calculation nlcom (normalden(_b[_cons]+_b[dtime]*30)*_b[dtime] ) and nlcom (normal(_b[_cons]+_b[dtime]*30) )
31
so
32
So the “logit”, the log-odds, is actually a fully linear function of X
33
1. As Probability goes from 0 to 1, logit goes from –infinite to + infinite 2. The logit is linear, but the probability is not 3. The explanatory variables are individual specific, but do not change across alternatives 4. The slope coefficient tells us by how much the log-odds changes with a unit change in the variable
34
1. This model can be in principle estimated with WLS (due to the heteroskedasticity in the error term) if we have grouped data (glogit in STATA, while blogit will run ML logit on grouped data) IN GRETL If you want to use logit for analysis of proportions (where the dependent variable is the proportion of cases having a certain characteristic, at each observation, rather than a 1 or 0 variable indicating whether the characteristic is present or not) you should not use the logit command, but rather construct the logit variable, as in genr lgt_p = log(p/(1 - p)) 2. Otherwise we use MLE on individual data
35
Goodness of fit McFadden’s pseudo R2 ( remember that it does not have any natural interpretation for values between 0 and 1 ) Count R2 (% of correct predictions) ( dodgy but common! ) Etc. Measures of goodness of fit are of secondary importance What counts is the sign of the regression coefficients and their statistical and practical significance
36
Goodness of fit Using MLE A large sample method => estimated errors are asymptotic => we use Z test statistics (based on the normal distribution), instead of t statistics A likelihood ratio test (with a test statistic distributed as chi-square with df= number of regressors) is equivalent to the F test
37
Goodness of fit: example ho See http://www.soziologie.uni-halle.de/langer/logitreg/books/long/stbfitstat.pdf How do you obtain this?
38
Goodness of fit: example So in STATA The “ones” do not Really have to be Actual ones, just Non-zeros IN GRETL if you do not have a binary Dependent variable It is assumed Ordered unless specified multinomial. If not discrete: error! But be very careful with these measures!
39
More diagnostics (STATA only) To compute the deviance of the residuals: predict “newname”, deviance The deviance for a logit model is like the RSS in OLS. The smaller the deviance the better the fit. And (Logit only) to combine with information about leverage: predict “newnamedelta”, ddeviance (A recommended cut-off value for the ddeviance is 4)
40
More diagnostics
41
Probit versus Logit Why does rule of thumb not work for dtime
42
Probit versus Logit A matter of taste nowadays, since we all have good computers The underlying distributions share the mean of zero but have different variances: Logit And normal 1 So estimated slope coefficients differ by a factor of about 1.8 ( ). Logit ones are bigger
43
More on Probit versus Logit Watch out for “perfect predictions” Luckily STATA will flag them for you and drop the culprit observations Gretl has a mechanism for preventing the algorithm from iterating endlessly in search of a nonexistent maximum. One sub-case of interest is when the perfect prediction problem arises because of a single binary explanatory variable. In this case, the offending variable is dropped from the model and estimation proceeds with the reduced specification.
44
More on Probit versus Logit However, it may happen that no single “perfect classifier” exists among the regressors, in which case estimation is simply impossible and the algorithm stops with an error. If this happens, unless your model is trivially mis- specified (like predicting if a country is an oil exporter on the basis of oil revenues), it is normally a small-sample problem: you probably just don’t have enough data to estimate your model. You may want to drop some of your explanatory variables.
45
More on Probit versus Logit Learn about the test (Wald tests based on chi- 2) and lrtest commands (LR tests), so you can test hypotheses as we did with t-tests and F tests in OLS They are asymptotically equivalent but can differ in small samples
46
More on Probit versus Logit Learn about the many extra STATA capabilities, if you use it, that will make your postestimation life much easier Long and Freese’s book is a great resource GRETL is more limited but doing things by hand for now will actually be a good thing!
47
For example
49
More on Probit versus Logit Stata users? Go through a couple of examples available online with your own STATA session connected to the internet. Examples: http://www.ats.ucla.edu/stat/stata/dae/probit.htm http://www.ats.ucla.edu/stat/stata/dae/logit.htm http://www.ats.ucla.edu/stat/stata/dae/logit.htm http://www.ats.ucla.edu/stat/stata/output/old/lognoframe.htm http://www.ats.ucla.edu/stat/stata/output/old/lognoframe.htm http://www.ats.ucla.edu/stat/stata/output/stata_logistic.htm
50
Slide 16-50 Principles of Econometrics, 3rd Edition binary choice models censored data conditional logit count data models feasible generalized least squares Heckit identification problem independence of irrelevant alternatives (IIA) index models individual and alternative specific variables individual specific variables latent variables likelihood function limited dependent variables linear probability model logistic random variable logit log-likelihood function marginal effect maximum likelihood estimation multinomial choice models multinomial logit odds ratio ordered choice models ordered probit ordinal variables Poisson random variable Poisson regression model probit selection bias tobit model truncated data
51
References Long, S. and J. Freese for all topics (available on Google!)
52
Next Multinomial Logit Conditional Logit
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.