04/19/2006Econ 6161 Econ 616 – Spring 2006 Qualitative Response Regression Models Presented by Yan Hu
04/19/2006Econ 6162 Outline Qualitative Response Regression Model Binary Response Regression Models 1.The Linear Probability Model (LPM) 2.The Logit Model 3.The Probit Model
04/19/2006Econ 6163 What is Qualitative Response Regression Model? The dependent variable is qualitative (or dummy) in nature. --- The dependent variable is a binary, or dichotomous variable: Y=1 if the person is in the labor force and Y=0 if he or she is not. --- Trichotomous response variable. --- Poly-chotomous (or multiple- category) response variable.
04/19/2006Econ 6164 Binary Response Regression Models E(Y) is related to the X’s through a link function g( E(Y) ) = X. In binary regression, a link function specifies a relationship between E(Y) (the probability of Y=1, which is also the expected value of Y) and a linear composite score of X's.
04/19/2006Econ 6165 Three Binary Response Regression Models The Linear Probability Model (LPM) The Logit Model The Probit Model
04/19/2006Econ 6166 What’s Linear Probability Model? Y follows the Bernoulli probability distribution. Link function: E(Y)=0(1-P)+1(P) =P Expression for LPM: P= X YiYi Probabilit y 01-P 1P Total1
04/19/2006Econ 6167 Problems of LPM (1) 1. Non-normality of the disturbances: U i follows the Bernoulli distribution : Problem may not be so critical. If the objective is point estimation, the normality assumption of disturbance is not necessary and the OLS still remain unbiased. As the sample size increases indefinitely, the OLS estimators tend to be normally distributed uiui Probability Y i =1PiPi Y i =0(1-P i )
04/19/2006Econ 6168 Problems of LPM (2) 2. Heteroscedastic variances of the disturbances: Var(u i )=P i (1-P i ), the variance is a function of the mean (P i ). One way to solve the heteroscedasticity is to transform the model by dividing it by the weights. Then, estimate the transformed equation by OLS.
04/19/2006Econ 6169 Problems of LPM (3) 3. Nofulfillment of Two ways of finding out whether the estimated lie between 0 and 1: 1.Estimate the LPM by the usual OLS method. If some are less than zero, is assumed to be zero for those cases; if they are greater than 1, they are assumed to be 1. 2.Devise an estimating technique that will guarantee that the estimated conditional probabilities will lie between 0 and 1, such as logit and probit models.
04/19/2006Econ Problems of LPM (4) 4. Questionable value of R 2 as a measure of goodness of fit. For a given X, the Y values will be either 0 or 1. Therefore, all the Y values will either lie along the X- axis or along the line corresponding to 1. Therefore, generally no LPM is expected to fit such a scatter so well. As a result, the conventionally computed R 2 is likely to be much lower than 1 for such models. Aldrich and Nelson contend that “use of the coefficient of determination as a summary statistic shoud be avoided in models with qualitative dependent variable.”
04/19/2006Econ What is the Logit Model? The cumulative logistic distrubution: P = E(Y=1|X) = 1/(1+e -βX ) P X 1 0
04/19/2006Econ What is the Logit Model? From the logistic distribution, 1-P = e -βX / (1+e -βX ) P/(1-P) = e βX, odds ratio log[p/(1-P)] = βX Link function: g=log[ p/(1-p) ], where p is the probability of either Y=1 or Y=0, depending on the software. Generally, log[ p/(1-p) ]=X.
04/19/2006Econ Two Types of Data To estimate the value of logit log[ p/(1-p) ]=X, we have to distinguish two types of data: --- Data at the individual, or micro, level --- Grouped or replicated data
04/19/2006Econ Data at the Individual Level X: family income, Y=1 if the family owns a house and 0 if it does not own a house. The following table gives data on individual families. FAMILYYX
04/19/2006Econ Grouped or Replicated Data The following table shows data on several families grouped according to income level and the number of families owning a house at each income level. Corresponding to each income level X i, there are N i families, n i among whom are home owners. IncomeNn
04/19/2006Econ Steps in Estimating the Logit Regression (Grouped Data) For each income level X, compute the probability of owning a house as P i ^=n i /N i. For each X i, obtain the logit as L i ^=log[P i ^ /(1-P i ^)] To resolve the problem of heteroscedasticity, W i =N i P i ^(1-P i ^) (W i ) 0.5 L i = β 1 (W i ) β 2 (W i ) 0.5 X i +(W i ) 0.5 u i or L i * = β 1 (W i ) β 2 X i *+v i Estimate above function by OLS on the transformed data. Establish confidence intervals and/or test hypotheses in the usual OLS framework.
04/19/2006Econ SAS Program Proc Import Out= Work.incomes Datafile= "c:\yan\econ616\DG-15.4.xls"; Run; data incomes1; set incomes; phat=n1/n; lhat=log(phat/(1-phat)); w=n*phat*(1-phat); wsquar=sqrt(w); lstar=round(lhat*wsquar, ); xstar=round(income*wsquar, ); run; proc reg data=incomes1; model lstar = wsquar xstar / NOINT; run;
04/19/2006Econ SAS Output The estimated slope coefficient suggests that for a unit ($1000) increase in weighted income, the weighted log of odds in favor of owning a house goes up by 0.08 units. VariableDFParamete r Estimator Standard Error t ValuePr > |t| wsquar <.0001 xstar <.0001
04/19/2006Econ Odds Interpretation The odds ratio: For a unit increase in weighted income, the (weighted) odds in favor of owing a house increase by (e ) or about 8.17%.
04/19/2006Econ An Example of Individual Data In the following table, Y=1 if a student’s final grade in an intermediate microeconomics course was A and Y=0 if the final grade was B or C. GPA, TUCE, and Personalized System of Instruction (PSI) are grade predictors. OBSGPATUCEPSIGRADELETTER C B B B A B B B
04/19/2006Econ SAS Program Proc Import Out= Work.gpagrade Datafile= "c:\yan\econ616\DG-15.7.xls"; Run; proc print data=gpagrade; run; Proc Logistic data=gpagrade ; Model grade (event='1') = gpa tuce psi; run; /* or */ proc probit data=gpagrade; class grade; model grade = gpa tuce psi / d=logistic itprint; run;
04/19/2006Econ Output Standard Wald Parameter DF Estimate Error Chi-Square Pr> ChiSq Intercept GPA TUCE PSI Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald
04/19/2006Econ Interpretation Each slope coefficient is a partial slope and measures the change in the estimated logit for a unit change in the value of the given regressor (holding other regressors constant). Odds interpretation. For example, students who are exposed to the new method of teaching are more than (e ) times to get an A than students who are not exposed to it, other things remaining the same.
04/19/2006Econ What’s the Probit Model Probit link: p= (h), where p is the cumulative distribution function of a standard normal variate. P i =P(Y=1|X)=P(I i * ≤I i )=P(Z i ≤β 1 +β 2 X i )= (β 1 +β 2 X i ), where P(Y=1|X) means the probability that an event occurs given the values of the X, and where Z i ~N(0,σ 2 ). β 1 +β 2 X i = -1 (P i ), where -1 is the inverse of the normal CDF.
04/19/2006Econ Use of Probit Model Probit model is used when Y is considered as the “manifestation” of some unobservable Gaussian-distributed latent variable in the data. For example, the decision of the family to own a house or not depends on an unobservable index I (latent variable), that is determined by one or more explanatory variables, say income X, in such a way that the larger the value of the index I, the greater the probability of a family owning a house.
04/19/2006Econ Probit Estimation with Grouped Data Method 1: 1.Calculate P i ^ =N1/N. 2.Estimate I i = -1 (P i ^ ), where is the standard normal CDF. 3.Estimate β 1 and β 2 from I i, i.e., β 1 +β 2 X i = I i. Method 2: Use SAS or R program directly.
04/19/2006Econ Program SAS: Proc Import Out= Work.incomes Datafile= "c:\yan\econ616\DG xls"; Run; proc genmod data=incomes; class ; model n1/n = income / dist = bin Link = probit lrci; run; R: incomes <- as.data.frame(matrix(scan(),ncol=3, byrow=T)) names(incomes) <- c(“income”,”N”, “N1”) N0 <- incomes$N- incomes$N1 glmA <- glm(cbind(N1, N0)~income, incomes, family=binomial(link=”probit”))
04/19/2006Econ Output Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) e-16 *** income e-16 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 9 degrees of freedom Residual deviance: on 8 degrees of freedom AIC: Number of Fisher Scoring iterations: 3
04/19/2006Econ Interpretation We want to find out the effect of a unit change in X (income) on the probability that Y=1, that is, a family purchases a house. 1.The rate of change of the probability with respect to income: 2.If X=6 (thousand dollars), the normal density function of f[ (6)]=f( )= * = Starting with an income level of $6000, if the income goes up by $1000, the probability of a family purchasing a house goes up by about 1.52%.
04/19/2006Econ Probit Model for Individual Data SAS program: Proc Import Out= Work.gpagrade Datafile= "c:\yan\econ616\DG-15.7.xls"; Run; proc probit data=gpagrade; class grade; model grade = gpa tuce psi; run;
04/19/2006Econ Output Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept GPA TUCE PSI
04/19/2006Econ Marginal Effect of Change in Regressor Holding the effect of all other variables constant. 1.LPM: slope coefficient measures directly the change in the probability of an event occurring as a result of a unit change in the value of a regressor. 2.Logit model: the slope coefficient of a variable gives the change in the log of the odds associated with a unit change in that variable. The rate of change in the probability of an event happening is given by β j P i (1-P i ). 3.Probit model: the rate of change in the probability is given by β j f(Xβ), where f is the density function of the standard normal variable.
04/19/2006Econ Logit or Probit? In most applications, the models are quite similar, the main difference being that the logistic distribution has slightly fat tails. There is no compelling reason to choose one over the other. In practice, many researchers choose the logit model because of its comparative mathematical simplicity. 0 logit P 1probit
04/19/2006Econ Reading Damodar N. Gujarati, Basic Econometrics, P