04/19/2006Econ 6161 Econ 616 – Spring 2006 Qualitative Response Regression Models Presented by Yan Hu.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Continued Psy 524 Ainsworth
Brief introduction on Logistic Regression
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Binary Logistic Regression: One Dichotomous Independent Variable
CHAPTER 8 MULTIPLE REGRESSION ANALYSIS: THE PROBLEM OF INFERENCE
Logistic Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear regression models
Logistic Regression Example: Horseshoe Crab Data
Models with Discrete Dependent Variables
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Multiple regression analysis
Binary Response Lecture 22 Lecture 22.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
The Simple Regression Model
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Generalized Linear Models
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
1 Regression Models with Binary Response Regression: “Regression is a process in which we estimate one variable on the basis of one or more other variables.”
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
1 Topic 4 : Ordered Logit Analysis. 2 Often we deal with data where the responses are ordered – e.g. : (i) Eyesight tests – bad; average; good (ii) Voting.
11 Chapter 5 The Research Process – Hypothesis Development – (Stage 4 in Research Process) © 2009 John Wiley & Sons Ltd.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic Regression and Odds Ratios Psych DeShon.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Chapter 4: Basic Estimation Techniques
Chapter 14 Introduction to Multiple Regression
BINARY LOGISTIC REGRESSION
Chapter 4 Basic Estimation Techniques
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS
THE LOGIT AND PROBIT MODELS
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
Generalized Linear Models
THE LOGIT AND PROBIT MODELS
ביצוע רגרסיה לוגיסטית. פרק ה-2
Logistic Regression.
Our theory states Y=f(X) Regression is used to test theory.
Chapter 7: The Normality Assumption and Inference with OLS
Seminar in Economics Econ. 470
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

04/19/2006Econ 6161 Econ 616 – Spring 2006 Qualitative Response Regression Models Presented by Yan Hu

04/19/2006Econ 6162 Outline  Qualitative Response Regression Model  Binary Response Regression Models 1.The Linear Probability Model (LPM) 2.The Logit Model 3.The Probit Model

04/19/2006Econ 6163 What is Qualitative Response Regression Model?  The dependent variable is qualitative (or dummy) in nature. --- The dependent variable is a binary, or dichotomous variable: Y=1 if the person is in the labor force and Y=0 if he or she is not. --- Trichotomous response variable. --- Poly-chotomous (or multiple- category) response variable.

04/19/2006Econ 6164 Binary Response Regression Models  E(Y) is related to the X’s through a link function g( E(Y) ) = X.  In binary regression, a link function specifies a relationship between E(Y) (the probability of Y=1, which is also the expected value of Y) and a linear composite score of X's.

04/19/2006Econ 6165 Three Binary Response Regression Models  The Linear Probability Model (LPM)  The Logit Model  The Probit Model

04/19/2006Econ 6166 What’s Linear Probability Model?  Y follows the Bernoulli probability distribution.  Link function: E(Y)=0(1-P)+1(P) =P  Expression for LPM: P= X YiYi Probabilit y 01-P 1P Total1

04/19/2006Econ 6167 Problems of LPM (1) 1. Non-normality of the disturbances:  U i follows the Bernoulli distribution :  Problem may not be so critical. If the objective is point estimation, the normality assumption of disturbance is not necessary and the OLS still remain unbiased. As the sample size increases indefinitely, the OLS estimators tend to be normally distributed uiui Probability Y i =1PiPi Y i =0(1-P i )

04/19/2006Econ 6168 Problems of LPM (2) 2. Heteroscedastic variances of the disturbances:  Var(u i )=P i (1-P i ), the variance is a function of the mean (P i ).  One way to solve the heteroscedasticity is to transform the model by dividing it by the weights. Then, estimate the transformed equation by OLS.

04/19/2006Econ 6169 Problems of LPM (3) 3. Nofulfillment of  Two ways of finding out whether the estimated lie between 0 and 1: 1.Estimate the LPM by the usual OLS method. If some are less than zero, is assumed to be zero for those cases; if they are greater than 1, they are assumed to be 1. 2.Devise an estimating technique that will guarantee that the estimated conditional probabilities will lie between 0 and 1, such as logit and probit models.

04/19/2006Econ Problems of LPM (4) 4. Questionable value of R 2 as a measure of goodness of fit.  For a given X, the Y values will be either 0 or 1. Therefore, all the Y values will either lie along the X- axis or along the line corresponding to 1. Therefore, generally no LPM is expected to fit such a scatter so well. As a result, the conventionally computed R 2 is likely to be much lower than 1 for such models.  Aldrich and Nelson contend that “use of the coefficient of determination as a summary statistic shoud be avoided in models with qualitative dependent variable.”

04/19/2006Econ What is the Logit Model?  The cumulative logistic distrubution: P = E(Y=1|X) = 1/(1+e -βX ) P X 1 0

04/19/2006Econ What is the Logit Model?  From the logistic distribution, 1-P = e -βX / (1+e -βX ) P/(1-P) = e βX, odds ratio log[p/(1-P)] = βX  Link function: g=log[ p/(1-p) ], where p is the probability of either Y=1 or Y=0, depending on the software.  Generally, log[ p/(1-p) ]=X.

04/19/2006Econ Two Types of Data  To estimate the value of logit log[ p/(1-p) ]=X, we have to distinguish two types of data: --- Data at the individual, or micro, level --- Grouped or replicated data

04/19/2006Econ Data at the Individual Level  X: family income, Y=1 if the family owns a house and 0 if it does not own a house. The following table gives data on individual families. FAMILYYX

04/19/2006Econ Grouped or Replicated Data  The following table shows data on several families grouped according to income level and the number of families owning a house at each income level. Corresponding to each income level X i, there are N i families, n i among whom are home owners. IncomeNn

04/19/2006Econ Steps in Estimating the Logit Regression (Grouped Data)  For each income level X, compute the probability of owning a house as P i ^=n i /N i.  For each X i, obtain the logit as L i ^=log[P i ^ /(1-P i ^)]  To resolve the problem of heteroscedasticity, W i =N i P i ^(1-P i ^) (W i ) 0.5 L i = β 1 (W i ) β 2 (W i ) 0.5 X i +(W i ) 0.5 u i or L i * = β 1 (W i ) β 2 X i *+v i  Estimate above function by OLS on the transformed data.  Establish confidence intervals and/or test hypotheses in the usual OLS framework.

04/19/2006Econ SAS Program Proc Import Out= Work.incomes Datafile= "c:\yan\econ616\DG-15.4.xls"; Run; data incomes1; set incomes; phat=n1/n; lhat=log(phat/(1-phat)); w=n*phat*(1-phat); wsquar=sqrt(w); lstar=round(lhat*wsquar, ); xstar=round(income*wsquar, ); run; proc reg data=incomes1; model lstar = wsquar xstar / NOINT; run;

04/19/2006Econ SAS Output The estimated slope coefficient suggests that for a unit ($1000) increase in weighted income, the weighted log of odds in favor of owning a house goes up by 0.08 units. VariableDFParamete r Estimator Standard Error t ValuePr > |t| wsquar <.0001 xstar <.0001

04/19/2006Econ Odds Interpretation  The odds ratio:  For a unit increase in weighted income, the (weighted) odds in favor of owing a house increase by (e ) or about 8.17%.

04/19/2006Econ An Example of Individual Data  In the following table, Y=1 if a student’s final grade in an intermediate microeconomics course was A and Y=0 if the final grade was B or C. GPA, TUCE, and Personalized System of Instruction (PSI) are grade predictors. OBSGPATUCEPSIGRADELETTER C B B B A B B B

04/19/2006Econ SAS Program Proc Import Out= Work.gpagrade Datafile= "c:\yan\econ616\DG-15.7.xls"; Run; proc print data=gpagrade; run; Proc Logistic data=gpagrade ; Model grade (event='1') = gpa tuce psi; run; /* or */ proc probit data=gpagrade; class grade; model grade = gpa tuce psi / d=logistic itprint; run;

04/19/2006Econ Output Standard Wald Parameter DF Estimate Error Chi-Square Pr> ChiSq Intercept GPA TUCE PSI Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald

04/19/2006Econ Interpretation  Each slope coefficient is a partial slope and measures the change in the estimated logit for a unit change in the value of the given regressor (holding other regressors constant).  Odds interpretation. For example, students who are exposed to the new method of teaching are more than (e ) times to get an A than students who are not exposed to it, other things remaining the same.

04/19/2006Econ What’s the Probit Model  Probit link: p= (h), where p is the cumulative distribution function of a standard normal variate.  P i =P(Y=1|X)=P(I i * ≤I i )=P(Z i ≤β 1 +β 2 X i )= (β 1 +β 2 X i ), where P(Y=1|X) means the probability that an event occurs given the values of the X, and where Z i ~N(0,σ 2 ).  β 1 +β 2 X i =  -1 (P i ), where  -1 is the inverse of the normal CDF.

04/19/2006Econ Use of Probit Model  Probit model is used when Y is considered as the “manifestation” of some unobservable Gaussian-distributed latent variable in the data.  For example, the decision of the family to own a house or not depends on an unobservable index I (latent variable), that is determined by one or more explanatory variables, say income X, in such a way that the larger the value of the index I, the greater the probability of a family owning a house.

04/19/2006Econ Probit Estimation with Grouped Data  Method 1: 1.Calculate P i ^ =N1/N. 2.Estimate I i =  -1 (P i ^ ), where  is the standard normal CDF. 3.Estimate β 1 and β 2 from I i, i.e., β 1 +β 2 X i = I i.  Method 2: Use SAS or R program directly.

04/19/2006Econ Program SAS: Proc Import Out= Work.incomes Datafile= "c:\yan\econ616\DG xls"; Run; proc genmod data=incomes; class ; model n1/n = income / dist = bin Link = probit lrci; run; R: incomes <- as.data.frame(matrix(scan(),ncol=3, byrow=T)) names(incomes) <- c(“income”,”N”, “N1”) N0 <- incomes$N- incomes$N1 glmA <- glm(cbind(N1, N0)~income, incomes, family=binomial(link=”probit”))

04/19/2006Econ Output Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) e-16 *** income e-16 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 9 degrees of freedom Residual deviance: on 8 degrees of freedom AIC: Number of Fisher Scoring iterations: 3

04/19/2006Econ Interpretation  We want to find out the effect of a unit change in X (income) on the probability that Y=1, that is, a family purchases a house. 1.The rate of change of the probability with respect to income: 2.If X=6 (thousand dollars), the normal density function of f[ (6)]=f( )= * = Starting with an income level of $6000, if the income goes up by $1000, the probability of a family purchasing a house goes up by about 1.52%.

04/19/2006Econ Probit Model for Individual Data  SAS program: Proc Import Out= Work.gpagrade Datafile= "c:\yan\econ616\DG-15.7.xls"; Run; proc probit data=gpagrade; class grade; model grade = gpa tuce psi; run;

04/19/2006Econ Output Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept GPA TUCE PSI

04/19/2006Econ Marginal Effect of Change in Regressor  Holding the effect of all other variables constant. 1.LPM: slope coefficient measures directly the change in the probability of an event occurring as a result of a unit change in the value of a regressor. 2.Logit model: the slope coefficient of a variable gives the change in the log of the odds associated with a unit change in that variable. The rate of change in the probability of an event happening is given by β j P i (1-P i ). 3.Probit model: the rate of change in the probability is given by β j f(Xβ), where f is the density function of the standard normal variable.

04/19/2006Econ Logit or Probit?  In most applications, the models are quite similar, the main difference being that the logistic distribution has slightly fat tails.  There is no compelling reason to choose one over the other.  In practice, many researchers choose the logit model because of its comparative mathematical simplicity. 0 logit P 1probit

04/19/2006Econ Reading  Damodar N. Gujarati, Basic Econometrics, P