FIN822 Li11 Binary independent and dependent variables.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Probit The two most common error specifications yield the logit and probit models. The probit model results if the are distributed as normal variates,
Linear Regression.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Models with Discrete Dependent Variables
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. Lecture 12: Joint Hypothesis Tests (Chapter 9.1–9.3, 9.5–9.6)
1Prof. Dr. Rainer Stachuletz Limited Dependent Variables P(y = 1|x) = G(  0 + x  ) y* =  0 + x  + u, y = max(0,y*)
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 8 Logistic Regression 1. Introduction Logistic regression extends the ideas of linear regression to the situation where the dependent variable,
Binary Response Lecture 22 Lecture 22.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
GRA 6020 Multivariate Statistics; The Linear Probability model and The Logit Model (Probit) Ulf H. Olsson Professor of Statistics.
Regression with a Binary Dependent Variable. Introduction What determines whether a teenager takes up smoking? What determines if a job applicant is successful.
Economics 20 - Prof. Anderson
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
GRA 6020 Multivariate Statistics; The Linear Probability model and The Logit Model (Probit) Ulf H. Olsson Professor of Statistics.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u Chapter 7. Dummy Variables.
FIN357 Li1 Binary Dependent Variables Chapter 12 P(y = 1|x) = G(  0 + x  )
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Ulf H. Olsson Professor of Statistics
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Topic 3: Regression.
Lecture 14-2 Multinomial logit (Maddala Ch 12.2)
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 5. Dummy Variables.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
GRA 6020 Multivariate Statistics Probit and Logit Models Ulf H. Olsson Professor of Statistics.
1 Javier Aparicio División de Estudios Políticos, CIDE Primavera Regresión.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Ordinal Logistic Regression “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
1 Regression Models with Binary Response Regression: “Regression is a process in which we estimate one variable on the basis of one or more other variables.”
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Lecture 14-1 (Wooldridge Ch 17) Linear probability, Probit, and
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©
LESSON 5 Multiple Regression Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-1.
Session 10. Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous)
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
1 Prices of Antique Clocks Antique clocks are sold at auction. We wish to investigate the relationship between the age of the clock and the auction price.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Issues in Estimation Data Generating Process:
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Dummy Variable Regression Models chapter ten.
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
1 Topic 4 : Ordered Logit Analysis. 2 Often we deal with data where the responses are ordered – e.g. : (i) Eyesight tests – bad; average; good (ii) Voting.
9.1 Chapter 9: Dummy Variables A Dummy Variable: is a variable that can take on only 2 possible values: yes, no up, down male, female union member, non-union.
4. Binary dependent variable Sometimes it is not possible to quantify the y’s Ex. To work or not? To vote one or other party, etc. Some difficulties: 1.Heteroskedasticity.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Multiple Regression Analysis with Qualitative Information
Logistic Regression.
M.Sc. in Economics Econometrics Module I
THE LOGIT AND PROBIT MODELS
Multiple Regression Analysis with Qualitative Information
THE LOGIT AND PROBIT MODELS
Multiple Regression Analysis with Qualitative Information
Introduction to Logistic Regression
Multiple Regression Analysis with Qualitative Information
Introduction to Econometrics, 5th edition
Limited Dependent Variables
Presentation transcript:

FIN822 Li11 Binary independent and dependent variables

FIN822 Li22 Binary independent variables y =  0 +  1 x 1 +  2 x  k x k + u

FIN822 Li33 Dummy Variables A dummy variable is a variable that takes on the value 1 or 0 Examples: male (= 1 if are male, 0 otherwise), NYSE (= 1 if stock listed in NYSE, 0 otherwise), etc. Dummy variables are also called binary variables

FIN822 Li44 A Dummy Independent Variable Consider a simple model with one continuous variable (x) and one dummy (d) y =  0 +  0 d +  1 x + u This can be interpreted as an intercept shift If d = 0, then y =  0 +  1 x + u If d = 1, then y = (  0 +  0 ) +  1 x + u The case of d = 0 is the base group

FIN822 Li55 Example, Y=turnover=trading volume/outstanding shares X=size of firms=log (stock price*outstanding shares) D=1 for NYSE stocks, 0 otherwise (AMEX and NASDAQ) If you assume the slope is the same for both groups, then regress y =  0 +  0 d +  1 x + u

FIN822 Li66 Example of  0 > 0 x y { 00 } 00 y = (  0 +  0 ) +  1 x y =  0 +  1 x slope =  1 d = 0 d = 1

FIN822 Li77 Dummies for Multiple Categories We can use dummy variables to deal with something with multiple categories Suppose everyone in your data is either a HS dropout, HS grad only, or college grad To compare HS and college grads to HS dropouts, include 2 dummy variables hsgrad = 1 if HS grad only, 0 otherwise; and colgrad = 1 if college grad, 0 otherwise

FIN822 Li88 Multiple Categories (cont) Any categorical variable can be turned into a set of dummy variables The base group is represented by the intercept; if there are n categories there should be n – 1 dummy variables

FIN822 Li99 Example, Y=turnover=trading volume/outstanding shares X=size of firms=log (stock price*outstanding shares) D=1 for NYSE stocks, 0 otherwise (AMEX and NASDAQ) If you assume the slope and intercept are different for both groups, you could run two separate regression for each group, or…

FIN822 Li10 Interactions with Dummies Can also consider interacting a dummy variable, d, with a continuous variable, x y =  0 +  1 d +  1 x +  2 d*x + u If d = 0, then y =  0 +  1 x + u If d = 1, then y = (  0 +  1 ) + (  1 +  2 ) x + u This is interpreted as a change in the slope (and intercept).

FIN822 Li11 y x y =  0 +  1 x y = (  0 +  0 ) + (  1 +  1 ) x Example of  0 > 0 and  1 < 0 d = 1 d = 0

FIN822 Li12 Caveats A typical use of a dummy variable is when we are looking for a program effect For example, we may have individuals that received job training and wish to test the effect of training. We need to remember that usually individuals choose whether to participate in a program, which may lead to a self-selection problem

FIN822 Li13 Self-selection Problems If we can control for everything that is correlated with both participation and the outcome of interest then it’s not a problem Often, though, there are unobservables that are correlated with participation. For example, people’s skill before the training. Low skill people may be more likely to take the training. In this case, the estimate of the program effect is biased.

FIN822 Li14 Self Selection Problem: An example Suppose one professor finds that his evening class students perform better that his daytime class. Does that mean the professor teaches better at night or students learn better at night? No. It turns out that students participating in night classes might be more hard working. They care to come at night, sometimes after a daytime work, etc.

FIN822 Li15 Self-selection Problems If we can control (control means adding more explanatory variables) for everything that is correlated with both participation (the dummy variable) and the outcome (y), then it’s not a problem.

FIN822 Li16 Binary Dependent Variables Logit and Probit models P(y = 1|x) = G(  0 + x  )

FIN822 Li17 Binary Dependent Variables A linear probability model can be written as P(y = 1|x) =  0 + x  A drawback to the linear probability model is that predicted values are not constrained to be between 0 and 1 An alternative is to model the probability as a function, G(  0 + x  ), where 0<G(z)<1

FIN822 Li18 The intuition

FIN822 Li19 The Probit Model One choice for G(z) is the standard normal cumulative distribution function (cdf) G(z) =  (z) ≡ ∫  (v)dv, where  (z) is the standard normal, so  (z) = (2  ) -1/2 exp(-z 2 /2) This case is referred to as a probit model Since it is a nonlinear model, it cannot be estimated by our usual methods Use maximum likelihood estimation

FIN822 Li20 The Logit Model Another common choice for G(z) is the logistic function G(z) = exp(z)/[1 + exp(z)] =  (z) This case is referred to as a logit model, or sometimes as a logistic regression Both functions have similar shapes – they are increasing in z, most quickly around 0

FIN822 Li21 Probits and Logits Both the probit and logit are nonlinear and require maximum likelihood estimation No real reason to prefer one over the other Traditionally saw more of the logit, mainly because the logistic function leads to a more easily computed model

FIN822 Li22 Interpretation of Probits and Logits In general we care about the effect of x on P(y = 1|x), that is, we care about ∂p/ ∂x For the linear case, this is easily computed as the coefficient on x For the nonlinear probit and logit models, it’s more complicated: ∂p/ ∂x j = g(  0 +x  )  j, where g(z) is dG/dz

FIN822 Li23 For logit model ∂p/ ∂x j =  j exp(  0 +x  (1 + exp(  0 +x  ) ) 2 For Probit model: ∂p/ ∂x j =  j  (  0 +x  ) =  j (2  ) -1/2 exp(-(  0 +x  2 /2)

FIN822 Li24 Interpretation (continued) Can examine the sign and significance (based on a standard t test) of coefficients,