So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.

Slides:



Advertisements
Similar presentations
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Advertisements

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -
COINTEGRATION 1 The next topic is cointegration. Suppose that you have two nonstationary series X and Y and you hypothesize that Y is a linear function.
============================================================ Dependent Variable: LGHOUS Method: Least Squares Sample: Included observations:
Simple Linear Regression and Correlation
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 TIME SERIES MODELS: STATIC MODELS AND MODELS WITH LAGS In this sequence we will make an initial exploration of the determinants of aggregate consumer.
Lecture 4 This week’s reading: Ch. 1 Today:
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
© Christopher Dougherty 1999–2006 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE We will now investigate the consequences of misspecifying.
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
DURBIN–WATSON TEST FOR AR(1) AUTOCORRELATION
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: two sets of dummy variables Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
Econometrics 1. Lecture 1 Syllabus Introduction of Econometrics: Why we study econometrics? 2.
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Introduction to Linear Regression
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Five Ending Wednesday, September 26 (Note: Exam 1 is on September 27)
© Christopher Dougherty 1999–2006 The denominator has been rewritten a little more carefully, making it explicit that the summation of the squared deviations.
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
SPURIOUS REGRESSIONS 1 In a famous Monte Carlo experiment, Granger and Newbold fitted the model Y t =  1 +  2 X t + u t where Y t and X t were independently-generated.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Four Ending Wednesday, September 19 (Assignment 4 which is included in this study guide.
PARTIAL ADJUSTMENT 1 The idea behind the partial adjustment model is that, while a dependent variable Y may be related to an explanatory variable X, there.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
AUTOCORRELATION 1 Assumption C.5 states that the values of the disturbance term in the observations in the sample are generated independently of each other.
Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables.
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: exercise 4.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
2010, ECON Hypothesis Testing 1: Single Coefficient Review of hypothesis testing Testing single coefficient Interval estimation Objectives.
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Chapter 20 Linear and Multiple Regression
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent variable is a dummy variable. 1 Adapted from “Introduction to Econometrics” by Christopher Dougherty

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Why do some people go to college while others do not? Why do some women enter the labor force while others do not? Why do some people buy houses while others rent? Why do some people migrate while others stay put? Why do some people commit crime while others do not? Why some loans were approved by the bank while others got rejected? Why do some people vote while others do not? Why do some people marry while others do not? The models that have been developed for this purpose are known as binary choice models, with the outcome, which we will denote Y, being assigned a value of 1 if the event occurs and 0 otherwise. 2

The simplest binary choice model is the linear probability model where, as the name implies, the probability of the event occurring, p, is assumed to be a linear function of a set of explanatory variables. Of course p is unobservable. One has data on only the outcome, Y. In LPM, we regress the dummy variable on a set of Xs using OLS, i.e., 3

The LPM predicts the probability of an event occurring, i.e. Y i = 1. In other words, the RHS of the equation must be interpreted as a probability, i.e., restricted to between 0 and 1. For example, if the predicted value is 0.70, this means the event has a 70% chance of occurring. The coefficient  k of the LPM can be interpreted as the marginal effect of X k on the probability that Y i = 1, holding other factors constant. 4

XXiXi 1 0 y, p 11 Points on the fitted line represent the predicted probabilities of the event occurring (i.e., Y =1 ) for each value of X 5

Example Suppose that we are modeling the decision of women to enter the labor force, with A simple LPM of labor force entry as a function of education yields 6

Predictions for Labor Force Model For a person with no education For a person with a high school education For a person with a Masters and Ph.D. (23 years) 7

Why do some people graduate from high school while others drop out? We will define a variable GRAD which is equal to 1 if the individual graduated from high school (i.e., those who had more than 11 years of schooling), and 0 otherwise. We consider only one explanatory variable, i.e., the ASVABC score. Our regression model is in the form: GRAD =  1 +  2 ASVABC +  i ILLUSTRATION 1 8

. g GRAD = 0. replace GRAD = 1 if S > 11 (509 real changes made). reg GRAD ASVABC Source | SS df MS Number of obs = F( 1, 538) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = GRAD | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | _cons | Here is the result of regressing GRAD on ASVABC. It suggests that every additional point on the ASVABC score increases the probability of graduating by 0.007, that is, 0.7%. 9

. g GRAD = 0. replace GRAD = 1 if S > 11 (509 real changes made). reg GRAD ASVABC Source | SS df MS Number of obs = F( 1, 538) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = GRAD | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | _cons | The intercept has no sensible meaning. Literally it suggests that a respondent with a 0 ASVABC score has a 58% probability of graduating. However a score of 0 is not possible. 10

Why do some people buy houses while others rent? We will define a variable HOME which is equal to 1 if the family owned a house, and 0 otherwise. We consider only one explanatory variable INCOME ($’000) Our regression model is in the form: HOME =  1 +  2 INCOME +  i ILLUSTRATION 2 11

Dependent Variable: HOME Method: Least Squares Sample: 1 40 Included observations: Variable | Coefficient Std. Error t-Statistic Prob C | INCOME | R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion -9.25E-06 Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Here is the result of regressing HOME on INCOME. It suggests that every additional unit on income ($1000) increases the probability of owning a house by , that is, 10.21%. If INCOME = 12, HOME = *12 = , indicating that if the income of a family is $12,000, the estimated probability of owing a house is 28%. 12

Dependent Variable: HOME Method: Least Squares Sample: 1 40 Included observations: Variable | Coefficient Std. Error t-Statistic Prob C | INCCOME | R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion -9.25E-06 Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) The intercept has a value of Probability cannot be negative! So, it is treated as zero. Literally it suggests that a respondent with a 0 INCOME has zero probability of owning a house. No income, no house. 13

Why do some women enter the labor force while others do not? We will define a variable PARTICIPATE which is equal to 1 if the woman has a job or is looking for a job, and 0 otherwise (not in the labor force). We consider two explanatory variables: MARRIED = 1 if the woman is married = 0 otherwise EDUCATION = number of years of schooling Our regression model is in the form: PARTICIPATE =  1 +  2 MARRIED +  3 EDUCATION +  i ILLUSTRATION 3 14

Dependent Variable: PARTICIPATE Method: Least Squares Sample: 1 30 Included observations: Variable | Coefficient Std. Error t-Statistic Prob C | MARRIED | EDUCATION | R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) The output suggests that the probability of a woman participating in the labor force falls 38.18% if she is married, holding constant her schooling. On the other, the probability increases by 9.3% for every additional year of schooling, holding constant her marital status. 15

SHORTCOMINGS OF LPM As noted earlier, the LPM is estimated using the OLS. However, there are several shortcomings with the LPM. (1) The error term is not normally distributed As usual, the value of the dependent variable Y i in observation i has a (i) deterministic component and (ii) a random component. The deterministic component depends on X i and the parameters, i.e., E(Y i ). The random component is the error term (  i ). 16

E(Y i ) is simple to compute, because it can take only two values. It is 1 with probability p i and 0 with probability (1 – p i ). The expected value in observation i is: This means that we can rewrite the model as shown: 17

XXiXi 1 0  1 +  2 X i Y, p The probability function is thus the deterministic component of the relationship between Y and X. 11 18

The two possible values, which give rise to the observations A and B, are illustrated in the diagram (see next slide!). Since Y takes on only two values (zero and one), the error term u also take on only two values. Hence, the error term does not have a normal distribution. Note: Normality is not required for the OLS estimates to be unbiased but it is necessary for efficiency. 19

XXiXi 1 0  1 +  2 X i Y, p 11 A B  1 +  2 X i 1 –  1 –  2 X i 20

Since the variance of the error term is a function of the value of X, we have no-constant variance. In other words, the distribution of the error term is heteroscedastic. The consequence is that the OLS estimator is inefficient and the standard errors are biased, resulting in incorrect hypothesis tests. Note: Weighted least square (WLS) has been suggested to deal with the problem of heteroskedasticity. (2) The distribution of the error term is heteroskedastic The population variance of the error term in observation i is given by: 21

Another shortcoming of the LPM is that the predicted probabilities can be greater than 1 or less than 0. Consider the simple LPM with only one independent variable: (3) The LPM is not compatible with the assumed probability structure The fitted line is: As noted earlier, yhat can be interpreted as the predicted probability of an event occurring (or Y = 1, or the probability of success). Probabilities can only range between 0 and 1. However, in OLS, there is no constraint that the yhat estimates fall in the 0-1 range; indeed, yhat is free to vary between -  and + . 22

XXiXi 1 0  1 +  2 X i Y, p 11 23

In the range where X is very large or very small, the predicted probability can be outside the 0-1 range. Some people try to solve this problem by setting probabilities that are greater than (less than) one (zero) to be equal to one (zero). Note: The more appropriate solution is offered by logit or probit models, which keep the OLS predicted values within the 0-1 range. 24