Models with Discrete Dependent Variables

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Continued Psy 524 Ainsworth
Lesson 10: Linear Regression and Correlation
Brief introduction on Logistic Regression
Objectives 10.1 Simple linear regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
FIN822 Li11 Binary independent and dependent variables.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Limited Dependent Variables
1 Module II Lecture 8: Categorical/Limited Dependent Variables and Logistic Regression Graduate School Quantitative Research Methods Gwilym Pryce.
x – independent variable (input)
Binary Response Lecture 22 Lecture 22.
Chapter 10 Simple Regression.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Chapter 5 Heteroskedasticity. What is in this Chapter? How do we detect this problem What are the consequences of this problem? What are the solutions?
Regression with a Binary Dependent Variable. Introduction What determines whether a teenager takes up smoking? What determines if a job applicant is successful.
Chapter 4 Multiple Regression.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Log-linear and logistic models
Inferences About Process Quality
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Ordinal Logistic Regression “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Objectives of Multiple Regression
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
1 Regression Models with Binary Response Regression: “Regression is a process in which we estimate one variable on the basis of one or more other variables.”
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Correlation and Linear Regression
Linear Regression Inference
Quantitative Methods Heteroskedasticity.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Andrew Thomson on Generalised Estimating Equations (and simulation studies)
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
Examining Relationships in Quantitative Research
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
Issues in Estimation Data Generating Process:
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Example x y We wish to check for a non zero correlation.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Nonparametric Statistics
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Econometric analysis of CVM surveys. Estimation of WTP The information we have depends on the elicitation format. With the open- ended format it is relatively.
Nonparametric Statistics
M.Sc. in Economics Econometrics Module I
THE LOGIT AND PROBIT MODELS
CHAPTER 12 More About Regression
THE LOGIT AND PROBIT MODELS
Nonparametric Statistics
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Linear Regression and Correlation
Presentation transcript:

Models with Discrete Dependent Variables Logit and Probit Models with Discrete Dependent Variables

Why Do We Need A Different Model Than Linear Regression? Appropriate estimation of relations between variables depends on selecting an appropriate statistical model. There are many different types of estimation problems in political science. Continuous variables where the experiment can be viewed as draws from a normal distribution. Continuous Variables where the experiment is draws from some other distribution. Continuous Variables where the distribution is truncated or censored. Discrete Variables - For example, we might model labor force participation, whether to vote for or against, purchase or not purchase, run for office or not run for office, etc. Models of this type are sometimes called qualitative response models, because the dependent variables are discrete, rather than continuous. There are several types of such models including the following.

Type of Qualitative Response Models Qualitative dichotomy (e.g., vote/not vote type variables)- We equate "no" with zero and "yes" with 1. However, these are qualitative choices and the coding of 0-1 is arbitrary. We could equally well code "no" as 1 and "yes" as zero. Qualitative multichotomy (e.g., occupational choice by an individual)- Let 0 be a clerk, 1 an engineer, 2 an attorney, 3 a politician, 4 a college professor, and 5 other. Here the codings are mere categories and the numbers have no real meaning. Rankings (e.g., opinions about a politician's job performance)- Strongly approve (5), approve (4), don't know (3), disapprove (2), strongly disapprove (1). The values that are chosen are not quantitative, but merely an ordering of preferences or opinions. The difference between outcomes is not necessarily the same from 5 to 4 as it is from 2 to 1. Count outcomes.

Dichotomous Dependent Variables There are various problems associated with estimating a dichotomous dependent variable under assumptions of a statistical experiment that draws from a normal distribution, i.e., using regression. Obviously the statistical experiment is not draws from a normal distribution, but from something called a Bernoulli distribution. Thus, estimation is likely to be inefficient. It is also theoretically inconsistent with the nature of the statistical experiment. The dependent variable is discrete and truncated on both ends at 0 and 1. This leads to a number of other serious problems. Consider first, a graph of the data in a typical sample of Bernoulli experiments.

Note that a linear regression line through the actual data cuts through the data at the point of greatest concentration on each end. The residuals from this regression line will only be close to the regression line if the X variable is also Bernoulli distributed. This means that measures of fit or hypothesis tests involving the squared errors will be silly. The regression line will seldom lie near the data.

Relatedly, this feature also means that the residuals from the linear model will be dichotomous and heteroskedastic, rather than normal, raising questions about hypothesis tests. When y=1, the residual will depend on X and be: When y=0, the residual will depend on X and be: This means that the residuals from the linear probability model will be heteroskedastic and have a dichotomous character.

Note that the residuals change systematically with the values of X Note that the residuals change systematically with the values of X. This implies what it termed endogeneity. They are also not distributed normally. We could "fix" this problem by estimating the linear probability model using weighted least squares. However, the problem with this model runs deeper. We must be able to interpret results from this model as expected values of probabilities. However, the graph below suggests further problems.

Observe that some of the probabilities lie above 1 and below zero Observe that some of the probabilities lie above 1 and below zero. This is not consistent with the rules of probability. We could truncate the model at 0 and 1 to "fix" this problem. However, note that probability, according to this model, is alleged to change in linear fashion with changes in X. Yet, this may not be consistent with reality in many real world situations. For example, consider the probability of home ownership as a function of income. Suppose we have prospective buyers with income around 10k per year. If we change their income by 1k, how much does the probability that they will buy a home change? Suppose we have prospective buyers with income around 30k. If we change their income by 1k, how much does the probability that they will own a home change? Suppose we have prospective buyers with income around 80k. If we change their income by 1k, how much does the probability that they will own a home change? In practice, there are many situations where the probability of a yes outcome follows an S shaped distribution, rather than the linear distribution alleged by the linear probability model.

Non-Linear Probability Models To begin, assume the appropriate statistical experiment. The statistical experiment is draws from a Bernoulli distribution. The probability model from the Bernoulli distribution is given: where p is a parameter reflecting the probability that y=1. The issue then becomes how to specify the probability that y=1. We noted above that this probability often follows an S shaped distribution. In other words, the probability that y=1 remains small until some threshold is crossed, at which point it switches rapidly to remain large after the threshold. This suggests a cumulative density function.

Two different cumulative density functions are commonly used in this situation: the cumulative standard normal distribution (probit) and the cumulative logistic distribution (logit). Probit- The cumulative standard normal density is given:

Logit- The cumulative logistic function for logit is grounded in the concept of an odds ratio. Let the log odds that y=1 be given: Then solving for the probability that y=1 we have:

Choosing Between Logit/Probit- In the dichotomous case, there is no basis in statistical theory for preferring one over the other. In most applications it makes no difference which one uses. If we have a small sample the two distributions can differ significantly in their results, but they are quite similar in large samples. Various R2 measures have been devised for Logit and Probit. However, none is a measure of the closeness of observations to an expected value as in regression analysis. All are ad hoc.

Hypothesis testing t or z test- We can test the significance of the individual coefficients simply using the point estimates and standard errors (square roots of the diagonal elements of the asymptotic covariance matrix of estimates). Form a z or t test by taking Confidence Intervals

Interpretation Interpreting Dichotomous Logit and Probit Coefficients- The actual coefficients in a logit or probit analysis are limited in their immediate interpretability. The signs are meaningful, but the magnitudes may not be, particularly when the variables are in different metrics. Above all, note that you cannot interpret the coefficients directly in terms of units of change in y for a unit change in x, as in regression analysis. There are various approaches to imparting substantive meaning into logit and probit results, including: Probability Calculations Graphical methods First differences First Partial derivatives.