9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Tests of Static Asset Pricing Models
Lecture 11 (Chapter 9).
Linear Regression.
Brief introduction on Logistic Regression
Random Assignment Experiments
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Error Component models Ric Scarpa Prepared for the Choice Modelling Workshop 1st and 2nd of May Brisbane Powerhouse, New Farm Brisbane.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
The General Linear Model. The Simple Linear Model Linear Regression.
2. Fixed Effects Models 2.1 Basic fixed-effects model
Models with Discrete Dependent Variables
The Simple Linear Regression Model: Specification and Estimation
Multiple Linear Regression Model
Maximum likelihood (ML) and likelihood ratio (LR) test
Chapter 10 Simple Regression.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
Chapter 4 Multiple Regression.
Maximum likelihood (ML) and likelihood ratio (LR) test
Statistical Background
Chapter 11 Multiple Regression.
Topic 3: Regression.
Continuous Random Variables and Probability Distributions
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
An Introduction to Logistic Regression
Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)
Maximum likelihood (ML)
BINARY CHOICE MODELS: LOGIT ANALYSIS
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
3. Models with Random Effects
Model Inference and Averaging
Lecture 8: Generalized Linear Models for Longitudinal Data.
Applications The General Linear Model. Transformations.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
Estimation in Marginal Models (GEE and Robust Estimation)
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
Continuous Random Variables and Probability Distributions
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
The simple linear regression model and parameter estimation
M.Sc. in Economics Econometrics Module I
Parametric Methods Berlin Chen, 2005 References:
9. Binary Dependent Variables
Presentation transcript:

9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models 9.4 Marginal models and GEE Appendix 9A - Likelihood calculations

9.1 Homogeneous models The response of interest, y it, now may be only a 0 or a 1, a binary dependent variable. –Typically indicates whether the ith subject possesses an attribute at time t. Suppose that the probability that the response equals 1 is denoted by Prob(y it = 1) = p it. –Then, we may interpret the mean response to be the probability that the response equals 1, that is,  E y it = 0 Prob(y it = 0) + 1 Prob(y it = 1) = p it. –Further, straightforward calculations show that the variance is related to the mean through the expression Var y it = p it (1 - p it ).

Inadequacy of linear models Homogeneous means that we will not incorporate subject- specific terms that account for heterogeneity. Linear models of the form y it = x it  +  it are inadequate because: –The expected response is a probability and thus must vary between 0 and 1 although the linear combination, x it , may vary between negative and positive infinity. –Linear models assume homoscedasticity (constant variance) yet the variance of the response depends on the mean which varies over observations. –The response must be either a 0 or 1 although the distribution of the error term is typically regarded as continuous.

Using nonlinear functions of explanatory variables In lieu of linear, or additive, functions, we express the probability of the response being 1 as a nonlinear function of explanatory variables p it =  ( x it  ). Two special cases are: – the logit case –  (z ) as a cumulative standard normal distribution function, the probit case. These two functions are similar. I focus on the logit case because it permits closed-form expressions unlike the cumulative normal distribution function.

Threshold interpretation Suppose that there exists an underlying linear model, y it * = x it  +  it *. –The response is interpreted to be the “propensity” to possess a characteristic. –We do not observe the propensity but we do observe when the propensity crosses a threshold, say 0. –We observe Using the logit distribution function, Prob (  it *  a) = 1/ (1 + exp(-a) ) Note that Prob(-  it *  x it  ) = Prob(  it *  x it  ). Thus,

Random utility interpretation In economics applications, we think of an individual choosing among c categories. –Preferences among categories are indexed by an unobserved utility function. –We model utility as a function of an underlying value plus random noise, that is, U itj = u it (V itj +  itj ), j = 0,1. –If U it1 > U it0, then denote this choice as y it = 1. –Assuming that u it is a strictly increasing function, we have Parameterize the problem by taking V it0 = 0 and V it1 = x it β. We may take the difference in the errors,  it0 -  it1, to be normal or logistic, corresponding to the probit and logit cases.

Logistic regression This is another phrase used to describe the logit case. Using p =  (z), the inverse of  can be calculated as z =  -1 (p) = ln ( p/(1-p) ). –Define logit (p) = ln ( p/(1-p) ) to be the logit function. –Here, p/(1-p) is known as the odds ratio. It has a convenient economic interpretation in terms of fair games. That is, suppose that p = Then, the odds ratio is The odds against winning are to 1, or 1 to 3. If we bet $1, then in a fair game we should win $3. The logistic regression models the linear combination of explanatory variables as the logarithm of the odds ratio, x it  = ln ( p it /(1-p it ) ).

Parameter interpretation To interpret  =(  1,  2, …,  K ), we begin by assuming that jth explanatory variable, x itj, is either 0 or 1. Then, with the notation, we may interpret Thus, To illustrate, if  j = 0.693, then exp(  j ) = 2. –The odds (for y = 1) are twice as great for x j = 1 as for x j = 0.

More parameter interpretation Similarly, assuming that jth explanatory variable is continuous, we have Thus, we may interpret  j as the proportional change in the odds ratio, known as an elasticity in economics.

Parameter estimation The customary estimation method is maximum likelihood. The log likelihood of a single observation is The log likelihood of the data set is Taking partial derivatives with respect to b yields the score equations –The solution of these equations, say b MLE, yields the maximum likelihood estimate. The score equations can also be expressed as a generalized estimating equation: where

For the logit function The normal equations are: –The solution depends on the responses y it only through the vector of statistics  it x it y it. The solution of these equations, say b MLE, yields the maximum likelihood estimate b MLE. This method can be extended to provide standard errors for the estimates.

9.2 Random effects models We accommodate heterogeneity by incorporating subject- specific variables of the form: p it =  (  i + x it  ). –We assume that the intercepts are realizations of random variables from a common distribution. We estimate the parameters of the {  i } distribution and the K slope parameters . By using the random effects specification, we dramatically reduced the number of parameters to be estimated compared to the Section 9.3 fixed effects set-up. –This is similar to the linear model case. This model is computationally difficult to evaluate.

Commonly used distributions We assume that subject-specific effects are independent and come from a common distribution. –It is customary to assume that the subject-specific effects are normally distributed. We assume, conditional on subject-specific effects, that the responses are independent. Thus, there is no serial correlation. There are two commonly used specifications of the conditional distributions in the random effects panel data model. –1. A logistic model for the conditional distribution of a response. That is, –2. A normal model for the conditional distribution of a response. That is, –where  is the standard normal distribution function.

Likelihood Let Prob(y it = 1|  i ) =  (  i + x it  ) denote the conditional probability for both the logistic and normal models. Conditional on  i, the likelihood for the it th observation is: Conditional on  i, the likelihood for the ith subject is: Thus, the (unconditional) likelihood for the ith subject is: –Here,  is the standard normal density function. Hence, the total log-likelihood is  i ln l i. –Note: lots of evaluations of a numerical integral….

Comparing logit to probit specification There are no important advantages or disadvantages when choosing the conditional probability  to be: –logit function (logit model) –standard normal (probit model) The likelihood involves roughly the same amount of work to evaluate and maximize, although the logit function is slightly easier to evaluate than the standard normal distribution function. The probit model is slightly easier to interpret because unconditional probabilities can be expressed in terms of the standard normal distribution function. That is,

9.3 Fixed effects models As with homogeneous models, we express the probability of the response being 1 as a nonlinear function of linear combinations of explanatory variables. To accommodate heterogeneity, we incorporate subject- specific variables of the form: p it =  (  i + x it  ). –Here, the subject-specific effects account only for the intercepts and do not include other variables. –We assume that {  i } are fixed effects in this section. In this chapter, we assume that responses are serially uncorrelated. Important point: Panel data with dummy variables provide inconsistent parameter estimates….

Maximum likelihood estimation Unlike random effect models, maximum likelihood estimators are inconsistent in fixed effects models. –The log likelihood of the data set is –This log likelihood can still be maximized to yield maximum likelihood estimators. –However, as the subject size n tends to infinity, the number of parameters also tends to infinity. Intuitively, our ability to estimate  is corrupted by our inability to estimate consistently the subject-specific effects {  i }. –In the linear case, we had that the maximum likelihood estimates are equivalent to the least squares estimates. The least squares estimates of  were consistent. The least squares procedure “swept out” intercept estimators when producing estimates of .

Maximum likelihood estimation is inconsistent Example 9.2 (Chamberlain, 1978, Hsiao 1986). –Let T i = 2, K=1 and x i1 = 0 and x i2 =1. –Take derivatives of the likelihood function to get the score functions – these are in display (9.8). –From (9.8), the score functions are –and –Appendix 9A.1 Maximize this to get b mle Show that the probability limit of b mle is 2 , and hence is an inconsistent estimator of .

Conditional maximum likelihood estimation This estimation technique provides consistent estimates of the beta coefficients. –It is due to Chamberlain (1980) in the context of fixed effects panel data models. Let’s consider the logit specification of , so that Big idea: With this specification, it turns out that  t y it is a sufficient statistic for  i. –Thus, if we condition on  t y it, then the distribution of the responses will not depend on  i.

Example of the sufficiency To illustrate how to separate the intercept from the slope effects, consider the case T i = 2. –Suppose that the sum,  t y it = y i1 +y i2, equals either 0 or 2. If sum equals 0, then Prob (y i1 = 0, y i2 = 0 |y i1 + y i2 = sum) = 1. If sum equals 2, then Prob (y i1 = 1, y i2 = 1 |y i1 + y i2 = sum) = 1. Both conditional probabilities do not depend on  i. Both conditional events are certain and will contribute nothing to a conditional likelihood. –If sum equals 1,

Example of the sufficiency Thus, This does not depend on  i. –Note that if an explanatory variable x ij is time-constant (x ij2  x ij1 ), then the corresponding parameter  j disappears from the conditional likelihood.

Conditional likelihood estimation Let S i be the random variable representing  t y it and let sum i be the realization of  t y it. The conditional likelihood of the data set is –Note that the ratio equals one when sum i equal 0 or T i. –The distribution of S i is messy and is difficult to compute for moderate size data sets with T more than 10. This provides a fix for the problem of “infinitely many nuisance parameters.” –Computationally difficult, hard to extend to more complex models, hard to explain to consumers

9.4 Marginal models and GEE Marginal models, also know as “population-averaged” models, only require specification of the first two moments –Means, variances and covariances –Not a true probability model –Ideal for moment estimation (GEE, GMM) Begin in the context of the random effects binary dependent variable model –The mean is E y it = –The variance is Var y it =  it (1-  it ). –The covariance is Cov (y ir, y is )

GEE – generalized estimating equations This is a method of moments procedure –Essentially the same as generalized method of moments –One matches theoretical moments to sample moments, with appropriate weighting. Idea – find the values of the parameters that satisfy –We have already specified the variance matrix. –We also use a K x T i matrix of derivatives –For binary variables, we have

Marginal Model Choose the mean function to be –Motivated by probit specification For the variance function, consider Var y it =   it (1-  it ). Let Corr(y ir, y is ) denote the correlation between y ir and y is. –This is known as a working correlation. Use the exchangeable correlation structure specified as Here, the motivation is that the latent variable  i is common to all observations within a subject, thus inducing a common correlation. The parameters τ = ( ,  ) constitute the variance components.

Robust Standard Errors Model-based standard errors are taken from the square root of the diagonal elements of As an alternative, robust or empirical standards errors are from These are robust to misspecified heterscedasticity as well as time series correlation.