Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
What is Event History Analysis?
Multilevel Event History Modelling of Birth Intervals
What is Event History Analysis?
Multilevel Event History Models with Applications to the Analysis of Recurrent Employment Transitions Fiona Steele.
Lecture 29 Summary of previous lecture LPM LOGIT PROBIT ORDINAL LOGIT AND PROBIT TOBIT MULTINOMIAL LOGIT AN PROBIT DURATION.
Statistical Analysis SC504/HS927 Spring Term 2008
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Longitudinal and Multilevel Methods for Models with Discrete Outcomes with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
SC968: Panel Data Methods for Sociologists
Models with Discrete Dependent Variables
MACROECONOMETRICS LAB 3 – DYNAMIC MODELS.
1Prof. Dr. Rainer Stachuletz Limited Dependent Variables P(y = 1|x) = G(  0 + x  ) y* =  0 + x  + u, y = max(0,y*)
Part 21: Hazard Models [1/29] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Econ 140 Lecture 181 Multiple Regression Applications III Lecture 18.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Topic 3: Regression.
1/62: Topic 2.3 – Panel Data Binary Choice Models Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA.
An Introduction to Logistic Regression
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
17. Duration Modeling. Modeling Duration Time until retirement Time until business failure Time until exercise of a warranty Length of an unemployment.
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Statistics and Econometrics for Business II Fall 2014 Instructor: Maksym Obrizan Lecture notes III # 2. Advanced topics in OLS regression # 3. Working.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point – and link to Multilevel Models (perhaps).
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Issues in Estimation Data Generating Process:
Discrete Choice Modeling William Greene Stern School of Business New York University.
Survival Analysis 1 Always be contented, be grateful, be understanding and be compassionate.
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
Discrete Choice Modeling William Greene Stern School of Business New York University.
1/62: Topic 2.3 – Panel Data Binary Choice Models Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
6. Ordered Choice Models. Ordered Choices Ordered Discrete Outcomes E.g.: Taste test, credit rating, course grade, preference scale Underlying random.
Logistic Regression and Odds Ratios Psych DeShon.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
4. Tobit-Model University of Freiburg WS 2007/2008 Alexander Spermann 1 Tobit-Model.
[Topic 11-Duration Models] 1/ Duration Modeling.
Econometric analysis of CVM surveys. Estimation of WTP The information we have depends on the elicitation format. With the open- ended format it is relatively.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Limited Dependent Variables
Econometrics ITFD Week 8.
Econometric Analysis of Panel Data
Charles University Charles University STAKAN III
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Simple Linear Regression
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Presentation transcript:

Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey

Demographic Applications: Single Spell 1. Time until death 2. Time until retirement 3. Time until first marriage 4. Time until first birth Multiple Spell 1. Time until birth of each child 2. Duration of each spell of employment We will use time until first birth and the timing of subsequent births as an example throughout the presentation.

The variable of interest is: P(t ≤ T t) This is the conditional probability that an individual experiences the event between t and t+n given that she has not experienced the event until that time. Example: The dependent variable is the timing of a first birth. Suppose the discrete time interval is a year and we observe each woman from the beginning of her child bearing years: 0…..1…..2…..3 Consider three cases: Person 1: Has a birth in year 1 (time 0 may be age 12) Person 2: Has a birth in year 2 Person 3: Still has not had a birth at the end of the observation period

Some important notes: 1. Since we are following the woman from the beginning of her child bearing years, we have eliminated the possibility of left censoring (the event occurs before the observation period). 2. Left censoring combined with unobserved heterogeneity introduces bias into the estimation results. The correction requires the estimation of an “initial conditions” equation similar to Heckman selection equation which are well known to yield unstable parameter estimates. 3. The third person is right censored. However, right censoring is easily handled as part of the estimation process. 4. As will be seen below, the dependent variable in a discrete time hazard model is dichotomous. Can use probit, logit or complementary log log (cloglog) models. Logit and cloglog are most often used. I use logit since one of the software packages needs logit – results were nearly the same for cloglog in models where software allowed for both (STATA).

The model: Person 1 (birth occurs in the first interval): Which leads to:

Person 2 (No birth in the first year and a birth in the second year): Joint probability is:

Person 3: (No births in the observation period) Estimation: Time 1: 3 observations Time 2: 2 observations Time 3: 1 observation The three sets of coefficients could be estimated in three separate logits for the set of individuals at risk. This is true since there is no unobserved heterogeneity that links the three time periods together.

Duration dependence This is a concept similar to state dependence in a standard panel data model. Duration dependence occurs when the value of the hazard at any point in time depends on the amount of time that has already elapsed. Relates to the propensity of a state towards self-perpetuation Examples: Mortality – hazard increases with time regardless of the values of the other covariates Unemployment duration – hazard of finding employment may decrease as the length of the unemployment spell increases

Modeling Duration Dependence In our current model, duration dependence is captured by the intercept terms in the equation since they are allowed to differ at each point in time. To see more clearly, assume that the effects of the covariates is the same at each point in time (the β’s are the same in the previous equations). Now define T 1ti =1 if if t=1 and 0 otherwise – with T 2ti and T 3ti defined similarly Then we can write (no constant in the model): Which allows for a very flexible pattern of duration dependence – can be non-linear for example

A less flexible pattern that requires the estimation of fewer parameters is: In our example, we will be examining the birth hazard starting all women at age 10 and so age and duration dependence are not separately identified. A parametric model which allows for non-linear duration dependence is:

Empirical Example Data from Indonesia Family Life Survey. We first examine timing of first birth – women followed from age 10 until first birth. Data set up:

Simple Models (linear and non-linear duration dependence):

Non-parametric Duration Dependence (using duration or age dummies):

Duration Dependence and Unobserved Heterogeneity Review of dynamic panel data model: where we have a time varying error and a persistent error (sometimes referred to as time invariant unobserved heterogeneity) Define: Then:

Alternative model (state dependence): where |α|<1 Now: It is very difficult to distinguish between the models – so we use the hybrid model: A problem is that this model is more difficult to estimate – neither ordinary least squares nor fixed effects methods yield consistent estimators – use maximum likelihood (with initial conditions problem) or instrumental variables.

Return to first example and unobserved heterogeneity (using person 2 as the example): Person 2 (No birth in the first year and a birth in the second year): We can no longer estimate parameters time period by time period – due to selection on unobservables (just as in standard Heckman selectivity model) Joint probability is now:

The unconditional joint probability is: Most commonly used distributional assumption for the unobserved heterogeneity is the normal distribution. The integral is approximated using Hermite point and weights (simply looked up in a table for the normal distribution): K is the number of interpolation points – more accurate to add more but slower (STATA default is 12 – frequently not enough for rare events) Heckman-Singer approach: Do not assume a distribution – directly estimate the points and weights as part of the maximum likelihood estimation process – referred to as the discrete factor approximation.

Identification “it is somewhat heroic to think that we can distinguish between duration dependence and unobserved heterogeneity when we only observe a single cycle for each agent” (Wooldridge – page 705) Example: Model with no censoring estimated by OLS. Can identify both using functional form – but the model parameter estimates are frequently unstable.

Examples Assume normality:

Cannot directly compare the coefficients with and without heterogeneity correction because of possible scale differences for discrete dependent variable models. However, scale effects can be removed if you compare ratios of coefficients: Without unobserved heterogeneity: With Unobserved heterogeneity:

Use Discrete Factor Method

Multiple Spell Discrete Time Hazards Models Model with no unobserved heterogeneity: Allow for M births: With no heterogeneity, estimate M+1 single spell hazards models (or fully interacted model). Results for fully interacted model (m=0,1,2,3,4):

Simple Model with coefficients restricted to be the same (using all available births for all women):

Add unobserved heterogeneity to the model: In order to use STATA, must assume a restrictive form of unobserved heterogeneity for both parametric and non- parametric forms. Parametric: More flexible specification would be: where Σ is m x m

Estimate assuming normally distributed unobserved heterogeneity (restrict coefficients across births):

Use the discrete factor model (restrict coefficients across births):

Normally distributed unobserved heterogeneity (unrestricted coefficients):

Discrete factor model with two points of support (unrestricted coefficients):

Add non-parametric unobserved heterogeneity with three points of support – unrestricted across equations using fortran:

Continued:

Can use likelihood ratio test to compare model without heterogeneity to: 1. Discrete factor model with two points of support using STATA where we have a restricted form of heterogeneity 2. Discrete factor model with three points of support and unrestricted heterogeneity using fortran. Tests sequentially reject the simpler models with p levels close to zero.