More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.

Slides:



Advertisements
Similar presentations
F-tests continued.
Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods
Multilevel Event History Models with Applications to the Analysis of Recurrent Employment Transitions Fiona Steele.
GRA 5917 Public Opinion and Input Politics. Lecture September 16h 2010 Lars C. Monkerud, Department of Public Governance, BI Norwegian School of Management.
Longitudinal Data Analysis for Social Science Researchers Introduction to Panel Models
Lecture 11 (Chapter 9).
Logistic Regression Psy 524 Ainsworth.
Limited Dependent Variables
Multilevel survival models A paper presented to celebrate Murray Aitkin’s 70 th birthday Harvey Goldstein ( also 70 ) Centre for Multilevel Modelling University.
Nguyen Ngoc Anh Nguyen Ha Trang
Models with Discrete Dependent Variables
Multiple Linear Regression Model

1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Generalised linear models
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Generalised linear models Generalised linear model Exponential family Example: Log-linear model - Poisson distribution Example: logistic model- Binomial.
Generalized Linear Models
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Objectives of Multiple Regression
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
Quantitative Methods Heteroskedasticity.
Modelling non-independent random effects in multilevel models William Browne Harvey Goldstein University of Bristol.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Estimating Demand Functions Chapter Objectives of Demand Estimation to determine the relative influence of demand factors to forecast future demand.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Longitudinal Data Analysis Professor Vernon Gayle
HSRP 734: Advanced Statistical Methods June 19, 2008.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point – and link to Multilevel Models (perhaps).
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
10. Basic Regressions with Times Series Data 10.1 The Nature of Time Series Data 10.2 Examples of Time Series Regression Models 10.3 Finite Sample Properties.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
‘Interpreting coefficients from longitudinal models’ Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009.
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Generalized Linear Models (GLMs) and Their Applications.
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Example x y We wish to check for a non zero correlation.
Armando Teixeira-Pinto AcademyHealth, Orlando ‘07 Analysis of Non-commensurate Outcomes.
Logistic Regression and Odds Ratios Psych DeShon.
Nonlinear Logistic Regression of Susceptibility to Windthrow Seminar 7 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.
STATA WORKSHOP
The Probit Model Alexander Spermann University of Freiburg SS 2008.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
F-tests continued.
Statistical Modelling
EHS Lecture 14: Linear and logistic regression, task-based assessment
Logistic Regression APKC – STATS AFAC (2016).
Logistic Regression.
William Greene Stern School of Business New York University
Chow test.
Generalized Linear Models
Statistical Methods For Engineers
Introduction to Econometrics, 5th edition
Presentation transcript:

More complex event history analysis

Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode

Start of Study End of Study 0 t1 t2 t3 0 = Unemployed; 1 = Working 11 UNEMPLOYMENT AND RETURNING TO WORK STUDY 0

Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working 1 UNEMPLOYMENT AND RETURNING TO WORK STUDY Transition = movement from one state to another

Recurrent events are merely outcomes that can take place on a number of occasions. A simple example is unemployment measured month by month. In any given month an individual can either be employed or unemployed. If we had data for a calendar year we would have twelve discrete outcome measures (i.e. one for each month).

Social scientists now routinely employ statistical models for the analysis of discrete data, most notably logistic and log- linear models, in a wide variety of substantive areas. I believe that the adoption of a recurrent events approach is appealing because it is a logical extension of these models.

Consider a binary outcome or two-state event 0 = Event has not occurred 1 = Event has occurred In the cross-sectional situation we are used to modelling this with logistic regression.

0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY – A study for six months

Months obs Constantly unemployed

Months obs Constantly employed

Months obs Employed in month 1 then unemployed

Months obs Unemployed but gets a job in month six

Here we have a binary outcome – so could we simply use logistic regression to model it? Yes and No – We need to think about this issue.

Appropriate Software STATISTICAL ANALYSIS FOR BINARY RECURRENT EVENTS (SABRE) Fits appropriate models for recurrent events. It is like GLIM. It can be downloaded free.

SABRE fits two models that are appropriate to this analysis. Model 1 = Pooled Cross-Sectional Logit Model Think of this as being the same as a logistic regression in any software package.

POOLED CROSS-SECTIONAL LOGIT MODEL x it is a vector of explanatory variables and  is a vector of parameter estimates.

We could fit a pooled cross- sectional model to our recurrent events data. This approach can be regarded as a naïve solution to our data analysis problem.

We need to consider a number of issues….

Months Y 1 Y 2 obs00 Pickle’s tip - In repeated measured analysis we would require something like a ‘paired’ t test rather than an ‘independent’ t test because we can assume that Y 1 and Y 2 are related.

SABRE fits two models that are appropriate to this analysis. Model 2 = Random Effects Model (or logistic mixture model)

Repeated measures data violate an important assumption of conventional regression models. The responses of an individual at different points in time will not be independent of each other. This problem has been overcome by the inclusion of an additional, individual-specific error term.

The random effects model extends the pooled cross-sectional model to include a case-specific random error term to account for residual heterogeneity. For a sequence of outcomes for the i th case, the basic random effects model has the integrated (or marginal likelihood) given by the equation.

Davies and Pickles (1985) have demonstrated that the failure to explicitly model the effects of residual heterogeneity may cause severe bias in parameter estimates. Using longitudinal data the effects of omitted explanatory variables can be overtly accounted for within the statistical model. This greatly improves the accuracy of the estimated effects of the explanatory variables

An example – see Davies, Elias & Penn (1992). A study of wive’s employment status. Y (femp) 0 = wife unemployed 1 = wife employed X 1 (fmune)0 = husband employed 1 = husband unemployed X 2 (fund1) 0 = no child under 1 year 1 = child under 1 year

Results of various models ModelX VarsDevianced.f. Pooled Pooledfmune Pooledfmune + fund Random effects fmune + fund

Deviance = on 1576 residual degrees of freedom dis e Parameter Estimate S. Error ___________________________________________________ int fmune ( 1) E+00 ALIASED [I] fmune ( 2) fund1 ( 1) E+00 ALIASED [I] fund1 ( 2) scale Random effect

Past Behaviour Current Behaviour STATE DEPENDENCE

Unemployed Employed MAY APRIL STATE DEPENDENCE

Months Y 1 Y 2 obs00 Lag Model

ACCOUNTS FOR PREVIOUS OUTCOME ( y t - 1 )

This is called a Lagged model A Lagged model helps to control for a previous outcome (or behaviour).

ModelX VarsDevianced.f. Random effects fmune + fund Drop yfmune + fund Lagfmune + fund Results of models – with state dependence

Deviance = on 1420 residual degrees of freedom Deviance decrease = on 1 residual degree of freedom dis e Parameter Estimate S. Error ___________________________________________________ int fmune ( 1) E+00 ALIASED [I] fmune ( 2) fund1 ( 1) E+00 ALIASED [I] fund1 ( 2) lag scale

State dependence can be explored further by the estimation of a a ‘two-state’ MARKOV model.

Unemployed Explanatory Variables Employed Explanatory Variables The Model Provides TWO sets of estimates MAY APRIL

Results of models – with state dependence ModelX VarsDevianced.f. Drop yfmune + fund Lagfmune + fund Markovfmune + fund

Parameter Estimate S. Error ___________________________________________________ Unemployed Women at t-1 _______ int fmune ( 1) E+00 ALIASED [I] fmune ( 2) fund1 ( 1) E+00 ALIASED [I] fund1 ( 2) scale Employed Women at t-1 _______ int fmune ( 1) E+00 ALIASED [I] fmune ( 2) fund1 ( 1) E+00 ALIASED [I] fund1 ( 2) scale E

SABRE – Good Points Fits appropriate models for recurrent events. It is like GLIM. It can be downloaded free. There is a users list. Uses the deviance to compare models (correct likelihood). Fits the Markov model. Fits a range of other models (e.g. loglinear + ordinal). Can do more advance analysis (e.g. Mover/Stayers).

SABRE – Bad Points It is like GLIM – you need to understand a prog. Syntax. Data management and handling are poor. There are few users.

Alternatives to SABRE STATA – Does not fit the full range of models. Multilevel model software – Okay up to a point but check that the likelihood is correct (complicated). No software other than SABRE fits a continuation ratio model (ordinal), Markov model or the mover/stayer.