Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.

Slides:



Advertisements
Similar presentations
Multilevel Event History Modelling of Birth Intervals
Advertisements

Continued Psy 524 Ainsworth
Event History Models 1 Sociology 229A: Event History Analysis Class 3
Brief introduction on Logistic Regression
Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Part 1: The gologit model & gologit2 program.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Duration models Bill Evans 1. timet0t0 t2t2 t 0 initial period t 2 followup period a b c d e f h g i Flow sample.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Multilevel Models 2 Sociology 8811, Class 24
Event History Analysis 7
Event History Analysis 6
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Parametric EHA Models Sociology 229A: Event History Analysis Class 6 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Event History Models Sociology 229: Advanced Regression Class 5
Missing Data.. What do we mean by missing data? Missing observations which were intended to be collected but: –Never collected –Lost accidently –Wrongly.
Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 1 Sociology 8811 Lecture 16 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Event History Models: Cox & Discrete Time Models
Lecture 16 Duration analysis: Survivor and hazard function estimation
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
17. Duration Modeling. Modeling Duration Time until retirement Time until business failure Time until exercise of a warranty Length of an unemployment.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Survival Data John Kornak March 29, 2011
Parametric EHA Models Sociology 229: Advanced Regression Class 6
Multinomial Logit Sociology 8811 Lecture 10
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Count Models 1 Sociology 8811 Lecture 12
HSRP 734: Advanced Statistical Methods July 17, 2008.
More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
Lecture 12: Cox Proportional Hazards Model
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
Logistic Regression Analysis Gerrit Rooks
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
[Topic 11-Duration Models] 1/ Duration Modeling.
Survival time treatment effects
EHS Lecture 14: Linear and logistic regression, task-based assessment
Logistic Regression APKC – STATS AFAC (2016).
Discussion: Week 4 Phillip Keung.
Lecture 18 Matched Case Control Studies
Event History Analysis 3
Introduction to Logistic Regression
Count Models 2 Sociology 8811 Lecture 13
EHA Frailty Models & Heterogeneous Diffusion Models
Presentation transcript:

Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Announcements Assignment 2 due Assignment # handed out Agenda More EHA models Discrete time models More details on Cox models & other fully parametric Proportional Hazard models Break Discussion of paper: Allison and McGinnis

Event History Example What factors affect how soon a country passes an environmental protection law? Event: Passing an environmental law in a given year Risk set: All countries that have not yet passed an environmental protection law –We decided that risk begins at 1970 (when such laws were invented) Countries independent after 1970 are treated as entering the analysis “late” Option #2: Duration since independence (age) –But, that was less appropriate for the research question.

Example: Environmental Laws Cross-national time series dataset of nearly 100 countries Event: when a country writes its first comprehensive environmental law (e.g., EPA) Data taken from various sources Independent variables: GDP, population, democracy, degradation, education, domestic and international NGOs Time duration: analyses are from In other words, countries enter the “risk set” in 1970, or when they become independent Total sample of 97 countries 73 countries have an event between 1970 and 1998.

Time-Varying Data Structure newname2newid3yearlaweventnumstartendssespop INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA Example: Law written SpellState Population

Time-Varying Data Structure newname2newid3yearlaweventnumstartendssespop INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA Stset command: stset end, failure(es==1) time0(start) Note: It is common to drop cases that are not at risk (ex: if start state = 1) BUT, it is not necessary… Stata drops cases after the event by default…unless you specify exit(time.)

Time-Varying Data Structure What if countries pass multiple laws? Called “repeated events 1. start state could be reset to zero 2. We can override the stata default of removing cases after the first event occurs: exit(time.) newname2newid3yearlaweventnumstartendssespop INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA

Smoothed Hazard Function West vs. non-West

EHA Models in Stata Cox Models: stcox indep1 indep2 indep3 Default output shows hazard ratios Useful options: nohr – requests raw coefs (not hazard ratios) vce(robust) – specifies robust standard errors vce(cluster varname) – better SEs for non- independent (clustered) data.

EHA Models in Stata Parametric Models: streg streg ind1 ind2 ind3, dist(exponential) You must specify a functional form (distribution) Ex: Exponential, weibull, gompertz, etc. We’ll discuss choices later Streg shares many options with stcox: nohr vce(robust), vce(cluster)

Constant Rate Model: Example Simple one-variable model comparing west vs. non-west streg west, dist(exponential) nohr Exponential regression -- log relative-hazard form No. of subjects = 97 Number of obs = 2047 No. of failures = 81 Time at risk = 2047 Wald chi2(1) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 97 clusters in newid3) | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] west | _cons |

Constant Rate Model: Example Model with time-varying covariates No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(6) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 92 clusters in newid3) | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | _cons | Democratic countries enact laws at a higher rate than less-democratic countries

Constant Rate Model: Example Same model – with Hazard Ratios No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(6) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 92 clusters in newid3) | Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | A 1-point increase in democracy increases the hazard rate by 25.8%!

Constant Rate Model : Example What if we expect global civil society to have a particularly strong effect in the non-West? Option #1: Create an interaction term No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(8) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 92 clusters in newid3) | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | nonwest | ingoXnonwest | _cons |

Constant Rate Model : Example What if we expect global civil society to have a particularly strong effect in the non-West? Option #2: Include only non-Western countries in the analysis No. of subjects = 76 Number of obs = 1720 No. of failures = 61 Time at risk = 1720 Wald chi2(6) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 76 clusters in newid3) | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | _cons |

Cox Models The basic Cox model: Where h(t) is the hazard rate h 0 (t) is some baseline hazard function (to be inferred from the data) This obviates the need for building a specific functional form into the model Also written as:

Cox Model: Example Mostly similar to exponential model… Cox regression -- Breslow method for ties No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(6) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 92 clusters in newid3) | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | Most effects = similar… though education effect loses significance…

Discrete Time EHA Models Distinction: Continuous vs. Discrete EHA –“Discrete time”: time divided into integer chunks Years, decades, months Spell start & end times are essentially “rounded off” –Continuous time: time conceptualized as an unbroken continuum Times need not be rounded off High levels of precision are possible –Not just integers, but decimals.

Discrete Time EHA Models Issue: Discrete vs. continuous time gives rise to different EHA models Example: The hazard rate is defined for continuous time: The hazard rate over discrete (identical- sized) chunks of time is (t i ):

Discrete Time EHA Models Issue: If the hazard rate in discrete time is a probability, maybe we can model it as such… –Standard options for modeling probabilities: Logistic regression (logit) model Probit model Complementary log/log model (cloglog) –An asymmetric function –Starts slowly from p=0, but accelerates more rapidly toward p=1 at the end –Often used when predicted probabilities are very low or high.

Discrete Time EHA Models Example: Discrete time logit model Where p is the probability of an event (Y=1) for a discrete chunk of time Complementary log log model looks like this:

Discrete Time EHA Models Basic logit/probit/cloglog models are like constant-rate/exponential models They assume a constant baseline hazard, represented by constant in the model Discrete EHA models are are proportional hazard models Logit output reports coefficients and odds ratios… But, it is appropriate to refer to them as hazard ratios Coefficient interpretation is the same Raw coeficientss require exponentiation to interpret…

Discrete Time EHA: Data Discrete time models require split-spell data where each spell has constant length Example: every record in your data represents 1 year Number of cases represents total time at risk –Ex: If caseid 1 has 10 records, it was at risk for 10 years… This differs from continuous models, where records can represent variable amounts of time –E.g., by providing specific start and end times…

Discrete Time EHA Data Discrete time data looks like other examples of split spell data But, each record MUST be the same length –Example: Country data over time: Logit/probit/cloglog simply models outcome of 1 newname2newid3yearlaweventnumstartendssespop INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA Event (Y=1)

Discrete Time Logit Model Logit model for discrete time EHA It is a constant rate model In fact, results are almost the same as streg…. logit es gdp degradation education democracy ngo ingo Logistic regression Number of obs = 1938 LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = es | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | _cons |

Discrete Time and Cox Models A Cox model can also be estimated in the discrete time context Indeed, the discrete time example helps illustrate what a Cox model really is (even in continuous time) –Idea: Use a conditional logit model Conditioned on the cases in the risk set at each point in time … rather than a traditional logit model

Discrete Time and Cox Models A conditional logit model estimates common coefficients across models for many groups Looks at within-group factors, net of overall rate within each group… sorta like a fixed-effects model… –Box-Steffensmeier & Jones, p. 80 Thus, effects are modeled net of the “baseline hazard” –Interpretation: A Cox model is like pooling a large set of logit results In the continuous time context, the group is the current risk set at the time of any failure

Discrete Time and Cox Models A conditional logit model on discrete time EHA yields identical results to a Cox Model; If you specify the “exact partial” method for handling ties in the continuous time Cox model –We’ll cover this later

Discrete Time Cox Model Conditional logit model – a cox model Yields identical results to cox when using discrete data. clogit es gdp degradation education democracy ngo ingo, group(year) Conditional (fixed-effects) logistic regression Number of obs = 1472 LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = es | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo |

Discrete vs. Continuous EHA In practice, we can often use either discrete or continuous methods Even though time is theoretically continuous, our measures are usually limited to discrete time intervals –Ex: year, month, day… For yearly spell data (or any other consistent interval) the data sets are pretty much identical –If time resolution is extremely poor, there can be advantages to using discrete time models –Otherwise, continuous time models provide greater flexibility And more modeling options.

EHA Example In-class group activity: Let’s design a study Outcome of interest: Students dropping a course What is the risk set? How would you set up the data? What are key independent variables? What kind of model would you use? Work in groups of 2-4, and be prepared to discuss your thoughts…

Reading Discussion Long, J. Scott, Paul D. Allison, and Robert McGinnis “Rank Advancement in Academic Careers: Sex Differences and the Effects of Productivity.” American Sociological Review, 58, 5: