More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.

Slides:



Advertisements
Similar presentations
Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed.
Advertisements

Event History Models 1 Sociology 229A: Event History Analysis Class 3
HSRP 734: Advanced Statistical Methods July 24, 2008.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
SC968: Panel Data Methods for Sociologists
Models with Discrete Dependent Variables
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
EHA: Terminology and basic non-parametric graphs
Duration models Bill Evans 1. timet0t0 t2t2 t 0 initial period t 2 followup period a b c d e f h g i Flow sample.
Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
Event History Analysis 7
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Event History Analysis 6
Parametric EHA Models Sociology 229A: Event History Analysis Class 6 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Event History Models Sociology 229: Advanced Regression Class 5
Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 1 Sociology 8811 Lecture 16 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Accelerated Failure Time (AFT) Model As An Alternative to Cox Model
Model Checking in the Proportional Hazard model
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Event History Models: Cox & Discrete Time Models
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Methods Workshop (3/10/07) Topic: Event Count Models.
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
Survival Data John Kornak March 29, 2011
Parametric EHA Models Sociology 229: Advanced Regression Class 6
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Count Models 1 Sociology 8811 Lecture 12
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Lecture 12: Cox Proportional Hazards Model
Survival Analysis in Stata First, declare your survival-time variables to Stata using stset For example, suppose your duration variable is called timevar.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Lecture 3: Parametric Survival Modeling
Robust Regression. Regression Methods  We are going to look at three approaches to robust regression:  Regression with robust standard errors  Regression.
Logistic Regression Analysis Gerrit Rooks
Treat everyone with sincerity,
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Proportional Hazards Model Checking the adequacy of the Cox model: The functional form of a covariate The link function The validity of the proportional.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
SURVIVAL ANALYSIS WITH STATA. DATA INPUT 1) Using the STATA editor 2) Reading STATA (*.dta) files 3) Reading non-STATA format files (e.g. ASCII) - infile.
03/20161 EPI 5344: Survival Analysis in Epidemiology Testing the Proportional Hazard Assumption April 5, 2016 Dr. N. Birkett, School of Epidemiology, Public.
Survival time treatment effects
QM222 Class 11 Section D1 1. Review and Stata: Time series data, multi-category dummies, etc. (chapters 10,11) 2. Capturing nonlinear relationships (Chapter.
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
Logistic Regression APKC – STATS AFAC (2016).
assignment 7 solutions ► office networks ► super staffing
Notes on Logistic Regression
Lecture 18 Matched Case Control Studies
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
Event History Analysis 3
Statistics 262: Intermediate Biostatistics
Parametric Survival Models (ch. 7)
CHAPTER 12 More About Regression
Count Models 2 Sociology 8811 Lecture 13
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
EHA Frailty Models & Heterogeneous Diffusion Models
Treat everyone with sincerity,
Presentation transcript:

More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Announcements Assignment #5 due Final paper assignment handed out Due at end of quarter Class topic: AFT models Stratified Models More on residuals, diagnostics Discussion: Empirical Paper

Short Paper Assignment New Topic: Organizational mortality among “licensed lenders” A type of credit company regulated by New York state –“Mom & pop” lenders… eventually largely outcompeted by modern banks/credit cards… –Examples: Empire City Personal Loan Company –Founded 1932, Dissolved 1938 American Credit Company »Renamed “Liberty Loan Company” in 1942 –Founded 1902, Dissolved 1964 –Branch office in 1947; dissolved in 1955 –Branch office in 1955; censored in 1965.

Short Paper Assignment Licensed lenders dataset –Unit of analysis: Organization Branch offices each have an independent government license, are treated as fully separate organizations –Data structure: Annual data set –Time-series / “Long form”, split-spell data –Outcome of interest: Organizational mortality When the organization dies/dissolves/shuts down –Rudimentary independent variables included…

Short Paper Assignment Project goals: –1. Test a series of hypotheses (which I provide) using EHA models –2. Run some simple EHA diagnostics Check proportionality assumption for one X var Check for outliers using residuals –3. Write up results (4-5 pages) Like the methods/results section of a short journal article…

Accelerated Failure Time Models We’ve been modeling the hazard rate: h(t) Most parametric approaches build on Cox strategy… An alternative approach: model log time Using parametric approach like exponential or Weibull Focus is time rather than hazard rate: Where last term “e” is assumed to have a distribution that defines the model (e.g., making it Weibull) –Recall: odd distrubution of e is the problem with OLS –What if we introduced a complex parameter here!

Accelerated Failure Time Models Cleves et al. 2004: AFT (or “log time) models aren’t actually new kinds of models Rather, they are re-expressing the same models in a different metric… Instead of expressing effects on hazard rate, coefficients reflect effect on log time to event Instead of “hazard ratios” you can compute “time ratios” –Substantive emphasis is on TIME to event This can be desirable… more concrete than haz rates –Issue: coefficients have opposite signs!!! A variable that increases hazard rate will decrease time to event.

Proportional Hazard vs. AFT Blossfeld data: Upward employment moves. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(exponential) nohr Exponential regression -- log relative-hazard form No. of subjects = 591 Number of obs = 591 No. of failures = 84 Time at risk = LR chi2(6) = Log likelihood = Prob > chi2 = _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] edu | coho2 | coho3 | lfx | pnoj | pres | _cons | Log relative hazard = Proportional hazards model

Proportional Hazard vs. AFT Blossfeld data: Upward employment moves. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(exponential) nohr time Exponential regression -- accelerated failure-time form No. of subjects = 591 Number of obs = 591 No. of failures = 84 Time at risk = LR chi2(6) = Log likelihood = Prob > chi2 = _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] edu | coho2 | coho3 | lfx | pnoj | pres | _cons | Streg option “time” specifies AFT form Note that log likelihood and T/Z values are the same. However, all signs are opposite & in a different scale.

Proportional vs. AFT metric Weibull models: Here, coefficients differ…. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(weibull) nohr Weibull regression -- log relative-hazard form _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] edu | coho2 | coho3 | lfx | pnoj | pres | _cons | streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(weibull) nohr time Weibull regression -- accelerated failure-time form _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] edu | coho2 | coho3 | lfx | pnoj | pres | _cons |

Accelerated Failure Time Models Remarks: –1. AFT models are less common, but you’ll run across them occasionally –2. It is important to recognize them… Because coefficient interpretations are opposite! –3. STATA currently offers more parametric options for AFT models Log-logistic and log-normal are only available in AFT These are non-monotonic curves, might be useful… –So, you might consider them if you are having trouble with model fit.

Parametric Models & Predictions Parametric models allow prediction of failure times for all cases Whether using proportional hazard or AFT metric –Strategy: run model, then use “predict” command –Issues: 1. You have many prediction options… –“Mean” estimated time; Median estimated time (+ log options) 2. If you have split-spell data, you’ll get a prediction for EACH record in the data –Predictions take into account X variables –As X variables change, predicted time changes, too!

Predicted Times Blossfeld job data (upward moves). list id duration event sex time mdtime | id duration event sex time mdtime | | | 1. | | 2. | | 3. | | 4. | | 14. | | 20. | | 21. | | 29. | | 30. | | 31. | | 37. | | 38. | | 39. | | 40. | | 41. | | 42. | | 43. | | 44. | | Predicted median time is 80 months, actual upward move occurred in 5 months… Model really doesn’t expect this case to have an upward job transition…

Parametric Models & Predictions Useful things you can do with predictions: –1. Highlight some examples to give your reader a concrete sense of event timing… –2. Construct predictions that reflect different values of X variables Ex: Run model. Make predictions. Recode Xs. Make further predictions –Example: How would the predicted time-to-event change if case was male, rather than female –Ex: Environmental treaties: What is predicted time to treaty signing if democracy were 10 rather than 1? Vividly illustrates coefficient effects.

Residuals – Summary From Cleves et al. (2004) An Introduction to Survival Analysis Using Stata, p. 184: 1. Cox-Snell residuals … are useful for assessing overall model fit 2. Martingale residuals Are useful in determining the functional form of the covariates to be included in the model 3. Schoenfeld residuals (scaled & unscaled), score residuals, and efficient score residuals Are useful for checking & testing the proportional hazard assumption, examining leverage points, and identifying outliers NOTE: A residual is produced for each independent variable… 4. Deviance residuals Are useful fin examining model accuracy and identifying outliers.

Cox-Snell Residuals Cox-Snell residuals for case i: Where H(t)-hat is the estimate of the cumulative hazard –Based on model results B-hats are estimates from the model Xi are values for each case in your data –Interpretation: “The expected number of events in a given time-interval” –Box-Steffensmeier & Jones 2004.

Cox-Snell residuals: Model Fit Cox-Snell residuals can be plotted to assess model fit If model fits well, graph of integrated (cumulative) hazard conditional on Cox-Snell residuals vs. Cox-Snell residuals will fall on a line –Strategy in stata: Run Cox model, request martingale residuals Use “predict” to compute Cox-Snell residuals Stset your data again, with Cox-Snell as time variable Compute integrated hazard Graph integrated hazard versus residuals.

Cox-Snell residuals: Model Fit Cox-Snell residuals can be plotted to assess model fit If model fits well, graph of integrated (cumulative) hazard conditional on Cox-Snell residuals vs. Cox-Snell residuals will fall on a line –Strategy in stata: Run Cox model, request martingale residuals Use “predict” to compute Cox-Snell residuals Stset your data again, with Cox-Snell as time variable Compute integrated hazard Graph integrated hazard versus residuals.

Cox-Snell Model Fit Example Cox-Snell Plot for Environmental Law data This looks quite bad. Cumulative hazard should fall on the line… Instead, there is a sizable gap. Note: Don’t worry much about deviations from the line at the right edge of the plot. There are typically few cases there…

Martingale Residuals Martingale residuals: More intuitive… Difference between observed event (vs. censored) and expected number of events a case is predicted to have –Based on hazard rate given X vars… Martingale residuals range from –infinity to +1 –Often very skewed –Deviance residuals: Normalized version of martingale residuals.

MG Residuals and Functional Form Issue: What functional form of independent variables should you choose? Ex: Should you log your independent variables? –Skewness is one consideration; but you also want to specify the correct relationship between vars… –In OLS regression we can plot X vars versus residuals to identify departures from linearity In EHA, we can do something similar: Estimate Cox model without covariates, save martingale residuals Use “lowess” command to plot mean residuals versus X variables Functional form that is closest to a flat line = best.

MG Residuals and Functional Form Stata code: * * Use Martingale Residuals to check functional form * stset tf, fail(des) * Estimate a cox model with NO covariates * -- option "estimate" makes this happen * Plus, create a new variable "mg" containing * Martingale residuals stcox, mgale(mg) estimate * Next, plot residuals versus different transformations * of your X variables (with smoothed mean – lowess) lowess mg lfx lowess mg lfxcubed lowess mg loglfx

Martingale Functional Form Example Blossfeld employment termination data Should labor force experience be raw, logged, cubed? Labor force experience is CUBED… Note the SHARP curve near zero… Very non-linear This is really bad.

Martingale Functional Form Example Blossfeld employment termination data Should labor force experience be raw, logged, cubed? This is RAW labor force experience Not bad… close to a flat line.

Martingale Functional Form Example Blossfeld employment termination data Should labor force experience be raw, logged, cubed? Labor force experience, logged This is the best yet… but not a big difference from raw…

Discussion: Empirical Example Soule, Sarah A and Susan Olzak “When Do Movements Matter? The Politics of Contingency and the Equal Rights Amendment.” American Sociological Review, Vol. 69, No. 4. (Aug., 2004), pp