EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.

Slides:



Advertisements
Similar presentations
Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed.
Advertisements

Event History Models 1 Sociology 229A: Event History Analysis Class 3
HSRP 734: Advanced Statistical Methods July 24, 2008.
ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
1 Module II Lecture 3: Misspecification: Non-linearities Graduate School Quantitative Research Methods Gwilym Pryce.
Section 4.2 Fitting Curves and Surfaces by Least Squares.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
Event History Analysis 7
Event History Analysis 6
Parametric EHA Models Sociology 229A: Event History Analysis Class 6 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
REGRESSION AND CORRELATION
Review.
Event History Models Sociology 229: Advanced Regression Class 5
1 4. Multiple Regression I ECON 251 Research Methods.
Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 1 Sociology 8811 Lecture 16 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Model Checking in the Proportional Hazard model
8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.
Multiple Regression 1 Sociology 8811 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Event History Models: Cox & Discrete Time Models
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Linear Regression Inference
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
Quantitative Methods Heteroskedasticity.
CORRELATION & REGRESSION
Parametric EHA Models Sociology 229: Advanced Regression Class 6
Multiple Regression 1 Sociology 5811 Lecture 22 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
Chapter 10 Hetero- skedasticity Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Sociology 5811: Lecture 14: ANOVA 2
Multiple Regression 3 Sociology 5811 Lecture 24 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Lecture 12: Cox Proportional Hazards Model
Lecture 16: Regression Diagnostics I Proportional Hazards Assumption -graphical methods -regression methods.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: exercise 4.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Robust Regression. Regression Methods  We are going to look at three approaches to robust regression:  Regression with robust standard errors  Regression.
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Tutorial I: Missing Value Analysis
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Additional Regression techniques Scott Harris October 2009.
SURVIVAL ANALYSIS WITH STATA. DATA INPUT 1) Using the STATA editor 2) Reading STATA (*.dta) files 3) Reading non-STATA format files (e.g. ASCII) - infile.
assignment 7 solutions ► office networks ► super staffing
Stat 31, Section 1, Last Time Sampling Distributions
Survival Analysis: From Square One to Square Two Yin Bun Cheung, Ph.D. Paul Yip, Ph.D. Readings.
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
Event History Analysis 3
CJT 765: Structural Equation Modeling
Biost 513 Discussion Section Week 9
Statistics 262: Intermediate Biostatistics
If we can reduce our desire,
Count Models 2 Sociology 8811 Lecture 13
Presentation transcript:

EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Announcements Class topics: Cox model: examining the baseline hazard –And hazard for various groups in your data Cox model diagnostics (part 1) Discussion of readings

Cox Model: Baseline Hazard Cox models involve a “baseline hazard” Note: baseline = when all covariates are zero Question: What does the baseline hazard look like? –Or baseline survivor & integrated hazard? –Stata can estimate the baseline survivor, hazard, integrated hazard. Two steps: 1. You must ask stata to save the info when you run the Cox model –Ex: stcox gdp degradation education democracy ngo ingo, robust nohr basehc(h0) 2. Use “stcurve” command to plot the baseline curves –Ex: stcurve, hazard OR stcurve, survival

Cox Model: Baseline Hazard Baseline rate: Adoption of environmental law

Cox Model: Baseline Hazard Note: It may not always make sense to plot the baseline hazard Baseline shows hazard when X variables are zero Sometimes zero values aren’t very useful/interesting –Example: Does it make sense to plot hazard of countries adopting laws, if X vars = zero? Hazard rate might be quite low In some cases, you’ll just get a flat zero curve –Or extremely high values –Solutions: 1. Rescale indep vars before running cox model 2. Use stcurve to choose relevant values of vars.

Cox Model: Estimated Hazards You can also use stcurve to plot estimated hazard rates based on values of indep vars Ex: What is hazard curve if democracy = 1, 5, 10? Strategy: use “at” subcommand: stcurve, hazard at(democ=1) at2(democ=10) NOTE: All other variables are pegged at the mean…

Cox: Estimated Hazard Rate Hazard rate for adoption of environmental law

Cox Model Diagnostics Issues that you must deal with: 1. How to estimate results with “ties” in your data –Ties = cases that fail at the exact same time 2. How to identify violations of the proportional hazard assumption 3. Dealing with outliers/influential cases 4. Assessing model fit –Most of this applies to parametric models Ties are not a concern But, additional issues come up: choosing the right functional form (shape) to model the hazard.

Cox Model Issues: Ties How to handle ties in data It is mathematically complex to estimate models when there are tied failures –That is: two cases that have events at the exact same time Several mathematical approaches: –Breslow approximation – simplest approach Stata default, but not the best choice! –Efron approximation – generally better More computationally intensive, but given the power of modern computers it is not an issue stcox var1 var2 var3, efron

Cox Model Issues: Ties –Exact marginal – “continuous time approximation” –Box-Steffensmeier & Jones: “Averaged Likelihood” Assumes ties didn’t happen EXACTLY at the same time… and considers all possible orderings –Exact partial – “discrete” –Box-Steffensmeier & Jones: “exact discrete method” Assumes ties happened EXACTLY at the same time –Advice: Use Efron at a minimum Exact methods are often more accurate –Exact marginal often makes most sense… events rarely occur at the EXACT same time… unless you have discrete data –But, exact methods can take a LONG time. –For big datasets with many ties, Efron is OK.

Proportional Hazard Assumption Key assumption: Proportional hazards Estimated Hazard ratios are proportional over time i.e., Estimates of a hazard ratio do NOT vary over time –Example: Effect of “abstinence” program on sexual behavior Issue: Do abstinence programs lower the rate in a consistent manner across time? –Or, perhaps the rate is lower initially… but then the rate jumps back up (maybe even exceeds the control group). –Groups are assumed to have “parallel” hazards Rather than rates that diverge, converge (or cross).

Proportional Hazard Assumption Strategies: 1. Visually examine raw hazard plots for sub- groups in your data Watch for non-parallel trends A crude method… not the best approach… but often identifies big violations

Proportional Hazard Assumption Visual examination of raw hazard rate You want them to change proportionally If one doubles, so does the other…

Proportional Hazard Assumption 2. Plot –ln(-ln(survival plot)) versus ln(time) across values of X variables What stata calls “stphplot” Parallel lines indicate proportional hazards Again, convergence and divergence (or crossing) indicates violation –A less-common approach: compare observed survivor plot to predicted values (for different values of X) What stata calls “stcoxkm” If observed are similar to predicted, assumption is not likely to be violated.

Proportional Hazard Assumption -ln(-ln(survivor)) vs. ln(time) – “stphplot” Parallel=good Convergence suggests violation of proportional hazard assumption (But, I’ve seen worse!)

Proportional Hazard Assumption Cox estimate vs. observed KM – “stcoxkm” Predicted differs from observed for countries in West

Proportional Hazard Assumption 3. Piecewise Models Piecewise = break model up into pieces (by time) –Ex: Split analysis in to “early” vs “late” time If coefficients vary in different time periods, hazards are not proportional –Example: stcox var1 var2 var3 if _t < 10 stcox var1 var2 var3 if _t >= 10 Look for large changes in coefficients!

Proportional Hazard Assumption In a piecewise model, coefficients would differ in non-proportional models Proportional Non-Proportional Here, the effect is the same in both time periods Early Late Early Late Here, the effect is negative in the early period and positive in the late period

Piecewise Models Look at coefficients at 2 (or more) spans of time EARLY. stcox gdp degradation education democracy ngo ingo if year < 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | LATE. stcox gdp degradation education democracy ngo ingo if year >= 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | Note: Effect of ngo is larger in early period

Proportional Hazard Assumption 4. Tests based on re-estimating model Try including time interactions in your model Recall: Interactions – effect of A on C varies with B If effect of variable X on hazard rate (or ratio) varies with time, then hazards aren’t proportional –Recall example: Abstinence programs Perhaps abstinence programs have a big effect initially, but the effect diminishes (or reverses) later on

Proportional Hazard Assumption Red = Abstinence group; green = control No time interaction Positive time interaction In non-proportional case, the effect of abstinence programs varies across time

Proportional Hazard Assumption Strategy: Create variables that reflect the interaction of X variables with time Significant effects of time interactions indicate non- proportional hazard Fortunately, inclusion of the interaction term in the model corrects the problem. Issue: X variables can interact with time in multiple ways… –Linearly –With “log time” or time squared –With time dummies –You may have to try a range of things…

Proportional Hazard Assumption Red = Abstinence group; green = control Linear time interaction Effect grows consistently over time Try “Abstinence*time” Interaction with time-period… Effect differs early vs. late Try “Abstinence*DLate”

Proportional Hazard Assumption 5. Grambsch & Therneau test –Ex: Stata “ estat phtest” Test for non-zero slope of Schoenfeld residuals vs time –Implies log hazard ratio function = proportional Can be applied to general model, or for each variable stcox gdp degradation education democracy ngo ingo, robust nohr scaledsch(sca*) schoenfeld(sch*). estat phtest Test of proportional hazards assumption Time: Time | chi2 df Prob>chi global test | Significant chi-square indicates violation of proportional hazard assumption

Proportional Hazard Assumption Variable-by-variable test “estat phtest”:. estat phtest, detail Test of proportional hazards assumption Time: Time | rho chi2 df Prob>chi gdp | degradation | education | democracy | ngo | ingo | global test | Note: Certain variables are especially problematic…

Proportional Hazard Assumption Notes on estat phtest : –1. Requires that you calculate “schoenfeld residuals” when you run the original cox model –And, if you want a test for each variable, you must also request scaled schoenfeld residuals –2. Test is based on identifying non-zero time trend… but how should we characterize time? Options: normal/linear time, log time, time dummies, etc –Results may differ depending on your choice –Ex: estat phtest, log – specifies “log time” Plot of smoothed Schoenfeld residuals can indicate best way to characterize time –Linear trend (not a curve) indicates that time is characterized OK –Ex: estat phtest, plot(ngo) OR estat phtest, log plot(ngo)

Proportional Hazard Assumption What if the assumption is violated? 1. Improve model specification Add time interactions to address nonproportionality Ex: If high democracies are not proportional to low democracies, try adding “highdemoc*time” Variables can be interacted with linear time, log time, time dummies, etc., to address the issue 2. Model groups separately Split sample along variables that are non-proportional.

Proportional Hazard Assumption What if the assumption is violated? 3. Use a stratified Cox model Allows a different baseline hazard for each group –But, you can’t estimate effect of stratifying variable! Ex: stcox var1 var2 var3, strata(Dhighdemoc) 4. Use a piecewise model Split time into chunks… in which PH assumption is met –Requires sufficient sample size in all time periods!

Proportional Hazard Assumption What if the assumption is violated? 5. Live with it (but temper your conclusions) Violation of proportional hazard assumption tends to: –Overestimate the effect of variables whose hazard ratios are increasing over time –And, underestimate those whose hazard ratios are decreasing However, Allison points out: Cox model is reasonably robust –Other issues (e.g., model misspecification) are bigger issues