Download presentation
Presentation is loading. Please wait.
Published byPhilip Randell Clarke Modified over 9 years ago
1
EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission
2
Announcements Class topics: Cox model: examining the baseline hazard –And hazard for various groups in your data Cox model diagnostics (part 1) Discussion of readings
3
Cox Model: Baseline Hazard Cox models involve a “baseline hazard” Note: baseline = when all covariates are zero Question: What does the baseline hazard look like? –Or baseline survivor & integrated hazard? –Stata can estimate the baseline survivor, hazard, integrated hazard. Two steps: 1. You must ask stata to save the info when you run the Cox model –Ex: stcox gdp degradation education democracy ngo ingo, robust nohr basehc(h0) 2. Use “stcurve” command to plot the baseline curves –Ex: stcurve, hazard OR stcurve, survival
4
Cox Model: Baseline Hazard Baseline rate: Adoption of environmental law
5
Cox Model: Baseline Hazard Note: It may not always make sense to plot the baseline hazard Baseline shows hazard when X variables are zero Sometimes zero values aren’t very useful/interesting –Example: Does it make sense to plot hazard of countries adopting laws, if X vars = zero? Hazard rate might be quite low In some cases, you’ll just get a flat zero curve –Or extremely high values –Solutions: 1. Rescale indep vars before running cox model 2. Use stcurve to choose relevant values of vars.
6
Cox Model: Estimated Hazards You can also use stcurve to plot estimated hazard rates based on values of indep vars Ex: What is hazard curve if democracy = 1, 5, 10? Strategy: use “at” subcommand: stcurve, hazard at(democ=1) at2(democ=10) NOTE: All other variables are pegged at the mean…
7
Cox: Estimated Hazard Rate Hazard rate for adoption of environmental law
8
Cox Model Diagnostics Issues that you must deal with: 1. How to estimate results with “ties” in your data –Ties = cases that fail at the exact same time 2. How to identify violations of the proportional hazard assumption 3. Dealing with outliers/influential cases 4. Assessing model fit –Most of this applies to parametric models Ties are not a concern But, additional issues come up: choosing the right functional form (shape) to model the hazard.
9
Cox Model Issues: Ties How to handle ties in data It is mathematically complex to estimate models when there are tied failures –That is: two cases that have events at the exact same time Several mathematical approaches: –Breslow approximation – simplest approach Stata default, but not the best choice! –Efron approximation – generally better More computationally intensive, but given the power of modern computers it is not an issue stcox var1 var2 var3, efron
10
Cox Model Issues: Ties –Exact marginal – “continuous time approximation” –Box-Steffensmeier & Jones: “Averaged Likelihood” Assumes ties didn’t happen EXACTLY at the same time… and considers all possible orderings –Exact partial – “discrete” –Box-Steffensmeier & Jones: “exact discrete method” Assumes ties happened EXACTLY at the same time –Advice: Use Efron at a minimum Exact methods are often more accurate –Exact marginal often makes most sense… events rarely occur at the EXACT same time… unless you have discrete data –But, exact methods can take a LONG time. –For big datasets with many ties, Efron is OK.
11
Proportional Hazard Assumption Key assumption: Proportional hazards Estimated Hazard ratios are proportional over time i.e., Estimates of a hazard ratio do NOT vary over time –Example: Effect of “abstinence” program on sexual behavior Issue: Do abstinence programs lower the rate in a consistent manner across time? –Or, perhaps the rate is lower initially… but then the rate jumps back up (maybe even exceeds the control group). –Groups are assumed to have “parallel” hazards Rather than rates that diverge, converge (or cross).
12
Proportional Hazard Assumption Strategies: 1. Visually examine raw hazard plots for sub- groups in your data Watch for non-parallel trends A crude method… not the best approach… but often identifies big violations
13
Proportional Hazard Assumption Visual examination of raw hazard rate You want them to change proportionally If one doubles, so does the other…
14
Proportional Hazard Assumption 2. Plot –ln(-ln(survival plot)) versus ln(time) across values of X variables What stata calls “stphplot” Parallel lines indicate proportional hazards Again, convergence and divergence (or crossing) indicates violation –A less-common approach: compare observed survivor plot to predicted values (for different values of X) What stata calls “stcoxkm” If observed are similar to predicted, assumption is not likely to be violated.
15
Proportional Hazard Assumption -ln(-ln(survivor)) vs. ln(time) – “stphplot” Parallel=good Convergence suggests violation of proportional hazard assumption (But, I’ve seen worse!)
16
Proportional Hazard Assumption Cox estimate vs. observed KM – “stcoxkm” Predicted differs from observed for countries in West
17
Proportional Hazard Assumption 3. Piecewise Models Piecewise = break model up into pieces (by time) –Ex: Split analysis in to “early” vs “late” time If coefficients vary in different time periods, hazards are not proportional –Example: stcox var1 var2 var3 if _t < 10 stcox var1 var2 var3 if _t >= 10 Look for large changes in coefficients!
18
Proportional Hazard Assumption In a piecewise model, coefficients would differ in non-proportional models Proportional Non-Proportional Here, the effect is the same in both time periods Early Late Early Late Here, the effect is negative in the early period and positive in the late period
19
Piecewise Models Look at coefficients at 2 (or more) spans of time EARLY. stcox gdp degradation education democracy ngo ingo if year < 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp |.4465818.4255587 1.05 0.294 -.3874979 1.280661 degradation | -.282548.1572746 -1.80 0.072 -.5908005.0257045 education | -.0195118.0328195 -0.59 0.552 -.0838368.0448131 democracy |.2295673.2625205 0.87 0.382 -.2849634.744098 ngo |.6792462.3110294 2.18 0.029.0696399 1.288853 ingo |.6664661.4804229 1.39 0.165 -.2751456 1.608078 ------------------------------------------------------------------------------ LATE. stcox gdp degradation education democracy ngo ingo if year >= 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp |.4963942.357739 1.39 0.165 -.2047613 1.19755 degradation | -.5702894.2395257 -2.38 0.017 -1.039751 -.1008277 education |.0142118.0143762 0.99 0.323 -.0139649.0423886 democracy |.2541799.0981386 2.59 0.010.0618317.4465281 ngo |.1742862.1448187 1.20 0.229 -.1095532.4581256 ingo | -.1134661.2104308 -0.54 0.590 -.5259028.2989707 ------------------------------------------------------------------------------ Note: Effect of ngo is larger in early period
20
Proportional Hazard Assumption 4. Tests based on re-estimating model Try including time interactions in your model Recall: Interactions – effect of A on C varies with B If effect of variable X on hazard rate (or ratio) varies with time, then hazards aren’t proportional –Recall example: Abstinence programs Perhaps abstinence programs have a big effect initially, but the effect diminishes (or reverses) later on
21
Proportional Hazard Assumption Red = Abstinence group; green = control No time interaction Positive time interaction In non-proportional case, the effect of abstinence programs varies across time
22
Proportional Hazard Assumption Strategy: Create variables that reflect the interaction of X variables with time Significant effects of time interactions indicate non- proportional hazard Fortunately, inclusion of the interaction term in the model corrects the problem. Issue: X variables can interact with time in multiple ways… –Linearly –With “log time” or time squared –With time dummies –You may have to try a range of things…
23
Proportional Hazard Assumption Red = Abstinence group; green = control Linear time interaction Effect grows consistently over time Try “Abstinence*time” Interaction with time-period… Effect differs early vs. late Try “Abstinence*DLate”
24
Proportional Hazard Assumption 5. Grambsch & Therneau test –Ex: Stata “ estat phtest” Test for non-zero slope of Schoenfeld residuals vs time –Implies log hazard ratio function = proportional Can be applied to general model, or for each variable stcox gdp degradation education democracy ngo ingo, robust nohr scaledsch(sca*) schoenfeld(sch*). estat phtest Test of proportional hazards assumption Time: Time ---------------------------------------------------------------- | chi2 df Prob>chi2 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ---------------------------------------------------------------- Significant chi-square indicates violation of proportional hazard assumption
25
Proportional Hazard Assumption Variable-by-variable test “estat phtest”:. estat phtest, detail Test of proportional hazards assumption Time: Time ---------------------------------------------------------------- | rho chi2 df Prob>chi2 ------------+--------------------------------------------------- gdp | 0.09035 0.63 1 0.4277 degradation | -0.22735 3.41 1 0.0646 education | 0.06915 0.47 1 0.4950 democracy | -0.04929 0.20 1 0.6560 ngo | -0.18691 4.56 1 0.0327 ingo | -0.03759 0.34 1 0.5609 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ---------------------------------------------------------------- Note: Certain variables are especially problematic…
26
Proportional Hazard Assumption Notes on estat phtest : –1. Requires that you calculate “schoenfeld residuals” when you run the original cox model –And, if you want a test for each variable, you must also request scaled schoenfeld residuals –2. Test is based on identifying non-zero time trend… but how should we characterize time? Options: normal/linear time, log time, time dummies, etc –Results may differ depending on your choice –Ex: estat phtest, log – specifies “log time” Plot of smoothed Schoenfeld residuals can indicate best way to characterize time –Linear trend (not a curve) indicates that time is characterized OK –Ex: estat phtest, plot(ngo) OR estat phtest, log plot(ngo)
27
Proportional Hazard Assumption What if the assumption is violated? 1. Improve model specification Add time interactions to address nonproportionality Ex: If high democracies are not proportional to low democracies, try adding “highdemoc*time” Variables can be interacted with linear time, log time, time dummies, etc., to address the issue 2. Model groups separately Split sample along variables that are non-proportional.
28
Proportional Hazard Assumption What if the assumption is violated? 3. Use a stratified Cox model Allows a different baseline hazard for each group –But, you can’t estimate effect of stratifying variable! Ex: stcox var1 var2 var3, strata(Dhighdemoc) 4. Use a piecewise model Split time into chunks… in which PH assumption is met –Requires sufficient sample size in all time periods!
29
Proportional Hazard Assumption What if the assumption is violated? 5. Live with it (but temper your conclusions) Violation of proportional hazard assumption tends to: –Overestimate the effect of variables whose hazard ratios are increasing over time –And, underestimate those whose hazard ratios are decreasing However, Allison points out: Cox model is reasonably robust –Other issues (e.g., model misspecification) are bigger issues
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.