Presentation is loading. Please wait.

Presentation is loading. Please wait.

More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.

Similar presentations


Presentation on theme: "More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission."— Presentation transcript:

1 More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

2 Announcements Assignment #5 due Final paper assignment handed out Due at end of quarter Class topic: AFT models Stratified Models More on residuals, diagnostics Discussion: Empirical Paper

3 Short Paper Assignment New Topic: Organizational mortality among “licensed lenders” A type of credit company regulated by New York state –“Mom & pop” lenders… eventually largely outcompeted by modern banks/credit cards… –Examples: Empire City Personal Loan Company –Founded 1932, Dissolved 1938 American Credit Company »Renamed “Liberty Loan Company” in 1942 –Founded 1902, Dissolved 1964 –Branch office in 1947; dissolved in 1955 –Branch office in 1955; censored in 1965.

4 Short Paper Assignment Licensed lenders dataset –Unit of analysis: Organization Branch offices each have an independent government license, are treated as fully separate organizations –Data structure: Annual data set –Time-series / “Long form”, split-spell data –Outcome of interest: Organizational mortality When the organization dies/dissolves/shuts down –Rudimentary independent variables included…

5 Short Paper Assignment Project goals: –1. Test a series of hypotheses (which I provide) using EHA models –2. Run some simple EHA diagnostics Check proportionality assumption for one X var Check for outliers using residuals –3. Write up results (4-5 pages) Like the methods/results section of a short journal article…

6 Accelerated Failure Time Models We’ve been modeling the hazard rate: h(t) Most parametric approaches build on Cox strategy… An alternative approach: model log time Using parametric approach like exponential or Weibull Focus is time rather than hazard rate: Where last term “e” is assumed to have a distribution that defines the model (e.g., making it Weibull) –Recall: odd distrubution of e is the problem with OLS –What if we introduced a complex parameter here!

7 Accelerated Failure Time Models Cleves et al. 2004: AFT (or “log time) models aren’t actually new kinds of models Rather, they are re-expressing the same models in a different metric… Instead of expressing effects on hazard rate, coefficients reflect effect on log time to event Instead of “hazard ratios” you can compute “time ratios” –Substantive emphasis is on TIME to event This can be desirable… more concrete than haz rates –Issue: coefficients have opposite signs!!! A variable that increases hazard rate will decrease time to event.

8 Proportional Hazard vs. AFT Blossfeld data: Upward employment moves. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(exponential) nohr Exponential regression -- log relative-hazard form No. of subjects = 591 Number of obs = 591 No. of failures = 84 Time at risk = 40161 LR chi2(6) = 131.39 Log likelihood = -253.68509 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- edu |.3020663.0429622 7.03 0.000.2178619.3862708 coho2 |.6366232.2713856 2.35 0.019.1047172 1.168529 coho3 |.7340517.2766077 2.65 0.008.1919105 1.276193 lfx | -.0022632.0020781 -1.09 0.276 -.0063363.0018098 pnoj |.1734636.1003787 1.73 0.084 -.0232751.3702022 pres | -.143771.0142008 -10.12 0.000 -.171604 -.115938 _cons | -5.116249.6197422 -8.26 0.000 -6.330922 -3.901577 ------------------------------------------------------------------------------ Log relative hazard = Proportional hazards model

9 Proportional Hazard vs. AFT Blossfeld data: Upward employment moves. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(exponential) nohr time Exponential regression -- accelerated failure-time form No. of subjects = 591 Number of obs = 591 No. of failures = 84 Time at risk = 40161 LR chi2(6) = 131.39 Log likelihood = -253.68509 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- edu | -.3020663.0429622 -7.03 0.000 -.3862708 -.2178619 coho2 | -.6366232.2713856 -2.35 0.019 -1.168529 -.1047172 coho3 | -.7340517.2766077 -2.65 0.008 -1.276193 -.1919105 lfx |.0022632.0020781 1.09 0.276 -.0018098.0063363 pnoj | -.1734636.1003787 -1.73 0.084 -.3702022.0232751 pres |.143771.0142008 10.12 0.000.115938.171604 _cons | 5.116249.6197422 8.26 0.000 3.901577 6.330922 ------------------------------------------------------------------------------ Streg option “time” specifies AFT form Note that log likelihood and T/Z values are the same. However, all signs are opposite & in a different scale.

10 Proportional vs. AFT metric Weibull models: Here, coefficients differ…. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(weibull) nohr Weibull regression -- log relative-hazard form _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- edu |.3004217.0438282 6.85 0.000.2145201.3863234 coho2 |.6259013.2775622 2.25 0.024.0818895 1.169913 coho3 |.7189294.2886739 2.49 0.013.1531389 1.28472 lfx | -.0022896.0020818 -1.10 0.271 -.0063698.0017906 pnoj |.1719096.1007356 1.71 0.088 -.0255286.3693478 pres | -.1430822.0146639 -9.76 0.000 -.171823 -.1143414 _cons | -5.043614.7361298 -6.85 0.000 -6.486402 -3.600826. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(weibull) nohr time Weibull regression -- accelerated failure-time form _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- edu | -.3048278.046158 -6.60 0.000 -.3952959 -.2143598 coho2 | -.635081.2753596 -2.31 0.021 -1.174776 -.0953861 coho3 | -.7294735.2817224 -2.59 0.010 -1.281639 -.1773078 lfx |.0023232.0021333 1.09 0.276 -.0018581.0065045 pnoj | -.1744309.1019852 -1.71 0.087 -.3743182.0254564 pres |.1451807.0163841 8.86 0.000.1130684.1772929 _cons | 5.117586.6280134 8.15 0.000 3.886702 6.348469

11 Accelerated Failure Time Models Remarks: –1. AFT models are less common, but you’ll run across them occasionally –2. It is important to recognize them… Because coefficient interpretations are opposite! –3. STATA currently offers more parametric options for AFT models Log-logistic and log-normal are only available in AFT These are non-monotonic curves, might be useful… –So, you might consider them if you are having trouble with model fit.

12 Parametric Models & Predictions Parametric models allow prediction of failure times for all cases Whether using proportional hazard or AFT metric –Strategy: run model, then use “predict” command –Issues: 1. You have many prediction options… –“Mean” estimated time; Median estimated time (+ log options) 2. If you have split-spell data, you’ll get a prediction for EACH record in the data –Predictions take into account X variables –As X variables change, predicted time changes, too!

13 Predicted Times Blossfeld job data (upward moves). list id duration event sex time mdtime +----------------------------------------------------+ | id duration event sex time mdtime | |----------------------------------------------------| 1. | 1 427 0 1 130.2342 90.27149 | 2. | 2 45 1 2 192.2021 133.2243 | 3. | 2 33 0 2 5651.612 3917.399 | 4. | 2 219 0 2 5131.651 3556.99 | 14. | 6 25 1 1 205.6662 142.557 | 20. | 7 5 1 2 116.0007 80.40555 | 21. | 7 14 0 2 416.3065 288.5616 | 29. | 10 120 1 1 690.877 478.8794 | 30. | 10 141 1 1 2412.739 1672.383 | 31. | 10 120 0 1 21855.97 15149.41 | 37. | 12 27 1 1 92.27634 63.96109 | 38. | 12 70 0 1 2605.027 1805.667 | 39. | 13 38 0 2 774.3403 536.7318 | 40. | 13 101 0 2 1094.581 758.7059 | 41. | 14 35 0 2 579.2303 401.4919 | 42. | 14 86 0 2 528.3259 366.2076 | 43. | 15 11 0 1 1612.258 1117.532 | 44. | 15 11 1 1 139.5957 96.76038 | Predicted median time is 80 months, actual upward move occurred in 5 months… Model really doesn’t expect this case to have an upward job transition…

14 Parametric Models & Predictions Useful things you can do with predictions: –1. Highlight some examples to give your reader a concrete sense of event timing… –2. Construct predictions that reflect different values of X variables Ex: Run model. Make predictions. Recode Xs. Make further predictions –Example: How would the predicted time-to-event change if case was male, rather than female –Ex: Environmental treaties: What is predicted time to treaty signing if democracy were 10 rather than 1? Vividly illustrates coefficient effects.

15 Residuals – Summary From Cleves et al. (2004) An Introduction to Survival Analysis Using Stata, p. 184: 1. Cox-Snell residuals … are useful for assessing overall model fit 2. Martingale residuals Are useful in determining the functional form of the covariates to be included in the model 3. Schoenfeld residuals (scaled & unscaled), score residuals, and efficient score residuals Are useful for checking & testing the proportional hazard assumption, examining leverage points, and identifying outliers NOTE: A residual is produced for each independent variable… 4. Deviance residuals Are useful fin examining model accuracy and identifying outliers.

16 Cox-Snell Residuals Cox-Snell residuals for case i: Where H(t)-hat is the estimate of the cumulative hazard –Based on model results B-hats are estimates from the model Xi are values for each case in your data –Interpretation: “The expected number of events in a given time-interval” –Box-Steffensmeier & Jones 2004.

17 Cox-Snell residuals: Model Fit Cox-Snell residuals can be plotted to assess model fit If model fits well, graph of integrated (cumulative) hazard conditional on Cox-Snell residuals vs. Cox-Snell residuals will fall on a line –Strategy in stata: Run Cox model, request martingale residuals Use “predict” to compute Cox-Snell residuals Stset your data again, with Cox-Snell as time variable Compute integrated hazard Graph integrated hazard versus residuals.

18 Cox-Snell residuals: Model Fit Cox-Snell residuals can be plotted to assess model fit If model fits well, graph of integrated (cumulative) hazard conditional on Cox-Snell residuals vs. Cox-Snell residuals will fall on a line –Strategy in stata: Run Cox model, request martingale residuals Use “predict” to compute Cox-Snell residuals Stset your data again, with Cox-Snell as time variable Compute integrated hazard Graph integrated hazard versus residuals.

19 Cox-Snell Model Fit Example Cox-Snell Plot for Environmental Law data This looks quite bad. Cumulative hazard should fall on the line… Instead, there is a sizable gap. Note: Don’t worry much about deviations from the line at the right edge of the plot. There are typically few cases there…

20 Martingale Residuals Martingale residuals: More intuitive… Difference between observed event (vs. censored) and expected number of events a case is predicted to have –Based on hazard rate given X vars… Martingale residuals range from –infinity to +1 –Often very skewed –Deviance residuals: Normalized version of martingale residuals.

21 MG Residuals and Functional Form Issue: What functional form of independent variables should you choose? Ex: Should you log your independent variables? –Skewness is one consideration; but you also want to specify the correct relationship between vars… –In OLS regression we can plot X vars versus residuals to identify departures from linearity In EHA, we can do something similar: Estimate Cox model without covariates, save martingale residuals Use “lowess” command to plot mean residuals versus X variables Functional form that is closest to a flat line = best.

22 MG Residuals and Functional Form Stata code: * * Use Martingale Residuals to check functional form * stset tf, fail(des) * Estimate a cox model with NO covariates * -- option "estimate" makes this happen * Plus, create a new variable "mg" containing * Martingale residuals stcox, mgale(mg) estimate * Next, plot residuals versus different transformations * of your X variables (with smoothed mean – lowess) lowess mg lfx lowess mg lfxcubed lowess mg loglfx

23 Martingale Functional Form Example Blossfeld employment termination data Should labor force experience be raw, logged, cubed? Labor force experience is CUBED… Note the SHARP curve near zero… Very non-linear This is really bad.

24 Martingale Functional Form Example Blossfeld employment termination data Should labor force experience be raw, logged, cubed? This is RAW labor force experience Not bad… close to a flat line.

25 Martingale Functional Form Example Blossfeld employment termination data Should labor force experience be raw, logged, cubed? Labor force experience, logged This is the best yet… but not a big difference from raw…

26 Discussion: Empirical Example Soule, Sarah A and Susan Olzak. 2004. “When Do Movements Matter? The Politics of Contingency and the Equal Rights Amendment.” American Sociological Review, Vol. 69, No. 4. (Aug., 2004), pp. 473-497.

27


Download ppt "More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission."

Similar presentations


Ads by Google