Download presentation
Presentation is loading. Please wait.
1
Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission
2
Announcements Class topic: EHA data structures Later: More details on models, diagnostics Paper Assignment #2 coming soon…
3
Spell Data Background Concepts: In EHA, each line of data may not represent an entire case Rather, it describes a case for some span of time Either a spell, or part of a spell… Often called “multiple-record” data The complete specification of a spell entails: Start and end times of the spell State at the start and end of the spell But, simple analyses don’t always require all this info –For instance, the start state is almost always zero… and thus does not need to be specified.
4
Spell Data: Example 1 Example: Student completion of Grad School Research Question: What attributes of students lead to faster completion of PhDs? Time clock options: Age, duration, historical time, etc Which is best for this research question? Answer: Duration. All we care about is time until graduation Age/historical era may be useful predictors, but aren’t the main substantive concern.
5
Spell Data: Example 1 Independent variables of interest: Science field (vs. humanities/soc sci) (1=yes) Perhaps humanities/soc sci PhD takes longer Married at start of grad school (1=yes) Perhaps family puts pressure to get done with school Years since completion of undergrad Perhaps ‘taking a break’ makes you more focused Note: All variables are constant for the case None are “time-varying” covariates i.e., they don’t change from spell to spell.
6
Spell Data: Example 1 Full specification of spell data: IDstart time end time start state end state scienc e? marr ied? yrs after ugrad 105.6 yrs01110 209 yrs00000 306.8 yrs01002 402.2 yrs00100 507.2 yrs01014
7
Spell Data: Example 1 Notes on example 1: 1. Start time and start state are always zero Typical of duration datasets Often, specification of those variables can be omitted 2. A single line of data (one “record”) was sufficient to describe an entire case Except if an individual got 2 PhD’s in the study 3. Equivalent information could be coded in multiple records per individual Total time is what matters, not # of lines of data.
8
Spell Data: Example 1 The info could be coded in multiple spells: IDstart time end time start state end state scienc e? marri ed? yrs after ugrad 105.6 yrs01110 Can also be represented as: 102 yrs00110 123 yrs00110 135.6 yrs01110
9
Spell Data: Example 1 Advantage of splitting a single spell into many: You can change values of independent variables over time Referred to as “time-varying” covariates Strategies: 1. Create a new record any time a covariate changes 2. Or, if variables change on a regular basis (e.g., yearly, daily, etc), split the records by that unit –Create a “yearly spell file” or “hourly spell file”.
10
Spell Data: Example 2 Research Question #2: How has NSF funding affected the production of PhDs since 1970? Time Clock: Historical Time We are interested in the hazard rate per year for all universities, not the duration for individuals Independent variables of interest: Department funding from NSF (varies yearly) Department size (varies yearly) Individual age, gender, marital status.
11
Spell Data: Example 2 Yearly split-spell historical time data: IDstart time end time start state end state Fundin g $ Dept size Gend er 831971197200800k650 831972197300600k710 831973197400500k690 831974197501900k760 8419881989002.5m1061
12
EHA in STATA Stata requires that you specify the way your data is organized Stata command: stset “Survival Time Setup” No analysis commands will work until this is used Options vary depending on type of data
13
STATA: stset Simple data setup: stset timevar, failure(failvar==1) Selects the end-spell variable to be “timevar” Selects variable defining event (vs. censored) And condition representing an event Assumes the following: Start time is always 0 Start state is always zero Failure = 0 represents censoring Example: stset endtime, failure(endstate==1)
14
STATA: stset options id(varname) Specifies a case ID Tells stata that multiple records belong to a particular case Ex: stset mardate, failure(mar=1) id(caseid) Necessary for datasets in which cases have multiple records: –IDmonthdoseagecured? –112500 –123000 –133001 –2100–2100 –2200–2200 –…–…
15
STATA: stset options origin(expression) Specifies “zero” time: when cases become “at risk” If origin is 1970, then 1971 = 1, 1972=2, etc Also, origin can vary from case to case –e.g., so all cases start at zero when a person gets married Ex: origin(time yearmarried) starts clock at time of marriage Ex: origin(dmarried == 1) starts clock at time at event when dmarried initially becomes 1 –i.e., at end of the first time-span when dmarried became 1
16
STATA: stset options Left censoring: the condition in which cases are unobserved PRIOR to a point in time enter(expression) Specifies when cases come under observation –Even though they have been “at risk” since time = 0 If entry time is not specified, STATA assumes that cases are under observation at all times in dataset Ex: enter(time enteredstudy) starts clock at time = value of variable called “enteredstudy” Ex: enter(underobs == 1) starts clock at time = value of variable called “enteredstudy” NOTE: entry occurs at END time!!! –So case doesn’t really enter until NEXT record in data…
17
STATA: stset options When do subjects leave the analysis? Stata defaults: –When there is no more data –When case has an event (even if there is more data) exit(expression) Specifies when cases exit the analysis Ex: exit(lefthospital == 1) removes cases at time that patients leave the hospital Ex: exit(time yrdivorced) –Also: necessary for multiple case data Overrides Stata defaults – keeps all cases in the analysis, even after first event Ex: exit(time.)
18
STATA: stset options Issue: In multiple-case data, stata assumes there are no gaps If cases end at time 5, 10… –Stata assumes that first spell is 0 – 5, second is 5 – 10 –The second spell begins right when the first ends… But, sometimes you have gaps in your data –Cases leave the analysis or are censored time0(expression) Ex: time0(starttime) Allows you to specify when cases enter the data (rather than relying on statas assumptions) Allows you to indicate that cases entered LATER than end of last spell.
19
EHA Commands: st set It is easy to make mistakes with stset! Diagnostic strategies: –1. look at st set output Stata identifies possible errors –It isn’t always right… but it is a start –2. Use the “stdes” command Generates summary statistics Shows total # of cases, spells, events Plus minimum / maximum entry & exit times –Also: “stvary” command if you have multiple-record data –3. Examine Stata’s time variables…
20
EHA Commands: st set st set creates a series of new variables that stata uses in the analysis: _t0 and _t – define the start and end of a time span _d – defines whether an event occurs at end of span _st – determines if time span should be included in the analysis –Versus being excluded because case hadn’t entered yet, or already exited… –If you are having problems with st set, look at these variables to see what is going wrong… Ex: list yrmarried dmarried _t0 _t _d _st Or just look in the data editor/viewer.
21
Event History Example What factors affect how soon a country passes an environmental protection law? In this analysis it is possible to use time-varying data…aren’t you excited? What is the “state” space? “Law” vs. “no law” What is the “event”? Passing an environmental law in a given year
22
Event History Example What is the “risk set”? Every country that has not passed an environmental protection law What is the duration of interest? Time from countries becoming “at risk” to adoption of law What is an appropriate “time clock”? Option 1: The number of years between independence and when the law was written Option 2: Historical time – based on a origin time in which countries become “at risk”
23
Example: Environmental Laws Cross-national time series dataset of nearly 100 countries Event: when a country writes its first comprehensive environmental law (e.g., EPA) Data taken from various sources Independent variables: GDP, population, democracy, degradation, education, domestic and international NGOs Time duration: analyses are from 1970-1998 In other words, countries enter the “risk set” in 1970, or when they become independent Total sample of 97 countries 73 countries have an event between 1970 and 1998.
24
Time-Varying Data Structure In the previous example, each row of data was a separate survey respondent Because survey respondents were not tracked over multiple years, this data was not “time-varying” In the current example, we have the advantage of time-varying data Each row of data is a country-year Our independent variables may change over time.
25
States, Spells, and Events Example (India): 1010 1970 … 1983 1984 1985 1986 1987 1988 … 1998 Year State Spell #2 Spell #1 Law written
26
States, Spells, and Events Example (Iran): 1010 1970 … 1983 1984 1985 1986 1987 1988 … 1998 Year State Spell #1 No law written as of 1998
27
Time-Varying Data Structure newname2newid3yearlaweventnumstartendssespop INDIA11191978011978197900656941 INDIA11191979011979198000672021 INDIA11191980011980198100687332 INDIA11191981011981198200702821 INDIA11191982011982198300718426 INDIA11191983011983198400734072 INDIA11191984011984198500749677 INDIA11191985011985198600765147 INDIA11191986111986198701781893 INDIA11191987011987198811798680 INDIA11191988011988198911815590 INDIA11191989011989199011832535 INDIA11191990011990199111849515 INDIA11191991011991199211866530 Example: Law written SpellState Population
28
Time-Varying Data Structure newname2newid3yearlaweventnumstartendssespop INDIA11191978011978197900656941 INDIA11191979011979198000672021 INDIA11191980011980198100687332 INDIA11191981011981198200702821 INDIA11191982011982198300718426 INDIA11191983011983198400734072 INDIA11191984011984198500749677 INDIA11191985011985198600765147 INDIA11191986111986198701781893 INDIA11191987011987198811798680 INDIA11191988011988198911815590 INDIA11191989011989199011832535 INDIA11191990011990199111849515 INDIA11191991011991199211866530 Stset command: stset end, failure(es==1) origin(1970) Note: It is common to drop cases that are not at risk (ex: if start state = 1) BUT, it is not necessary if stset is done correctly… Stata drops cases after the event by default
29
Time-Varying Data Structure What if countries pass multiple laws? Called “repeated events 1. start state could be reset to zero 2. We can override the stata default of removing cases after the first event occurs: exit(time.) newname2newid3yearlaweventnumstartendssespop INDIA11191978011978197900656941 INDIA11191979011979198000672021 INDIA11191980011980198100687332 INDIA11191981011981198200702821 INDIA11191982011982198300718426 INDIA11191983011983198400734072 INDIA11191984011984198500749677 INDIA11191985011985198600765147 INDIA11191986111986198701781893 INDIA11191987011987198800798680 INDIA11191988011988198900815590 INDIA11191989011989199000832535 INDIA11191990011990199101849515 INDIA11191991011991199200866530
30
Cumulative Survivor Function
31
Cumulative Survivor Function by Region
32
Cumulative Survivor Function West vs. non-West
33
Smoothed Hazard Function West vs. non-West
34
Constant Rate Model: Example Simple one-variable model comparing west vs. non-west Exponential regression -- log relative-hazard form No. of subjects = 97 Number of obs = 2047 No. of failures = 81 Time at risk = 2047 Wald chi2(1) = 12.10 Log pseudolikelihood = 275.49924 Prob > chi2 = 0.0005 (Std. Err. adjusted for 97 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- west |.6931146.1992638 3.48 0.001.3025648 1.083664 _cons | -3.34054.0807514 -41.37 0.000 -3.49881 -3.18227
35
Constant Rate Model: Example Model with time-varying covariates No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(6) = 94.29 Log pseudolikelihood = 282.11796 Prob > chi2 = 0.0000 (Std. Err. adjusted for 92 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp | -.044568.1842564 -0.24 0.809 -.4057039.3165679 degradation | -.4766958.1044108 -4.57 0.000 -.6813372 -.2720543 education |.0377531.0130314 2.90 0.004.0122121.0632942 democracy |.2295392.0959669 2.39 0.017.0414475.417631 ngo |.4258148.1576803 2.70 0.007.1167671.7348624 ingo |.3114173.365112 0.85 0.394 -.4041891 1.027024 _cons | -4.565513 1.864396 -2.45 0.014 -8.219663 -.9113642
36
Constant Rate Model : Example What if we expect global civil society to have a particularly strong effect in the non-West? Option #1: Create an interaction term No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(8) = 91.25 Log pseudolikelihood = 282.5435 Prob > chi2 = 0.0000 (Std. Err. adjusted for 92 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp | -.0789765.2546507 -0.31 0.756 -.5780827.4201298 degradation | -.4656443.1177774 -3.95 0.000 -.6964838 -.2348047 education |.0425672.0137641 3.09 0.002.01559.0695444 democracy |.2277121.0951693 2.39 0.017.0411836.4142406 ngo |.4069064.1595268 2.55 0.011.0942397.7195732 ingo | -.1326514.6842896 -0.19 0.846 -1.473834 1.208532 nonwest | -3.345421 4.94285 -0.68 0.499 -13.03323 6.342387 ingoXnonwest |.49408.6819827 0.72 0.469 -.8425815 1.830741 _cons | -1.28664 5.692187 -0.23 0.821 -12.44312 9.869841
37
Constant Rate Model : Example What if we expect global civil society to have a particularly strong effect in the non-West? Option #2: Include only non-Western countries in the analysis No. of subjects = 76 Number of obs = 1720 No. of failures = 61 Time at risk = 1720 Wald chi2(6) = 55.26 Log pseudolikelihood = 215.57325 Prob > chi2 = 0.0000 (Std. Err. adjusted for 76 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp |.3521921.3470927 1.01 0.310 -.3280971 1.032481 degradation | -.7326479.2566293 -2.85 0.004 -1.235632 -.2296637 education |.0314009.0193698 1.62 0.105 -.0065633.069365 democracy |.2387203.0935281 2.55 0.011.0554087.422032 ngo |.3604018.1984957 1.82 0.069 -.0286426.7494462 ingo |.5447586.4949746 1.10 0.271 -.4253738 1.514891 _cons | -8.446306 3.872579 -2.18 0.029 -16.03642 -.8561915
38
Constant Rate Model: Example What if we expect global civil society to have a particularly strong effect in developing countries, but only in earlier periods? We can end the analysis in 1990 No. of subjects = 71 Number of obs = 1348 No. of failures = 21 Time at risk = 1348 Wald chi2(6) = 47.11 Log pseudolikelihood = 64.196123 Prob > chi2 = 0.0000 (Std. Err. adjusted for 71 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp |.6710042.8928315 0.75 0.452 -1.078913 2.420922 degradation | -.8862021.6193312 -1.43 0.152 -2.100069.3276647 education |.0106282.0399874 0.27 0.790 -.0677456.0890021 democracy | -.0359767.2079641 -0.17 0.863 -.4435789.3716255 ngo |.2688239.3306838 0.81 0.416 -.3793045.9169522 ingo | 1.933407.5669864 3.41 0.001.8221337 3.044679 _cons | -19.4724 7.021114 -2.77 0.006 -33.23353 -5.711268
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.