Download presentation
Presentation is loading. Please wait.
1
Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission
2
Announcements Paper #1 due on Thursday! Questions? New Topic: Event History Analysis
3
Regression and EHA: Examples Medical Research on Drug Efficacy Question #1: Do patients with larger doses of a drug have lower cholesterol? Approach: OLS Regression If assumptions are met, OLS is appropriate Independent Variable = dosage (“level” of drug) Dependent Variable = cholesterol (“level”)
4
Regression Example: Cholesterol Relationship between level of X and Y is modeled as a linear function: Y = a + bX + e 300 250 200 150 100 0 10 20 30 40 50 60 70 Drug Dosage (mg) Cholesterol Level
5
Example 2: Drug & Mortality Suppose a different question: Does increased drug dosage reduce the incidence of mortality among patients? The dependent variable has a different character 1. Whereas cholesterol is measured as a “level” (continuously), mortality is “discrete” Either the patient lives or they don’t (not a “level”) 2. Also, TIMING is an issue Not just if a patient survives, but how long A drug that extends life is good, even if patients die
6
Logit/Probit Strategies Research strategies to address this problem: 1. Use a non-linear regression model for discrete outcomes: Logit, Probit, etc. Dependent variable is a dummy for patient mortality Look for relationship between dosage and mortality Benefit: Easy. An analog of regression Limitation: Doesn’t take timing into account All patients that die have the same influence on the model (whether they live 5 days or 20 years due to the drug dosage).
7
Logit/Probit Strategy: Visual Relationship between level of X and the discrete variable Y is modeled as a non- linear function Yes No 0 10 20 30 40 50 60 70 Drug Dosage (mg) Mortality
8
Drug & Mortality: OLS Regression Option #2: Use OLS regression to model the time elapsed (duration) until mortality –Rather than ask “did they live or die” (logit/probit), you ask “how long did they live”? Compute a variable that reflects the time until mortality (in relevant time units – e.g., months since drug therapy is started) Model time as the dependent variable Observe: Do patients with high drug doses die later than ones with low doses?
9
OLS Duration Strategy: Visual Q: Where do you put individuals who were alive at the end of the study? 80 60 40 20 0 0 10 20 30 40 50 60 70 Drug Dosage (mg) Months Until Mortality
10
Drug & Mortality: OLS Regression Problem #1: What about patients who don’t experience mortality during study? This is called “censored data” If study is 80 months, you know that Y>80… –But, you don’t have an exact value What do you do? –Treat them as experiencing mortality at the very end of the study? Or approximate time of mortality? –Exclude them? NO! That selects on the dependent variable! Possible solution: Use models for censored data –Ex: tobit model.
11
Drug & Mortality: OLS Regression Problem #2: Temporal data often violates normality assumption of OLS regression Often violations are quite bad “Censored” data is a surmountable problem, but normality violation is usually not So – we shouldn’t typically use OLS!
12
Drug and Mortality: EHA Strategy Event History Analysis (EHA) provides purchase on this exact type of problem And others, as well In essence, EHA models a dependent variable that reflects both: –1. Whether or not a patient experiences mortality (like logit), and… –2. When it occurs (like a OLS regression of duration) Note: This information is typically encoded in 2 or more variables
13
Drug and Mortality: EHA Strategy Moreover: EHA is very flexible and can address various situations: 1. EHA can address “repeated” events Mortality can only occur once per patient. But, heart attack can occur repeatedly, at different points in time – further confounding OLS or probit 2. EHA can address different time-clocks Durations could be coded in a number of contexts: From start of study. Age of patient. Historical time. And even more complex issues
14
EHA: Overview and Terminology EHA is referred to as “dynamic” modeling i.e., addresses the timing of outcomes: rates Dependent variable is best conceptualized as a rate of some occurrence Not a “level” or “amount” as in OLS regression Think: “How fast?” “How often?” The “occurrence” may be something that can occur only once for each case: e.g., mortality Or, it may be repeatable: e.g., marriages, strategic alliances.
15
EHA: Overview EHA involves both descriptive and parametric analysis of data Just like regression Scatterplots, partialplots = descriptive OLS model/hypothesis tests = parametric Descriptive analyses/plots Allow description of the overall rate of some outcome For all cases, or for various subgroups Parametric Models Allow hypothesis testing about variables that affect rate (and can include control variables).
16
EHA: Types of Questions Some types of questions EHA can address: 1. Mortality: Does drug dosage reduce rates? Does “rate” decrease with larger doses? Also: control for race, gender, treatment options, etc 2. Life stage transitions: timing of marriage Is rate affected by gender, class, religion? 3. Organizational mortality Is rate affected by size, historical era, competition? 4. Civil war Is rate affected by economic, political factors?
17
EHA Terminology: States & Events EHA has evolved its own terminology: “State” = the “state of being” of a case Conceptualized in terms of discrete phenomena e.g., alive vs. dead “State space” = the set of all possible states Can be complex: Single, married, divorced, widowed “Event” = Occurrence of the outcome of interest Shift from “alive” to “dead”, “single” to “married” Occurs at a specific, known point in time
18
Terminology: Risk & Spells “Risk Set” = the set of all cases capable of experiencing the event e.g., those “at risk” of experiencing mortality Note: the risk set changes over time… “Spell” = A chunk of time that a case experiences, bounded by: events, and/or the start or end of the study As in “I’m gonna sit here for a spell…” EHA is, in essence, an analysis of a set of spells (experienced by a given sample of cases).
19
States, Spells, & Events: Visually If we assign numeric values to states, it is easy to graph cases over time As they experience 1 or more spells Example: drug & mortality study States: Alive = 0 Dead = 1 Time = measured in months Starting at zero, when the study begins Ending at 60 months, when study ends (5 years).
20
States, Spells, & Events: Visually Example of mortality at month 33 1010 0 10 20 30 40 50 60 Time (Months) State Spell #1 Spell #2 Event End of Study Note: It takes 2 spells to describe this case –But, we may only be interested in the first spell. (Because there is no possibility of change after transition to state = 1)
21
States, Spells, & Events: Visually Example of a patient who is cured –Doesn’t experience mortality during study 1010 0 10 20 30 40 50 60 Time (Months) State Spell #1 End of Study Note: Only 1 spell is needed –The spell indicates a consistent state (0), for the period of time in which we have information
22
More Terminology: Censoring Note: In both cases, data runs out after month 60 Even if the patient is still alive In temporal analysis, we rarely have data for all relevant time for all cases “Censored” = indicates the absence of data before or after a certain point in time As in: “data on cases is censored at 60 months” “Right Censored” = no data after a time point “Left Censored” = no data before a time point
23
States, Spells, & Events: Visually A more complex state space: partnership 0 = single, 1 = married, 2 = divorced, 3 = widowed Individual history: Married at 20, divorced at 27, remarried at 33 32103210 16 20 24 28 32 36 40 44 Age (Years) State Spell #1 Right Censored at 45 Spell #4 Spell #2Spell #3
24
Measuring States and Times EHA, in short, is the analysis of spells It takes into account the duration of spells, and whether or not there was a change of state at the end States at start and end of spell are measured by assigning pre-defined values to a variable Much like logit/probit or multinomial logit Times at the start and end of spell must also be measured Time Unit = The time metric in the study e.g., minutes, hours, days, months, years, etc
25
Time Clock Time Clock = time reference of the analysis Possibilities: Duration since start of study Chronological age of case (person, firm, country) Duration since end of last spell i.e., clock is set to zero at start of each spell Historical time – the actual calendar date The choice of time-clock can radically change the analysis and meaning of results It is crucial to choose a clock that makes sense for the hypotheses you wish to test
26
Time Clocks Visually: Age 32103210 16 20 24 28 32 36 40 44 Age (Years) State Spell #1 End of Study Spell #4Spell #2Spell #3 EHA examines rate of transitions as a function of a person’s age
27
Time Clocks Visually: Duration Single from 16-20 (4 years), married from 20-27 (7 years), divorced from 27-33 (6 yrs), remarried at 33-45 (12 yrs) 32103210 0 4 6 12 18 22 Duration (Years) State Spell #1 Spell #4 Spell #2 Spell #3 EHA examines rate of transitions as a function of a person’s duration in their current state
28
Time Clocks: General Advice Different time-clocks have different strengths We’ll discuss this more… Chronological Age = good for processes clearly linked to age Biological things: fertility, mortality Liability of newness Historical time = useful for examining the impact of historical change on ongoing phenomena E.g., effects of changing regulatory regimes on rates of strategic alliances
29
Moving Toward Analyses: Example Example: Employee retention How long after hiring before employees quit? Data: Sample of 12 employees at McDonalds Time-Clock/Time Unit: duration of employment from time of hiring (measured in days) 2 Possible states: Employed & No longer employed We are uninterested in subsequent hires Therefore, we focus on initial spell, ending in quitting.
30
Example: Employee Retention Visually – red line indicates length of employment spell for each case: 0 20 40 60 80 100 120 Time (days) Cases Right Censored
31
Simple EHA Descriptives Question: What simple things can we do to describe this sample of 12 employees? 1. Average duration of employment Only works if all (or nearly all) have quit Many censored cases make “average” meaningless –This is a fairly useful summary statistic Gives a sense of overall speed of events Especially useful when broken down by sub-groups e.g., average by gender or compensation plan.
32
Descriptives: Average Duration Simply calculate the mean time-to-quitting 0 20 40 60 80 100 120 Time (days) Cases Right Censored Average = 33.4 days
33
Simple EHA Descriptives Question: What simple things can we do to describe this sample of 12 employees? 2. Compute “Half Life” of employee tenure Determine time at which attrition equals 50% Also highlights the overall turnover rate Note: Exact value is calculable, even if there are censored cases Again, computing for sub-groups is useful
34
Descriptives: Half Life Determine time when ½ of sample has had event 0 20 40 60 80 100 120 Time (days) Cases Right Censored Half Life = 23 days
35
Simple EHA Descriptives Question: What simple things can we do to describe this sample of 12 employees? 3. Tabulate (or plot) quitters in different time- periods: e.g., 1-20 days, 21-40 days, etc. Absolute numbers of “quitters” or “stayers” –or Numbers of quitters as a proportion of “stayers” Or look at number (or proportion) who have “survived” (i.e., not quit)
36
Descriptives: Tables For each period, determine number or proportion quitting/staying 0 20 40 60 80 100 120 Time (days) Cases Day 1-2020-4040-6060-8080-100
37
EHA Descriptives: Tables Time Range Quitters: Total #, % # staying 1Day 1-205 quit, 42% of all, 42% of remaining 7 left, 58 % of all 2Day 21-402 quit, 16% of all 29% of remaining 5 left, 42% of all 3Day 41-601 quit, 8% of all 20% of remaining 4 left, 33 % of all 4Day 61-801 quit, 8% of all 25% of remaining 3 left, 25% of all
38
EHA Descriptives: Tables Remarks on EHA tables: 1. Results of tables change depending on time-ranges chosen (like a histogram) E.g., comparing 20-day ranges vs. 10-day ranges 2. % quitters vs. % quitters as a proportion of those still employed Absolute % can be misleading since the number of people left in the risk set tends to decrease A low # of quitters can actually correspond to a very high rate of quitting for those remaining in the firm Typically, these ratios are more socially meaningful than raw percentages.
39
EHA Descriptives: Plots We can also plot tabular information:
40
The Survivor Function A more sophisticated version of % remaining Calculated based on continuous time (calculus), rather than based on some arbitrary interval (e.g., day 1-20) Survivor Function – S(t): The probability (at time = t) of not having the event prior to time t. Always equal to 1 at time = 0 (when no events can have happened yet Decreases as more cases experience the event When graphed, it is typically a decreasing curve Looks a lot like % remaining
41
Survivor Function McDonald’s Example: Steep decreases indicate lots of quitting at around 20 days
42
The Hazard Function A more sophisticated version of # events divided by # remaining Hazard Function – h(t) = The probability of an event occurring at a given point in time, given that it hasn’t already occurred Formula: Think of it as: the rate of events occurring for those at risk of experiencing the event
43
The Hazard Function Example: High (and wide) peaks indicate lots of quitting
44
Cumulative Hazard Function Problem: the Hazard Function is often very spiky and hard to read/interpret Alternative #1: “Smooth” the hazard function (using a smoothing algorithm) Alternative #2: The “cumulative” or “integrated” hazard Use calculus to “integrate” the hazard function Recall – An integral represents the area under the curve of another function between 0 and t. Integrated hazard functions always increase (opposite of the survivor function). Big growth indicates that the hazard is high.
45
Integrated Hazard Function Example: Steep increases indicate peaks in hazard rate “Flat” areas indicate low hazard rate
46
Descriptive EHA: Marriage Example: Event = Marriage Time Clock: Person’s Age Data Source: NORC General Social Survey Sample: 29,000 individuals
47
Survivor: Marriage Compare survivor for women, men: Survivor plot for Men (declines later) Survivor plot for Women (declines earlier)
48
Integrated Hazard: Marriage Compare Integrated Hazard for women, men: Integrated Hazard for men increases slower (and remains lower) than women
49
Hazard Plot: Marriage Hazard Rate: Full Sample
50
Survivor Plot: Pros/Cons Benefits: 1. Clear, simple interpretation 2. Useful for comparing subgroups in data Limitations: 1. Mainly useful for a fixed risk set with a single non- repeating event (e.g., Drug trials/mortality) –If events recur frequently, the survivor drops to zero (and becomes uninterpretable) 2. If the risk set fluctuates a lot, the survivor function becomes harder to interpret.
51
Hazard Plot Pros/Cons Benefits: Directly shows the rate over time –This is the actual dependent variable modeled Works well for repeating events Limitations: Can be difficult to interpret – requires practice Spikes make it hard to get a clear picture of trend –Pay close attention to width of spikes, not just height! Choice of smoothing algorithms can affect results Hard to compare groups (due to spikeyness).
52
Integrated Hazard Plot Pros/Cons Benefits: Closely related to the dependent variable that you’ll be modeling Very good for comparing groups Works for repeating events Limitations: Not as intuitive as the actual hazard rate Still takes some practice to interpret.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.