EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.

Slides:



Advertisements
Similar presentations
What is Event History Analysis?
Advertisements

What is Event History Analysis?
Event History Models 1 Sociology 229A: Event History Analysis Class 3
Survival Analysis-1 In Survival Analysis the outcome of interest is time to an event In Survival Analysis the outcome of interest is time to an event The.
ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
EHA: Terminology and basic non-parametric graphs
Duration models Bill Evans 1. timet0t0 t2t2 t 0 initial period t 2 followup period a b c d e f h g i Flow sample.
Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
Event History Analysis 7
Parametric EHA Models Sociology 229A: Event History Analysis Class 6 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Event History Models Sociology 229: Advanced Regression Class 5
Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Main Points to be Covered Cumulative incidence using life table method Difference between cumulative incidence based on proportion of persons at risk and.
Event History Analysis 1 Sociology 8811 Lecture 16 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
BINARY CHOICE MODELS: LOGIT ANALYSIS
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Event History Models: Cox & Discrete Time Models
Lecture 16 Duration analysis: Survivor and hazard function estimation
Copyright © 2005 by Evan Schofer
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
Methods Workshop (3/10/07) Topic: Event Count Models.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Survival Data John Kornak March 29, 2011
HSRP 734: Advanced Statistical Methods July 10, 2008.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Regression models for binary and survival data PD Dr. C.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Multiple Regression 3 Sociology 5811 Lecture 24 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Count Models 1 Sociology 8811 Lecture 12
01/20151 EPI 5344: Survival Analysis in Epidemiology Epi Methods: why does ID involve person-time? March 10, 2015 Dr. N. Birkett, School of Epidemiology,
More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Sociology 5811: Lecture 11: T-Tests for Difference in Means Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Sociology 5811: Lecture 2: Datasets and Simple Descriptive Statistics Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Survival Analysis in Stata First, declare your survival-time variables to Stata using stset For example, suppose your duration variable is called timevar.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
02/20161 EPI 5344: Survival Analysis in Epidemiology Hazard March 8, 2016 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
Algorithm Analysis with Big Oh ©Rick Mercer. Two Searching Algorithms  Objectives  Analyze the efficiency of algorithms  Analyze two classic algorithms.
SURVIVAL ANALYSIS WITH STATA. DATA INPUT 1) Using the STATA editor 2) Reading STATA (*.dta) files 3) Reading non-STATA format files (e.g. ASCII) - infile.
03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health.
DURATION ANALYSIS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 9.
Linear Regression 1 Sociology 5811 Lecture 19 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
assignment 7 solutions ► office networks ► super staffing
Lecture 18 Matched Case Control Studies
QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
Event History Analysis 3
QM222 Class 18 Omitted Variable Bias
QM222 Class 8 Section A1 Using categorical data in regression
Introduction to Logistic Regression
Parametric Survival Models (ch. 7)
Count Models 2 Sociology 8811 Lecture 13
Presentation transcript:

EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Announcements Final paper assignment due next week Questions? Class topics: More on interpreting hazard & cumulative hazard functions More multilevel models…

Hazard Plots: Smoothing Issue: Stata heavily smooths hazard plots “Raw” hazard plots are very spiky… smoothing can help with interpretation Issue: Too much smoothing obscures the detail within your data –Simplest way to control smoothing: Set the “width” of the kernel smoother in Stata EX: sts graph, haz width(3) Lower width = less smoothing; try different values.

Hazard Smoothing Environmental Law Data: Default smoothing

Hazard Smoothing Environmental Law Data: width (1)

Hazard Smoothing Environmental Law Data: width (.2)

Hazard Smoothing Don’t make width too small!: width (.001) Stata’s default smoother amplifies peaks in data if width is too small!

Hazard Smoothing: Remarks Stata default smoothing is quite aggressive Obscures detail in your data –Stata default smoothing “width” is ~4 in this case Smoothing of 1-2 works much better In addition to removing detail, smoothing but lowers the peaks… Highest peak =.1 (width 4) Highest peak =.3 (width.2) Also: REALLY narrow width exaggerates peaks –Hightest peak = 50 (width.0001)

Survival Plot Problem: noorigin Issue: Stata always likes to include t=0…

Survival Plot Problem: noorigin Solution: sts graph, noorigin

Plots: Confidence Intervals Confidence intervals are a good idea Especially useful when comparing groups –Stata sts graph, ci sts graph, haz ci –Issue: Adding CIs tends to compress the Y axis to make room for the confidence bands Makes the hazard look less variable over time Watch for that… –Issue: CIs can make charts “busy” / hard to read.

Hazard Plot with 95% CI

Hazard plot with 95% CI

Survivor plot with 95% CI

Other sts graph options Options to show # of lost, entered, or censored cases Lost: puts a number above plots showing cases lost Atrisk: shows # of cases at risk –Actually, it shows risk per interval –EX: if unit = nation, it shows nation-years in an interval Censored: shows number of cases censored

Sts graph: atrisk

Interpreting Hazard & Cumulative Haz The survivor plot has a clear interpretation: The proportion of cases that have not experienced the event Assuming non-repeated events –If events repeat frequently, survivor falls to 0, stays there… Assuming the risk-set stays more-or-less constant –Survivor never goes back up, even if more cases enter the risk set… But, hazard rates & cumulative hazard rates are harder to understand intuitively… So, I made some illustrative examples

Hazard Example 1 Start with 10 people Let’s put them in the risk set sequentially All cases start at time t=0 One case fails at each point in time Start End Failed?

Example 1: Survivor Plot

Example 1: Hazard Plot Events occur at an even interval… but rate goes up because the risk set dwindles…

Example 1: Integrated Hazard

Example 2 Let’s figure out what’s really going on… Again, start with 10 people Imagine each enters the risk set sequentially, and fails after 1 time unit –So only 1 case at risk in any period of time –And, 1 event per each point in time Start End Failed?

Example 2: Survivor Plot Survivor drops to zero when first case fails… doesn’t go back up when additional cases enter NOT very informative…

Example 2: Hazard Plot Hazard basically sits at 1.0. Variations = due to smoothing issues… That’s because for every time unit at risk there is event

Interpreting Hazards Let’s run an exponential model We’ll estimate the constant only… the baseline hazard. streg, dist(exponential) nohr Exponential regression -- log relative-hazard form No. of subjects = 10 Number of obs = 10 No. of failures = 10 Time at risk = 10 LR chi2(0) = 0.00 Log likelihood = Prob > chi2 = _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] _cons | Why is the base rate zero? Answer: We need to exponentiate! Exp(0) = 1 The model estimates the baseline hazard to be 1.0!

Example 2: Integrated Hazard Integrated Hazard reaches 10 Same number of events as previous example… but less time-at-risk… so overall cumulated risk was higher

Example 3 Let’s keep those same cases but add 10 more Each in risk for 1 time-unit; all of which are censored Start End Failed?

Example 3: Hazard Plot The risk set is doubled, but # events stays the same… So, hazard drops by half… to.5

Interpreting Hazards Let’s run an exponential model We’ll estimate the constant only… the baseline hazard. streg, dist(exponential) nohr Exponential regression -- log relative-hazard form No. of subjects = 20 Number of obs = 20 No. of failures = 10 Time at risk = 20 LR chi2(0) = 0.00 Log likelihood = Prob > chi2 = _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] _cons | Exp(-.693) =.5 The baseline hazard rate is.5…

Example 3: Integrated Hazard Likewise, integrated hazard is only half as big…

Example 4 What about when events occur in clumps? Example: two dense clusters of events –Between times 1-2 and 4-5 Start End Failed?

Example 4: Survivor Plot Here we see the two “clumps” of events…

Example 4: Hazard Plot Second “clump” has much higher hazard because the risk set is much smaller… Default smoothing pretty much wipes out the first clump

Example 4: Hazard Plot, less smoothing Hazard with “width(.3)” Now both clumps of events are clearly visible…

Example 4: Integrated Hazard Note how events with small risk set affect the cumulative hazard more (2 nd clump)…

Interpreting Hazards The hazard rate reflects the rate of events per unit time at risk A constant hazard of.1 for one time-unit means that 10% of at-risk cases will have events –But, things are often more complex than that when hazards are computed in continuous time The rate may vary within the interval depending on how the events are concentrated The risk set may change over the interval… esp. if cases leave the risk set.

Interpreting Integrated Hazards Integrated hazards represent the total amount of risk that has accumulated If the hazard is constant at.1, the integrated hazard would reach 10 after one hundred time-units…