Event History Analysis 3

Slides:



Advertisements
Similar presentations
Event History Models 1 Sociology 229A: Event History Analysis Class 3
Advertisements

Brief introduction on Logistic Regression
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
EHA: Terminology and basic non-parametric graphs
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 7
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Chapter 11 Multiple Regression.
Event History Models Sociology 229: Advanced Regression Class 5
Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 1 Sociology 8811 Lecture 16 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Simple Linear Regression Analysis
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Event History Models: Cox & Discrete Time Models
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Lecture 16 Duration analysis: Survivor and hazard function estimation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Chapter 13: Inference in Regression
Linear Regression Inference
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
HSRP 734: Advanced Statistical Methods July 10, 2008.
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
Assessing Survival: Cox Proportional Hazards Model
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Multiple Regression 3 Sociology 5811 Lecture 24 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Count Models 1 Sociology 8811 Lecture 12
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
HSRP 734: Advanced Statistical Methods July 17, 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Methods of Presenting and Interpreting Information Class 9.
Nonparametric Statistics
QM222 Class 9 Section A1 Coefficient statistics
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
assignment 7 solutions ► office networks ► super staffing
Discussion: Week 4 Phillip Keung.
Lecture 18 Matched Case Control Studies
Statistical Inference for more than two groups
Regression 1 Sociology 8811 Copyright © 2007 by Evan Schofer
Multiple Regression and Model Building
Chapter 25 Comparing Counts.
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
CHAPTER 12 More About Regression
Chapter 26 Comparing Counts.
Count Models 2 Sociology 8811 Lecture 13
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Event History Analysis 3 Sociology 8811 Lecture 17 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Announcements Topic: More Event History Analysis Models, data structures, etc Note: We have fallen a bit behind schedule Lectures = slightly behind the reading assignments I’ll try to catch up But, you can adjust your reading efforts accordingly.

Review: EHA In essence, EHA models a dependent variable that reflects both: 1. Whether or not a patient experiences mortality 2. When it occurs (like a OLS regression of duration Dependent variable is best conceptualized as a rate of some occurrence EHA involves both descriptive and parametric analysis of data

Survivor: Marriage Compare survivor for women, men: Survivor plot for Men (declines later) Survivor plot for Women (declines earlier)

Integrated Hazard: Marriage Compare Integrated Hazard for women, men: Integrated Hazard for men increases slower (and remains lower) than women

Hazard Plot: Marriage Hazard Rate: Full Sample

Hazard Plot: Marriage Smoothed Hazard Rate: Full Sample

From Plots to Tests to Models It appears from the plots that women get married faster than men Issue: How do we test hypotheses about the difference in rates? Can we be confident that the observed difference between men and women is not merely due to sampling variability?

Tests of Equality for Survivor Fns Idea: Conduct a hypothesis test to see if survivor functions differ across groups Like a t-test for difference in means… Example: Log-Rank Test Based on calculating the expected # failures at each point in time if there were no difference between groups Then, compute difference between observed failures and expected value for each group Analogous to a chi-square test of independence for a crosstab.

Log-rank Test Example: Do women marry earlier than men? . sts test sex, logrank failure _d: married == 1 analysis time _t: endtime Log-rank test for equality of survivor functions | Events Events sex | observed expected ------+------------------------- 1 | 10118 12820.67 2 | 13990 11287.33 Total | 24108 24108.00 chi2(1) = 1389.65 Pr>chi2 = 0.0000 Significant Chi-square (p<.05) indicates that survivor plots differ

Tests of Equality for Survivor Fns Stata offers a variety of tests They mainly differ by how they weight cases ex: some place greater weight on early failures Tests available in Stata Log rank, Wilcoxon, Tarone-Ware, Peto-Peto-Prentice See Stata manual “Survivor Analysis & Epidemiological Tables” for advice about which to use In many cases, the results are similar across tests Also: Cox test Based on a different principle Can be used with weighted data (“pweights”).

EHA Models Strategy: Model the hazard rate as a function of covariates Much like regression analysis Determine coefficients The extent to which change in independent variables results in a change in the hazard rate Use information from sample to compute t-values (and p-values) Test hypotheses about coefficients

EHA Models Issue: In standard regression, we must choose a proper “functional form” relating X’s to Y’s OLS is a “linear” model – assumes a liner relationship e.g.: Y = a + b1X1 + b2X2 … + bnXn + e Logistic regression for discrete dependent variables – assumes an ‘S-curve’ relationship between variables When modeling the hazard rate h(t) over time, what relationship should we assume? There are many options: assume a flat hazard, or various S-shaped, U-shaped, or J-shaped curves We’ll discuss details later…

Constant Rate Models The simplest parametric EHA model assumes that the base hazard rate is generally “flat” over time Any observed changes are due to changed covariates Called a “Constant Rate” or “Exponential” model Note: assumption of constant rate isn’t always tenable Formula: Usually rewritten as:

Constant Rate Models Question: Is the constant rate assumption tenable?

Constant Rate Models Question: Is the constant rate assumption tenable? Answer: Probably not The hazard rate goes up and down over time Not constant at all – even if smoothed 2. The change over time isn’t likely the result of changing covariates (X’s) in our model However, if the change was merely the result of some independent variable, then the underlying (unobserved) rate might, in fact, be constant.

Constant Rate Models Let’s run an analysis anyway… Ignore the violation of assumptions regarding the functional form of the hazard rate Recall -- Constant rate model is: In this case, we’ll only specify one X var: DFEMALE – dummy variable indicating women Coefficient reflects difference in hazard rate for women versus men.

Constant Rate Model: Marriage A simple one-variable model comparing gender . streg sex, dist(exponential) nohr No. of subjects = 29269 Number of obs = 29269 No. of failures = 24108 Time at risk = 693938 LR chi2(1) = 213.53 Log likelihood = -30891.849 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Dfemale | .1898716 .0130504 14.55 0.000 .1642933 .2154499 _cons | -3.655465 .0216059 -169.19 0.000 -3.697812 -3.613119 The positive coefficient for DFemale indicates a higher hazard rate for women

Constant Rate Coefficients Interpreting the EHA coefficient: b = .19 Coefficients reflect change in log of the hazard Recall one of the ways to write the formula: But – we aren’t interested in log rates We’re interested in change in the actual rate Solution: Exponentiate the coefficient i.e., use “inverse-log” function on calculator Result reflects the impact on the actual rate.

Constant Rate Coefficients Exponentiate the coefficient to generate the “hazard ratio” Multiplying by the hazard ratio indicates the increase in hazard rate for each unit increase in the independent variable Multiplying by 1.21 results in a 21% increase A hazard ratio of 2.00 = a 100% increase A hazard ratio of .25 = a decreased rate by 75%.

Constant Rate Coefficients The variable FEMALE is a dummy variable Women = 1, Men = 0 Increase from 0 to 1 (men to women) reflects a 21% increase in the hazard rate Continuous measures, however can change by many points (e.g., Firm size, age, etc.) To determine effects of multiple point increases (e.g., firm size of 10 vs. 7) multiply repeatedly Ex: Hazard Ratio = .95, increase = 3 units: .95 x .95 x .95 = .86 – indicating a 14% decrease.

Hypothesis Tests: Marriage Final issue: Is the 21% higher hazard rate for women significantly different than men? Or is the observed difference likely due to chance? Solution: Hazard rate models calculate standard errors for coefficient estimates Allowing calculation of T-values, P-values -------------------------------------------------- _t | Coef. Std. Err. t P>|t| --------+--------------------------------------- Female | .1898716 .0130504 14.55 0.000 _cons | -3.465594 .0099415 -348.60 0.000

Types of EHA Models Two main types of proportional EHA Models 1. Parametric Models specify a functional form of h(t) Constant rate is one example Also: Piecewise Exponential, Gompertz, Weibull,etc. 2. Cox Models Doesn’t specify a particular form for h(t) Each makes assumptions Like OLS assumptions regarding functional form, error variance, normality, etc If assumptions are violated, models can’t be trusted.

Parametric Models These models make assumptions about the overall shape of the hazard rate over time Much like OLS regression assumes a linear relationship between X and Y, logit assumes s-curve Options: constant, Gompertz, Weibull There is a piecewise exponential option, too Note: They also make standard statistical assumptions: Independent random sample Properly specified model, etc, etc…

Cox Models The basic Cox model: Where h(t) is the hazard rate h0(t) is some baseline hazard function (to be inferred from the data) This obviates the need for building a specific functional form into the model bX’s are coefficients and covariates

Cox Model Assumptions Cox Models assume that independent variables don’t interact with time At lease, not in ways you haven’t controlled for i.e., that the hazard rate at different values of X are proportional (parallel) to each other over time Example: Marriage rate – women vs. men Women have a higher rate at all points in time Question: Does the hazard rate for women diverge or converge with men over time? If so, the proportion (or ratio) of the rate changes. The assumption is violated. Use a different model

Cox Model Assumptions: Proportionality: Look for parallel h(t)’s for different sub-groups (values of X’s) h(t) time Good Women Men h(t) Bad Women Men

Cox Model Assumptions: Hazard rates are often too spiky to discern trends Options: 1. Smooth the hazard plots OR 2. Check the integrated hazard rate Look for differences in the overall shape of the curve Note: divergence is OK on an integrated hazard

Cox Model: Example Marriage example: No. of subjects = 29269 Number of obs = 29269 No. of failures = 24108 Time at risk = 693938 LR chi2(1) = 1225.71 Log likelihood = -229548.82 Prob > chi2 = 0.0000 -------------------------------------------------- _t | Coef. Std. Err. z P>|z| --------+----------------------------------------- Female | .4551652 .0131031 34.74 0.000

Reading Discussion Frank, David J., Ann M. Hironaka, and Evan Schofer. 2000. “The Nation State and the Natural Environment, 1900-1995.” American Sociological Review, 65 (Feb): 96-116.