Event History Models Sociology 229: Advanced Regression Class 5

Slides:



Advertisements
Similar presentations
Event History Models 1 Sociology 229A: Event History Analysis Class 3
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Copyright © 2010 Pearson Education, Inc. Slide
HSRP 734: Advanced Statistical Methods July 24, 2008.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Econometric Details -- the market model Assume that asset returns are jointly multivariate normal and independently and identically distributed through.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 7
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Parametric EHA Models Sociology 229A: Event History Analysis Class 6 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Chapter 11 Multiple Regression.
Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 1 Sociology 8811 Lecture 16 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Today Concepts underlying inferential statistics
Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Correlation and Regression Analysis
BINARY CHOICE MODELS: LOGIT ANALYSIS
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Simple Linear Regression Analysis
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Event History Models: Cox & Discrete Time Models
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Leedy and Ormrod Ch. 11 Gray Ch. 14
Chapter 13: Inference in Regression
Hypothesis Testing in Linear Regression Analysis
Linear Regression Inference
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
Assessing Survival: Cox Proportional Hazards Model
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Multiple Regression 3 Sociology 5811 Lecture 24 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Count Models 1 Sociology 8811 Lecture 12
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Managerial Economics Demand Estimation & Forecasting.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Lecture 12: Cox Proportional Hazards Model
ANOVA, Regression and Multiple Regression March
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Logistic Regression and Odds Ratios Psych DeShon.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
DURATION ANALYSIS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 9.
Logistic Regression APKC – STATS AFAC (2016).
Lecture 18 Matched Case Control Studies
Event History Analysis 3
Multiple logistic regression
Count Models 2 Sociology 8811 Lecture 13
Presentation transcript:

Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Announcements Assignment 3 due Agenda EHA models Discrete time models More details on Cox models & other fully parametric Proportional Hazard models Break Discussion of paper: Allison and McGinnis

Review Event history analysis focuses attention on rates of events/failures over time Descriptive approaches include: Survivor Plots Hazard plots Integrated / cumulative hazard plots Also, we can conduct non-parametric tests to see if rates differ across groups Example: Log rank test

Hazard Plot: Marriage Smoothed Hazard Rate: Full Sample

EHA Models Strategy: Model the hazard rate as a function of covariates Goal: Estimate coefficients that show impact of independent variables on the hazard rate Also, we can use information from sample to compute t-values (and p-values) Test hypotheses about coefficients.

EHA Models Issue: In standard regression, we must choose a proper “functional form” relating X’s to Y’s OLS is a “linear” model – assumes a liner relationship e.g.: Y = a + b1X1 + b2X2 … + bnXn + e Logistic regression for discrete dependent variables – assumes an ‘S-curve’ relationship between variables When modeling the hazard rate h(t) over time, what relationship should we assume? There are many options: assume a flat hazard, or various S-shaped, U-shaped, or J-shaped curves We’ll discuss details later…

Constant Rate Models The simplest parametric EHA model assumes that the base hazard rate is generally “flat” over time Any observed changes are due to changed covariates Called a “Constant Rate” or “Exponential” model Note: assumption of constant rate isn’t always tenable Formula: Usually rewritten as:

Constant Rate Models Is the constant rate assumption tenable?

Constant Rate Models Question: Is the constant rate assumption tenable? Answer: Harder question than it seems… The hazard rate goes up and down over time Not constant at all – even if smoothed However, if the change was merely the result of independent variables, then the underlying (base) rate might, in fact, be constant If your model doesn’t include variables that account for time variation in h(t), then a constant-rate model isn’t suitable.

Constant Rate Models Let’s run an analysis anyway… Ignore possible violation of assumptions regarding the functional form of h(t) Recall -- Constant rate model is: In this case, we’ll only specify one X var: DFEMALE – dummy variable indicating women Coefficient reflects difference in hazard rate for women versus men.

Constant Rate Model: Marriage A simple one-variable model comparing gender . streg sex, dist(exponential) nohr No. of subjects = 29269 Number of obs = 29269 No. of failures = 24108 Time at risk = 693938 LR chi2(1) = 213.53 Log likelihood = -30891.849 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Dfemale | .1898716 .0130504 14.55 0.000 .1642933 .2154499 _cons | -3.655465 .0216059 -169.19 0.000 -3.697812 -3.613119 The positive coefficient for DFemale indicates a higher hazard rate for women

Constant Rate Coefficients Interpreting the EHA coefficient: b = .19 Coefficients reflect change in log of the hazard Recall one of the ways to write the formula: But – we aren’t interested in log rates We’re interested in change in the actual rate Solution: Exponentiate the coefficient i.e., use “inverse-log” function on calculator Result reflects the impact on the actual rate.

Constant Rate Coefficients Exponentiate the coefficient to generate the “hazard ratio” Multiplying by the hazard ratio indicates the increase in hazard rate for each unit increase in the independent variable Multiplying by 1.21 results in a 21% increase A hazard ratio of 2.00 = a 100% increase A hazard ratio of .25 = a decreased rate by 75%.

Constant Rate Coefficients The variable FEMALE is a dummy variable Women = 1, Men = 0 Increase from 0 to 1 (men to women) reflects a 21% increase in the hazard rate Continuous measures, however can change by many points (e.g., Firm size, age, etc.) To determine effects of multiple point increases (e.g., firm size of 10 vs. 7) multiply repeatedly Ex: Hazard Ratio = .95, increase = 3 units: .95 x .95 x .95 = .86 – indicating a 14% decrease.

Hypothesis Tests: Marriage Final issue: Is the 21% higher hazard rate for women significantly different than men? Or is the observed difference likely due to chance? Solution: Hazard rate models calculate standard errors for coefficient estimates Allowing calculation of T-values, P-values -------------------------------------------------- _t | Coef. Std. Err. t P>|t| --------+--------------------------------------- Female | .1898716 .0130504 14.55 0.000 _cons | -3.465594 .0099415 -348.60 0.000

Types of EHA Models Two main types of proportional EHA Models 1. Parametric Models specify a functional form of h(t) Constant rate; Also: Gompertz, Weibull,etc. 2. Cox Models Also called “semi-parametric” Doesn’t specify a particular form for h(t) Each makes assumptions Like OLS assumptions regarding functional form, error variance, normality, etc If assumptions are violated, results can’t be trusted.

Parametric Models Parametric models make assumptions about the shape of the hazard rate over time Conditional on X Much like OLS regression assumes a linear relationship between X and Y, logit assumes s-curve Options: constant, Gompertz, Weibull There is a piecewise exponential option, too Note: They also make standard statistical assumptions: Independent random sample Properly specified model, etc, etc…

Cox Models The basic Cox model: Where h(t) is the hazard rate h0(t) is some baseline hazard function (to be inferred from the data) This obviates the need for building a specific functional form into the model bX’s are coefficients and covariates

Cox Model: Example Marriage example: No. of subjects = 29269 Number of obs = 29269 No. of failures = 24108 Time at risk = 693938 LR chi2(1) = 1225.71 Log likelihood = -229548.82 Prob > chi2 = 0.0000 -------------------------------------------------- _t | Coef. Std. Err. z P>|z| --------+----------------------------------------- Female | .4551652 .0131031 34.74 0.000

Cox vs. Parametric: Differences Cox Models do not make assumptions about the time-dependence of the hazard rate Cox models focus on time-ordering of observed events ONLY They do not draw information from periods in which no events occur After all, to do this you’d need to make some assumption about what rate you’d expect in that interval… Benefit: One less assumption to be violated Cost: Cox model is less efficient than a properly specified parametric model Standard errors = bigger; more data needed to get statistically significant results.

Cox vs. Parametric: Similarities Models discussed so far are all “proportional hazard” models Assumption: covariates (X’s) raise or lower the hazard rate in a proportional manner across time Ex: If women have higher risk of marriage than men, that elevated risk will be consistent over all time… Another way of putting it: Cox Models assume that independent variables don’t interact with time At least, not in ways you haven’t controlled for i.e., that the hazard rate at different values of X are proportional (parallel) to each other over time

Proportional Hazard Models Proportionality: X variables shift h(t) up or down in a proportional manner h(t) time Proportional Women Men h(t) Not Proportional Women Men

Proportional Hazard Models Issue: Does the hazard rate for women diverge or converge with men over time? If so, the proportion (or ratio) of the rate changes. The proportional hazard assumption is violated Upcoming classes: We’ll discuss how to check the proportional hazard assumption and address violations…

Reading Discussion Hironaka, Ann M.  2005.  “World Patterns in Civil War Duration.”  Chapter 2 in Neverending Wars.  Cambridge, MA:  Harvard University Press. How are the models set up? What were the outcomes? Findings? Empirical Example:  Soule, Sarah A and Susan Olzak.  2004.  “When Do Movements Matter? The Politics of Contingency and the Equal Rights Amendment.”  American Sociological Review, Vol. 69, No. 4. (Aug., 2004), pp. 473-497.