Cox Regression II Kristin Sainani Ph.D. http://www.stanford.edu/~kcobb Stanford University Department of Health Research and Policy Kristin Sainani Ph.D.

Slides:

Advertisements

Similar presentations

Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed.

Advertisements

Surviving Survival Analysis

Three or more categorical variables

Logistic Regression Psy 524 Ainsworth.

Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.

HSRP 734: Advanced Statistical Methods July 24, 2008.

SC968: Panel Data Methods for Sociologists

April 25 Exam April 27 (bring calculator with exp) Cox-Regression

Cox Regression II. Monday “Gut Check” Problem… Write out the likelihood for the following data, with weight as a time-dependent variable: Time-to-event.

PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).

Common Problems in Writing Statistical Plan of Clinical Trial Protocol Liying XU CCTER CUHK.

Clustered or Multilevel Data

Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.

Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics March 2007 Carla Talarico.

An Introduction to Logistic Regression

Today Concepts underlying inferential statistics

Model Checking in the Proportional Hazard model

Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.

Analysis of Complex Survey Data

1 Kaplan-Meier methods and Parametric Regression methods Kristin Sainani Ph.D. Stanford University Department of Health.

SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.

Simple Linear Regression

Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.

1 Survival Analysis Biomedical Applications Halifax SAS User Group April 29/2011.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.

Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.

BPS - 3rd Ed. Chapter 211 Inference for Regression.

1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.

Assessing Survival: Cox Proportional Hazards Model

Time-dependent covariates and further remarks on likelihood construction Presenter Li,Yin Nov. 24.

01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,

University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.

Linear correlation and linear regression + summary of tests

HSRP 734: Advanced Statistical Methods July 17, 2008.

Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.

April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.

HSRP 734: Advanced Statistical Methods July 31, 2008.

Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.

Lecture 9: Analysis of intervention studies Randomized trial - categorical outcome Measures of risk: –incidence rate of an adverse event (death, etc) It.

Three Statistical Issues (1) Observational Study (2) Multiple Comparisons (3) Censoring Definitions.

Lecture 12: Cox Proportional Hazards Model

01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.

Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.

Compliance Original Study Design Randomised Surgical care Medical care.

Love does not come by demanding from others, but it is a self initiation. Survival Analysis.

1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.

1 Modeling change Kristin Sainani Ph.D. Stanford University Department of Health Research and Policy

01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,

Proportional Hazards Model Checking the adequacy of the Cox model: The functional form of a covariate The link function The validity of the proportional.

Nonparametric Statistics

Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.

Additional Regression techniques Scott Harris October 2009.

BPS - 5th Ed. Chapter 231 Inference for Regression.

Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.

1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.

03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health.

03/20161 EPI 5344: Survival Analysis in Epidemiology Testing the Proportional Hazard Assumption April 5, 2016 Dr. N. Birkett, School of Epidemiology, Public.

April 18 Intro to survival analysis Le 11.1 – 11.2

Applied Biostatistics: Lecture 2

Statistics 262: Intermediate Biostatistics

Common Problems in Writing Statistical Plan of Clinical Trial Protocol

Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II

Statistics 262: Intermediate Biostatistics

Improving Overlap Farrokh Alemi, Ph.D.

Love does not come by demanding from others, but it is a self initiation. Survival Analysis.

Presentation transcript:

Cox Regression II Kristin Sainani Ph.D. http://www.stanford.edu/~kcobb Stanford University Department of Health Research and Policy Kristin Sainani Ph.D. http://www.stanford.edu/~kcobb Stanford University Department of Health Research and Policy

Topics Stratification Age as time scale Residuals Repeated events Intention-to-treat analysis for RCTs

1. Stratification Violations of PH assumption can be resolved by: Adding time*covariate interaction Adding other time-dependent version of the covariate Stratification

Stratification Different stratum are allowed to have different baseline hazard functions. Hazard functions do not need to be parallel between different stratum. Essentially results in a “weighted” hazard ratio being estimated: weighted over the different strata. Useful for “nuisance” confounders (where you do not care to estimate the effect). Does not allow you to evaluate interaction or confounding of stratification variable (will miss possible interactions).

Example: stratify on gender Males: 1, 3, 4, 10+, 12, 18 (subjects 1-6) Females: 1, 4, 5, 9+ (subjects 7-10) ♀ ♂

The PL ♀ ♂

2. Using age as the time-scale in Cox Regression Age is a common confounder in Cox Regression, since age is strongly related to death and disease. You may control for age by adding baseline age as a covariate to the Cox model. A better strategy for large-scale longitudinal surveys, such as NHANES, is to use age as your time-scale (rather than time-in-study). You may additionally stratify on birth cohort to control for cohort effects.

Age as time-scale The risk set becomes everyone who was at risk at a certain age rather than at a certain event time. The risk set contains everyone who was still event-free at the age of the person who had the event. Requires enough people at risk at all ages (such as in a large-scale, longitudinal survey).

The likelihood with age as time Event times: 3, 5, 7+, 12, 13+ (years-in-study) Baseline ages: 28, 25, 40, 29, 30 (years) Age at event or censoring: 31, 30, 47+, 41, 43+

3. Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed minus predicted” residual of linear regression

Martingale residual ci-H(ti,Xi,ßi) ci (1 if event, 0 if censored) minus the estimated cumulative hazard to ti (as a function of fitted model) for individual i: ci-H(ti,Xi,ßi) E.g., for a subject who was censored at 2 months, and whose predicted cumulative hazard to 2 months was 20% Martingale=0-.20 = -.20 E.g., for a subject who had an event at 13 months, and whose predicted cumulative hazard to 13 months was 50%: Martingale=1-.50 = +.50 Gives excess failures. Martingale residuals are not symmetrically distributed, even when the fitted model is correctly, so transform to deviance residuals...

Deviance Residuals The deviance residual is a normalized transform of the martingale residual. These residuals are much more symmetrically distributed about zero. Observations with large deviance residuals are poorly predicted by the model.

Deviance Residuals Behave like residuals from ordinary linear regression Should be symmetrically distributed around 0 and have standard deviation of 1.0. Negative for observations with longer than expected observed survival times. Plot deviance residuals against covariates to look for unusual patterns.

Deviance Residuals In SAS, option on the output statement: Output out=outdata resdev=Varname **Cannot get diagnostics in SAS if time-dependent covariate in the model

Example: uis data Pattern looks fairly symmetric around 0. Out of 628 observations, a few in the range of 3-SD is not unexpected Pattern looks fairly symmetric around 0.

Example: uis data What do you think this cluster represents?

Example: censored only

Example: had event only

Schoenfeld residuals Schoenfeld (1982) proposed the first set of residuals for use with Cox regression packages Schoenfeld D. Residuals for the proportional hazards regresssion model. Biometrika, 1982, 69(1):239-241. Instead of a single residual for each individual, there is a separate residual for each individual for each covariate Note: Schoenfeld residuals are not defined for censored individuals.

Schoenfeld residuals The Schoenfeld residual is defined as the covariate value for the individual that failed minus its expected value. (Yields residuals for each individual who failed, for each covariate). Expected value of the covariate at time ti = a weighted-average of the covariate, weighted by the likelihood of failure for each individual in the risk set at ti. The person who died was 56; based on the fitted model, how likely is it that the person who died was 56 rather than older?

Example 5 people left in our risk set at event time=7 months: Female 55-year old smoker Male 45-year old non-smoker Female 67-year old smoker Male 58-year old smoker Male 70-year old non-smoker The 55-year old female smoker is the one who has the event…

Example Based on our model, we can calculate a predicted probability of death by time 7 for each person (call it “p-hat”): Female 55-year old smoker: p-hat=.10 Male 45-year old non-smoker : p-hat=.05 Female 67-year old smoker : p-hat=.30 Male 58-year old smoker : p-hat=.20 Male 70-year old non-smoker : p-hat=.30 Thus, the expected value for the AGE of the person who failed is: 55(.10) + 45 (.05) + 67(.30) + 58 (.20) + 70 (.30)= 60 And, the Schoenfeld residual is: 55-60 = -5

Example Based on our model, we can calculate a predicted probability of death by time 7 for each person (call it “p-hat”): Female 55-year old smoker: p-hat=.10 Male 45-year old non-smoker : p-hat=.05 Female 67-year old smoker : p-hat=.30 Male 58-year old smoker : p-hat=.20 Male 70-year old non-smoker : p-hat=.30 The expected value for the GENDER of the person who failed is: 0(.10) + 1(.05) + 0(.30) + 1 (.20) + 1 (.30)= .55 And, the Schoenfeld residual is: 0-.55 = -.55

Schoenfeld residuals Since the Schoenfeld residuals are, in principle, independent of time, a plot that shows a non-random pattern against time is evidence of violation of the PH assumption. Plot Schoenfeld residuals against time to evaluate PH assumption Regress Schoenfeld residuals against time to test for independence between residuals and time.

Example: no pattern with time

Example: violation of PH

Schoenfeld residuals In SAS: option on the output statement: Output out=outdata ressch= Covariate1 Covariate2 Covariate3

Summary of the many ways to evaluate PH assumption… 1. Examine log(-log(S(t)) plots PH assumption is supported by parallel lines and refuted by lines that cross or nearly cross Must use categorical predictors or categories of a continuous predictor 2. Include interaction with time in the model PH assumption is supported by non-significant interaction coefficient and refuted by significant interaction coefficient Retaining the interaction term in the model corrects for the violation of PH Don’t complicate your model in this way unless it’s absolutely necessary! 3. Plot Schoenfeld residuals PH assumption is supported by a random pattern with time and refuted by a non-random pattern 4. Regress Schoenfeld residuals against time to test for independence between residuals and time. PH assumption is supported by a non-significant relationship between residuals and time, and refuted by a significant relationship

4. Repeated events Death (presumably) can only happen once, but many outcomes could happen twice… Fractures Heart attacks Pregnancy Etc…

Repeated events: 1 Strategy 1: run a second Cox regression (among those who had a first event) starting with first event time as the origin Repeat for third, fourth, fifth, events, etc. Problems: increasingly smaller and smaller sample sizes.

Repeated events: Strategy 2 Treat each interval as a distinct observation, such that someone who had 3 events, for example, gives 3 observations to the dataset Major problem: dependence between the same individual

Strategy 3 Stratify by individual (“fixed effects partial likelihood”) In PROC PHREG: strata id; Problems: does not work well with RCT data requires that most individuals have at least 2 events Can only estimate coefficients for those covariates that vary across successive spells for each individual; this excludes constant personal characteristics such as age, education, gender, ethnicity, genotype

5. Considerations when analyzing data from an RCT…

Intention-to-Treat Analysis Intention-to-treat analysis: compare outcomes according to the groups to which subjects were initially assigned, regardless of which intervention they actually received. Evaluates treatment effectiveness rather than treatment efficacy

Why intention to treat? Non-intention-to-treat analyses lose the benefits of randomization, as the groups may no longer be balanced with regards to factors that influence the outcome. Intention-to-treat analysis simulates “real life,” where patients often don’t adhere perfectly to treatment or may discontinue treatment altogether.

Drop-ins and Drop-outs: example, WHI Both women on placebo and women on active treatment discontinued study medications. Women on treatment “dropped in” to treatment because their doctors took them off study drugs and put them on hormones to insure they were on hormones and not placebo. Women on placebo “dropped in” to treatment because their regular doctors put them on hormones (dogma= “hormones are good”). Note: also the recent diabetes study in a Swedish population, where they reduced heart disease 50% among type II diabetics. Very tough intervention. Multiple drugs. Attempts to quit smoking. Diet and exercise. Not everyone participated fully; for example, smoking cessation was utter failure. If everyone had participated more fully in the regime, there might have even been a stronger effect. Women’s Health Initiative Writing Group. JAMA. 2002;288:321-333.

Effect of Intention to treat on the statistical analysis Intention-to-treat analyses tend to underestimate treatment effects; increased variability “waters down” results.

Example Take the following hypothetical RCT: Treated subjects have a 25% chance of dying during the 2-year study vs. placebo subjects have a 50% chance of dying. TRUE RR= 25%/50% = .50 (treated have 50% less chance of dying) You do a 2-yr RCT of 100 treated and 100 placebo subjects. If nobody switched, you would see about 25 deaths in the treated group and about 50 deaths in the placebo group (give or take a few due to random chance). Observed RR .50

Example, continued BUT, if early in the study, 25 treated subjects switch to placebo and 25 placebo subjects switch to control. You would see about 25*.25 + 75*.50 = 43-44 deaths in the placebo group And about 25*.50 + 75*.25 = 31 deaths in the treated group Observed RR = 31/44  .70 Diluted effect!

References Paul Allison. Survival Analysis Using SAS. SAS Institute Inc., Cary, NC: 2003. See all lectures from this course