Survival analysis with time-varying covariates in SAS

Slides:



Advertisements
Similar presentations
Allison Dunning, M.S. Research Biostatistician
Advertisements

Surviving Survival Analysis
If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
Exploring the Shape of the Dose-Response Function.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Overview of Logistics Regression and its SAS implementation
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Public Health 5415 Biostatistical Methods II Spring 2005 Greg Grandits Class Times Monday10:10am-12:05pm Wednesday10:10am-11:00am.
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Common Problems in Writing Statistical Plan of Clinical Trial Protocol Liying XU CCTER CUHK.
Journal Club Alcohol, Other Drugs, and Health: Current Evidence March–April 2010.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Proportional Hazard Regression Cox Proportional Hazards Modeling (PROC PHREG)
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
Using time-dependent covariates in the Cox model THIS MATERIAL IS NOT REQUIRED FOR YOUR METHODS II EXAM With some examples taken from Fisher and Lin (1999)
Introduction to Survival Analysis Seminar in Statistics 1 Presented by: Stefan Bauer, Stephan Hemri
Journal Club Alcohol and Health: Current Evidence January-February 2005.
Incomplete Block Designs
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
Model Checking in the Proportional Hazard model
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Survival Analysis: From Square One to Square Two
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Measuring Output from Primary Medical Care, with Quality Adjustment Workshop on measuring Education and Health Volume Output OECD, Paris 6-7 June 2007.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
1 Survival Analysis Biomedical Applications Halifax SAS User Group April 29/2011.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Covariance structures in longitudinal analysis Which one to choose?
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Assessing Survival: Cox Proportional Hazards Model
Time-dependent covariates and further remarks on likelihood construction Presenter Li,Yin Nov. 24.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
01/20151 EPI 5344: Survival Analysis in Epidemiology Time varying covariates March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
Applied Epidemiologic Analysis - P8400 Fall 2002
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
1 1 Using Marginal Structural Model to Estimate and Adjust for Causal Effect of Post- discontinuation Chemotherapy on Survival in Cancer Trials Y. Wang,
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
National Hospital Discharge Survey: A Hands-On Workshop Using Public-Use Data Files Michelle N. Podgornik, MPH 2006 Data Users Conference July 11, 2006.
HSRP 734: Advanced Statistical Methods July 31, 2008.
1 THE ROLE OF COVARIATES IN CLINICAL TRIALS ANALYSES Ralph B. D’Agostino, Sr., PhD Boston University FDA ODAC March 13, 2006.
Lecture 12: Cox Proportional Hazards Model
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
Describing the risk of an event and identifying risk factors Caroline Sabin Professor of Medical Statistics and Epidemiology, Research Department of Infection.
Lecture 3: Parametric Survival Modeling
1 Introduction to Modeling Beyond the Basics (Chapter 7)
The parametric g-formula and inverse probability weighting
MPS/MSc in StatisticsAdaptive & Bayesian - Lect 31 Lecture 3 Sequential designs 3.1 A multi-stage (or group sequential) design 3.2 A general setting for.
EPI 5344: Survival Analysis in Epidemiology Week 6 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa 03/2016.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health.
April 18 Intro to survival analysis Le 11.1 – 11.2
Survival Analysis: From Square One to Square Two Yin Bun Cheung, Ph.D. Paul Yip, Ph.D. Readings.
Statistics 262: Intermediate Biostatistics
Multiple logistic regression
Selecting the Right Predictors
Kaplan-Meier survival curves and the log rank test
Presentation transcript:

Survival analysis with time-varying covariates in SAS

The Research Question Are patients who are taking ACE/ARBs medications less likely to progress to chronic atrial fibrillation?

Atrial fibrillation (AF) Atrial fibrillation is an irregularity of the heart’s rhythm Due to chaotic electrical activity in the upper chambers (atria), the atria quiver instead of contracting in an organized manner.  Chronic AF (CAF)  patient in AF for long time period (> 7 days?)

Heart Diagram

ACE/ARB medications Medications that generally affect blood pressure Lowering blood pressure in patients with hypertension is associated with lowering the risk of stroke Here, test if these meds are associated with slower progression to CAF

What does the data look like? CAnadian Registry of Atrial Fibrillation (CARAF) Enrolled patients with first ECG documented AF 3 month follow-up and then yearly follow-ups for 10 years Medical history, medications, AF recurrence Observational prospective data

About the data… Medications taken since last visit recorded Medication use can change year by year Actual dates of start (or stop) of medication is not known At each follow-up visit, record occurrence of chronic AF since last visit Not all patients have had the same amount of follow-up time Patients are censored when structural heart disease develops

Given the data structure and question  Cox model with time-varying covariate Event is progression to CAF Once a patient develops CAF, patient no longer followed ACE/ARB use as time-varying covariate Adjust for gender and age when first confirmed with AF Censor on death and at diagnosis of structural heart disease

Cautions about this type of modeling Careful consideration to choose form of time dependent covariate most current value? lagged value? weighted average of previous values? Individualized predictions not possible Check time sequence of covariate and outcome event is diagnosis of disease, weight as time-varying covariate Fisher, LD. and DY Lin. 1999. Annu. Rev. Public Health. 20:145 – 57.

Cautions about this type of modeling TD covariates of exposure or treatment related to the health of the patient internal vs. external covariates treatment/exposure dependent on intermediate

Data collection ACE/ARB Y1 Y3 Y4 Y5 Y2 BL Q1 CAF Y4

Modelling in SAS using PROC PHREG Data can be set up in two ways Counting process format Single observation format

The Counting Process Format Based on counting processes and martingale theory Data for each patient is given by {N,Y,Z} models the intensity/rate of a point process References for counting process formulation of survival analysis Hosmer, D.W. Jr. and Lemeshow, S. (1999), Applied Survival Analysis: Regression Modeling of Time to Event Data. Toronto:John Wiley & Sons, Inc. Therneau, T. M. and Grambsch, P. M. (2000), Modeling Survival Data: Extending the Cox Model, New York: Springer-Verlag.  

The Counting Process Not only useful for modelling time-dependent covariates, but also Time-dependent strata Staggered entry Multiple events per subject

Counting process format Multiple rows of observation/patient Specify for each observation (start time, end time, indicator for event) At risk interval as: (start time, end time] Indicator for event at the end time

Example of CARAF Data caraf_no confirm_dt event_dt caf_yn sex ace_arb 01KAIJ0087 11/06/1993 09/09/1993 M 06/07/1994 27/06/1995 1 01KEEE0550 22/03/1994 08/08/1994 08/08/1995 17/09/1996 08/04/1997 27/04/1998 26/07/1999

Counting process format caraf_no start_day end_day censor sex prev_acearb 01KAIJ0087 90 390 746 01KEEE0550 139 1 504 910 1113 1497 1952

Beware! Since the counting process format can be used for a variety of data structures, easy to incorrectly structure data, so… CHECK, CHECK, CHECK formatted data to ensure that all possible configurations of (start, stop] intervals are created correctly

Counting process method proc phreg data=count; model (start_day, end_day)*censor(0) = age_confirm sex prev_acearb/rl; run;

Single observation format One record per subject Create an vector of values for time and the time-varying covariate Also, require time to event/censoring and a censor indicator

Single observation data format caraf_no Days2 evt ace0 ace1 ace2 ace3 ace4 ace5 ace6 ace7 ace8 ace9 ace10 01KAIJ0087 746 . 01KEEE0550 1952 1 censor days0 days1 days2 days3 days4 days5 days6 days7 days8 days9 days 10 90 390 3885 3886 3887 3888 3889 3890 3891 3892 139 504 910 1113 1497

Single observation method proc phreg data= single_obs; model days2evt*censor(0)= age_confirm sex prev_acearb/rl; array a_acearb{*} ace0-ace10; array a_time{*} days0-days10; if days2evt <= a_time[1] then do; prev_acearb=a_acearb[1]; end; else if days2evt > a_time[11] then do; prev_acearb=a_acearb[11]; else do i=1 to 10; if a_time[i] < days2evt <= a_time[i+1] then do; prev_acearb=a_acearb[i+1]; end; run;

Caution when formatting the data Check, check and check data format Print out subset of data to see if programming statements choose the correct time-varying covariate value

Results Counting Process Method Single Observation Method 2.475 0.502 1.115 0.7895 0.0713 0.40685 0.10862 1 prev_acearb 1.581 0.534 0.919 0.7603 0.0931 0.27681 -0.08445 sex 1.060 1.011 1.035 0.0041 8.2463 0.01214 0.03486 age_confirm 95% Hazard Ratio Confidence Limits Hazard Ratio Pr > ChiSq Chi-Square Standard Error Parameter Estimate DF Variable Analysis of Maximum Likelihood Estimates Counting Process Method 2.475 0.502 1.115 0.7895 0.0713 0.40685 0.10862 1 prev_acearb 1.581 0.534 0.919 0.7603 0.0931 0.27681 -0.08445 sex 1.060 1.011 1.035 0.0041 8.2463 0.01214 0.03486 age_confirm 95% Hazard Ratio Confidence Limits Hazard Ratio Pr > ChiSq Chi-Square Standard Error Parameter Estimate DF Variable Analysis of Maximum Likelihood Estimates

Model Information – Single Observation Method Data Set WORK.SINGLE_OBS Dependent Variable days2evt Censoring Variable censor Censoring Value(s) Ties Handling BRESLOW Number of Observations Read Number of Observations Used 361 361 Summary of the Number of Event and Censored Values Total Event Censored Percent Censored 361 57 304 84.21

Model Information – Counting Process Method Data Set WORK.COUNT Dependent Variable start_day end_day Censoring Variable censor Censoring Value(s) Ties Handling BRESLOW Number of Observations Read Number of Observations Used 1876 1876 Summary of the Number of Event and Censored Values Total Event Censored Percent Censored 1876 57 1819 96.96

Residual analyses For Single observation format proc phreg data= single_obs; model days2evt*censor(0)= age_confirm sex prev_acearb/rl; array a_acearb{*} ace0-ace10; array a_time{*} days0-days10; . output out=single_res resmart=resmart ressch=resch ressco=ressco; run; WARNING: The OUTPUT data set has no observations due to the presence of time-dependent explanatory variables. NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: The data set WORK.SINGLE_RES has 0 observations and 8 variables. NOTE: PROCEDURE PHREG used (Total process time): real time 0.12 seconds cpu time 0.07 seconds

Residuals - continued Counting Process Format proc phreg data=count; model (start_day, end_day)*censor(0) = age_confirm sex prev_acearb/rl; output out=count_mres resmart=resmart ressch=resch ressco=ressco; run;

Faster computations? Counting process format Single observation format NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: PROCEDURE PHREG used (Total process time): real time 0.07 seconds cpu time 0.06 seconds Single observation format NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: PROCEDURE PHREG used (Total process time): real time 0.10 seconds cpu time 0.07 seconds

Incorrect time interval caraf_no visit confirm_ event_ start_ end_ censor date day 06LAFL0178 q1 15/01/1993 30/04/1993 105 y1 24/03/1993 68 1

Incorrect time interval – cont’d Single observation method NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: PROCEDURE PHREG used (Total process time): real time 0.57 seconds cpu time 0.26 seconds

Summary of the Number of Event and Censored Values Single observation Model Information Data Set WORK.SINGLE_OBS Dependent Variable days2evt Censoring Variable censor Censoring Value(s) Ties Handling BRESLOW 83.98 304 58 362 Percent Censored Censored Event Total Summary of the Number of Event and Censored Values

Incorrect time interval-cont’d Using the counting process method NOTE: 1 observations were deleted due either to missing or invalid values for the time, censoring, frequency or explanatory variables or to invalid operations in generating the values for some of the explanatory variables. NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: PROCEDURE PHREG used (Total process time): real time 0.82 seconds cpu time 0.29 seconds

Summary of the Number of Event and Censored Values Counting process Model Information Data Set WORK.COUNT Dependent Variable start_day end_day Censoring Variable censor Censoring Value(s) Ties Handling BRESLOW 96.97 1822 57 1879 Percent Censored Censored Event Total Summary of the Number of Event and Censored Values

Incorrect time interval If at risk interval has the same start and end time caraf_no confirm_dt event_dt start_day end_day 01CRED0022 11/07/1991 17/01/1992 190 08/07/1992 363 30/08/1994 1146 09/08/1995 1490 01HARD0032 23/08/1991 04/09/1992 378

Incorrect time interval In the log file: Counting process – note identifies # observations excluded and analyses continues without problems Single observation – no note  analyses continues without problems

Missing mid follow-up Counting Process Single Observation NOTE: 140 observations were deleted due either to missing or invalid values for the time, censoring, frequency or explanatory variables or to invalid operations in generating the values for some of the explanatory variables. NOTE: Convergence criterion (GCONV=1E-8) satisfied. Single Observation NOTE: 63 observations that were events in the input data set are counted as censored observations due to missing or invalid values in the time-dependent explanatory variables. NOTE: Convergence criterion (GCONV=1E-8) satisfied.

Age as fixed or time-varying? Age as time-varying proc phreg data= single_obs; model days2evt*censor(0)= /*age_confirm*/ age_fup sex prev_acearb/rl; array a_acearb{*} ace0-ace10; array a_time{*} days0-days10; age_fup=age_confirm + days2evt; if days2evt <= a_time[1] then do; prev_acearb=a_acearb[1]; end; else if days2evt > a_time[11] then do; prev_acearb=a_acearb[11]; end; else do i=1 to 10; if a_time[i] < days2evt <= a_time[i+1] then do; prev_acearb=a_acearb[i+1]; end; end; run;

Age as time-varying Single Observation Method 2.475 0.502 1.115 0.7895 0.0713 0.40685 0.10862 1 prev_acearb 1.581 0.534 0.919 0.7603 0.0931 0.27681 -0.08445 sex 1.060 1.011 1.035 0.0041 8.2463 0.01214 0.03486 age_confirm 95% Hazard Ratio Confidence Limits Hazard Ratio Pr > ChiSq Chi-Square Standard Error Parameter Estimate DF Variable Analysis of Maximum Likelihood Estimates Analysis of Maximum Likelihood Estimates Variable DF Parameter Estimate Standard Error Chi-Square Pr > ChiSq Hazard Ratio 95% Hazard Ratio Confidence Limits age_fup 1 0.03486 0.01214 8.2463 0.0041 1.035 1.011 1.060 sex -0.08445 0.27681 0.0931 0.7603 0.919 0.534 1.581 prev_acearb 0.10862 0.40685 0.0713 0.7895 1.115 0.502 2.475

Age as time-varying Why are the results the same for either fixed or time-varying age? log hi(t) = a(t) + bsexxi1 + bage_fupxi2(t) + bprev_acearbxi2(t) Recall: xi2(t) = xi2(0) + t Rewrite equation as: log hi(t) = a*(t) + bsexxi1 + bage_fupxi2(0) + bprev_acearbxi2(t)

Comparison between the two methods Parameter estimates are equivalent between methods Counting process format can be used to model other types of analyses Residual analysis possible only with the counting process format Counting process provides notes in log if incorrect sequential time intervals found Easier to check for correct time-varying covariate value in the counting process format Counting process method faster?

Summary Easy enough to carry out survival analysis with time-varying covariates in SAS but requires careful consideration if appropriate to do and how to formulate Two equivalent methods for formatting for this type of analyses some advantages to the counting process method CHECK your data format regardless of choice!

References Allison, PD. (1995), Survival Analysis Using SAS: A Practical Guide. Cary, NC:SAS Institute Inc. Ake, Christopher F. and Arthur L. Carpenter, 2003, "Extending the Use of PROC PHREG in Survival Analysis", Proceedings of the 11th Annual Western Users of SAS Software, Inc. Users Group Conference, Cary, NC: SAS Institute Inc. Fisher, LD. and DY Lin. 1999. Annu. Rev. Public Health. 20:145 – 57. Hosmer, D.W. Jr. and Lemeshow, S. (1999), Applied Survival Analysis: Regression Modeling of Time to Event Data. Toronto:John Wiley & Sons, Inc. Therneau, T. M. and Grambsch, P. M. (2000), Modeling Survival Data: Extending the Cox Model, New York: Springer-Verlag.