Lecture 13: Cox PHM Part II Basic Cox Model Parameter Estimation Hypothesis Testing.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Survival Analysis In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission) Data are typically subject to censoring.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Brief introduction on Logistic Regression
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Logistic Regression.
Lecture 20 Comparing groups Cox PHM. Comparing two or more samples  Anova type approach where τ is the largest time for which all groups have at least.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Data Analysis Statistics. Inferential statistics.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
Multiple Regression Models
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Inferences About Process Quality
Data Analysis Statistics. Inferential statistics.
Linear and generalised linear models
Today Concepts underlying inferential statistics
Accelerated Failure Time (AFT) Model As An Alternative to Cox Model
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
AS 737 Categorical Data Analysis For Multivariate
Chapter 13: Inference in Regression
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
Assessing Survival: Cox Proportional Hazards Model
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Lecture 19: Competing Risk Regression
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Three Statistical Issues (1) Observational Study (2) Multiple Comparisons (3) Censoring Definitions.
Lecture 12: Cox Proportional Hazards Model
Lecture 16: Regression Diagnostics I Proportional Hazards Assumption -graphical methods -regression methods.
Multiple Logistic Regression STAT E-150 Statistical Methods.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and.
Logistic Regression Analysis Gerrit Rooks
Introduction to Frailty Models
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health.
BIOST 513 Discussion Section - Week 10
Comparing Cox Model with a Surviving Fraction with regular Cox model
CHAPTER 7 Linear Correlation & Regression Methods
Chapter 13 Nonlinear and Multiple Regression
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Presentation transcript:

Lecture 13: Cox PHM Part II Basic Cox Model Parameter Estimation Hypothesis Testing

Recall Basic Cox PHM Model Linear form Proportional hazards because

Likelihood Full likelihood Log-likelihood

Partial Likelihood The partial likelihood is defined as Where – j = 1, 2, …, n – No ties – t 1 < t 2 < … < t D – Z (i)k is the k th covariate associated with the individual whose failure time is t i – R(t i ) is the risk set at time t i

Estimation We can use the log of the likelihood to obtain and MLE for  Maximize log-likelihood to solve for estimates of  Score equations and information matrices are found using standard approaches Solving for estimates can be done numerically (e.g. Newton-Raphson)

Tests of the Model Testing that  k = 0 for all k = 1, 2, …, p Three main tests – Chi-square/ Wald test – Likelihood ratio test – Score test All three have chi-square distribution with p degrees of freedom

Score Equations an Information Matrix MLE found by U h (  ) = 0 using Newton-Raphson (need U h (  ) and I(  ) ). Start with an initial guess for  After i th step of the algorithm the updated guess is Repeat this process until convergence achieved

Developing the Tests In order to develop the tests, we must be able to define – Log-likelihood for the partial likelihood expression – The score equation(s)  U h (  ) – The information matrix  I(  ) A simple example… – 3 observed times: t i = t 1 < t 2 < t 3 – Event indicators:  i = 1, 1, 0 – 1 covariate: z i = z 1, z 2, z 3

Log-likelihood

Score Equation

Information (Matrix)

Wald Test Based on: Wald test:

Score Test Based on: Score test:

Likelihood Ratio Test

Partial Likelihood for Ties? The previous partial likelihood only applies when there are no ties Three primary approaches – Breslow – Efron – Cox When ties are few, all three perform similarly

Breslow (1974) s i is the sum of the vectors Z j over all individuals who die at time t i Consider each of the d i events at a given time as distinct

Efron (1977)  i is the set of individuals who have the event at time t i Closer to the correct partial likelihood score based on a discrete hazard model than Breslow’s

Cox (1972) Based on discrete time, hazard-rate model Assumes logistic model for hazard rate – Let h(t | Z) be the conditional death probability in (t, t + 1) given survival to time t – Assume

This is the “proper” partial likelihood Where – Q i = a d i – tuple of individuals who could have been one of the d i failures at t i – q = {q 1, q 2, …, q di } is one of the elements of Q i –. Cox (1972)- Discrete

Why Not Use Cox Every Time? Consider the computation Denominator of Cox partial likelihood is more complicated Efron and Breslow approaches are less intensive to calculate R can do all three

“Local” Tests Testing individual coefficients But, more interestingly, testing a set of coefficients Examples – Testing treatment variables (3 categories) – Testing extent spread (4 categories) Same as previous – Wald test – Score test – Likelihood ratio test

Wald test

The Wald test is then Where I 11 (b) is the upper q x q submatrix of I(b) This statistics is distributed ~  2 with q degrees of freedom

Score Test Define This is the vector of q scores for, evaluated at The score test is then This statistics is distributed ~  2 with q degrees of freedom

Likelihood Ratio Test Define b 2 (  10 ) be the partial maximum likelihood estimates of  2 based on the model with  1 set to  10 The LRT is Where LL = log-likelihood LRT has ~ chi-square distribution with p degrees of freedom under H 0

Example Study examining impact of cancer stage and age at diagnosis on survival in men with larynx cancer. 90 subjects Outcome = time to death from diagnosis Variables – Age at diagnosis – Stage (I-IV)

Model Consider a model with both age and stage Global Tests (fit using Breslow method) – LRT:18.07, p = – Wald:20.82, p = – Score:24.33, p < VariabledfbetaseWald Testp-valueRR Stage II Stage III Stage IV < Age

Local Test Say we want to test whether or not the beta’s for stage are all 0… H 0 : Necessary info depends on the test we want to use – LRT: need log(L(b)) for “full” and “null” models – Wald: need b 1 and I 11 – Score: need U 1 [  10,b 2 (  10 )] and I 11 (  10,b 2 (  10 ))

Local Test: LRT Null Model: Full Model: log(L(b)) for full model: log(L(b)) for null model:

Local Test: Wald Test Information Matrix: Test:

Local Test: Score Test Vector of coefficients under null: Information matrix under null Test:

Contrasts Recall test for H 0 : c ’  = c ’   When a covariate is a factor (e.g. stage), we can test contrasts – Stage II vs. Stage III – Stage II vs. Stage IV – Stage III vs. Stage IV Define scalar contrast matrix, c, and test H 0 : c ’  = 0

Fitting Models In R coxph function for fitting CPHMs – Coefficient estimates – Global tests (LRT, Wald, and Score) – Local test for each individual covariate We can also easily conduct local tests ant contrasts using – LRT – Wald Score not so easy…

Example: Colon Cancer Trial of adjuvant chemotherapy for colon cancer. 929 subject – Main variable of interest is treatment Placebo Levamisiole Levamisole + 5-FU – Other variables Demographics (gender, age) Tumor obstruction (yes/no) Number of lymph nodes involved (number with detectable cancer) Extent of local spread (submucosa, muscle, serosa, more) Adherence to other organs (yes/no) Outcome: Time to cancer recurrence

Some of the Covariates

Fitting in R Treating factors as ordinal reg2<-coxph(st~sex+rx+perfor+obstruct+adhere+nodes2+ extent, data=coln) Treating factors as nominal reg3<-coxph(st~sex+rx+perfor+obstruct+adhere+factor(nodes2)+ factor(extent), data=coln)

Cox PHM Approach > data(colon) > coln<-colon[2*(1:929),] > st<-Surv(coln$time, coln$status) > reg1<-coxph(st~coln$rx) > attributes(reg1) $names [1] "coefficients" "var" "loglik" "score" "iter" "linear.predictors" [7] "residuals" "means" "concordance" "method" "n" "nevent" [13] "terms" "assign" "wald.test" "y" "formula" "xlevels" [19] "contrasts" "call" $class [1] "coxph" > reg1$coef coln$rxLev coln$rxLev+5FU

Results > summary(reg1) Call: coxph(formula = st ~ coln$rx) n= 929, number of events= 468 coef exp(coef) se(coef) z Pr(>|z|) coln$rxLev coln$rxLev+5FU e-05 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower.95 upper.95 coln$rxLev coln$rxLev+5FU Likelihood ratio test = on 2 df, p=5.175e-06 Wald test = on 2 df, p=1.247e-05 Score (logrank) test = on 2 df, p=9.804e-06

Multiple Regression Results > summary(reg2) Call: coxph(formula = st ~ sex + rx + perfor + obstruct + adhere + nodes2 + extent, data = coln) n= 911, number of events= 456 (18 observations deleted due to missingness) coef exp(coef) se(coef) z Pr(>|z|) sex rxLev rxLev+5FU e-05 *** perfor obstruct adhere * nodes < 2e-16 *** extent e-06 *** --- Likelihood ratio test= on 8 df, p=0 Wald test = on 8 df, p=0 Score (logrank) test = on 8 df, p=0

Multiple Regression Results > summary(reg3) Call: coxph(formula = st ~ sex + rx + perfor + obstruct + adhere + factor(nodes2) + factor(extent), data = coln) coef exp(coef) se(coef) z Pr(>|z|) sex rxLev rxLev+5FU e-05 *** perfor obstruct adhere factor(nodes2) factor(nodes2) * factor(nodes2) ** factor(nodes2) < 2e-16 *** factor(extent) factor(extent) factor(extent) **

Ties? > table(duplicated(st)) FALSE TRUE > coxph(st~rx, data=coln) coef exp(coef) se(coef) z p rxLev e-01 rxLev+5FU e-05 Likelihood ratio test=24.3 on 2 df, p=5.17e-06 n= 929, number of events= 468 > coxph(st~rx, data=coln, ties="breslow") coef exp(coef) se(coef) z p rxLev e-01 rxLev+5FU e-05 Likelihood ratio test=24.3 on 2 df, p=5.23e-06 n= 929, number of events= 468 > coxph(st~rx, data=coln, ties="exact") coef exp(coef) se(coef) z p rxLev e-01 rxLev+5FU e-05 Likelihood ratio test=24.3 on 2 df, p=5.19e-06 n= 929, number of events= 468

Run Time > system.time(coxph(st~rx, data=coln)) user system elapsed > system.time(coxph(st~rx, data=coln, ties="breslow")) user system elapsed > system.time(coxph(st~rx, data=coln, ties="exact")) user system elapsed

Local Tests Say we wanted to determine if extent of local spread mattered in the model Four categories: – 1 = submucosa – 2 = muscle – 3 = serosa – 4 = contiguous structures We can use our local tests here (must set it up ourselves in R) – LRT – Wald

Local Test: LRT > full_fit<-coxph(st ~ sex + rx + perfor + obstruct + adhere + factor(nodes2)+ factor(extent), data=coln) > LLfull<-full_fit$loglik[2] > LLfull [1] > null_fit<-coxph(st ~ sex + rx + perfor + obstruct + adhere + factor(nodes2), data=coln) > LLnull<-null_fit$loglik[2] > LLnull [1] > LRT_extent<-2*(LLfull-LLnull) > LRT_extent [1] > pval<-pchisq(LRT_extent, df=3, lower=F) > pval [1] e-05

Local Test: Wald Test > reg3<-coxph(st~sex+rx+perfor+obstruct+adhere+factor(nodes2)+ factor(extent), data=coln) > wald_extent<-t(reg3$coeffic[11:13])%*%solve(reg3$var[11:13,11:13])%*% reg3$coeffic[11:13] > wald_extent [,1] [1,] > pval<-pchisq(wald_extent, df=3, lower=F) > pval [,1] [1,] e-05

Contrasts Since at least one of the extent of spread categories was not 0, we may also want to contrast the four categories We can use contrasts but again, we must set this up ourselves in R.

Contrasts in R > E2vE3<-reg3$coef[11:12]%*%c(-1,1) > se2v3<-sqrt(reg3$var[11,11]+reg3$var[12,12]-2*reg3$var[11,12]) > z2v3<-E2vE3/se2v3 > z2v3 [,1] [1,] > 2*(1-pnorm(abs(z2v3))) [,1] [1,] > E2vE4<-reg3$coef[c(11,13)]%*%c(-1,1) > se2v4<-sqrt(reg3$var[11,11]+reg3$var[13,13]-2*reg3$var[11,13]) > z2v4<-E2vE4/se2v4 > z2v4 [,1] [1,] > 2*(1-pnorm(abs(z2v4))) [,1] [1,] e-05

Confidence Interval Construct a 95% CI for the hazard ratio between different extent of local spread categories For specific model coefficient For a contrast

CIs for Contrasts in R > ucv23<-E2vE3+qnorm(.975)*se2v3 > lcv23<-E2vE3-qnorm(.975)*se2v3 > hr23<-exp(E2vE3) > hr23 [,1] [1,] > uhr23<-exp(ucv23) > uhr23 [,1] [1,] > lhr23<-exp(lcv23) > lhr23 [,1] [1,]

Confidence Interval Conclusion: – Adjusting for other covariates in the model, the risk of death among individuals with serosa involvement is 1.75 times the risk of death relative to subjects with only muscle involvement (95% CI for RR ).

Proportional? Recall we are making a strong assumption that we have proportional hazards for each covariate We can investigate this to some extent via graphical displays BUT limited for quantitative variables We’ll learn more about this later