Lecture 10: Hypothesis Testing II Weight Functions Trend Tests
Testing >2 Samples in R > ###2-sample testing using toy example > time<-c(3,6,9,9,11,16,8,9,10,12,19,23) > cens<-c(1,0,1,1,0,1,1,1,0,0,1,0) > grp<-c(1,1,1,1,1,1,2,2,2,2,2,2) > grp<-as.factor(grp) > > sdat<-Surv(time, cens) > survdiff(sdat~grp) Call: survdiff(formula = sdat ~ grp) N Observed Expected (O-E)^2/E (O-E)^2/V grp= grp= Chisq= 1.6 on 1 degrees of freedom, p= 0.203
Testing >2 Samples in R > survdiff(sdat~grp, rho=1) Call: survdiff(formula = sdat ~ grp) N Observed Expected (O-E)^2/E (O-E)^2/V grp= grp= Chisq= 1.6 on 1 degrees of freedom, p= 0.268
Revisit ‘Linear Dependence’ of Z j ( ) How are they linearly dependent? Two sample case:
Revisit ‘Linear Dependence’ of Z j ( ) K -sample case:
Beyond Log-Rank Log-rank has optimum power to detect H a when the hazard rates of our K groups are proportional What if they’re not… We’ve mentioned using other weight functions Depending on the choice of weight functions, we can place emphasis on different regions of the survival curve.
Example: Kidney Infection Data on 119 kidney dialysis patients Comparing time to kidney infection between two groups – Catheters placed percutaneously ( n = 76) – Catheters placed surgically ( n = 43)
Log-Rank Test titi Y i1 d i1 Y i2 d i2 YiYi didi Sum
Comparisons Test p -value Log-Rank Gehan Tarone-Ware Peto-Peto Modified Peto-Peto Fleming-Harrington p=0; q= Fleming-Harrington p=1; q= Fleming-Harrington p=1; q= Fleming-Harrington p=0.5; q= Fleming-Harrington p=0.5; q=
Notice the Differences! Situation of varying inference Need to be sure you are testing what you think you are testing Check – Look at the hazards – Do they cross? Problem – Estimating hazards is imprecise (as we’ve discussed)
Cumulative Hazards
Hazard Rate (smoothing spline)
Misconception Survival curves crossing telling about appropriateness of log-rank Not true – Survival curves crossing depends on censoring and study duration – What if they cross but we don’t look far enough out Consider – Survival curves cross hazards cross – Hazards cross survival curves may or may not cross Solution? – Test regions of t – Prior and after cross based in looking at hazard – Some tests allow for crossing (Yang and Prentice 2005)
Take-home Choice of weight function can be critical K&M recommend applying log-rank and Gehan Cox regression (simple) is akin to log-rank Think carefully about the distribution of weights and about possible crossing of hazards
What About Weights… We know that R has limited selection for weights. SAS doesn’t seem to allow us to specify any weights (at least not in proc lifetest) So of course we can write our own function…
R Function for Different Weights What information will we need to construct the different weights? Can we get this information from R?
Building Our R Function > times<-kidney$Time > cens<-kidney$d > grp<-kidney$cath > fit<-survfit(Surv(times, cens)~1) > tm<-summary(fit)$time > Yi<-fit$n.risk[which(fit$time%in%tm)] > di<-fit$n.event[which(fit$time%in%tm)] > Yi [1] > di [1]
> fit<-survfit(st~kidney$cath) > summary(fit) Call: survfit(formula = st ~ kidney$cath) kidney$cath=1 time n.risk n.event survival std.err lower 95% CI upper 95% CI kidney$cath=2 time n.risk n.event survival std.err lower 95% CI upper 95% CI > names(fit) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" "type" "strata" "std.err" "upper" [11] "lower" "conf.type" "conf.int" "call"
> names(fit) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" "type" "strata" "std.err" "upper" [11] "lower" "conf.type" "conf.int" "call" > fit$n.risk [1] [34] > fit$n.event [1] > names(summary(fit)) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" "type" "strata" "std.err" "upper" [11] "lower" "conf.type" "conf.int" "call" "table“ > summary(fit)$n.risk [1] > summary(fit)$n.event [1] Building Our R Function
We still need to think about how to estimate Y i1 and d i1 for all times where > 1 event occurs – including times where group 1 is censored We can certainly construct a risk set using what we get out of R. – Recall how we find the risk set…
Building Our R Function > dat<-cbind(times, cens)[which(grp==1),] > yij<-dij<-c() > for (i in 1:length(tm)) { tmi<-tm[i] yij =tmi))) dij<-append(dij, sum(dat[which(dat[,1]==tmi),2])) } > yij [1] > dij [1] We need to estimate the weights so we can construct the weighted versions
Test Statistic -We have all the parts we need to construct the “constant” portion of our test statistic. >OmEi<-dij-yij*(di/Yi) >vi<-(yij/Yi)*(1-yij/Yi)*((Yi-di)/(Yi-1))*di > round(OmEi, 3) [1] [10] > round(vi, 3) [1] Now we need to estimate the weights so we can construct the weighted versions…
#generating weights Sim1<-c(1,fit$surv[which(fit$time%in%tm)][1:(length(tm)-1)]) if (wt=="lr") Wti<-rep(1, length(tm)) if (wt=="geh") Wti<-Yi if (wt=="tw") Wti<-sqrt(Yi) if (wt=="pp") Wti<-cumprod(1-di/(Yi+1)) if (wt=="mpp") Wti<-cumprod(1-di/(Yi+1))*Yi/(Yi+1) if (wt=="fh") { if(missing(p) | missing(q)) stop("Use of Fleming-Harrington Weights requires values for p and q") else Wti<-Sim1^p*(1-Sim1)^q } #Example Using the Gehan weight > wt=“geh” > if (wt=="geh") Wti<-Yi > Wti [1] Different Weight functions
Final Calculations #Apply the chosen weight to our test statistic and it’s variance > OmE<-as.numeric(t(Wti)%*%OmEi) > v<-as.numeric(t(Wti^2)%*%vi) > tstat<-OmE^2/v > pval<-pchisq(tstat, df=1, lower.tail=F) > OmE [1] -9 > v [1] > tstat [1] > pval [1]
survdiff_wts<-function(times, cens, grp, wt, p, q) { fit<-survfit(Surv(times, cens)~1) tm<-summary(fit)$time Yi<-fit$n.risk[which(fit$time%in%tm)] di<-fit$n.event[which(fit$time%in%tm)] dat<-cbind(times, cens)[which(grp==1),] yij<-dij<-c() for (i in 1:length(tm)) { tmi<-tm[i] yij =tmi))) dij<-append(dij, sum(dat[which(dat[,1]==tmi),2])) } OmEi<-dij-yij*(di/Yi) vi<-(yij/Yi)*(1-yij/Yi)*((Yi-di)/(Yi-1))*di Sim1<-c(1,fit$surv[which(fit$time%in%tm)][1:(length(tm)-1)]) if (wt=="lr") Wti<-rep(1, length(tm)) if (wt=="geh") Wti<-Yi if (wt=="tw") Wti<-sqrt(Yi) if (wt=="pp") Wti<-cumprod(1-di/(Yi+1)) if (wt=="mpp") Wti<-cumprod(1-di/(Yi+1))*Yi/(Yi+1) if (wt=="fh") { if(missing(p) | missing(q)) stop("Use of Fleming-Harrington Weights requires values for p and q") else Wti<-Sim1^p*(1-Sim1)^q } OmE<-as.numeric(t(Wti)%*%OmEi) v<-as.numeric(t(Wti^2)%*%vi) tstat<-OmE^2/v pval<-pchisq(tstat, df=1, lower.tail=F) ans<-list(weights=Wti, Z_tau=OmE, sig_11=v, chisq=tstat, pval=pval) names(ans)<-c("Weights", "Z_tau","sig_11","chisq value","pvalue") return(ans) }
Larynx Cancer 90 patients diagnosed with larynx cancer (1970’s) Patients classified according to disease stage – Stages I-IV We are interested in survival BUT we want to compare the four stages
Kaplan-Meier curves
R: survdiff >lar<-read.csv("H:public.html\\BMTRY_722_Summer2015\\Date\\larynx.csv") >time<-lar$time; death<-lar$death; stage<-lar$stage >st<-Surv(time, death) > test0<-survdiff(st~stage) > test0 Call: survdiff(formula = st ~ stage) N Observed Expected (O-E)^2/E (O-E)^2/V stage= stage= stage= stage= Chisq= 22.8 on 3 degrees of freedom, p= 4.53e-05
R: survdiff > test1<-survdiff(st~stage, rho=1) > test1 Call: survdiff(formula = st ~ stage, rho=1) … Chisq= 23.1 on 3 degrees of freedom, p= 3.85e-05 > test2<-survdiff(st~stage, rho=3) > test2 Call: survdiff(formula = st ~ stage, rho=3) … Chisq= 21.8 on 3 degrees of freedom, p= 7.03e-05 Recall: W(ti)=Y(ti)S(t_(i-1))^p(1-S(t_(i-1)))^q
What about our hazards
R: survdiff > test3<-survdiff(st[stage<3]~stage[stage<3]) Chisq= 0 on 1 degrees of freedom, p= > test4<-survdiff(st~factor(disease, exclude=c(2,4))) Chisq= 3.1 on 1 degrees of freedom, p= > test5<- survdiff(st~factor(disease, exclude=c(2,3))) Chisq= 23.4 on 1 degrees of freedom, p= 1.32e-06 > test6<-survdiff(st~factor(disease, exclude=c(1,4))) Chisq= 1.5 on 1 degrees of freedom, p= > test7<-survdiff(st~factor(disease, exclude=c(1,3)) Chisq= 11.5 on 1 degrees of freedom, p= > test8 2]~stage[stage>2]) Chisq= 0.5 on 1 degrees of freedom, p= 0.769
What about the differences Not much evidence of hazards crossing If there isn’t overlap, then tests will be somewhat consistent Log-rank: most appropriate when hazards are proportional
Test For Trends We generally perform tests of trends for ordinal variables – Dose level – PSA categories (prostate cancer) – Cancer stage Different than treating variable as continuous, although that is one ‘accepted’ approach For continuous covariates, we need a regression model (we will get there shortly)
Formally Tests for trends Our hypothesis is Any weight function discussed previously can be used Test statistic:
Formally Tests for trends a j : Weights- often chosen as a j = j but can be user specified jg : j th, g th element of the variance-covariance matrix of Z j ( )
Stage: Ordinal Categories
Trend Test in R #Test Trend in R surv.trendtest<-function(times, cens, wt, aj) { require(survival) test<-survdiff(Surv(times, cens), rho=wt) zj<-test$obs-test$exp zv<-test$var num<-sum(aj*zj) den<-0 for (i in 1:length(aj)) { for (g in 1:length(aj)) { den<-den+aj[i]*aj[g]*zv[i,g]} } den<-sqrt(den) zz<-num/den pval<-2*(1-pnorm(abs(zz))) return(list(Z=zz, pvalue=pval)) }
Trend Test in R >test.t0<-surv.trendtest(test=test0, wt=1:4) >test.t0 $Z [1] $pvalue [1] >test.t1<-surv.trendtest(test=test1, wt=1:4) > test.t1 $Z [1] $pvalue [1] e-05
Next Time… Stratified tests Other K -sample tests