Lecture 11: Hypothesis Testing III

Name: Lecture 11: Hypothesis Testing III
Uploaded: 2017-08-17T01:38:55+00:00
Duration: PTM35S40
Channel: Emery Moore
Description: Lecture 11: Hypothesis Testing III

Lecture 11: Hypothesis Testing III
Stratified Tests Renyi and Other Tests

Stratified Tests Adjust for a covariate
Allows you to control for a confounder without using a regression approach However Like regression, if interaction is present, it won’t be detected Assumes the ‘treatment’ effect is the same across strata

Sometime Confusing “Stratified” analysis Sometimes
Subgroup analysis Stratified “combined” test In this case, combined test Recall Mantel-Haenszel odds ratio

Notation Now three variables J = 1, 2,…., K indexes groups
Outcome (time to event) Group variable (i.e. treatment) Strata variable (i.e. gender, cancer grade) J = 1, 2,…., K indexes groups S = 1, 2,…, M indexes strata

Similar to the Standard Test
Formal hypothesis Now, Zj.(t) is represented by a sum

From there, inference is the same
Chi-square test with K – 1 d.f. where S-1 is the inverse of the estimated variance covariance matrix For the 2 group scenario it can be reduced to a Z-score

Asymptotics Just like unstratified test, requires large N
Here requires even larger- think about dividing the sample into M strata In most cases, there probably is not sufficient N

Small Example 20 subjects received 1 of two treatments
9 patients on treatment 1 11 patients received treatment 2 Patients also categorized by disease type 2 strata Question: Does the data show a treatment effect after adjusting for disease type?

Time Death Censor Trt Disease 1 5 6 2 8 37 49 58 79 11 50 51 62 67 73 86 90 96 97

What first Data in standard format
Trt 1: 1, 5, 5+,6+, 8, 37, 49, 58, 79+ Trt 2: 11+, 50, 51, 62, 73, 86, 90, 96, 97, 97 We might first conduct a global test What is our hypothesis

Constructing Statistic

Calculate Statistic Z-statistic c2 statistic

Now Let’s Adjust for Disease Type
Steps: Divide the data according to strata Calculate Zjs.(t) and Sum Zjs(t) and across strata to get Zj.(t) & Calculate your test statistic according to

Divide data By Strata Disease 1 Disease 2 Time Death Censor Trt 1 5 8
5 8 49 11 2 50 62 67 73 86 Time Death Censor Trt 6 1 37 58 79 51 2 90 96 97

Calculate and sgsms

Calculate the Statistic
Z (or chi-square) What is our conclusion

R Code >times<-c(1,5,5,6,8,11,37,49,50,51,58,62,67,73,79,86,90,96,97,97) >trt<- c(1,1,1,1,1,2,1,1,2,2,1,2,2,2,1,2,2,2,2,2) >strat<-c(1,1,1,2,1,1,2,1,1,2,2,1,1,1,2,1,2,2,2,2) >death<-c(1,1,0,0,1,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1) #Global >survdiff(st~trt) Call: survdiff(formula = st ~ trt) N Observed Expected (O-E)^2/E (O-E)^2/V trt= trt= Chisq= 6.1 on 1 degrees of freedom, p=

R Code #Stratified survdiff(st~trt + strata(strat)) Call: survdiff(formula = st ~ trt + strata(strat)) N Observed Expected (O-E)^2/E (O-E)^2/V trt= trt= Chisq= 9.5 on 1 degrees of freedom, p=

BMT: Hodgkin’s & Non-Hodgkin’s Lymphoma
Study included 43 BMT patients Is there a difference in hazard rates between Allogenic transplant = HLA matched sibling donor (N=16) Autogenic transplant = Own “cleaned” marrow (N=27) But want to adjust for disease state Non-Hodgkin’s lymphoma (N=23) Hodgkin’s disease (N=20)

Global Test 2 1 43 16 0.628 0.234 4 42 15 0.643 0.230 28 41 14 0.659 0.225 30 40 13 -0.325 0.219 32 39 0.667 0.222 … 132 22 7 -0.318 0.217 140 21 -0.333 252 18 -0.389 0.238 357 0.563 0.246 Sum 0.886 5.841

Global Results Global Test Results
> dat<-read.csv("C:\\BJW\\AutoAllo.csv") > d<-dat$death; t<-dat$time > dis<-dat$disease; type<-dat$graft > nostrat<-survdiff(Surv(t, d)~type) > nostrat Call: survdiff(formula = Surv(t, d) ~ type) N Observed Expected (O-E)^2/E (O-E)^2/V type= type= Chisq= 0.1 on 1 degrees of freedom, p= 0.714

Stratified by Disease Type
Non-Hodgkin’s Lymphoma subjects 28 1 23 11 0.522 0.250 32 22 10 0.545 0.248 42 21 9 -0.429 0.245 49 20 0.550 53 19 8 -0.421 0.244 57 18 -0.444 0.247 63 17 -0.471 0.249 81 2 16 -1.000 0.467 84 14 0.429 140 13 7 -0.538 252 -0.636 0.231 357 0.300 0.210 524 6 -0.750 0.188 Sum -2.344 3.319

Stratified by Disease Type
Hodgkin’s Disease subjects 2 1 20 5 0.750 0.188 4 19 0.789 0.166 30 18 3 -0.167 0.139 36 17 -0.176 0.145 41 16 -0.188 0.152 52 15 -0.200 0.160 62 14 -0.214 0.168 72 13 0.769 0.178 77 12 0.833 79 11 0.909 0.083 108 10 0.000 132 9 sum 3.106 1.518

Stratified Results Stratified Test Results
> strat<-survdiff(Surv(t, d)~type + strata(dis)) > strat Call: survdiff(formula = Surv(t, d) ~ type + strata(dis)) N Observed Expected (O-E)^2/E (O-E)^2/V type= type= Chisq= 0.1 on 1 degrees of freedom, p= 0.729

Stratified Results Stratified Test Results Again we fail to reject
This seems in error (recall (our survival curves looked VERY different)

Problem? The treatment effect is not the same in the 2 disease states
They are in different directions ZAllo = ZAuto = 3.106 Stratified approach is NOT appropriate

Alternative to Stratified Analysis
Alternatives Define 4 groups and conduct a K-sample log rank test Allogenic and NHL Allogenic and Hodgkin’s Autogenic and NHL Autogenic and Hodgkin’s Subgroup analysis (by disease) should be performed Allo|Hodgkins Allo|Non-Hodgkins

R Code- K sample test > allgrp<-ifelse(dis==1 & type==1, 1, 0) > allgrp<-ifelse(dis==1 & type==2, 2, allgrp) > allgrp<-ifelse(dis==2 & type==1, 3, allgrp) > allgrp<-ifelse(dis==2 & type==2, 4, allgrp) > grp4<-survdiff(Surv(t, d)~allgrp) > grp4 Call: survdiff(formula = Surv(t, d) ~ allgrp) N Observed Expected (O-E)^2/E (O-E)^2/V allgrp= allgrp= allgrp= allgrp= Chisq= 11.1 on 3 degrees of freedom, p=

R Code- Subgroup analysis
> ### Subgroup (NHL) > subNHL<-survdiff(Surv(t,d)[which(dis==1)]~type[which(dis==1)]) > subNHL Call: survdiff(formula = Surv(t, d)[which(dis == 1)] ~ type[which(dis ==1)]) N Observed Expected (O-E)^2/E (O-E)^2/V type[which(dis == 1)]= type[which(dis == 1)]= Chisq= 1.7 on 1 degrees of freedom, p= > ### Subgroup (Hodgkins) > subHD<-survdiff(Surv(t,d)[which(dis==2)]~type[which(dis==2)]) > subHD survdiff(formula = Surv(t, d)[which(dis == 2)] ~ type[which(dis ==2)]) N Observed Expected (O-E)^2/E (O -E)^2/V type[which(dis == 2)]= type[which(dis == 2)]= Chisq= 6.4 on 1 degrees of freedom, p=

Summary: Stratified Testing
Alternative to a regression approach to control for a 2nd covariate when examining treatment effect. Sample size needs to be larger that in the case of testing K-groups for test results to be valid. One needs to be cautious about misinterpreting null results when interactions exist. We can use a subgroup approach if this fails.

Renyi Tests Previous tests we discussed all use weighted integral of estimated difference in cumulative hazard rates Doesn’t address situation where early differences favor one group, and later differences favor another group Solution: Renyi tests i.e. addresses issue of crossing hazard rates

Renyi Test Censored data analogs of Kolmogrov-Smirnov statistic when comparing to uncensored samples Recall KS is a test of equality of one-dimensional probability distributions used to compare two samples

Komolgrov-Smirnov Test
Recall empirical distribution function Hypothesis The KS statistic is

Example of a KS test Two groups observed for a continuous outcome:
1: -0.2, 3.7, 4.3, 5.0, 7.7, 8.6 2: -0.9, 0.4, 0.5, 2.6, 3.0, 12.1 We want to determine if the distribution of the outcomes are different (without assuming any distributional form…)

Constructing KS statistic
x P(X1 < x) P(X2 < x) |P(X1 < x)-P(X2 < x)| -0.9 1/6 -0.2 0.4 1/3 0.5 1/2 2.6 2/3 3.0 5/6 3.7 4.3 5.0 7.7 8.6 1 12.1

K-S Test

Renyi Test Approach Find the value of Z(ti) for each failure time
Note different from Z(t) which sums over all ti < t Calculate series of Z(ti) : Estimate the standard error of Z(t) (all times)

Renyi Statistic When hazard rates cross, the absolute value of Z(t) will have max value at some value t < t Hypothesis test: Note that multiple tests are made, because we are taking the max over Z(t)

Test Statistic Q Use the same variance estimate for test statistic as in standard two-sample approach Test statistic Q is approximated by distribution of sup{|B(x)|, 0 < x < 1} where B is a standard Brownian motion process Use table C.5 to find associated p-value

Small Example Given the following data Group 1: (7, 8+, 9, 15, 17)

Constructing the statistic
dk dk1 Yk Yk1 Var

Calculating Q First we can calculate Q
Once we have Q we compare to table C.5

Example 2: Kidney Infection
Data on 119 kidney dialysis patients Comparing time to kidney infection between two groups Catheters placed percutaneously (n = 76) Catheters placed surgically (n = 43)

Example: Kidney Infection

R Code: Kidney Infection
> kidney<-read.csv("H:\\public_html\\BMRTY722_Summer2015\\Data\\Kidney.csv") > time<-kidney$Time > infect<-kidney$d > percut<-kidney$cath > st<-Surv(time, infect) > LRtest<-survdiff(st~percut) > LRtest Call: survdiff(formula = st ~ percut) N Observed Expected (O-E)^2/E (O-E)^2/V percut= percut= Chisq= 2.5 on 1 degrees of freedom, p= 0.112

How to Test This in R? We could write our own R function to conduct the Renyi test… BUT, it turns out there was a package released in April that has the Renyi test (and all weight functions from K & M included )

> library(survMisc) > RYtest<-comp(survfit(st~percut)) > RYtest $tne t n e n_percut=1 e_percut=1 n_percut=2 e_percut=2 1: : … 16: $tests$lrTests ChiSq df p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Flem~-Harr~ with p=1, q= $tests$supTests Q p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Renyi Flem~-Harr~ with p=1, q=

> library(survMisc) > RYtest<-comp(survfit(st~percut), FHp=0, FHq=0) > RYtest $tne t n e n_percut=1 e_percut=1 n_percut=2 e_percut=2 1: : … 16: $tests$lrTests ChiSq df p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Flem~-Harr~ with p=0, q= $tests$supTests Q p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Renyi Flem~-Harr~ with p=0, q=

Example 3: Gastric Cancer
Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy 45 Patients randomized to each of two arms Followed for up to 8 years

R Code: Gastric Cancer > RYtest<-comp(survfit(Surv(tm, dth)~x, data=dat)) > RYtest $tne t n_x=1 e_x=1 n_x=2 e_x=2 n e 1: … 80: $tests$lrTests ChiSq df p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Flem~-Harr~ with p=1, q= $tests$supTests Q p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Renyi Flem~-Harr~ with p=1, q=

Compare To Log Rank Renyi test 0.05< p <0.06
What would you expect to see from the log rank test? More or less significant?

LR Results > LRtest<-survdiff(Surv(tm, dth)~x) > LRtest Call: survdiff(formula = Surv(tm, dth) ~ x) N Observed Expected (O-E)^2/E (O-E)^2/V x= x= Chisq= 0.2 on 1 degrees of freedom, p= 0.63

Final Comments on the Renyi Test
Simulations comparing the Renyi vs. log-rank Hazards cross  Renyi test performs better Renyi test has little loss of power if proportional hazard assumption holds (with limited censoring) However, with large amounts of censoring, advantages of the Renyi test decline So this tests provides a good alternative when hazard rates cross. But caution still needs to be taken when there is a large amount of censoring.

Other Tests for Crossing Hazards
Cramer-von Mises test(s): Based on the integrated squared difference between two curves T-test analog: Requires estimation of the mean Compared area under S1(t) and S2(t) Brookmeyer-Crowley Censored version of two-sample median test

Cramer-Von Mises Test Based on the Nelson-Aalen estimator for the hazard rate and it’s associated variance Ideally we integrate over time 0 to t but this integral is estimated by summing over distinct death times

2-Sample T-test analog Again this test is based on the difference in the area under the survival curve between two groups Components of the test include: Order all observed times (event and censored) Calculate dij, cij, and Yij or both groups Calculate the KM estimator for survival and censoring Calculate the pooled KM estimate of survival

2-Sample T-test analog Once these estimates are obtained:
Construct weight function Construct the test statistic Construct the variance of the test statistic Calculate a Z-score according to

Summary of Other 2-Sample Tests
When the hazard rates cross, both the Cramer-Von Mises and the 2-sample t-test analog have greater power than log-rank. When hazard rates are proportional, both show power loss relative to log-rank. Performance is similar to the Renyi test when hazards cross but Renyi has better power for proportional hazards.

Test Based on Fixed Points in Time
Complicated description in K&M (chapter 7.8) However, pretty simple idea when you are comparing two groups:

Next Time We will begin our discussion of semi-parametric regression modeling in survival analysis.

Lecture 11: Hypothesis Testing III

Similar presentations

Presentation on theme: "Lecture 11: Hypothesis Testing III"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 11: Hypothesis Testing III

Similar presentations

Presentation on theme: "Lecture 11: Hypothesis Testing III"— Presentation transcript:

Similar presentations

About project

Feedback