Download presentation
1
Lecture 11: Hypothesis Testing III
Stratified Tests Renyi and Other Tests
2
Stratified Tests Adjust for a covariate
Allows you to control for a confounder without using a regression approach However Like regression, if interaction is present, it won’t be detected Assumes the ‘treatment’ effect is the same across strata
3
Sometime Confusing “Stratified” analysis Sometimes
Subgroup analysis Stratified “combined” test In this case, combined test Recall Mantel-Haenszel odds ratio
4
Notation Now three variables J = 1, 2,…., K indexes groups
Outcome (time to event) Group variable (i.e. treatment) Strata variable (i.e. gender, cancer grade) J = 1, 2,…., K indexes groups S = 1, 2,…, M indexes strata
5
Similar to the Standard Test
Formal hypothesis Now, Zj.(t) is represented by a sum
6
From there, inference is the same
Chi-square test with K – 1 d.f. where S-1 is the inverse of the estimated variance covariance matrix For the 2 group scenario it can be reduced to a Z-score
7
Asymptotics Just like unstratified test, requires large N
Here requires even larger- think about dividing the sample into M strata In most cases, there probably is not sufficient N
8
Small Example 20 subjects received 1 of two treatments
9 patients on treatment 1 11 patients received treatment 2 Patients also categorized by disease type 2 strata Question: Does the data show a treatment effect after adjusting for disease type?
9
Time Death Censor Trt Disease 1 5 6 2 8 37 49 58 79 11 50 51 62 67 73 86 90 96 97
11
What first Data in standard format
Trt 1: 1, 5, 5+,6+, 8, 37, 49, 58, 79+ Trt 2: 11+, 50, 51, 62, 73, 86, 90, 96, 97, 97 We might first conduct a global test What is our hypothesis
12
Constructing Statistic
13
Calculate Statistic Z-statistic c2 statistic
14
Now Let’s Adjust for Disease Type
Steps: Divide the data according to strata Calculate Zjs.(t) and Sum Zjs(t) and across strata to get Zj.(t) & Calculate your test statistic according to
15
Divide data By Strata Disease 1 Disease 2 Time Death Censor Trt 1 5 8
5 8 49 11 2 50 62 67 73 86 Time Death Censor Trt 6 1 37 58 79 51 2 90 96 97
16
Calculate and sgsms
17
Calculate and sgsms
18
Calculate the Statistic
Z (or chi-square) What is our conclusion
19
R Code >times<-c(1,5,5,6,8,11,37,49,50,51,58,62,67,73,79,86,90,96,97,97) >trt<- c(1,1,1,1,1,2,1,1,2,2,1,2,2,2,1,2,2,2,2,2) >strat<-c(1,1,1,2,1,1,2,1,1,2,2,1,1,1,2,1,2,2,2,2) >death<-c(1,1,0,0,1,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1) #Global >survdiff(st~trt) Call: survdiff(formula = st ~ trt) N Observed Expected (O-E)^2/E (O-E)^2/V trt= trt= Chisq= 6.1 on 1 degrees of freedom, p=
20
R Code #Stratified survdiff(st~trt + strata(strat)) Call: survdiff(formula = st ~ trt + strata(strat)) N Observed Expected (O-E)^2/E (O-E)^2/V trt= trt= Chisq= 9.5 on 1 degrees of freedom, p=
21
BMT: Hodgkin’s & Non-Hodgkin’s Lymphoma
Study included 43 BMT patients Is there a difference in hazard rates between Allogenic transplant = HLA matched sibling donor (N=16) Autogenic transplant = Own “cleaned” marrow (N=27) But want to adjust for disease state Non-Hodgkin’s lymphoma (N=23) Hodgkin’s disease (N=20)
23
Global Test 2 1 43 16 0.628 0.234 4 42 15 0.643 0.230 28 41 14 0.659 0.225 30 40 13 -0.325 0.219 32 39 0.667 0.222 … 132 22 7 -0.318 0.217 140 21 -0.333 252 18 -0.389 0.238 357 0.563 0.246 Sum 0.886 5.841
24
Global Results Global Test Results
> dat<-read.csv("C:\\BJW\\AutoAllo.csv") > d<-dat$death; t<-dat$time > dis<-dat$disease; type<-dat$graft > nostrat<-survdiff(Surv(t, d)~type) > nostrat Call: survdiff(formula = Surv(t, d) ~ type) N Observed Expected (O-E)^2/E (O-E)^2/V type= type= Chisq= 0.1 on 1 degrees of freedom, p= 0.714
25
Stratified by Disease Type
Non-Hodgkin’s Lymphoma subjects 28 1 23 11 0.522 0.250 32 22 10 0.545 0.248 42 21 9 -0.429 0.245 49 20 0.550 53 19 8 -0.421 0.244 57 18 -0.444 0.247 63 17 -0.471 0.249 81 2 16 -1.000 0.467 84 14 0.429 140 13 7 -0.538 252 -0.636 0.231 357 0.300 0.210 524 6 -0.750 0.188 Sum -2.344 3.319
26
Stratified by Disease Type
Hodgkin’s Disease subjects 2 1 20 5 0.750 0.188 4 19 0.789 0.166 30 18 3 -0.167 0.139 36 17 -0.176 0.145 41 16 -0.188 0.152 52 15 -0.200 0.160 62 14 -0.214 0.168 72 13 0.769 0.178 77 12 0.833 79 11 0.909 0.083 108 10 0.000 132 9 sum 3.106 1.518
27
Stratified Results Stratified Test Results
> strat<-survdiff(Surv(t, d)~type + strata(dis)) > strat Call: survdiff(formula = Surv(t, d) ~ type + strata(dis)) N Observed Expected (O-E)^2/E (O-E)^2/V type= type= Chisq= 0.1 on 1 degrees of freedom, p= 0.729
28
Stratified Results Stratified Test Results Again we fail to reject
This seems in error (recall (our survival curves looked VERY different)
29
Problem? The treatment effect is not the same in the 2 disease states
They are in different directions ZAllo = ZAuto = 3.106 Stratified approach is NOT appropriate
30
Alternative to Stratified Analysis
Alternatives Define 4 groups and conduct a K-sample log rank test Allogenic and NHL Allogenic and Hodgkin’s Autogenic and NHL Autogenic and Hodgkin’s Subgroup analysis (by disease) should be performed Allo|Hodgkins Allo|Non-Hodgkins
31
R Code- K sample test > allgrp<-ifelse(dis==1 & type==1, 1, 0) > allgrp<-ifelse(dis==1 & type==2, 2, allgrp) > allgrp<-ifelse(dis==2 & type==1, 3, allgrp) > allgrp<-ifelse(dis==2 & type==2, 4, allgrp) > grp4<-survdiff(Surv(t, d)~allgrp) > grp4 Call: survdiff(formula = Surv(t, d) ~ allgrp) N Observed Expected (O-E)^2/E (O-E)^2/V allgrp= allgrp= allgrp= allgrp= Chisq= 11.1 on 3 degrees of freedom, p=
32
R Code- Subgroup analysis
> ### Subgroup (NHL) > subNHL<-survdiff(Surv(t,d)[which(dis==1)]~type[which(dis==1)]) > subNHL Call: survdiff(formula = Surv(t, d)[which(dis == 1)] ~ type[which(dis ==1)]) N Observed Expected (O-E)^2/E (O-E)^2/V type[which(dis == 1)]= type[which(dis == 1)]= Chisq= 1.7 on 1 degrees of freedom, p= > ### Subgroup (Hodgkins) > subHD<-survdiff(Surv(t,d)[which(dis==2)]~type[which(dis==2)]) > subHD survdiff(formula = Surv(t, d)[which(dis == 2)] ~ type[which(dis ==2)]) N Observed Expected (O-E)^2/E (O -E)^2/V type[which(dis == 2)]= type[which(dis == 2)]= Chisq= 6.4 on 1 degrees of freedom, p=
33
Summary: Stratified Testing
Alternative to a regression approach to control for a 2nd covariate when examining treatment effect. Sample size needs to be larger that in the case of testing K-groups for test results to be valid. One needs to be cautious about misinterpreting null results when interactions exist. We can use a subgroup approach if this fails.
34
Renyi Tests Previous tests we discussed all use weighted integral of estimated difference in cumulative hazard rates Doesn’t address situation where early differences favor one group, and later differences favor another group Solution: Renyi tests i.e. addresses issue of crossing hazard rates
35
Renyi Test Censored data analogs of Kolmogrov-Smirnov statistic when comparing to uncensored samples Recall KS is a test of equality of one-dimensional probability distributions used to compare two samples
36
Komolgrov-Smirnov Test
Recall empirical distribution function Hypothesis The KS statistic is
37
Example of a KS test Two groups observed for a continuous outcome:
1: -0.2, 3.7, 4.3, 5.0, 7.7, 8.6 2: -0.9, 0.4, 0.5, 2.6, 3.0, 12.1 We want to determine if the distribution of the outcomes are different (without assuming any distributional form…)
38
Constructing KS statistic
x P(X1 < x) P(X2 < x) |P(X1 < x)-P(X2 < x)| -0.9 1/6 -0.2 0.4 1/3 0.5 1/2 2.6 2/3 3.0 5/6 3.7 4.3 5.0 7.7 8.6 1 12.1
40
K-S Test
41
Renyi Test Approach Find the value of Z(ti) for each failure time
Note different from Z(t) which sums over all ti < t Calculate series of Z(ti) : Estimate the standard error of Z(t) (all times)
42
Renyi Statistic When hazard rates cross, the absolute value of Z(t) will have max value at some value t < t Hypothesis test: Note that multiple tests are made, because we are taking the max over Z(t)
43
Test Statistic Q Use the same variance estimate for test statistic as in standard two-sample approach Test statistic Q is approximated by distribution of sup{|B(x)|, 0 < x < 1} where B is a standard Brownian motion process Use table C.5 to find associated p-value
44
Small Example Given the following data Group 1: (7, 8+, 9, 15, 17)
45
Constructing the statistic
dk dk1 Yk Yk1 Var
46
Calculating Q First we can calculate Q
Once we have Q we compare to table C.5
47
Example 2: Kidney Infection
Data on 119 kidney dialysis patients Comparing time to kidney infection between two groups Catheters placed percutaneously (n = 76) Catheters placed surgically (n = 43)
48
Example: Kidney Infection
49
R Code: Kidney Infection
> kidney<-read.csv("H:\\public_html\\BMRTY722_Summer2015\\Data\\Kidney.csv") > time<-kidney$Time > infect<-kidney$d > percut<-kidney$cath > st<-Surv(time, infect) > LRtest<-survdiff(st~percut) > LRtest Call: survdiff(formula = st ~ percut) N Observed Expected (O-E)^2/E (O-E)^2/V percut= percut= Chisq= 2.5 on 1 degrees of freedom, p= 0.112
50
How to Test This in R? We could write our own R function to conduct the Renyi test… BUT, it turns out there was a package released in April that has the Renyi test (and all weight functions from K & M included )
51
R Code: Kidney Infection
> library(survMisc) > RYtest<-comp(survfit(st~percut)) > RYtest $tne t n e n_percut=1 e_percut=1 n_percut=2 e_percut=2 1: : … 16: $tests$lrTests ChiSq df p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Flem~-Harr~ with p=1, q= $tests$supTests Q p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Renyi Flem~-Harr~ with p=1, q=
52
R Code: Kidney Infection
> library(survMisc) > RYtest<-comp(survfit(st~percut), FHp=0, FHq=0) > RYtest $tne t n e n_percut=1 e_percut=1 n_percut=2 e_percut=2 1: : … 16: $tests$lrTests ChiSq df p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Flem~-Harr~ with p=0, q= $tests$supTests Q p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Renyi Flem~-Harr~ with p=0, q=
53
Example 3: Gastric Cancer
Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy 45 Patients randomized to each of two arms Followed for up to 8 years
55
R Code: Gastric Cancer > RYtest<-comp(survfit(Surv(tm, dth)~x, data=dat)) > RYtest $tne t n_x=1 e_x=1 n_x=2 e_x=2 n e 1: … 80: $tests$lrTests ChiSq df p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Flem~-Harr~ with p=1, q= $tests$supTests Q p Log-rank Gehan-Breslow (mod~ Wilcoxon) Tarone-Ware Peto-Peto Mod~ Peto-Peto (Andersen) Renyi Flem~-Harr~ with p=1, q=
56
Compare To Log Rank Renyi test 0.05< p <0.06
What would you expect to see from the log rank test? More or less significant?
57
LR Results > LRtest<-survdiff(Surv(tm, dth)~x) > LRtest Call: survdiff(formula = Surv(tm, dth) ~ x) N Observed Expected (O-E)^2/E (O-E)^2/V x= x= Chisq= 0.2 on 1 degrees of freedom, p= 0.63
58
Final Comments on the Renyi Test
Simulations comparing the Renyi vs. log-rank Hazards cross Renyi test performs better Renyi test has little loss of power if proportional hazard assumption holds (with limited censoring) However, with large amounts of censoring, advantages of the Renyi test decline So this tests provides a good alternative when hazard rates cross. But caution still needs to be taken when there is a large amount of censoring.
59
Other Tests for Crossing Hazards
Cramer-von Mises test(s): Based on the integrated squared difference between two curves T-test analog: Requires estimation of the mean Compared area under S1(t) and S2(t) Brookmeyer-Crowley Censored version of two-sample median test
60
Cramer-Von Mises Test Based on the Nelson-Aalen estimator for the hazard rate and it’s associated variance Ideally we integrate over time 0 to t but this integral is estimated by summing over distinct death times
61
2-Sample T-test analog Again this test is based on the difference in the area under the survival curve between two groups Components of the test include: Order all observed times (event and censored) Calculate dij, cij, and Yij or both groups Calculate the KM estimator for survival and censoring Calculate the pooled KM estimate of survival
62
2-Sample T-test analog Once these estimates are obtained:
Construct weight function Construct the test statistic Construct the variance of the test statistic Calculate a Z-score according to
63
Summary of Other 2-Sample Tests
When the hazard rates cross, both the Cramer-Von Mises and the 2-sample t-test analog have greater power than log-rank. When hazard rates are proportional, both show power loss relative to log-rank. Performance is similar to the Renyi test when hazards cross but Renyi has better power for proportional hazards.
64
Test Based on Fixed Points in Time
Complicated description in K&M (chapter 7.8) However, pretty simple idea when you are comparing two groups:
65
Next Time We will begin our discussion of semi-parametric regression modeling in survival analysis.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.