Lecture 9: Hypothesis Testing One sample tests >2 sample.

Lecture 9: Hypothesis Testing One sample tests >2 sample

Hypothesis Testing for One-Sample Standard set-up What is  ? Common approach – Assume distribution is exponential – Test that distribution is exponential with  =  0

Pretty Stringent Actually As long as the hazard is specified for the range of t, tests can be performed

General Form of Test

Log-Rank W(t i ) = Y(t i )

Accounting for Left-Truncation Choice of weights is still W(t) = Y(t)

Other Options Harrington and Fleming – Allows user to have flexibility in weighting – Can choose early or late departures to be more influential – Special case: Gehan-Wilcoxon – Harrington DP and Fleming TR (1982). A class of rank test procedures for censored survival data. Biometrika 69, 553- 566. Gatsonis Interesting aside – Log-rank first introduced for one-sample testing by Breslow (1975) – Extended to left-truncation by Hyde (1977) and Woolson (1981).

Notes An estimator of the variance, V, can be the empirical estimate rather than the hypothesized value When the alternative, h(t) > h 0 (t) is true, this variance estimator is expected to be larger and the test less powerful If h(t) < h 0 (t) then this variance will be smaller and the test more powerful

Example: Rheumatoid Arthritis 10 white males with RA followed for up to 18 years Objective: – Determine if men with RA are at greater risk of mortality

Entry TimeExit Time didi 43510 44540 45510 45600 48610 49550 50591 51691 53680 54700

Bone Marrow Transplant for Leukemia Patient undergoing bone marrow transplant (BMT) for acute leukemia Three types of leukemia – ALL – AML low risk – AML high risk What if we are interested in overall incidence rate (i.e. either relapse or death)

BMT Example Want to test whether or not survival in BMT patients follows an exponential distribution – What does this mean we are asking? Can estimate from the data (recall the MLE for an exponential distribution)

R Code ### BMT example data<-read.csv("H:\\BMTRY_722_Summer2013\\BMT_1_3.csv") failtime<-ifelse(data$Relapse==0 & data$Death==0| data$Relapse==1, data$TTR, NA) failtime =data$TTD, data$TTD, failtime) event<-ifelse(data$Relapse==1| data$Death==1, 1, 0) st<-Surv(failtime, event) fit<-survfit(st~1) plot(fit, xlab="Time", ylab="S(t)", lwd=2) #Calculating lambda hat lambda.hat<-sum(event)/sum(failtime)

“survdiff” Function Description Tests if there is a difference between two or more curves using the G-rho family of tests, or for a single curve against a known alternative Usage survdiff(formula, data, subset, na.action, rho=0) Arguments formula: a formula expression as for other survival models, of the form Surv(time, status)~predictors. For a one-sample test, the predictors must consist of a single offset(sp) term, where sp is a vector giving the survival probability for each subject

“survdiff” Function Method This function implements the G-rho family of Harrington and Fleming (1982), with weights on each death of S(t)^rho, where S is the Kapalan-Meier estimate of survival. With rho=0 this is the log-rank or Mantel-Haenszel test, and with rho=1 it is the equivalent to the Peto & Peto modification of the Gehan-Wilcoxon test. If the right hand side of the formula consists only of an offset term, then a one sample test is done. To cause the missing values in the predictors to be treated as a separate group, rather than being omitted, use a factor function with its exclude argument.

R code #Estimating lambda >lambda.hat<-sum(event)/sum(failtime) # Expected S(t) = exp(-lambda.hat*t) > S.exp<-exp(-lambda.hat*failtime) > one.sample.test<-survdiff(st~offset(S.exp)) > one.sample.test1 Observed Expected Z p 83 83 0 1 > one.sample.test2<-survdiff(st~offset(S.exp), rho=1) > one.sample.test2 Observed Expected Z p 83 83 0 0.00521 #Comparing hypothesized dist’n to empirical dist’n > plot(fit, conf.int=F, lwd=2) > lines(sort(failtime), rev(sort(S.exp)), col=2, lwd=2, type="s")

R code #Estimating lambda for failure times <800 > fail2<-failtime[which(failtime<800)] > event2<-event[which(failtime<800)] > lambda.hat2<-sum(event2)/sum(fail2) # Expected S(t) = exp(-.004*t) > S.exp2<-exp(- lambda.hat2 *fail2) > st2<-Surv(fail2, event2); fit2<-survfit(st2~1) > one.sample.testa<-survdiff(st2~offset(S.exp2)) > one.sample.testa Observed Expected Z p 80 80 0 1 > one.sample.testb<-survdiff(st2~offset(S.exp2), rho=1) > one.sample.testb Observed Expected Z p 80 80 0.000 0.477

R code #Estimating lambda for failure times >800 > fail3 =800)] > event3 =800)] > lambda.hat3<-sum(event3)/sum(fail3) # Expected S(t) = exp(-.004*t) > S.exp3<-exp(- lambda.hat3*fail3) > st3<-Surv(fail3, event3); fit3<-survfit(st3~1) > one.sample.testc<-survdiff(st3~offset(S.exp3)) > one.sample.testc Observed Expected Z p 3 3 -2.56e-16 1 > one.sample.testd<-survdiff(sts~offset(S.exp3), rho=1) > one.sample.testd Observed Expected Z p 3 3 -0.035 0.9730

Conclusions So what can we conclude about our original hypothesis?

Relevance Becoming more common Phase II cancer studies with TTE outcomes instead of response But – Often more interested in median or 1 year survival Yet – Very important for sample size considerations – Most often assume study data will have exponential distribution for sample size

On to something more interesting… comparing >2 samples

Comparing two or more samples Anova type approach – Where  is the largest time for which all groups have at least one subject at risk Data can be right-censored (and left truncated) for the tests we will discuss

Notation Let t 1 < t 2 < … < t D be distinct death times in all samples being compared At time t i, let d ij be the number of events in group j out of Y ij individuals at risk ( j = 1,2,…,K ) Define

Rationale Weighted comparisons of the estimated hazard of the j th population under the null hypothesis and alternative hypothesis Based on Nelson-Aalen estimator If the null is true, the pooled estimate of h(t) should be an estimator for h j (t)

Applying the Test Let W j (t) be a positive weight function s.t. W j (t) = 0 if Y ij = 0 If all Z j (  ) ’s are close to zero, then little evidence to reject the null

Common Form for Weight Functions All commonly used tests choose weight functions s.t. Note that weight is common across all j Can redefine Z :

Test Statistic Variance and covariance of Z j (  (K&M p. 207) Z 1 (  Z 2 (  Z K (  are linearly dependent because their sum is 0 For test statistic, choose K – 1 components Chi-square test with K – 1 d.f. where  -1 is the variance-covariance matrix

Log-Rank Test for 2 Groups For log-rank W(t i )=1 Have 2 groups and want to test if survival is the same in the groups We want to develop a nonparametric test of

Log-Rank Test for 2 Groups If and follow some parametric distribution and are in the same family, this is easy For example assume But need a test whose validity doesn’t depend on parametric assumptions

Constructing the Log-Rank Test Recall our notation – t 1 < t 2 < … < t D are D distinct ordered event times – Y ij = # people in the group j at risk at t i – Y i = # people at risk across groups at t i – d ij = # of people in group j that fail at t i – d i = # of people in across groups that fail at t i

Constructing the Log-Rank Test We can summarize the information at time t i in a 2x2 table FailDon’t Fail Group 0 Group 1

Constructing the Log-Rank Test

Toy Example Say we have the following data on two groups: We want to test the hypothesis

Toy Example

Same Test in R > time<-c(3,6,9,9,11,16,8,9,10,12,19,23) > cens<-c(1,0,1,1,0,1,1,1,0,0,1,0) > grp<-c(1,1,1,1,1,1,2,2,2,2,2,2) > grp<-as.factor(grp) > > sdat<-Surv(time, cens) > survdiff(sdat~grp) Call: survdiff(formula = sdat ~ grp) N Observed Expected (O-E)^2/E (O-E)^2/V grp=1 6 4 2.57 0.800 1.62 grp=2 6 3 4.43 0.463 1.62 Chisq= 1.6 on 1 degrees of freedom, p= 0.203

Same Test in R > names(toy) [1] "n" "obs" "exp" "var" "chisq" "call" > toy$obs [1] 4 3 > toy$exp [1] 2.566667 4.433333 > toy$var [,1] [,2] [1,] 1.267778 -1.267778 [2,] -1.267778 1.267778 > toy$chisq [1] 1.620508

UMP Tests

More general: 2 samples We can change the weight function For K = 2, can use Z -score or   2 Corrects for ties

Choice for Weight Functions W(t) = 1 – Log-rank test – Optimal power for detecting differences when hazards are proportional W i (t) = Y i – Gehan test – Generalization of 2-sample Mann-Whitney-Wilcoxon test

Choices for Weight Functions Fleming-Harrington – General case – Special cases Log-rank: q = 0 Mann-Whitney-Wilcoxon: p = 1, q = 0 q = 0, p > 0: gives greater weight to early departures p = 0, q > 0: gives greater weight to late departures – Allows specific choice of influence (for better or worse!)

Others? Many Not all available in all software (e.g. Gehan not in R) Worth trying a few in each situation to compare inferences

Caveat Note we are interested in the average difference (consider log-rank specifically) What if hazards cross? Could have significant difference prior to some t, and another significant difference after t : but what if direction differs?

Next time More on different weight functions Tests for trends

Lecture 9: Hypothesis Testing One sample tests >2 sample.

Similar presentations

Presentation on theme: "Lecture 9: Hypothesis Testing One sample tests >2 sample."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 9: Hypothesis Testing One sample tests >2 sample.

Similar presentations

Presentation on theme: "Lecture 9: Hypothesis Testing One sample tests >2 sample."— Presentation transcript:

Similar presentations

About project

Feedback