Lecture 11: Hypothesis Testing III

Slides:



Advertisements
Similar presentations
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
Advertisements

Departments of Medicine and Biostatistics
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
HSRP 734: Advanced Statistical Methods July 24, 2008.
ChiSq Tests: 1 Chi-Square Tests of Association and Homogeneity.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Survival analysis1 Every achievement originates from the seed of determination.
Chapter 14 Analysis of Categorical Data
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Log-linear and logistic models
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Introduction to Survival Analysis PROC LIFETEST and Survival Curves.
Chapter 9 Hypothesis Testing.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Chapter 12: Analysis of Variance
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Chapter 13: Inference in Regression
Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table Exercises.
Lecture 9: Hypothesis Testing One sample tests >2 sample.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
1 Survival Analysis Biomedical Applications Halifax SAS User Group April 29/2011.
HSRP 734: Advanced Statistical Methods July 10, 2008.
Education 793 Class Notes T-tests 29 October 2003.
NONPARAMETRIC STATISTICS
Mid-Term Review Final Review Statistical for Business (1)(2)
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
ANOVA (Analysis of Variance) by Aziza Munir
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Testing Hypotheses about Differences among Several Means.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 12 Inference About A Population.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
© Copyright McGraw-Hill 2000
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
Copyright © 2010 Pearson Education, Inc. Slide
Lecture 12: Cox Proportional Hazards Model
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Lecture 16: Regression Diagnostics I Proportional Hazards Assumption -graphical methods -regression methods.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Lecture 10: Hypothesis Testing II Weight Functions Trend Tests.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Chapter 12 Chi-Square Tests and Nonparametric Tests
BIOST 513 Discussion Section - Week 10
April 18 Intro to survival analysis Le 11.1 – 11.2
Lecture Slides Elementary Statistics Twelfth Edition
Applied Biostatistics: Lecture 2
Statistical Inference for more than two groups
Chapter 8: Inference for Proportions
Saturday, August 06, 2016 Farrokh Alemi, PhD.
Hypothesis testing. Chi-square test
Comparing Populations
Comparing two Rates Farrokh Alemi Ph.D.
Presentation transcript:

Lecture 11: Hypothesis Testing III Stratified Tests Renyi and Other Tests

Stratified Tests Adjust for a covariate Allows you to control for a confounder without using a regression approach However Like regression, if interaction is present, it won’t be detected Assumes the ‘treatment’ effect is the same across strata

Sometime Confusing “Stratified” analysis Sometimes Subgroup analysis Stratified “combined” test In this case, combined test Recall Mantel-Haenszel odds ratio

Notation Now three variables J = 1, 2,…., K indexes groups Outcome (time to event) Group variable (i.e. treatment) Strata variable (i.e. gender, cancer grade) J = 1, 2,…., K indexes groups S = 1, 2,…, M indexes strata

Similar to the Standard Test Formal hypothesis Now, Zj.(t) is represented by a sum

From there, inference is the same Chi-square test with K – 1 d.f. where S-1 is the inverse of the estimated variance covariance matrix For the 2 group scenario it can be reduced to a Z-score

Asymptotics Just like unstratified test, requires large N Here requires even larger- think about dividing the sample into M strata In most cases, there probably is not sufficient N

Small Example 20 subjects received 1 of two treatments 9 patients on treatment 1 11 patients received treatment 2 Patients also categorized by disease type 2 strata Question: Does the data show a treatment effect after adjusting for disease type?

Time Death Censor Trt Disease 1 5 6 2 8 37 49 58 79 11 50 51 62 67 73 86 90 96 97

What first Data in standard format Trt 1: 1, 5, 5+,6+, 8, 37, 49, 58, 79+ Trt 2: 11+, 50, 51, 62, 73, 86, 90, 96, 97, 97 We might first conduct a global test What is our hypothesis

Constructing Statistic

Calculate Statistic Z-statistic c2 statistic

Now Let’s Adjust for Disease Type Steps: Divide the data according to strata Calculate Zjs.(t) and Sum Zjs(t) and across strata to get Zj.(t) & Calculate your test statistic according to

Divide data By Strata Disease 1 Disease 2 Time Death Censor Trt 1 5 8 5 8 49 11 2 50 62 67 73 86 Time Death Censor Trt 6 1 37 58 79 51 2 90 96 97

Calculate and sgsms

Calculate and sgsms

Calculate the Statistic Z (or chi-square) What is our conclusion

R Code >times<-c(1,5,5,6,8,11,37,49,50,51,58,62,67,73,79,86,90,96,97,97) >trt<- c(1,1,1,1,1,2,1,1,2,2,1,2,2,2,1,2,2,2,2,2) >strat<-c(1,1,1,2,1,1,2,1,1,2,2,1,1,1,2,1,2,2,2,2) >death<-c(1,1,0,0,1,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1) #Global >survdiff(st~trt) Call: survdiff(formula = st ~ trt) N Observed Expected (O-E)^2/E (O-E)^2/V trt=1 9 6 2.63 4.329 6.1 trt=2 11 10 13.37 0.851 6.1 Chisq= 6.1 on 1 degrees of freedom, p= 0.0136

R Code #Stratified survdiff(st~trt + strata(strat)) Call: survdiff(formula = st ~ trt + strata(strat)) N Observed Expected (O-E)^2/E (O-E)^2/V trt=1 9 6 2.27 6.16 9.46 trt=2 11 10 13.73 1.02 9.46 Chisq= 9.5 on 1 degrees of freedom, p= 0.0021

BMT: Hodgkin’s & Non-Hodgkin’s Lymphoma Study included 43 BMT patients Is there a difference in hazard rates between Allogenic transplant = HLA matched sibling donor (N=16) Autogenic transplant = Own “cleaned” marrow (N=27) But want to adjust for disease state Non-Hodgkin’s lymphoma (N=23) Hodgkin’s disease (N=20)

Global Test 2 1 43 16 0.628 0.234 4 42 15 0.643 0.230 28 41 14 0.659 0.225 30 40 13 -0.325 0.219 32 39 0.667 0.222 … 132 22 7 -0.318 0.217 140 21 -0.333 252 18 -0.389 0.238 357 0.563 0.246 Sum 0.886 5.841

Global Results Global Test Results > dat<-read.csv("C:\\BJW\\AutoAllo.csv") > d<-dat$death; t<-dat$time > dis<-dat$disease; type<-dat$graft > nostrat<-survdiff(Surv(t, d)~type) > nostrat Call: survdiff(formula = Surv(t, d) ~ type) N Observed Expected (O-E)^2/E (O-E)^2/V type=1 16 10 9.11 0.0862 0.134 type=2 27 16 16.89 0.0465 0.134 Chisq= 0.1 on 1 degrees of freedom, p= 0.714

Stratified by Disease Type Non-Hodgkin’s Lymphoma subjects 28 1 23 11 0.522 0.250 32 22 10 0.545 0.248 42 21 9 -0.429 0.245 49 20 0.550 53 19 8 -0.421 0.244 57 18 -0.444 0.247 63 17 -0.471 0.249 81 2 16 -1.000 0.467 84 14 0.429 140 13 7 -0.538 252 -0.636 0.231 357 0.300 0.210 524 6 -0.750 0.188 Sum -2.344 3.319

Stratified by Disease Type Hodgkin’s Disease subjects 2 1 20 5 0.750 0.188 4 19 0.789 0.166 30 18 3 -0.167 0.139 36 17 -0.176 0.145 41 16 -0.188 0.152 52 15 -0.200 0.160 62 14 -0.214 0.168 72 13 0.769 0.178 77 12 0.833 79 11 0.909 0.083 108 10 0.000 132 9 sum 3.106 1.518

Stratified Results Stratified Test Results > strat<-survdiff(Surv(t, d)~type + strata(dis)) > strat Call: survdiff(formula = Surv(t, d) ~ type + strata(dis)) N Observed Expected (O-E)^2/E (O-E)^2/V type=1 16 10 9.24 0.0629 0.12 type=2 27 16 16.76 0.0347 0.12 Chisq= 0.1 on 1 degrees of freedom, p= 0.729

Stratified Results Stratified Test Results Again we fail to reject This seems in error (recall (our survival curves looked VERY different)

Problem? The treatment effect is not the same in the 2 disease states They are in different directions ZAllo = -2.344 ZAuto = 3.106 Stratified approach is NOT appropriate

Alternative to Stratified Analysis Alternatives Define 4 groups and conduct a K-sample log rank test Allogenic and NHL Allogenic and Hodgkin’s Autogenic and NHL Autogenic and Hodgkin’s Subgroup analysis (by disease) should be performed Allo|Hodgkins Allo|Non-Hodgkins

R Code- K sample test > allgrp<-ifelse(dis==1 & type==1, 1, 0) > allgrp<-ifelse(dis==1 & type==2, 2, allgrp) > allgrp<-ifelse(dis==2 & type==1, 3, allgrp) > allgrp<-ifelse(dis==2 & type==2, 4, allgrp) > grp4<-survdiff(Surv(t, d)~allgrp) > grp4 Call: survdiff(formula = Surv(t, d) ~ allgrp) N Observed Expected (O-E)^2/E (O-E)^2/V allgrp=1 11 5 7.67 0.927 1.350 allgrp=2 12 9 7.45 0.324 0.459 allgrp=3 5 5 1.45 8.721 9.567 allgrp=4 15 7 9.44 0.631 0.997 Chisq= 11.1 on 3 degrees of freedom, p= 0.0113

R Code- Subgroup analysis > ### Subgroup (NHL) > subNHL<-survdiff(Surv(t,d)[which(dis==1)]~type[which(dis==1)]) > subNHL Call: survdiff(formula = Surv(t, d)[which(dis == 1)] ~ type[which(dis ==1)]) N Observed Expected (O-E)^2/E (O-E)^2/V type[which(dis == 1)]=1 11 5 7.34 0.748 1.66 type[which(dis == 1)]=2 12 9 6.66 0.825 1.66 Chisq= 1.7 on 1 degrees of freedom, p= 0.198 > ### Subgroup (Hodgkins) > subHD<-survdiff(Surv(t,d)[which(dis==2)]~type[which(dis==2)]) > subHD survdiff(formula = Surv(t, d)[which(dis == 2)] ~ type[which(dis ==2)]) N Observed Expected (O-E)^2/E (O -E)^2/V type[which(dis == 2)]=1 5 5 1.89 5.095 6.36 type[which(dis == 2)]=2 15 7 10.11 0.955 6.36 Chisq= 6.4 on 1 degrees of freedom, p= 0.0117

Summary: Stratified Testing Alternative to a regression approach to control for a 2nd covariate when examining treatment effect. Sample size needs to be larger that in the case of testing K-groups for test results to be valid. One needs to be cautious about misinterpreting null results when interactions exist. We can use a subgroup approach if this fails.

Renyi Tests Previous tests we discussed all use weighted integral of estimated difference in cumulative hazard rates Doesn’t address situation where early differences favor one group, and later differences favor another group Solution: Renyi tests i.e. addresses issue of crossing hazard rates

Renyi Test Censored data analogs of Kolmogrov-Smirnov statistic when comparing to uncensored samples Recall KS is a test of equality of one-dimensional probability distributions used to compare two samples

Komolgrov-Smirnov Test Recall empirical distribution function Hypothesis The KS statistic is

Example of a KS test Two groups observed for a continuous outcome: 1: -0.2, 3.7, 4.3, 5.0, 7.7, 8.6 2: -0.9, 0.4, 0.5, 2.6, 3.0, 12.1 We want to determine if the distribution of the outcomes are different (without assuming any distributional form…)

Constructing KS statistic x P(X1 < x) P(X2 < x) |P(X1 < x)-P(X2 < x)| -0.9 1/6 -0.2 0.4 1/3 0.5 1/2 2.6 2/3 3.0 5/6 3.7 4.3 5.0 7.7 8.6 1 12.1

K-S Test

Renyi Test Approach Find the value of Z(ti) for each failure time Note different from Z(t) which sums over all ti < t Calculate series of Z(ti) : Estimate the standard error of Z(t) (all times)

Renyi Statistic When hazard rates cross, the absolute value of Z(t) will have max value at some value t < t Hypothesis test: Note that multiple tests are made, because we are taking the max over Z(t)

Test Statistic Q Use the same variance estimate for test statistic as in standard two-sample approach Test statistic Q is approximated by distribution of sup{|B(x)|, 0 < x < 1} where B is a standard Brownian motion process Use table C.5 to find associated p-value

Small Example Given the following data Group 1: (7, 8+, 9, 15, 17)

Constructing the statistic dk dk1 Yk Yk1 Var

Calculating Q First we can calculate Q Once we have Q we compare to table C.5

Example 2: Kidney Infection Data on 119 kidney dialysis patients Comparing time to kidney infection between two groups Catheters placed percutaneously (n = 76) Catheters placed surgically (n = 43)

Example: Kidney Infection

R Code: Kidney Infection > kidney<-read.csv("H:\\public_html\\BMRTY722_Summer2015\\Data\\Kidney.csv") > time<-kidney$Time > infect<-kidney$d > percut<-kidney$cath > st<-Surv(time, infect) > LRtest<-survdiff(st~percut) > LRtest Call: survdiff(formula = st ~ percut) N Observed Expected (O-E)^2/E (O-E)^2/V percut=1 43 15 11 1.42 2.53 percut=2 76 11 15 1.05 2.53 Chisq= 2.5 on 1 degrees of freedom, p= 0.112

How to Test This in R? We could write our own R function to conduct the Renyi test… BUT, it turns out there was a package released in April that has the Renyi test (and all weight functions from K & M included )

R Code: Kidney Infection > library(survMisc) > RYtest<-comp(survfit(st~percut)) > RYtest $tne t n e n_percut=1 e_percut=1 n_percut=2 e_percut=2 1: 0.5 119 6 76 6 43 0 2: 1.5 103 1 60 0 43 1 … 16: 26.5 5 1 3 0 2 1 $tests$lrTests ChiSq df p Log-rank 2.529506318 1 0.11174 Gehan-Breslow (mod~ Wilcoxon) 0.002084309 1 0.96359 Tarone-Ware 0.402738202 1 0.52568 Peto-Peto 1.399160019 1 0.23686 Mod~ Peto-Peto (Andersen) 1.275908836 1 0.25866 Flem~-Harr~ with p=1, q=1 9.834062861 1 0.00171 $tests$supTests Q p Log-rank 1.590442 0.22347 Gehan-Breslow (mod~ Wilcoxon) 1.430499 0.30511 Tarone-Ware 1.260498 0.41467 Peto-Peto 1.166979 0.48551 Mod~ Peto-Peto (Andersen) 1.185549 0.47085 Renyi Flem~-Harr~ with p=1, q=1 7.460348 0.00000

R Code: Kidney Infection > library(survMisc) > RYtest<-comp(survfit(st~percut), FHp=0, FHq=0) > RYtest $tne t n e n_percut=1 e_percut=1 n_percut=2 e_percut=2 1: 0.5 119 6 76 6 43 0 2: 1.5 103 1 60 0 43 1 … 16: 26.5 5 1 3 0 2 1 $tests$lrTests ChiSq df p Log-rank 2.529506318 1 0.11174 Gehan-Breslow (mod~ Wilcoxon) 0.002084309 1 0.96359 Tarone-Ware 0.402738202 1 0.52568 Peto-Peto 1.399160019 1 0.23686 Mod~ Peto-Peto (Andersen) 1.275908836 1 0.25866 Flem~-Harr~ with p=0, q=0 2.529506318 1 0.11174 $tests$supTests Q p Log-rank 1.5904422 0.22347 Gehan-Breslow (mod~ Wilcoxon) 1.4304991 0.30511 Tarone-Ware 1.2604976 0.41467 Peto-Peto 1.1669791 0.48551 Mod~ Peto-Peto (Andersen) 1.1855486 0.47085 Renyi Flem~-Harr~ with p=0, q=0 0.9743145 0.65287

Example 3: Gastric Cancer Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy 45 Patients randomized to each of two arms Followed for up to 8 years

R Code: Gastric Cancer > RYtest<-comp(survfit(Surv(tm, dth)~x, data=dat)) > RYtest $tne t n_x=1 e_x=1 n_x=2 e_x=2 n e 1: 1 45 1 45 0 90 1 … 80: 2363 3 1 6 0 9 1 $tests$lrTests ChiSq df p Log-rank 0.23192760 1 0.63010 Gehan-Breslow (mod~ Wilcoxon) 3.99653918 1 0.04559 Tarone-Ware 1.92661766 1 0.16513 Peto-Peto 4.02844247 1 0.04474 Mod~ Peto-Peto (Andersen) 4.12061234 1 0.04236 Flem~-Harr~ with p=1, q=1 0.01112868 1 0.91598 $tests$supTests Q p Log-rank 2.200066 0.05560 Gehan-Breslow (mod~ Wilcoxon) 2.951879 0.00632 Tarone-Ware 2.677299 0.01484 Peto-Peto 2.965941 0.00604 Mod~ Peto-Peto (Andersen) 2.997885 0.00544 Renyi Flem~-Harr~ with p=1, q=1 9.388643 0.00000

Compare To Log Rank Renyi test 0.05< p <0.06 What would you expect to see from the log rank test? More or less significant?

LR Results > LRtest<-survdiff(Surv(tm, dth)~x) > LRtest Call: survdiff(formula = Surv(tm, dth) ~ x) N Observed Expected (O-E)^2/E (O-E)^2/V x=1 45 43 45.1 0.102 0.232 x=2 45 39 36.9 0.125 0.232 Chisq= 0.2 on 1 degrees of freedom, p= 0.63

Final Comments on the Renyi Test Simulations comparing the Renyi vs. log-rank Hazards cross  Renyi test performs better Renyi test has little loss of power if proportional hazard assumption holds (with limited censoring) However, with large amounts of censoring, advantages of the Renyi test decline So this tests provides a good alternative when hazard rates cross. But caution still needs to be taken when there is a large amount of censoring.

Other Tests for Crossing Hazards Cramer-von Mises test(s): Based on the integrated squared difference between two curves T-test analog: Requires estimation of the mean Compared area under S1(t) and S2(t) Brookmeyer-Crowley Censored version of two-sample median test

Cramer-Von Mises Test Based on the Nelson-Aalen estimator for the hazard rate and it’s associated variance Ideally we integrate over time 0 to t but this integral is estimated by summing over distinct death times

2-Sample T-test analog Again this test is based on the difference in the area under the survival curve between two groups Components of the test include: Order all observed times (event and censored) Calculate dij, cij, and Yij or both groups Calculate the KM estimator for survival and censoring Calculate the pooled KM estimate of survival

2-Sample T-test analog Once these estimates are obtained: Construct weight function Construct the test statistic Construct the variance of the test statistic Calculate a Z-score according to

Summary of Other 2-Sample Tests When the hazard rates cross, both the Cramer-Von Mises and the 2-sample t-test analog have greater power than log-rank. When hazard rates are proportional, both show power loss relative to log-rank. Performance is similar to the Renyi test when hazards cross but Renyi has better power for proportional hazards.

Test Based on Fixed Points in Time Complicated description in K&M (chapter 7.8) However, pretty simple idea when you are comparing two groups:

Next Time We will begin our discussion of semi-parametric regression modeling in survival analysis.