Download presentation
Presentation is loading. Please wait.
Published bySusan Fisher Modified over 8 years ago
1
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model
2
2 Relative risk regression models Hazard rate for individual i Relative risk for individual i depends on covariates x i =(x i1, x i2, …, x ip ), possibly time-dependent relative risk (hazard ratio) baseline hazard Cox: Excess relative risk:
3
3 Cohort data with delayed entry Study time individuals at risk (arrows are censored observations)
4
4 Need information on covariates for all individuals at risk Estimate regression coefficients by maximizing Cox's partial likelihood The partial likelihood is a product over all failure times (event times) The contribution for the individual i j failing at t j is R j is the risk set at t j n(t) is number at risk at t–
5
5 Cohort sampling designs Expensive to collect and check (!) covariate information for all individuals in large cohorts Also not necessary when there are few events — the cases carry most of the statistical information Useful to have cohort sampling designs where one only needs to collect covariate information for cases and controls –Nested case-control –Case-cohort
6
6 Classical nested case-control design Select at random at each failure time m – 1 controls among the n(t j ) – 1 non-failures at risk Illustration for m = 2 case control
7
7 Counter-matched nested case-control design The statistical information in a sampled risk set (a case and its controls) depends on the variation of the covariate values within the set We may obtain "large" variation of an exposure of interest by counter-matching on (i) a surrogate measure for the exposure (ii) exposure when correcting for a confounder Classify each individual at risk into one of L strata based on information available for everyone. Select the controls by stratified random sampling
8
8 Want a specified number m l from each stratum l in a sampled risk set (a case and its controls) Illustration for L = 2 and m 1 = m 2 = 1 Select m l controls among those at risk in stratum l, except for the case's stratum s where only m s – 1 controls are selected
9
9 A sampling design is described by its sampling distribution Classical nested case-control design: If individual i fails at time t the probability of selecting r as the sampled risk set is Sampled risk setconsists of the case i j and the m – 1 controls (we assume that r is a subset of the risk set, that r is of size m and that i is in r )
10
10 Counter-matched nested case-control design: (under suitable assumptions on the set r) If individual i in stratum s(i) fails at time t the probability of selecting r as the sampled risk set is Denote by n l (t) the number at risk in stratum l at time t –
11
11 Partial likelihood Introduce the counting process N (i,r) (t) counting the number of times in [ 0,t] that individual i fails and the sampled risk set equals r This takes the form: hazard rate at risk indicator sampling probability Corresponding intensity process:
12
12 Introduce the aggregated processes:s Probability that individual i fails given that a failure occurs at t and given that the sampled risk set is r : Partial likelihood is a product of such factors over all failures and sampled risk set occurrences (after cancelling common factors)
13
13 Contribution to the partial likelihood from a sampled risk set: Classical nested case-control: Counter-matched case-control: May estimate regression parameters by software for relative risk regression (Cox, etc) that allows for "offsets". By similar counting process arguments as for the full cohort, one may show that the usual large sample likelihood methods apply.
14
14 Uranium miners cohort 3347 uranium miners from Colorado Plateau included in study cohort 1950-60 Followed-up until end of 1982 258 lung cancer deaths Interested in effect of radon and smoking exposure on the risk of lung cancer death Have exposure information for the full cohort. Will use cohort sampling for illustration
15
15 DesignRadon (b 1 )Smoke (b 1 ) 1:1 case-control0.42 (0.20)0.23 (0.10) 1:1 counter-matched0.39 (0.14)0.25 (0.10) 1:3 case-control0.43 (0.16)0.20 (0.07) 1:3 counter-matched0.41 (0.13)0.19 (0.07) Full cohort0.38 (0.11)0.17 (0.05) Countermatch on radon exposure quartiles. Fit excess relative risk model: x i1 = cumulative radon (100 WLMs) x i2 = cumulative smoking (1000 PACKS)
16
16 Classical case-cohort design Select at random a subcohort C consisting of a fraction p of the full cohort Illustration for p = 0,50 subcohort
17
17 Use a pseudo likelihood for estimation Software for relative risk regression (Cox, etc) may be "tricked" to do the estimation Contribution to pseudo likelihood for a case: Likelihood methods do not apply. Standard errors from statistical software need to be fixed, and likelihood ratio tests cannot be used
18
18 Stratified case-cohort design Select the subcohort by stratified random sampling of a fraction p s from stratum s Illustration for S = 2 and p 1 = p 2 = 0,50
19
19 Contribution to pseudo likelihood for a case: Weights: for i in stratum s Alternative versions of the pseudo likelihood are available
20
20 Simulation with one normal covariate: Stratify into two strata according to a binary surrogate that is available for everyone 10 % surrogate positive individuals Covariate N(0,1) for surrogate negative individuals and N( , ) for surrogate positive individuals Baseline and censoring adjusted to get 10% failures and 20% censoring before the ”closure of the study”
21
21 Simulation repeated 1000 times with 1000 individuals in the cohort 100 individuals in the subcohort for case-cohort 100 controls (on the average) for nested case-control Efficiencies in % relative to full cohort: =2 =4 =4 =1 =1 =2 Classical nested case-control 403227 Classical case-cohort393019 Counter matched nested case-control467675 Stratified case-cohort517172
22
22 Statistical inference: –Nested case-control (NCC): usual likelihood methods apply, and standard software may be used for the analysis –Case-cohort (CC): Likelihood methods are not valid, but statistical software may be "tricked" to do the analysis Statistical efficiency is about the same for the two design Missing covariates : –NCC: in a 1:1 design a sampled risk set is lost if covariate information is missing for the control –CC: missing covariates in the subcohort are less serious Nested case-control or case-cohort?
23
23 Logistics for prospective studies: –NCC: control sampling has to wait until cases occur –CC: subcohort can be selected at the outset Time scale for analysis: –NCC: must be decided before sampling of controls –CC: need not be decided before sampling of subcohort
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.