Download presentation
Presentation is loading. Please wait.
Published byHugo Carpenter Modified over 9 years ago
1
1 Using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint work with Bryan Langholz
2
2 Outline: Example: Uranium miners cohort Cohort model, data and martingale residuals Risk set sampling Martingale residuals and goodness-of- fit tests for sampled risk set data Concluding remarks
3
3 Uranium miners cohort: 3347 uranium miners from Colorado Plateau included in study cohort 1950-60 Followed-up until end of 1982 258 lung cancer deaths Interested in effect of radon and smoking exposure on the risk of lung cancer death Have exposure information for the full cohort. Will sample from the risk sets for illustration (e.g. Langholz & Goldstein, 1996)
4
4 Relative risk regression models Hazard rate for individual i Relative risk for individual i depends on covariates x i1, x i2, …, x ip (possibly time-dependent) relative riskbaseline hazard Cox: Excess relative risk:
5
5 Cohort data: Study time individuals at risk (arrows are censored observations)
6
6 t 1 < t 2 < t 3 < …. times of failures i j individual failing at t j ("case") Counting process for individual i : Intensity process l i (t) is given by
7
7 Cumulative intensity processes: Martingales: Martingale residual processes: at risk indicator hazard rate
8
8 Martingal residual processes may be used to assess goodness of fit: Plot individual martingale residuals Plot grouped martingale residual processes versus time (Aalen,1993; Grønnesby & Borgan,1996) versus covariates (Therneau, Grambsch & Flemming,1990) The latter may be extended to sampled risk set data
9
9 Risk set sampling Cohort studies need information on covariates for all individuals at risk Expensive to collect and check (!) this information for all individuals in large cohorts For risk set sampling designs one only needs to collect covariate information for the cases and a few controls sampled at the times of the failure
10
10 Select m –1 controls among the n(t) – 1 non-failures at risk if a case occurs at time t, i.e. match on study time Illustration for m = 2 case control
11
11 A sampling design for the controls is described by its sampling distribution The classical nested case-control design: If individual i fails at time t the probability of selecting the set r as the sampled risk set is A sampled risk setconsists of the case i j and its controls (we assume that r is a subset of the risk set, that r is of size m and that i is in r ) A number of sampling designs are available
12
12 Inference on the regression coefficients can be based on the partial likelihood The partial likelihood enjoys usual likelihood properties (Borgan, Goldstein & Langholz 1995) For the classical nested case-control design, the partial likelihood simplifies
13
13 Martingale residuals and goodness- of-fit tests for sampled risk set data Introduce the counting processes Intensity processes take the form:
14
14 Martingale residual processes: Corresponding martingales: The are of little practical use on their own, but they may be aggregated over groups of individuals to produce useful plots
15
15 For group g May be interpreted as "observed _ expected" number of failures in group g Asymptotic distribution may be derived using counting process methods Simplifies for classical nested case-control
16
16 Ilustration: uranium miners cohort Fit excess relative risk model: x i1 = cumulative radon (100 WLMs) x i2 = cumulative smoking (1000 packs) For classical nested case-control with three controls per case:
17
17 Aggregate martingale residual processes in three groups according to cumulative radon exposure: Groups: I: 1500 WLMs There are indications for an interaction between cumulative radon exposure and age
18
18 Age and groupObservedExpected Below 60 years & group I3030.7 Below 60 years & group II3945.9 Below 60 years & group III8173.4 Above 60 years & group I2727.7 Above 60 years & group II4536.1 Above 60 years & group III3644.2 Observed and expected number of failures in the groups for ages below and above 60 years: Chi-squared statistic with 2(3 – 1) = 4 df takes the value 10.5 (P-value 3.2%)
19
19 Concluding remarks Introduces a time aspect that is usually disregarded for sample risk set data Gives a similar model formulation as for cohort data and thereby opens up for similar methodo- logical developments as for cohort studies Grouped martingale residual processes is one example of this. They allow to check for time- dependent effects and other deviations from the model The counting process formulation of nested case-control studies:
20
20 How should the grouping be performed? How do specific deviations from the model turn up in the plots? Kolmogorov-Smirnov and Cramer von Mises type tests? (Durbin’s approximation, Lin et al’s simultation trick) Questions and further develoments of grouped martingale residual plots and related goodness-of-fit methods
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.