Download presentation
Presentation is loading. Please wait.
Published byGervais Fields Modified over 9 years ago
1
TWO-STAGE CASE-CONTROL STUDIES USING EXPOSURE ESTIMATES FROM A GEOGRAPHICAL INFORMATION SYSTEM Jonas Björk 1 & Ulf Strömberg 2 1 Competence Center for Clinical Research 2 Occupational and Environmental Medicine Lund University Hospital
2
OUTLINE OF TALK Previous project: What have we done? (Jonas Björk) Ongoing project: What shall we do? (Ulf Strömberg)
3
Two-stage procedure for case- control studies 1 st stage Complete data obtained from registries Disease status General characteristics Group affiliation (e.g. occupation or residential area) Group-level exposure X G 2 nd stage Individual exposure data for a subset of the 1 st stage sample
4
Exposure database group-level exposure JEM = Job Exposure Matrix Occupational group proportion exposed GIS Residential group (area) average concentration of an air pollutant
5
JEM - proportion exposed Most data typically in groups with low X G
6
Linear Relation between Proportion Exposed and Relative Risk No confounding between/within groups Example: RR (exposed vs. unexposed) = 2.0 Proportion exposed X G Average RR 0%1.0 10%0.10 * 2 + 0.9 +1.0 =1.1 50%1.5 100%2.0
7
Linear OR model: OR(X G ) = 1 + β X G X G = Exposure proportion OR for exposed vs. unexposed = OR(1) = 1 + β 1 OR(1) XGXG 0 1 Most data typically in groups with low X G
8
Confounding between groups General confounders (eg, gender and age) can normally be adjusted for Assuming no confounding within groups and no effect modification in any stratum s k : OR(X G ;s 1, s 2,...s k ) = (1 + β X G ) exp(Σγ k s k )
9
Combining 1 st and 2 nd stage data Assumption: 2 nd stage data missing at random condition on disease status and 1 st stage group affiliation For subjects with missing 2 nd stage data: Use 1 st stage data to calculate expected number of exposed/unexposed Expectation-maximization (EM) algorithm
10
EM-algorithm (Wacholder & Weinberg 1994) 1.Select a starting value, e.g. OR=1 2.E-step Among the non-participants, calculate expected number of exposed/unexposed case and controls in each group 3.M-step Maximize the likelihood for observed+expected cell frequencies using the chosen risk model for individual-level data (not necessarily linear) New OR-estimate 4. Repeat 2. and 3. until convergence
11
E-step in our situation (Strömberg & Björk, submitted) m 0 controls with missing 2 nd stage data m 0 * X G = expected number of exposed m 1 cases with missing 2 nd stage data m 1 * X G * ÔR / [1+(ÔR-1)* X G ] ÔR = Current OR-estimate Complete the data in each group G:
12
Simulated case-control studies 400 cases, 1200 controls in the 1 st stage 2 nd stage participation 75% of the cases 25% of the controls Selective participation of 2 nd stage controls Corr(Participation, X G ) =0, > 0, <0 1000 replications in each scenario True OR = 3
13
Simulations - Results Participation1 st stage data only (400 + 1200) 2 nd stage data only (300 + 300) EM-method (400 + 1200) ORSDCoverageORSDCoverageORSDCoverage Corr(Part., X G )=03.00.1895.0%3.00.2395.6%3.00.1595.5% Corr(Part., X G )<03.00.1895.0%5.30.2945.8%3.00.1595.0% Corr(Part., X G )>03.00.1895.0%1.80.2032.9%3.00.1595.5% SD = Empirical standard deviation of the ln(OR) estimates Coverage = Coverage of 95% confidence intervals
14
Simulations - Conclusions Combining 1 st and 2 nd stage data, using the EM method can: 1. Improve precision 2. Remove bias from selective participation Method is sensitive to errors in the (1 st stage) external exposure data!
15
Simulations – Conclusions II EM-method is sensitive to 1.Violations of the MAR-assumption (condition on on disease status and 1 st stage group affiliation) 2. Errors in the (1 st stage) external exposure data
16
Ongoing methodological research project Focus on exposure estimates from a GIS
17
GIS data: NO2 (Scania)
18
Two-stage exposure assessment procedure X G = 4.8 X G = 10.1 X G = 20.1... x i 1 st stage: X G represents mean exposure levels rather than proportion exposed x i 2 nd stage: x i is a continuous, rather than a dichotomous, exposure variable
19
Assume a linear relation between and x i and disease odds (cf. radon exposure and lung cancer [Weinberg et al., 1996]). xixi Odds For the ”only 1 st stage” subjects: no bias expected by using their X G :s (Berkson errors) provided MAR in each group – independent of disease status. EM method? Exposure variation in each group?
20
Two-stage exposure assessment procedure – related work Multilevel studies with applications to a study of air pollution [Navidi et al., 1994]: pooling exposure effect estimates based on individual-level and group-level models, respectively
21
Collecting data on confounders or effect modifiers at 2 nd stage X G = 4.8 X G = 10.1 X G = 20.1... c i 1 st stage: X G = mean exposure levels c i 2 nd stage: c i is a covariate, e.g. smoking history
22
Data on confounders or effect modifiers at 2 nd stage – estimation of exposure effect Confounder adjustment based on logistic regression: pseudo-likelihood approach [Cain & Breslow, 1988] More general approach: EM method [Wacholder & Weinberg, 1994]
23
Design stage (“stage 0”) Group 1 Group 2 Group 3... Subjects? 1 st stage: How many geographical areas (groups)? ? ? 2 nd stage: Fractions of the 1 st stage cases and controls?
24
Design stage – related work Two-stage exposure assessment: power depends more strongly on the number of groups than on the number of subjects per group [Navidi et al., 1994]
25
References I Björk & Strömberg. Int J Epidemiol 2002;31:154-60. Strömberg & Björk. “Incorporating group- level exposure information in case-control studies with missing data on dichotomous exposures”. Submitted.
26
References II Cain & Breslow. Am J Epidemiol 1988;128:1198- 1206. Navidi et al. Environ Health Perspect 1994;102(Suppl 8):25-32. Wacholder & Weinberg. Biometrics 1994;50:350-7. Weinberg et al. Epidemiology 1996;7:190-7.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.