Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),

Similar presentations


Presentation on theme: "Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),"— Presentation transcript:

1 Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata
Aurelio Tobías Spanish Council for Scientific Research (CSIC), Barcelona, Spain Ben Armstrong and Antonio Gasparrini London School of Hygiene and Tropical Medicine, UK 2014 UK Stata Users Group meeting London, 12th September 2014

2 Background The time stratified case crossover design is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes

3 The case-crossover design
Proposed by Maclure (1991) to study transient effects on the risk of acute health events Have you made ​​any unusual activity … Health event … compared to your usual routine? Only cases are sampled assessing changes in exposure

4 The case-crossover design
Proposed by Maclure (1991) to study transient effects on the risk of acute health events Exposed? Exposed? Health event Control period Risk period Only cases are sampled assessing changes in exposure

5 The case-crossover design
Proposed by Maclure (1991) to study transient effects on the risk of acute health events Analysis likewise a matched case-control Exposed? Exposed? Health event Control period Risk period Only cases are sampled assessing changes in exposure Control Exp. No Risk period a c b d OR = b/c

6 Application to environmental studies
Firstly adapted to air pollution studies Philadelphia (Neas et al. 1999) and Barcelona (Sunyer et al. 2000) comparing exposure levels for a given day (t) when health event occurs vs. levels before (t-7) and after (t+7) the health event Allows to control for time-trend and seasonality by design, since compares exposure levels between same weekdays within each month of each year

7 Time-stratified case-crossover
There is a large industry on how to choose the referent window (Wt). It has been shown that this method of analysis sometimes can produce ‘overlap bias’. Matched case-control studies require the distribution in each matched set to be excangeable. This exchangeability condition may not be met since samplig the reference window (Wt) determines the exposure. (Figueiras et al. 2010)

8 Option 1: conditional logistic regression
Dataset needs to be reshaped from time-series format to individual matched case-control by using the new command, . mkcco vary, date(vardate) This also creates three new variables, case case days (=1) wn weight observations by n events on case day ccset set num. to match case-control days Then, use Stata’s clogit command

9 . use madrid, clear . list date y pm10 temp in 1/31, noobs clean date y pm10 temp 01jan 02jan 03jan 04jan 05jan 06jan 07jan 08jan 09jan 10jan 11jan 12jan 13jan 14jan 15jan 16jan 17jan 18jan 19jan 20jan 21jan 22jan 23jan 24jan 25jan 26jan 27jan 28jan 29jan 30jan 31jan

10 Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
. generate dow = dow(date) . list date y pm10 temp in 1/31 if dow==1, noobs clean date y pm10 temp 06jan 13jan 20jan 27jan . mkcco y, date(date) Dataset reshaped from 1096 to 5480 obs. (1096 case days and 4384 control days). New variables case, wn and ccset have been added to the resaphed dataset. Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

11 Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
. list date y pm10 temp case ccset wn in 1/36 if dow==1, noobs clean date y pm10 temp case ccset wn 06jan 13jan 20jan 27jan ... Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

12 Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
... date y pm10 temp case ccset wn 06jan 13jan 20jan 27jan Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

13 Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
... date y pm10 temp case ccset wn 06jan 13jan 20jan 27jan Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

14 Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
... date y pm10 temp case ccset wn 06jan 13jan 20jan 27jan Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

15 . clogit case pm10 [w=wn], group(ccset)
(frequency weights assumed) ... Conditional (fixed-effects) logistic regression Number of obs = LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R = case | Coef. Std. Err z P>|z| [95% Conf. Interval] pm10 |

16 Option 1: conditional logistic regression
Advantage Similar approach to a matched case-control study, being familiar to the epidemiologist Limitations Unfriendly data management Large (huge) data-sets at individual level Slow convergence when fitting interaction terms for individual factors (sex, age) Cannot account for overdispersion and residual autocorrelation

17 Option 2: Poisson regression
Lu and Zeger (2007) “ … the time-stratified case-crossover design leads to the same estimate as obtained from a Poisson regression with dummy variables indicating the strata” Strata defined by the 3-way interaction between year, month and day of week Either poisson or glm (jointly with the scale option) Stata’s commands can be used

18 . use madrid, clear . generate yy = year(date) . generate mm = month(date) . generate dow = dow(date) . poisson y pm10 i.yy##i.mm#i.dow ... Poisson regression Number of obs = LR chi2(252) = Prob > chi2 = Log likelihood = Pseudo R = y | Coef. Std. Err z P>|z| [95% Conf. Interval] pm10 | | parameters! _cons |

19 . glm y pm10 i.yy##i.mm#i.dow, family(poisson) link(log) scale(x2)
... Generalized linear models No. of obs = Optimization : ML Residual df = Scale parameter = Deviance = (1/df) Deviance = Pearson = (1/df) Pearson = Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] AIC = Log likelihood = BIC = | OIM y | Coef. Std. Err z P>|z| [95% Conf. Interval] pm10 | | _cons | (Standard errors scaled using square root of Pearson X2-based dispersion.)

20 Option 2: Poisson regression
Advantages Avoids reshape of dataset keeping original time-series format Can account for overdispersion and residual autocorrelation Limitations Large number of interaction parameters, slowing the convergence time (even making difficult it with missing data) Even slower when fitting interaction terms for individual factors (sex, age)

21 Option 3: Conditional Poisson regression
Armstrong and Gasparrini (ISEE 2011) “ … Poisson models with large numbers of indicator variables can alternatively be fit with conditional Poisson models, conditioning on numbers of events in the time stratum” Firstly proposed by Farrington (1995) for the self-controlled case-series design for vaccine safety studies Strata defined by grouping same weekday within each month of each year, and then use Stata’s xtpoisson command jointly with the fe option But overdispersion needs to accounted for with the new xtpois_ovdp command

22 . egen set = group(yy mm dow)
. xtset set date ... . xtpoisson y pm10, fe Conditional fixed-effects Poisson regression Number of obs = Group variable: set Number of groups = Obs per group: min = avg = max = Wald chi2(1) = Log likelihood = Prob > chi = y | Coef. Std. Err z P>|z| [95% Conf. Interval] pm10 | . xtpois_ovdp Estimate and standard errors corrected for over-dipersion df: 843 ; pearson x2: ; dispersion: y | Coef. Std. Err z P>|z| [95% Conf. Interval] pm10 |

23 CPU time My example (3 years data, 60 events/day) clogit 0.181 sec.
poisson sec. xtpoisson sec.

24 Option 3: Conditional Poisson regression
Advantages Avoids reshape of dataset keeping original time-series format Can account for overdispersion and residual autocorrelation Easy fit of interaction terms for individual factors (sex, age) with faster convergence Limitations ?

25 Summary Case-crossover studies can easily be analysed using Stata
Conditional logistic regression requires unfriendly data management, and large (huge) data-sets at individual level Poisson, 3-way interaction, regression requires to fit large number of interaction parameters, making it difficult convergence Conditional Poisson seems to solve both

26 Thanks for your attention!

27 Thanks for your attention!


Download ppt "Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),"

Similar presentations


Ads by Google