Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),

Slides:



Advertisements
Similar presentations
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Advertisements

Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
SC968: Panel Data Methods for Sociologists Random coefficients models.
1 FE Panel Data assumptions. 2 Assumption #1: E(u it |X i1,…,X iT,  i ) = 0.
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
Matched designs Need Matched analysis. Incorrect unmatched analysis. cc cc exp,exact Proportion | Exposed Unexposed | Total Exposed
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
CBER Selecting the Appropriate Statistical Distribution for a Primary Analysis P. Lachenbruch.
Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Modelling risk ratios and risk differences …this is *new* methodology…
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
BINARY CHOICE MODELS: LOGIT ANALYSIS
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL.
Social patterning in bed-sharing behaviour A longitudinal latent class analysis (LLCA)
VIII. POISSON REGRESSION WITH MULTIPLE EXPLANATORY VARIABLES.  Generalization of Poisson regression model to include multiple covariates  Analyzing a.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),
Count Models 1 Sociology 8811 Lecture 12
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD.
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
From t-test to multilevel analyses Del-3
QM222 Class 11 Section D1 1. Review and Stata: Time series data, multi-category dummies, etc. (chapters 10,11) 2. Capturing nonlinear relationships (Chapter.
BINARY LOGISTIC REGRESSION
EHS Lecture 14: Linear and logistic regression, task-based assessment
The comparative self-controlled case series (CSCCS)
From t-test to multilevel analyses Del-2
Discussion: Week 4 Phillip Keung.
Lecture 18 Matched Case Control Studies
From t-test to multilevel analyses (Linear regression, GLM, …)
Sec 9C – Logistic Regression and Propensity scores
Assessing Disclosure Risk in Microdata
QM222 Class 8 Section A1 Using categorical data in regression
Introduction to Logistic Regression
MON TUE WED THU
Regression diagnostics
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Problems with infinite solutions in logistic regression
Selecting the Appropriate Statistical Distribution for a Primary Analysis P. Lachenbruch.
Sun Mon Tue Wed Thu Fri Sat
Case-Crossover Analysis in Air Pollution Epidemiology
CMGPD-LN Methodological Lecture Day 4
Count Models 2 Sociology 8811 Lecture 13
Sun Mon Tue Wed Thu Fri Sat
Common Statistical Analyses Theory behind them
2016 | 10 OCT SUN MON TUE WED THU FRI SAT
Sun Mon Tue Wed Thu Fri Sat
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins
Modeling Ordinal Associations Bin Hu
Introduction to Econometrics, 5th edition
Presentation transcript:

Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC), Barcelona, Spain Ben Armstrong and Antonio Gasparrini London School of Hygiene and Tropical Medicine, UK 2014 UK Stata Users Group meeting London, 12th September 2014

Background The time stratified case crossover design is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes

The case-crossover design Proposed by Maclure (1991) to study transient effects on the risk of acute health events Have you made ​​any unusual activity … Health event … compared to your usual routine? Only cases are sampled assessing changes in exposure

The case-crossover design Proposed by Maclure (1991) to study transient effects on the risk of acute health events Exposed? Exposed? Health event Control period Risk period Only cases are sampled assessing changes in exposure

The case-crossover design Proposed by Maclure (1991) to study transient effects on the risk of acute health events Analysis likewise a matched case-control Exposed? Exposed? Health event Control period Risk period Only cases are sampled assessing changes in exposure Control Exp. No Risk period a c b d OR = b/c

Application to environmental studies Firstly adapted to air pollution studies Philadelphia (Neas et al. 1999) and Barcelona (Sunyer et al. 2000) comparing exposure levels for a given day (t) when health event occurs vs. levels before (t-7) and after (t+7) the health event Allows to control for time-trend and seasonality by design, since compares exposure levels between same weekdays within each month of each year

Time-stratified case-crossover There is a large industry on how to choose the referent window (Wt). It has been shown that this method of analysis sometimes can produce ‘overlap bias’. Matched case-control studies require the distribution in each matched set to be excangeable. This exchangeability condition may not be met since samplig the reference window (Wt) determines the exposure. (Figueiras et al. 2010)

Option 1: conditional logistic regression Dataset needs to be reshaped from time-series format to individual matched case-control by using the new command, . mkcco vary, date(vardate) This also creates three new variables, case case days (=1) wn weight observations by n events on case day ccset set num. to match case-control days Then, use Stata’s clogit command

. use madrid, clear . list date y pm10 temp in 1/31, noobs clean date y pm10 temp 01jan2003 64 14 9.6 02jan2003 56 12 11 03jan2003 73 16 9.8 04jan2003 69 14 8.8 05jan2003 71 17 4.3 06jan2003 68 9 7 07jan2003 63 19 3 08jan2003 85 13 6.8 09jan2003 67 13 4.6 10jan2003 80 22 2.7 11jan2003 65 23 1 12jan2003 95 17 .85 13jan2003 60 45 .5 14jan2003 76 60 .45 15jan2003 77 72 1.3 16jan2003 75 54 2.6 17jan2003 76 65 2 18jan2003 75 43 .8 19jan2003 74 12 6.3 20jan2003 79 19 5.8 21jan2003 73 16 8 22jan2003 72 21 6.7 23jan2003 72 25 7 24jan2003 74 46 6.2 25jan2003 59 42 8.1 26jan2003 77 26 10 27jan2003 64 22 15 28jan2003 65 41 9.7 29jan2003 74 18 5.7 30jan2003 77 17 4.8 31jan2003 55 17 2.3

Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 . generate dow = dow(date) . list date y pm10 temp in 1/31 if dow==1, noobs clean date y pm10 temp 06jan2003 68 9 7 13jan2003 60 45 .5 20jan2003 79 19 5.8 27jan2003 64 22 15 . mkcco y, date(date) Dataset reshaped from 1096 to 5480 obs. (1096 case days and 4384 control days). New variables case, wn and ccset have been added to the resaphed dataset. Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 . list date y pm10 temp case ccset wn in 1/36 if dow==1, noobs clean date y pm10 temp case ccset wn 06jan2003 68 9 7 1 2003021 68 13jan2003 60 45 .5 0 2003021 68 20jan2003 79 19 5.8 0 2003021 68 27jan2003 64 22 14.8 0 2003021 68 ... Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... date y pm10 temp case ccset wn 06jan2003 68 9 7 0 2003022 60 13jan2003 60 45 .5 1 2003022 60 20jan2003 79 19 5.8 0 2003022 60 27jan2003 64 22 14.8 0 2003022 60 Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... date y pm10 temp case ccset wn 06jan2003 68 9 7 0 2003023 79 13jan2003 60 45 .5 0 2003023 79 20jan2003 79 19 5.8 1 2003023 79 27jan2003 64 22 14.8 0 2003023 79 Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... date y pm10 temp case ccset wn 06jan2003 68 9 7 0 2003024 64 13jan2003 60 45 .5 0 2003024 64 20jan2003 79 19 5.8 0 2003024 64 27jan2003 64 22 14.8 1 2003024 64 Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

. clogit case pm10 [w=wn], group(ccset) (frequency weights assumed) ... Conditional (fixed-effects) logistic regression Number of obs = 295025 LR chi2(1) = 8.39 Prob > chi2 = 0.0038 Log likelihood = -98913.41 Pseudo R2 = 0.0000 ------------------------------------------------------------------------------ case | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pm10 | .0007406 .0002552 2.90 0.004 .0002404 .0012408

Option 1: conditional logistic regression Advantage Similar approach to a matched case-control study, being familiar to the epidemiologist Limitations Unfriendly data management Large (huge) data-sets at individual level Slow convergence when fitting interaction terms for individual factors (sex, age) Cannot account for overdispersion and residual autocorrelation

Option 2: Poisson regression Lu and Zeger (2007) “ … the time-stratified case-crossover design leads to the same estimate as obtained from a Poisson regression with dummy variables indicating the strata” Strata defined by the 3-way interaction between year, month and day of week Either poisson or glm (jointly with the scale option) Stata’s commands can be used

. use madrid, clear . generate yy = year(date) . generate mm = month(date) . generate dow = dow(date) . poisson y pm10 i.yy##i.mm#i.dow ... Poisson regression Number of obs = 1096 LR chi2(252) = 1374.12 Prob > chi2 = 0.0000 Log likelihood = -3787.589 Pseudo R2 = 0.1535 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pm10 | .0007406 .0002552 2.90 0.004 .0002404 .0012408 | ... 252 parameters! _cons | 4.35927 .0563535 77.36 0.000 4.248819 4.469721

. glm y pm10 i.yy##i.mm#i.dow, family(poisson) link(log) scale(x2) ... Generalized linear models No. of obs = 1096 Optimization : ML Residual df = 843 Scale parameter = 1 Deviance = 1069.73306 (1/df) Deviance = 1.26896 Pearson = 1065.616046 (1/df) Pearson = 1.264076 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] AIC = 7.373338 Log likelihood = -3787.588997 BIC = -4830.78 ------------------------------------------------------------------------------ | OIM y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pm10 | .0007406 .0002869 2.58 0.010 .0001782 .001303 | _cons | 4.35927 .0633589 68.80 0.000 4.235088 4.483451 (Standard errors scaled using square root of Pearson X2-based dispersion.)

Option 2: Poisson regression Advantages Avoids reshape of dataset keeping original time-series format Can account for overdispersion and residual autocorrelation Limitations Large number of interaction parameters, slowing the convergence time (even making difficult it with missing data) Even slower when fitting interaction terms for individual factors (sex, age)

Option 3: Conditional Poisson regression Armstrong and Gasparrini (ISEE 2011) “ … Poisson models with large numbers of indicator variables can alternatively be fit with conditional Poisson models, conditioning on numbers of events in the time stratum” Firstly proposed by Farrington (1995) for the self-controlled case-series design for vaccine safety studies Strata defined by grouping same weekday within each month of each year, and then use Stata’s xtpoisson command jointly with the fe option But overdispersion needs to accounted for with the new xtpois_ovdp command

. egen set = group(yy mm dow) . xtset set date ... . xtpoisson y pm10, fe Conditional fixed-effects Poisson regression Number of obs = 1096 Group variable: set Number of groups = 252 Obs per group: min = 4 avg = 4.3 max = 5 Wald chi2(1) = 8.42 Log likelihood = -2854.4979 Prob > chi2 = 0.0037 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pm10 | .0007406 .0002552 2.90 0.004 .0002404 .0012408 . xtpois_ovdp Estimate and standard errors corrected for over-dipersion df: 843 ; pearson x2: 1065.6 ; dispersion: 1.26 y | Coef. Std. Err. z P>|z| [95% Conf. Interval] pm10 | .0007406 .0002869 2.58 0.010 .0001782 .001303

CPU time My example (3 years data, 60 events/day) clogit 0.181 sec. poisson 2.111 sec. xtpoisson 0.066 sec.

Option 3: Conditional Poisson regression Advantages Avoids reshape of dataset keeping original time-series format Can account for overdispersion and residual autocorrelation Easy fit of interaction terms for individual factors (sex, age) with faster convergence Limitations ?

Summary Case-crossover studies can easily be analysed using Stata Conditional logistic regression requires unfriendly data management, and large (huge) data-sets at individual level Poisson, 3-way interaction, regression requires to fit large number of interaction parameters, making it difficult convergence Conditional Poisson seems to solve both

Thanks for your attention!

Thanks for your attention!