Inference from ecological models: air pollution and stroke using data from Sheffield, England. Ravi Maheswaran, Guangquan Li, Jane Law, Robert Haining, Marta Blangiardo, Sylvia Richardson, Nicky Best
Outline: 1.Background to the Sheffield study and results presented at Geomed From the Poisson to the Binomial model 3.Results 4.Conclusions
1. Nitrogen oxides (NO x ) and stroke mortality in Sheffield, England (Geomed 2005). Strokes account for 8%-12% of UK deaths Some evidence of a link between air pollution and stroke: studies of severe air pollution episodes (e.g 1952 London smog); analysis of daily time series (e.g. Kan et al (2003): Shanghai); cohort studies (e.g. Nafstad et al (2004): Norwegian males).
Since absolute number of deaths is small, power of tests even in large cohort studies is not large particularly for a factor that may not have a large effect. Small area ecological studies may help: - by providing another way of looking at the relationship; - by allowing the analysis of very large populations and at a much lower cost than a cohort study; - small areas are likely to be more homogeneous (than large areas) in terms of population characteristics thus reducing the risk of ecological bias.
Data Stroke mortality data: ICD9 codes ; c3k stroke deaths in population of c200k over 45; Aggregated by Enumeration District (c 150 households); age (5 year cohorts from 45 to 85+) and sex deaths per ED (min expected: 0.1; max:10.9)
Population data: (i) 1991 Census data on demography and deprivation (Townsend index); Recorded at the Enumeration District level (n=1030) (ii) Sheffield Health and Illness Prevalence survey (2000): Random sample stratified by ward; >10k respondents of whom >9.5k gave complete age, sex and smoking information. Average of 2.43 smokers per ED (Min expected: 0.19; max expected: 19.24)
Environmental data: Quantifying NO x exposure. The Indic-Airviro model:
Average annual mean pollution levels (exc 1998): NO x (ug/m 3 )
Areal Interpolation (from grid to ED): point in polygon – weighted PostPoint
NO x data transfered to the enumeration district framework after application of the weighted PostPoint method of areal interpolation
Poisson Model y i = number of stroke deaths in area i. y i ~ Poisson( i ) i = r i E i r i = underlying true area i specific relative risk. E i = expected number of deaths in area i standardized for age, sex and socio-economic deprivation: m = age-sex-deprivation specific mortality rate for population subgroup m. n i,m = size of population subgroup m in area i.
Generalized linear model: x i = NO x level in area i. z i ave = Smoking prevalence ratio in area i (spatial moving average using the observed and expected counts).
Poisson regression controlling for age, sex, deprivation and smoking prevalence. ParameterRel. Risk (95% CI) WinBUGS Rel. Risk (95% CI) SAS NO x category ( )1.48 ( ) ( )1.26 ( ) (0.98–1.24)1.10 ( ) ( )1.12 ( ) 111 Smoking: z ave 0.93 ( )0.93 ( ) DIC: Deviance/df=2.3
Bayesian hierarchical spatial model: Fitted to allow for overdispersion due to : - small area population heterogeneity; - missing covariates (that may be spatially autocorrelated). To allow for the uncertainty associated with the smoking data (small counts; missing values), an errors-in-variable model used for z i.
e i = unexplained area-specific log relative risk in area i after adjusting for x and z est. = v i + s i v i = unstructured random effects (zero-mean normal prior) s i = spatially structured random effects (zero-mean intrinsic conditional autoregressive prior). z i est = log[smoke.r i ] = smoke. + smoke.v i + smoke.s i
Priors: - flat priors used for , and . - gamma(0.5, ) used for the precision parameters of the random effect terms. Spatial fraction (SF): - Var(s i )/[Var(s i ) + Var(v i )]. Ratio of the estimate of the marginal variance of the spatial random effect to the sum of the estimated marginal variances of the spatial and the unstructured random effects. SF => 1 implies spatial heterogeneity dominates; SF => 0 implies unstructured heterogeneity dominates.
Poisson regression with spatial random effects, controlling for age, sex, deprivation and smoking prevalence ParameterRel. Risk (95% CI) WinBUGS NO x category ( ) ( ) ( ) ( ) 11 Smoking: z est 1.05 ( ) Spatial fraction (model; for smoking (0.006; 0.99) DIC=
Conclusions: Evidence of an association between NO x and stroke mortality: 1.threshold level for an effect; 2.effect size diminishes after including random effects to allow for overdispersion and missing variables; 3.spatially smoothing NO x to allow for local journeys did not make a difference to the size of the effect; 4.Unable to allow for long and short term population movements. 5.No association with smoking prevalence (effect of definition?; small sample sizes in some EDs?)
2. Fitting a Binomial Model -stroke is not contagious so outcomes for individuals are independent Bernoulli rvs and therefore at the area level they aggregate to Binomial rvs. - because stroke is relatively rare, the Poisson assumption should give similar results, but it is only an approximation. - we also have data on the proportion exposed to different levels of NO x at the ED level which was not previously used.
Ecological analysis Not-exposedExposedMargins Death Not Death Totals Unknown (but of interest) Observed (not previously used) Observed (and used in the previous analysis)
Within-ED population distribution by PostPoint.
Dichotomised individual level model x i,j is 0 (if individual j in area i is not exposed) or 1 (if individual j in area i is exposed). :stroke risk in not-exposed group in i :stroke risk in exposed group in i z i denotes other area level covariates (e.g. deprivation) v i ~ N(0, 2 ). An unstructured random effect to account for unmeasured covariates.
The person is in the exposed group The person is in the not-exposed group Depending on the exposure status of the individual: This can be extend to a categorical exposure variable with more than 2 levels. Various extensions of the model such as incorporating continuous exposure can be found in Jackson et al. (2006) Jackson, C. H., Best, N. G. and Richardson, S. Improving ecological inference using individual-level data. Statistics in Medicine (2006) 25(12):
An area-level model incorporating the distribution of within-area exposure where i = proportion of the population in area i in the exposed category. p i = probability of stroke death in area i, regardless of exposure.
Remark Note that applying a Binomial model with the proportion of exposed individuals as a covariate: But in general Ecological bias Derived from an individual level model
ParameterRel. Risk (95% CI) NO x categoryWithout unstr. R.E.With unstr. R.E (1.14 – 1.52)1.07 (0.88 – 1.29) (1.03 – 1.30)1.05 (0.86 – 1.25) (0.99 – 1.22)0.92 (0.75 – 1.10) (0.87 – 1.13)0.87 (0.73 – 1.04) 111 DIC: pD: 8 DIC: pD: Results Binomial regression controlling for age, sex (18 strata), deprivation and incorporating the within area distribution of exposure.
A dichotomised-exposure Binomial regression model controlling for age, sex (4 strata; 18 strata) and deprivation and incorporating data on the within area distribution of exposure. ParameterRel. Risk (95% CI): (4 strata) Rel. Risk (95% CI): (18 strata) NO x category Exposed1.20 (1.05 – 1.34)1.14 ( ) Non-exposed11 The exposed category comprises NO x categories 4 and 5 in the previous slide; The non-exposed category comprises categories 1, 2 and 3.
4. Conclusions 1.Incorporation of information on within area exposure resulted in a reduction of the estimated relative risk compared to the earlier set of results. 2.Lower risks in categories 2 and 3 in the binomial model with 5 exposure categories may indicate some confounding effects have not been accounted for in the current model; in the absence of additional information, these effects could be “averaged out” by combining some exposure categories. 3.Fitting a reduced model with two exposure categories does indicate a significant effect in the exposed group after adjusting for age, sex and deprivation; 4.Increasing the number of age-sex cohorts from 4 to 18 in the dichotomous-exposure model reduced the estimated relative risk to 1.14 (95% CI: 1.00, 1.30), but there is still evidence of a significant effect.
Differences between the current approach and the earlier modelling. – The Poisson model is prone to ecological bias since for exposure, only aggregated information was used. – Here we attempt to reduce the bias by utilizing data on the within-area distribution of exposure, i.e., the proportion of people in the exposed and non-exposed groups. – Deprivation was absorbed into the expected number of cases in the earlier work, here it has been included as a covariate. We could adjust for deprivation in the baseline risks. – There was no adjustment for smoking prevalence since it was not significant in the earlier modeling. The possibility exists of using lung cancer mortality as a proxy for smoking instead.