Inference from ecological models: air pollution and stroke using data from Sheffield, England. Ravi Maheswaran, Guangquan Li, Jane Law, Robert Haining,

Slides:



Advertisements
Similar presentations
Sources and effects of bias in investigating links between adverse health outcomes and environmental hazards Frank Dunstan University of Wales College.
Advertisements

How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Controlling for Time Dependent Confounding Using Marginal Structural Models in the Case of a Continuous Treatment O Wang 1, T McMullan 2 1 Amgen, Thousand.
Three or more categorical variables
Research Designs Commonly Used In Epidemiology. One of the basic concepts in research designs which are trying to discern cause is that we have to make.
BACKGROUND Benzene is a known carcinogen. Occupational exposure to benzene is an established risk factor for leukaemia. Less is known about the effects.
Nicky Best, Chris Jackson, Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London Studying.
Case-Control Studies (Retrospective Studies). What is a cohort?
Nicky Best and Chris Jackson With Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Is low-dose Aspirin use associated with a reduced risk of colorectal cancer ? a QResearch primary care database analysis Prof Richard Logan, Dr Yana Vinogradova,
Chapter 17 Comparing Two Proportions
GIS in Spatial Epidemiology: small area studies of exposure- outcome relationships Robert Haining Department of Geography University of Cambridge.
STAT262: Lecture 5 (Ratio estimation)
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
Peter Congdon, Centre for Statistics and Department of Geography, Queen Mary University of London. 1 Spatial Path Models with Multiple.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara.
1 1. Observations and random experiments Observations are viewed as outcomes of a random experiment.
Hierarchical models for combining multiple data sources measured at individual and small area levels Chris Jackson With Nicky Best and Sylvia Richardson.
Multiple Choice Questions for discussion
Advanced Statistics for Interventional Cardiologists.
Simple Linear Regression
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
TWO-STAGE CASE-CONTROL STUDIES USING EXPOSURE ESTIMATES FROM A GEOGRAPHICAL INFORMATION SYSTEM Jonas Björk 1 & Ulf Strömberg 2 1 Competence Center for.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Graphical models for combining multiple data sources
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
Chris Jackson With Nicky Best and Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London
October 15H.S.1 Causal inference Hein Stigum Presentation, data and programs at:
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Term 4, 2005BIO656 Multilevel Models1 Hierarchical Models for Pooling: A Case Study in Air Pollution Epidemiology Francesca Dominici.
Incorporating heterogeneity in meta-analyses: A case study Liz Stojanovski University of Newcastle Presentation at IBS Taupo, New Zealand, 2009.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
Department of SOCIAL MEDICINE Producing Small Area Estimates of the Need for Hip and Knee Replacement Surgery ANDY JUDGE Nicky Welton Mary Shaw Yoav Ben-Shlomo.
Stephen Fisher, Jane Holmes, Nicky Best, Sylvia Richardson Department of Sociology, University of Oxford Department of Epidemiology and Biostatistics Imperial.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1October In Chapter 17: 17.1 Data 17.2 Risk Difference 17.3 Hypothesis Test 17.4 Risk Ratio 17.5 Systematic Sources of Error 17.6 Power and Sample.
The binomial applied: absolute and relative risks, chi-square.
2006 Summer Epi/Bio Institute1 Module IV: Applications of Multi-level Models to Spatial Epidemiology Instructor: Elizabeth Johnson Lecture Developed: Francesca.
A short introduction to epidemiology Chapter 4: More complex study designs Neil Pearce Centre for Public Health Research Massey University Wellington,
An Introductory Lecture to Environmental Epidemiology Part 5. Ecological Studies. Mark S. Goldberg INRS-Institut Armand-Frappier, University of Quebec,
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Leicester Warwick Medical School Health and Disease in Populations Case-Control Studies Paul Burton.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
BACKGROUND Benzene is a known carcinogen. Occupational exposure to benzene is an established risk factor for leukaemia. Less is known about the effects.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
An ecological analysis of crime and antisocial behaviour in English Output Areas, 2011/12 Regression modelling of spatially hierarchical count data.
Vicky Copley, PHE Risk Factor Intelligence
Improved life tables: by geography, socio-economic status… Bernard Rachet and Michel Coleman Methods and applications for population-based survival20-21.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
A short introduction to epidemiology Chapter 6: Precision Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
1 Part09: Applications of Multi- level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
STANDARDIZATION Direct Method Indirect Method. STANDARDIZATION Issue: Often times, we wish to compare mortality rates between populations, or at different.
1 Module IV: Applications of Multi-level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Chapter 2. **The frequency distribution is a table which displays how many people fall into each category of a variable such as age, income level, or.
Prediction of lung cancer mortality in Central & Eastern Europe Joanna Didkowska.
The binomial applied: absolute and relative risks, chi-square
Lancet. 2017 Aug 5;390(10094): doi: /S (17) Epub 2017 May 25.
Introduction to logistic regression a.k.a. Varbrul
Lecture 1: Fundamentals of epidemiologic study design and analysis
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Narrative Reviews Limitations: Subjectivity inherent:
Research Techniques Made Simple: Interpreting Measures of Association in Clinical Research Michelle Roberts PhD,1,2 Sepideh Ashrafzadeh,1,2 Maryam Asgari.
Presentation transcript:

Inference from ecological models: air pollution and stroke using data from Sheffield, England. Ravi Maheswaran, Guangquan Li, Jane Law, Robert Haining, Marta Blangiardo, Sylvia Richardson, Nicky Best

Outline: 1.Background to the Sheffield study and results presented at Geomed From the Poisson to the Binomial model 3.Results 4.Conclusions

1. Nitrogen oxides (NO x ) and stroke mortality in Sheffield, England (Geomed 2005). Strokes account for 8%-12% of UK deaths Some evidence of a link between air pollution and stroke: studies of severe air pollution episodes (e.g 1952 London smog); analysis of daily time series (e.g. Kan et al (2003): Shanghai); cohort studies (e.g. Nafstad et al (2004): Norwegian males).

Since absolute number of deaths is small, power of tests even in large cohort studies is not large particularly for a factor that may not have a large effect. Small area ecological studies may help: - by providing another way of looking at the relationship; - by allowing the analysis of very large populations and at a much lower cost than a cohort study; - small areas are likely to be more homogeneous (than large areas) in terms of population characteristics thus reducing the risk of ecological bias.

Data Stroke mortality data: ICD9 codes ; c3k stroke deaths in population of c200k over 45; Aggregated by Enumeration District (c 150 households); age (5 year cohorts from 45 to 85+) and sex deaths per ED (min expected: 0.1; max:10.9)

Population data: (i) 1991 Census data on demography and deprivation (Townsend index); Recorded at the Enumeration District level (n=1030) (ii) Sheffield Health and Illness Prevalence survey (2000): Random sample stratified by ward; >10k respondents of whom >9.5k gave complete age, sex and smoking information. Average of 2.43 smokers per ED (Min expected: 0.19; max expected: 19.24)

Environmental data: Quantifying NO x exposure. The Indic-Airviro model:

Average annual mean pollution levels (exc 1998): NO x (ug/m 3 )

Areal Interpolation (from grid to ED): point in polygon – weighted PostPoint

NO x data transfered to the enumeration district framework after application of the weighted PostPoint method of areal interpolation

Poisson Model y i = number of stroke deaths in area i. y i ~ Poisson(  i )  i = r i E i r i = underlying true area i specific relative risk. E i = expected number of deaths in area i standardized for age, sex and socio-economic deprivation:  m = age-sex-deprivation specific mortality rate for population subgroup m. n i,m = size of population subgroup m in area i.

Generalized linear model: x i = NO x level in area i. z i ave = Smoking prevalence ratio in area i (spatial moving average using the observed and expected counts).

Poisson regression controlling for age, sex, deprivation and smoking prevalence. ParameterRel. Risk (95% CI) WinBUGS Rel. Risk (95% CI) SAS NO x category ( )1.48 ( ) ( )1.26 ( ) (0.98–1.24)1.10 ( ) ( )1.12 ( ) 111 Smoking: z ave 0.93 ( )0.93 ( ) DIC: Deviance/df=2.3

Bayesian hierarchical spatial model: Fitted to allow for overdispersion due to : - small area population heterogeneity; - missing covariates (that may be spatially autocorrelated). To allow for the uncertainty associated with the smoking data (small counts; missing values), an errors-in-variable model used for z i.

e i = unexplained area-specific log relative risk in area i after adjusting for x and z est. = v i + s i v i = unstructured random effects (zero-mean normal prior) s i = spatially structured random effects (zero-mean intrinsic conditional autoregressive prior). z i est = log[smoke.r i ] = smoke.  + smoke.v i + smoke.s i

Priors: - flat priors used for ,  and . - gamma(0.5, ) used for the precision parameters of the random effect terms. Spatial fraction (SF): - Var(s i )/[Var(s i ) + Var(v i )]. Ratio of the estimate of the marginal variance of the spatial random effect to the sum of the estimated marginal variances of the spatial and the unstructured random effects. SF => 1 implies spatial heterogeneity dominates; SF => 0 implies unstructured heterogeneity dominates.

Poisson regression with spatial random effects, controlling for age, sex, deprivation and smoking prevalence ParameterRel. Risk (95% CI) WinBUGS NO x category ( ) ( ) ( ) ( ) 11 Smoking: z est 1.05 ( ) Spatial fraction (model; for smoking (0.006; 0.99) DIC=

Conclusions: Evidence of an association between NO x and stroke mortality: 1.threshold level for an effect; 2.effect size diminishes after including random effects to allow for overdispersion and missing variables; 3.spatially smoothing NO x to allow for local journeys did not make a difference to the size of the effect; 4.Unable to allow for long and short term population movements. 5.No association with smoking prevalence (effect of definition?; small sample sizes in some EDs?)

2. Fitting a Binomial Model -stroke is not contagious so outcomes for individuals are independent Bernoulli rvs and therefore at the area level they aggregate to Binomial rvs. - because stroke is relatively rare, the Poisson assumption should give similar results, but it is only an approximation. - we also have data on the proportion exposed to different levels of NO x at the ED level which was not previously used.

Ecological analysis Not-exposedExposedMargins Death Not Death Totals Unknown (but of interest) Observed (not previously used) Observed (and used in the previous analysis)

Within-ED population distribution by PostPoint.

Dichotomised individual level model x i,j is 0 (if individual j in area i is not exposed) or 1 (if individual j in area i is exposed). :stroke risk in not-exposed group in i :stroke risk in exposed group in i z i denotes other area level covariates (e.g. deprivation) v i ~ N(0,  2 ). An unstructured random effect to account for unmeasured covariates.

The person is in the exposed group The person is in the not-exposed group Depending on the exposure status of the individual: This can be extend to a categorical exposure variable with more than 2 levels. Various extensions of the model such as incorporating continuous exposure can be found in Jackson et al. (2006) Jackson, C. H., Best, N. G. and Richardson, S. Improving ecological inference using individual-level data. Statistics in Medicine (2006) 25(12):

An area-level model incorporating the distribution of within-area exposure where  i = proportion of the population in area i in the exposed category. p i = probability of stroke death in area i, regardless of exposure.

Remark Note that applying a Binomial model with the proportion of exposed individuals as a covariate: But in general Ecological bias Derived from an individual level model

ParameterRel. Risk (95% CI) NO x categoryWithout unstr. R.E.With unstr. R.E (1.14 – 1.52)1.07 (0.88 – 1.29) (1.03 – 1.30)1.05 (0.86 – 1.25) (0.99 – 1.22)0.92 (0.75 – 1.10) (0.87 – 1.13)0.87 (0.73 – 1.04) 111 DIC: pD: 8 DIC: pD: Results Binomial regression controlling for age, sex (18 strata), deprivation and incorporating the within area distribution of exposure.

A dichotomised-exposure Binomial regression model controlling for age, sex (4 strata; 18 strata) and deprivation and incorporating data on the within area distribution of exposure. ParameterRel. Risk (95% CI): (4 strata) Rel. Risk (95% CI): (18 strata) NO x category Exposed1.20 (1.05 – 1.34)1.14 ( ) Non-exposed11 The exposed category comprises NO x categories 4 and 5 in the previous slide; The non-exposed category comprises categories 1, 2 and 3.

4. Conclusions 1.Incorporation of information on within area exposure resulted in a reduction of the estimated relative risk compared to the earlier set of results. 2.Lower risks in categories 2 and 3 in the binomial model with 5 exposure categories may indicate some confounding effects have not been accounted for in the current model; in the absence of additional information, these effects could be “averaged out” by combining some exposure categories. 3.Fitting a reduced model with two exposure categories does indicate a significant effect in the exposed group after adjusting for age, sex and deprivation; 4.Increasing the number of age-sex cohorts from 4 to 18 in the dichotomous-exposure model reduced the estimated relative risk to 1.14 (95% CI: 1.00, 1.30), but there is still evidence of a significant effect.

Differences between the current approach and the earlier modelling. – The Poisson model is prone to ecological bias since for exposure, only aggregated information was used. – Here we attempt to reduce the bias by utilizing data on the within-area distribution of exposure, i.e., the proportion of people in the exposed and non-exposed groups. – Deprivation was absorbed into the expected number of cases in the earlier work, here it has been included as a covariate. We could adjust for deprivation in the baseline risks. – There was no adjustment for smoking prevalence since it was not significant in the earlier modeling. The possibility exists of using lung cancer mortality as a proxy for smoking instead.