Nicky Best and Chris Jackson With Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester.
Sources and effects of bias in investigating links between adverse health outcomes and environmental hazards Frank Dunstan University of Wales College.
Associations between Obesity and Depression by Race/Ethnicity and Education among Women: Results from the National Health and Nutrition Examination Survey,
BACKGROUND Benzene is a known carcinogen. Occupational exposure to benzene is an established risk factor for leukaemia. Less is known about the effects.
Nicky Best, Chris Jackson, Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London Studying.
ENVIRONMENTAL EPIDEMIOLOGY OF PSYCHOSES: Lessons from Nottingham Jouko Miettunen Department of Public Health and Primary Care Institute of Public Health.
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
“Personality, Socioeconomic Status, and All-Cause Mortality in the United States” - Chapman BP et al. Journal Club 02/24/11.
Risk of Low Birth Weight Associated with Family Poverty in Korea Bong Joo Lee Se Hee Lim Department of Social Welfare, Seoul National University. A Paper.
Chance, bias and confounding
Dependent Interviewing: Seminar, University of Essex September 2004 Peter Shepherd Centre for Longitudinal Studies, Institute of.
Sample of Anonymised Records: User Meeting Propensity to migrate by ethnic group: 1991 & 2001 Paul Norman 1, John Stillwell 2 & Serena Hussain 2 School.
Geography and Geographical Analysis using the ONS Longitudinal Study Christopher Marshall & Julian Buxton CeLSIUS.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
The Social Profile of Rural Britain: Insights from longitudinal datasets Heather Joshi Gareth Hughes & Brian Dodgeon Centre for Longitudinal Studies Institute.
Migration, methodologies and health inequality SEED Group
GIS in Spatial Epidemiology: small area studies of exposure- outcome relationships Robert Haining Department of Geography University of Cambridge.
A Longitudinal Study of Maternal Smoking During Pregnancy and Child Height Author 1 Author 2 Author 3.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Confidence Intervals and Hypothesis Tests
Peter Congdon, Centre for Statistics and Department of Geography, Queen Mary University of London. 1 Spatial Path Models with Multiple.
Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara.
Hierarchical models for combining multiple data sources measured at individual and small area levels Chris Jackson With Nicky Best and Sylvia Richardson.
Chronic kidney disease Mr James Hollinshead Public Health Analyst East Midlands Public Health Observatory (EMPHO) UK Renal Registry 2011 Annual Audit Meeting.
Cohort Study.
Using the Health Survey for England to examine ethnic differences in obesity, diet and physical activity Vanessa Higgins & Angela Dale Centre for Census.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Kevin Kovach, DrPH(c), MSc, CHES Johnson County Department of Health and Environment – Olathe, Kansas Does the County Poverty Rate Influence Birth Weight.
Graphical models for combining multiple data sources
1 Maternal Shift Work and the Risk of Urogenital Defects in Offspring Conceived Using Infertility Treatment Fertility Society Australia Conference 2015,
1 Rob Woodruff Battelle Memorial Institute, Health & Analytics Cynthia Ferre Centers for Disease Control and Prevention Conditional.
Chris Jackson With Nicky Best and Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London
Nuoo-Ting (Jassy) Molitor 1 Chris Jackson 2 With Nicky Best, Sylvia Richardson 1 1 Department of Epidemiology and Public Health Imperial College, London.
Combining prevalence estimates from multiple sources Julian Flowers.
Methodology for producing the revised back series of population estimates for Julie Jefferies Population and Demography Division Office for.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Inference from ecological models: air pollution and stroke using data from Sheffield, England. Ravi Maheswaran, Guangquan Li, Jane Law, Robert Haining,
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
Maternity and Ethnicity in Scotland Chalmers J, Bansal N, Fischbacher CM, Steiner M, Bhopal R, on behalf of the Scottish Health and Ethnicity Linkage Study.
Department of SOCIAL MEDICINE Producing Small Area Estimates of the Need for Hip and Knee Replacement Surgery ANDY JUDGE Nicky Welton Mary Shaw Yoav Ben-Shlomo.
Racial and Ethnic Disparities in the Knowledge of Shaken Baby Syndrome among Recent Mothers Findings from the Rhode Island PRAMS Hanna Kim, Samara.
Stephen Fisher, Jane Holmes, Nicky Best, Sylvia Richardson Department of Sociology, University of Oxford Department of Epidemiology and Biostatistics Imperial.
Mother and Child Health: Research Methods G.J.Ebrahim Editor Journal of Tropical Pediatrics, Oxford University Press.
Census.ac.uk The UK Census Longitudinal Studies Chris Dibben, University of St Andrews.
Nuoo-Ting (Jassy) Molitor 1 Chris Jackson 2 With Nicky Best, Sylvia Richardson 1 1 Department of Epidemiology and Public Health Imperial College, London.
DTC Quantitative Methods Survey Research Design/Sampling (Mostly a hangover from Week 1…) Thursday 17 th January 2013.
The Campbell Collaborationwww.campbellcollaboration.org C2 Training: May 9 – 10, 2011 Introduction to meta-analysis.
A short introduction to epidemiology Chapter 4: More complex study designs Neil Pearce Centre for Public Health Research Massey University Wellington,
An Introductory Lecture to Environmental Epidemiology Part 5. Ecological Studies. Mark S. Goldberg INRS-Institut Armand-Frappier, University of Quebec,
Early Motherhood in the UK: Micro and Macro Determinants Denise Hawkes and Heather Joshi Centre for Longitudinal Research Institute of Education University.
BACKGROUND Benzene is a known carcinogen. Occupational exposure to benzene is an established risk factor for leukaemia. Less is known about the effects.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
AN EXAMPLE OF COOPERATION & SOME WIDER ISSUES Ian Plewis (Bedford Group, Institute of Education) & Stephen Morris (Social Research Division, Department.
Overview and Common Pitfalls in Statistics and How to Avoid Them
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Analysis of the characteristics of internet respondents to the 2011 Census to inform 2021 Census questionnaire design Orlaith Fraser & Cal Ghee.
The Impact of Birth Spacing on Subsequent Feto-Infant Outcomes among Community Enrollees of a Federal Healthy Start Project Hamisu M. Salihu, MD, PhD Euna.
Descriptive study design
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
1 Bandit Thinkhamrop, PhD.(Statistics) Dept. of Biostatistics & Demography Faculty of Public Health Khon Kaen University Overview and Common Pitfalls in.
Life course partnership status and biomarkers in mid-life: Evidence from the 1958 British birth cohort George B. Ploubidis, Richard J. Silverwood, Bianca.
Sources of Increasing Differential Mortality among the Aged by Socioeconomic Status Barry Bosworth, Gary Burtless and Kan Zhang T HE B ROOKINGS I NSTITUTION.
Factors associated with maternal smoking during early pregnancy: relationship to low-birth-weight infants and maternal attitude toward their pregnancy.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
INTRODUCTION Despite recent advances in spatial analysis in transport, such as the accounting for spatial correlation in accident analysis, important research.
1 A investigation of ethnic variations in mortality using the ONS Longitudinal Study Chris White Health Variations Team Office for National Statistics.
Ethnic variation on the impact of family living arrangements on child health Findings from the Millennium Cohort Study Lidia Panico and Yvonne Kelly ETHINC.
Journal Club Notes.
Modeling the Causal Effects of Assisted Reproductive Technology (ART)
Presentation transcript:

Nicky Best and Chris Jackson With Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London Bayesian graphical models for inference from combinations of data

Example: low birth weight and air pollution Does exposure to air pollution during pregnancy increase the risk of low birth weight? Example illustrates various biases. Combine datasets with different strengths:  Survey data (Millennium Cohort Study)  Small, great individual detail.  Administrative data (national births register)  Large, but little individual detail. Single underlying model assumed to govern both datasets: elaborate as appropriate to handle biases

Low birth weight Important determinant of future health  population health indicator. Established risk factors:  Tobacco smoking during pregnancy.  Ethnicity (South Asian, issue for UK data)  Maternal age, weight, height, number of previous births. Role of environmental risk factors, such as air pollution, less clear.  Various studies around the world suggest a link.  Exposure to urban air pollution correlated with socioeconomic factors  ethnicity, tobacco smoking  confounding

Data sources (1): Millennium Cohort Study About 15,000 births in the UK between Sep 2000 and August 2001 (we study only England and Wales, singleton births) Postcode made available to us under strict security  Match individuals with annual mean concentration of certain air pollutants (PM 10, NO 2, CO, SO 2 ) ( NETCEN )  Birth weight, and reasonably complete set of confounder data available Allows a reasonable analysis, but issues remain: Low power to detect small effect  could be improved by incorporating other data. Selection bias…

Selection of Millennium Cohort ALL UK WARDS ENGLAND SCOTLAND WALES NORTHERN IRELAND High child poverty Low child poverty High child poverty Low child poverty High child poverty Low child poverty High child poverty Low child poverty High ethnic minority SELECTION PROBABILITY

Selection bias in the Millennium Cohort Survey disproportionately represents population.  If selection scheme (=child poverty / ethnicity) related to exposure (=pollution) and outcome (=low birth weight), then estimate of association biased. Accounting for selection bias:  Adjust model for all variables affecting selection, or  Weight cases by inverse probability of selection Cluster sampling  within-ward correlations  for correct standard errors for inference on population, use a hierarchical (multilevel) model with groups defined by wards.

Data sources (2): National birth register Every birth in the population recorded. Individual data with postcode (  pollution exposure) and birth weight available to us under strict security. Social class and employment status of parents also available for a 10% sample. We study only this 10% sample: 50,000 births between Sep 2000 and Aug Larger dataset, no selection bias, …but no confounder information, especially ethnicity and smoking.

Data sources (3): Aggregate data Ethnic composition of the population  2001 census  for census output areas (~500 individuals) Tobacco expenditure  consumer surveys (CACI, who produce ACORN consumer classification data )  for census output areas. …linked by postcode to Millennium Cohort and national register data.

Birth weight and pollution (source: MCS)

Birth weight and ethnicity (source: MCS)

Birth weight and smoking (source: MCS)

Pollution and confounders (source: MCS)

Models for formally analysing combined data Want estimate of the association between low birth weight and pollution, using all data, accounting for: Selection bias in MCS  Adjust models for all predictors of selection  Or weight by inverse probability of selection Missing confounders in register  Bayesian graphical model…

Graphical model representation LBW i POLL i POLL j MODEL baby i in registerbaby j in MCS ETH i ETH j LBW j LBW i : low birth weight POLL i : pollution exposure (plus other confounders observed in both datasets) ETH i : ethnicity and smoking. Only observed in the MCS. Same MODEL assumed to govern both datasets. known unknown

Adding in the imputation model LBW i POLL i POLL j MODEL(LBW) baby i in registerbaby j in MCS ETH i ETH j LBW j AGG i AGG j MODEL(imputation) AGG i : aggregate ethnicity/smoking data for area of residence of baby i MODEL for imputation of in terms of aggregate data and other variables. Estimate it from observed MODEL for imputation of ETH i in terms of aggregate data and other variables. Estimate it from observed ETH j in the MCS.

Bayesian model Estimate both:  Imputation model for missing ethnicity and smoking  Outcome model for the association between low birth weight and pollution. All beliefs about unknown quantities expressed as probability distributions.  Prior beliefs (often ignorance) modified in light of data  posterior distributions Joint posterior distribution of all unknowns estimated by Markov Chain Monte Carlo (MCMC) simulation (WinBUGS software) Graphical representation of the model guides the MCMC simulation.

Variables in the final models: (1) regression model for low birth weight Probability baby i has birth weight under 2.5 kg modelled in terms of  Pollution (NO 2 and SO 2 )  Ethnicity (White / South Asian / Black / other)  Smoking during pregnancy (yes/no)  Social class of mother  Survey selection strata (for MCS data) Other variables not significant in multiple regression, or not confounded with pollution (mother’s weight, height, maternal age, number of previous births, hypertension during pregnancy,…)

Variables in the final models: (2) imputation model for missing data Probability baby i is in one of eight categories:  ethnicity 1. White / 2. South Asian / 3. Black / 4. other  smoking during pregnancy 1. No / 2. Yes Modelled in terms of small-area variables for baby i:  Proportion of population of in each of three ethnic minority categories (South Asian / Black / other)  Tobacco expenditure  MCS survey selection strata …and some individual-level variables for baby i.  Pollution exposure  Low birth weight  Social class, employment status of mother.

Odds ratios (posterior mean, 95% CI) Data NO 2 * SO 2 * SmokingSouth Asian Register, ignore confounding 1.20 (1.13,1.27) 1.03 (1.00,1.07) -- MCS1.04 (0.89,1.21) 1.04 (0.96,1.12) 2.00 (1.71,2.34) 2.76 (2.14,3.56) MCS, ignore selection 1.08 (0.94,1.23) 1.04 (0.96,1.12) 2.00 (1.71,2.34) 3.01 (2.42,3.74) Register + MCS 0.97 (0.91,1.03) 1.01 (0.97,1.05) 1.94 (1.80,2.10) 2.92 (2.61,3.26) Register, adjust for confounding 0.97 (0.91,1.04) 1.01 (0.97,1.07) 1.94 (1.76,2.12) 2.93 (2.57,3.33) *One unit of pollution concentration = interquartile range of pollution concentration across England and Wales

Conclusions so far No evidence for association of pollution exposure with low birth weight. Combining the datasets can  increase statistical power of the survey data  alleviate bias due to confounding in the administrative data Must allow for selection mechanism of survey when combining data

Work in progress Sensitivity to different choices for the imputation model  External data (e.g. small-area data) on confounders not always available More investigation of selection bias, and different ways of accounting for it Quantify relative influence of each dataset Other biases, expected to be smaller problem  Missing data in MCS  Exposure measurement error Distinguish between preterm birth and low full-term birth weight.

Combining aggregate and individual data Aggregate (ecological) data  Administrative data usually aggregated to preserve confidentiality  Make inferences on individual-level risk factors and outcomes using aggregate data: “Ecological bias” caused by  within-area variability of risk factors  confounding caused by limited number of variables. Needs appropriate models, and often individual data  survey/cohort data, case-control data. Combining aggregate and individual data:  can reduce ecological bias and increase power  distinguish contextual effects from individual.

Publications Our papers, presentations and software available from C. Jackson, N. Best, S. Richardson. Hierarchical related regression for combining aggregate and survey data in studies of socio-economic disease risk factors. under revision, Journal of the Royal Statistical Society, Series A. C. Jackson, N. Best, S. Richardson. Improving ecological inference using individual-level data. Statistics in Medicine (2006) 25(12): C. Jackson, S. Richardson, N. Best. Studying place effects on health by synthesising area-level and individual data. Submitted.