Health effects modeling Lianne Sheppard University of Washington.

Slides:

Advertisements

Similar presentations

Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.

Advertisements

Agency for Healthcare Research and Quality (AHRQ)

Sources and effects of bias in investigating links between adverse health outcomes and environmental hazards Frank Dunstan University of Wales College.

Random Assignment Experiments

Study Designs in Epidemiologic

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.

Halûk Özkaynak US EPA, Office of Research and Development National Exposure Research Laboratory, RTP, NC Presented at the CMAS Special Symposium on Air.

Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)

Chance, bias and confounding

Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.

What is a sample? Epidemiology matters: a new introduction to methodological foundations Chapter 4.

GIS and Spatial Statistics: Methods and Applications in Public Health

Critical Issues of Exposure Assessment for Human Health Studies of Air Pollution Michelle L. Bell Yale University SAMSI September 15, 2009.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.

Correlation and Autocorrelation

Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.

Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.

Department of Engineering and Public Policy Carnegie Mellon University Integrated Assessment of Particulate Matter Exposure and Health Impacts Sonia Yeh.

Clustered or Multilevel Data

Model Choice in Time Series Studies of Air Pollution and Health Roger D. Peng, PhD Department of Biostatistics Johns Hopkins Blomberg School of Public.

GIS in Spatial Epidemiology: small area studies of exposure- outcome relationships Robert Haining Department of Geography University of Cambridge.

Ordinary Kriging Process in ArcGIS

TOOLS OF POSITIVE ANALYSIS

Health Risks of Exposure to Chemical Composition of Fine Particulate Air Pollution Francesca Dominici Yeonseung Chung Michelle Bell Roger Peng Department.

Business Statistics - QBM117 Statistical inference for regression.

Halûk Özkaynak US EPA, Office of Research and Development National Exposure Research Laboratory, RTP, NC Presented at the CMAS Special Symposium on Air.

Introduction to Regression Analysis, Chapter 13,

So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.

Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.

Simple Linear Regression

Modeling errors in physical activity data Sarah Nusser Department of Statistics and Center for Survey Statistics and Methodology Iowa State University.

The Impact of Air Pollution on Infant Mortality: Evidence from Geographic Variation in Pollution Shocks Induced by a Recession Kenneth Y. Chay and Michael.

Air Quality Health Risk Assessment – Methodological Issues and Needs Presented to SAMSI September 19, 2007 Research Triangle Park, NC Anne E. Smith, Ph.D.

EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.

Research methods in adult development

Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.

Volunteer Angler Data Collection and Methods of Inference Kristen Olson University of Nebraska-Lincoln February 2,

Task Force on Health Recent results - Particulate matter Michal Krzyzanowski TFH Chair Head, Bonn Office European Centre for Environment and Health WHO.

Study Designs Afshin Ostovar Bushehr University of Medical Sciences Bushehr, /4/20151.

Lecture 8: Generalized Linear Models for Longitudinal Data.

 Is there a comparison? ◦ Are the groups really comparable?  Are the differences being reported real? ◦ Are they worth reporting? ◦ How much confidence.

Term 4, 2005BIO656 Multilevel Models1 Hierarchical Models for Pooling: A Case Study in Air Pollution Epidemiology Francesca Dominici.

CMAS special session Oct 13, 2010 Air pollution exposure estimation: 1.what’s been done? 2.what’s wrong with that? 3.what can be done? 4.how and what to.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Spatial Statistics in Ecology: Continuous Data Lecture Three.

An Introductory Lecture to Environmental Epidemiology Part 5. Ecological Studies. Mark S. Goldberg INRS-Institut Armand-Frappier, University of Quebec,

Osteoarthritis Initiative Analytic Strategies for the OAI Data December 6, 2007 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and.

1 G Lect 14M Review of topics covered in course Mediation/Moderation Statistical power for interactions What topics were not covered? G Multiple.

Components of the Global Climate Change Process IPCC AR4.

Predicting Long-term Exposures for Health Effect Studies Lianne Sheppard Adam A. Szpiro, Johan Lindström, Paul D. Sampson and the MESA Air team University.

Issues concerning the interpretation of statistical significance tests.

Impact of Air Pollution on Public Health: Transportability of Risk Estimates Jonathan M. Samet, MD, MS NERAM V October 16, 2006 Vancouver, B.C. Department.

Chapter 8: Simple Linear Regression Yang Zhenlin.

1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.

Exposure Assessment for Health Effect Studies: Insights from Air Pollution Epidemiology Lianne Sheppard University of Washington Special thanks to Sun-Young.

Descriptive study design

Types of Studies. Aim of epidemiological studies To determine distribution of disease To examine determinants of a disease To judge whether a given exposure.

1 Part09: Applications of Multi- level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.

1 Module IV: Applications of Multi-level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.

Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.

Uses of Diagnostic Tests Screen (mammography for breast cancer) Diagnose (electrocardiogram for acute myocardial infarction) Grade (stage of cancer) Monitor.

Single Season Study Design. 2 Points for consideration Don’t forget; why, what and how. A well designed study will:  highlight gaps in current knowledge.

Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.

N Engl J Med Jun 29;376(26): doi: 10

Why Model? Make predictions or forecasts where we don’t have data.

Descriptive study design

Statistical Data Analysis

Lecture 1: Fundamentals of epidemiologic study design and analysis

Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges

Presentation transcript:

Health effects modeling Lianne Sheppard University of Washington

Outline Introduction Conceptual overview for health effect studies Disease and risk model Exposure and measurement models Health effects study designs and relationship to exposure assessment Measured exposure focused through the lens of study design Challenges in health modeling Example 1: Cohort study – implications of predicted exposure Example 2: Time series study – understanding the estimated health effect parameter Discussion

Introduction Epidemiological study interpretation Estimates of association in the context of the particular study Study population Health outcome Exposure metric and data Confounders and other adjustment variables Study design Can’t infer causality from observational studies Goal: Understand properties of health effect estimates in epidemiological studies Context: Health effects of ambient air pollution My interests: Impact and implications of specific study designs in the context of the exposure data – study design as a lens to focus the data Role of exposure assessment and data on health effect estimates

Conceptual framework for health effect studies Disease model Relates the true environmental exposure to the disease outcome Includes the health effect parameter(s) of interest Exposure model Describes the distribution of exposure over space, time, and individuals Measurement model Relates measured exposures to the true unknown exposure Study design Sources of exposure variation should frame the design of any epidemiological study Limitations in exposure assessment that will lead to measurement error bias must also be considered

Disease model Relates the exposure to the disease model, e.g. E(Y it ) = exp(X P it β+Z it γ) for the outcome Y it on individual i at time t, personal exposures X P it and health effect parameter β β is the parameter of interest – “toxicity” Also includes Confounders and other adjustment variables (Z it ) A dependence model (as needed)

Risk model The disease model includes the risk model – a model to reflect risk over time Under an expanded risk model, the disease model is where β(t;s) denotes the influence of exposure at time s on risk at time t.

Risk model examples Current risk: Risk at time t is affected by exposure at time t: Cumulative constant risk: Risk is determined by cumulative exposure during the previous m days: Lagged constant risk: Risk is determined by cumulative exposure during the previous m days lagged n days Cumulative time-varying risk: Risk varies over time and is determined by cumulative exposure during the previous m days

Basic personal air pollution exposure model (e.g. particulate matter – PM) Total personal exposure : Total personal exposure = Non-ambient source exposure + Fraction of ambient * Ambient source concentration X P it = X N it + α it * C it Ambient source exposure: X A it =Ambient source exposure: X A it = α it C it We can measure and,We can measure C it and X P it, Assume ambient and non-ambient sources are independentAssume ambient and non-ambient sources are independent Person i Time t

Exposure model component: α Fraction of ambient concentration experienced as exposure: α it = o it + (1-o it ) F inf(it) is the fraction of time spent outdoorso it is the fraction of time spent outdoors is the infiltration efficiency (building filter)F inf(it) is the infiltration efficiency (building filter) Varies by season, person/building, region, species (or characteristic)Varies by season, person/building, region, species (or characteristic) NoteNote

Measurement model Needed because typically only measurements of C it are available while X P it or X A it are of interest The measurement model defines sources of variation: The data don’t capture (“Berkson”) The data capture but aren’t of interest (“classical”) Measurement models Are needed to avoid bias Are assumed to not provide additional information about health effects

Health effect study designs – Ambient source air pollution exposure Rely most on short-term temporal exposure variation: Panel studies Time series studies Case-crossover studies Rely most on spatial exposure variation: Cohort studies Migration studies Rely on either or both temporal and spatial variation: Medium term longitudinal studies Cross-sectional studies

Panel studies Enroll a panel of subjects and observe them repeatedly over time Strengths Possible to collect comprehensive personal, home indoor, and home outdoor exposure data on every subject Uniquely suited to study personal exposure effects Can directly measure health outcomes Challenges High effort for a limited number of subjects Power limited for affordable studies and rare outcomes Significant feasibility issues need to be overcome Can be very difficult to detect small effects because of the large heterogeneity in individual responses and uneven compliance to study protocol (medication use, data collection) Heterogeneity between subjects can swamp the small effects of air pollution  Analysis approach can affect conclusions, particularly with typical small panel sizes

Time series studies Estimate the association between time-varying ambient concentration and time-varying population event counts Rely on temporal exposure variation Strengths Simple and inexpensive (use administrative data) Powerful -- can target huge populations Appear uniquely suited to estimate acute health effects of ambient pollutants for rare events Bias due to spatial variation in PM is likely to be small Challenges Sources of bias not well understood (Is an ecological design => possible ecological bias) However individuals are crossed with time so ecological biases much less likely to dominate than when individuals are nested Results can be sensitive to modeling choices (and software) Confounding removed through modeling Don’t capture chronic effects, non-ambient exposures Don’t estimate toxicity (rather estimate attenuated toxicity, attenuated for building characteristics and population behavior)

Case-crossover studies Assess acute effects of air pollution by comparing exposures on the day with an event (index day) to days without the event (referent days) Essentially time series studies with a different approach to confounding control: Confounding controlled by matching (and modeling) rather than modeling alone Some approaches to referent selection lead to biased health effects (overlap bias) Time-stratified referent selection recommended: Commonly used symmetric bidirectional referents are subject to overlap bias Similar scientific considerations as time series studies

Cohort studies Follow subjects over time to relate some measure of usual exposure to health events Rely on variation in exposure over space (shared exposure) and individual (total exposure, including unshared components) Incomplete exposure ascertainment implies Need to rely on an exposure prediction model Because of limited exposure assessment, these are semi- individual studies Can’t rule out ecological biases Individuals are nested within areas Unclear how to best accumulate exposure over time. What are the implications? e.g., Average exposure Time-varying risk model

Challenges in analysis and interpretation of epidemiological studies – Bias Air pollution health effects are small and thus can be easily swamped by even small biases Confounding is A major source of bias Orders of magnitude larger than the air pollution effect of interest Other less well understood issues Exposure vs. concentration and attenuation of ambient exposure (recall ambient exposure=ambient concentration*α) Loss of information Bias Policy implications Specification, cross-level, and overlap biases Model selection

Small Effects and Large Confounders: Air pollution signal is an order of magnitude smaller than confounder effects (time series study example) Courtesy of Francesca Dominici and NMMAPS

Challenges in analysis and interpretation of epidemiological studies – Uncertainty Uncertainty of the model: key features Linearity of the exposure-response model Which single or distributed lags in the risk model? Multiple pollutants Confounder control Exposure data, metrics, and measurement error How does measured “exposure” relate to true exposure? Additional model selection issues Model selection process often not disclosed Model averaging as an alternative

Exposure data considerations for health effects analyses Considerations in study planning: Source of variation needed for study design Measurements available or feasible to collect Predicted exposure required? Interpretation of estimated health effects depends on exposure data used in the analysis Example 1: Effect of prediction on cohort study health effect estimates Example 2: Time series study health effects estimates: Interpretation and relevant features of personal exposure when concentration is used in the analysis

Cohort study and predicted exposure example: Simulation set-up Realistic setting: Monitored PM 2.5 data Outcome model based on cardiovascular events using published estimates (Women’s Health Initiative, Miller et al 2007) Los Angeles geography Compare exposure prediction models: Nearest monitor vs. universal kriging Simulation structure Simulate spatially dependent exposure for subject residences and monitoring sites Explore a variety of exposure models Use true exposure to generate the health outcome data Predict exposure from monitoring site data only Estimate health effects conditioned on modeled (and true) exposure Purpose: To investigate how prediction of pollutants over space affects estimated relative risk in a cohort study

Cohort study and predicted exposure example: Simulation study area

Underlying PM 2.5 AQS monitoring data PM2.5 Air Quality Standard (AQS) monitors 22 monitors in five counties in greater Los Angeles PM2.5 concentration in year 2000 (black < red < green < blue) Spatial analysis to estimate parameters:  Mean (using geographic covariates)  Variance  Range  Partial sill  Nugget

Exposure (concentration) models Multivariate normal distribution with spatial autocorrelation using assumed mean and covariance model parameters Realizations of PM 2.5 at 2,000 residences and 22 monitoring sites Five underlying exposure models using different spatial structures: True exposureSource of spatial variabilityInitial parameters Model (TEM)Range Partial sill NuggetMean TEM 1Geographic characteristicsShortSmall 2 nd order TEM 2Medium rangeMiddleLargeSmallConstant TEM 3Measurement error onlyShortSmallLargeConstant TEM 4Short rangeShortestLargeSmallConstant TEM 5Long rangeLongLargeSmallConstant

Examples of spatial surfaces Spatial surface* of five exposure models (lighter = higher concentration): Geographic characteristics Medium range Measurement error onlyShort range Long range *One realization of each surface

True and predicted PM 2.5 Relationship between true and predicted PM 2.5 at 2,000 individual sites in one simulation : Observations: Better association between predictions and true values when there is more spatial structure Spatial structure can be In the mean model (TEM 1) In the variance model (TEM 5) Models 1 and 2 were based on different estimated fits to the same data, with model 1 allowing a spatially varying mean and model 2 assuming a constant mean. Model 1 appears to capture spatial structure better. True vs. Nearest True vs. KrigedNearest v.s Kriged Geog Char Med Range Meas Error Only Short Range Long Range

Health effect estimates – Geographical characteristics exposure Comparison of β estimates for true and modeled PM 2.5 True exposure Nearest neighbor True exposure vs. nearest neighbor True exposure vs. kriged Nearest neighbor vs. kriged KrigedKriged NearestNearest KrigedKriged x=y line best fit line

Health effect estimates – Exposures with little spatial structure Measurement error only Short Range – Low spatial correlation

Health effect estimates – Spatially dependent exposures only in the variance model Medium range – medium spatial correlation Long range – High spatial correlation

Health effect estimates – Summary True exposureFitted exposureBias 2 Variance Mean squared error Coverage probability GeographicalTrue characteristicsNearest Kriging Medium range –True Medium spatialNearest correlationKriging Measurement errorTrue Nearest Kriging Short range – LowTrue spatial correlationNearest Kriging Long range – HighTrue spatial correlationNearest Kriging

Conclusions: Impact of predicted exposure on cohort study health effect estimates Exposure prediction Kriging prediction gave better estimates of PM 2.5 than nearest monitor prediction Less biased Generally smaller prediction error Kriging predictions were less variable than nearest monitor predictions Health effect estimates Kriged PM 2.5 as compared to nearest monitor PM 2.5 had: Better coverage (in most cases) Less biased health effect estimates More variable health effect estimates (and thus worse MSE) Underlying exposure models with higher spatial dependence had better coverage Results more consistent with prior expectations for a Berkson measurement error model Less that 95% coverage with predicted exposure Not incorporating uncertainty of prediction in this analysis

Discussion: Impact of predicted exposure on cohort study health effect estimates Other lessons learned: More dense monitoring doesn’t change these results Only 22 monitor measurements Same results for up to 42 monitors Not all the kriging results were believable Spatial statistics is iterative, uses judgment and thus is not well suited to our nonjudgmental approach to the simulations Some realizations of kriging parameter estimates were unacceptably large Universal kriging performed better on average than ordinary kriging Fewer poor estimates of kriging parameters, even when the true exposure had a constant mean Better coverage for health effects Spatial pollution structure best suited to modeling and good health effect estimates: High spatial variability Spatial variability characterized in the mean model Spatial variability in the variance model should have long range and a smaller partial sill so there is relatively small prediction error variance.

Acute Air Pollution Health Effects: Sources of Bias in Time Series Studies Use of concentration when exposure is of interest Not estimating toxicity Not accounting for time-varying ambient attenuation Substitution of measured for true concentration Classical measurement error Dropping the within-day component of exposure variation by using central site concentration measurements Specification bias (small because the effects are small) Cross-level bias (inference on effects in individuals when the data only come from groups) Inadequate adjustment for covariates Uncontrolled confounding Multipollutant exposures

Time series study example: Impact of aspects of personal exposure – Set-up We conducted simulation studies to assess the behavior of time series study estimates under differing exposure and measurement models Assume Acute risk model (same day exposure only) Total personal exposure affects true disease risk Only ambient concentration is measured and used in the time series study analysis Simulate individual data; analyze using a time series study design

Time series study example: Impact of aspects of personal exposure – Set-up Assume a true individual-level disease model with personal exposure Personal exposure model: Generate NT personal exposures and binary events for N=100,000 individuals on T=1,000 days Use a time series study analysis with ambient concentration measurements, i.e. fit Assess the impact of Major independent non-ambient exposure contributions Seasonally varying ambient attenuation α Varying characterizations of daily exposure or concentration measurements X P it = [ nonambient source ] it + α it C it

Time series study example: Impact of aspects of personal exposure – Results Time series studies estimate αβ – toxicity times ambient attenuation Non-ambient source exposure doesn’t affect estimates when it is independent of ambient concentration Variation in α affects time series study results when it is seasonal and correlated with ambient concentration: (supported by data – see next slide) Larger estimates if α is high when concentration is high Smaller estimates if α is low when concentration is high Average concentration from multiple monitors improves estimates slightly (reduction in classical measurement error)

Central Air Conditioning (%) CVD Coefficient Regression Coefficients for CVD-Related Hospital Admissions vs. Ambient PM 10 Janssen N, Schwartz J, Zanobetti A, Suh H (2002). Environ. Health Perspect. Summer peaking cities Winter peaking cities Slide courtesy of Doug Dockery ↑ => smaller summer α

Time series study example: Impact of aspects of personal exposure – Summary Measurements’ effect on health effect parameter interpretation: Models with concentration as the predictor don’t estimate toxicity alone: When the disease model has a simple form, e.g. E(Y)=exp(X A β) =exp(Cαβ) β is toxicity Assuming X A =Cα, the disease model with ambient concentration has parameter αβ. Differences between estimates of αβ can be due to variations in α (e.g. due to season, region or individual) Huge policy implications that variation in time series study health effect estimates is not (only) toxicity

Time series study example: Impact of aspects of personal exposure – Discussion Ambient attenuation (α) is not just measurement error In models with concentration as the predictor, it changes the interpretation of the estimated health effect parameter (not just toxicity) α has structure that varies by season, region, person, species (due to e.g. size, reactivity) Averaging exposure over time or area averages over α α is not measured and properties (e.g. seasonality, population patterns) not well understood – Important area for exposure assessment research

Discussion – Health modeling in the context of exposure data These two examples illustrate ways study design and exposure data influence The health effect parameters estimated The characteristics of the health effect estimates Design of choice depends on: Health outcome of interest Exposure characteristics of interest (e.g. is exposure usual or unusual?) What sources of variation in exposure do available exposure data capture? If an exposure prediction model is needed, are there sufficient data to produce a good model that captures the key sources of variation? Feasibility

Discussion – Other research directions Link health effect parameters from acute and chronic exposures Ascertain time-varying risk in cohort studies Incorporation of complex risk models into policy estimates Effect of exposure structure on estimates in single vs. distributed lag models Multipollutant exposures More complete estimates of uncertainty. Uncertainty due to: Model selection Exposure assessment and predicted exposure Form of the distributed risk model Confounder selection Subgroup selection