Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.

Slides:



Advertisements
Similar presentations
Estimating health effects of air pollution: challenges ahead Francesca Dominici SAMSI Workshop September
Advertisements

Spatial point patterns and Geostatistics an introduction
Sources and effects of bias in investigating links between adverse health outcomes and environmental hazards Frank Dunstan University of Wales College.
Introduction to Smoothing and Spatial Regression
Kriging.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
The General Linear Model Or, What the Hell’s Going on During Estimation?
Linear regression models
U.S. EPA Office of Research & Development October 30, 2013 Prakash V. Bhave, Mary K. McCabe, Valerie C. Garcia Atmospheric Modeling & Analysis Division.
Basic geostatistics Austin Troy.
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Critical Issues of Exposure Assessment for Human Health Studies of Air Pollution Michelle L. Bell Yale University SAMSI September 15, 2009.
Multiple Linear Regression Model
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Spatial Interpolation
FOUR METHODS OF ESTIMATING PM 2.5 ANNUAL AVERAGES Yan Liu and Amy Nail Department of Statistics North Carolina State University EPA Office of Air Quality,
Applied Geostatistics
SAMPLING DISTRIBUTIONS. SAMPLING VARIABILITY
Model Choice in Time Series Studies of Air Pollution and Health Roger D. Peng, PhD Department of Biostatistics Johns Hopkins Blomberg School of Public.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
Ordinary Kriging Process in ArcGIS
Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer.
1 Practical issues and tools for modeling temporal and spatio-temporal trends in atmospheric pollutant monitoring data Paul D. Sampson Department of Statistics.
Method of Soil Analysis 1. 5 Geostatistics Introduction 1. 5
Introduction to Regression Analysis, Chapter 13,
For Better Accuracy Eick: Ensemble Learning
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Chapter 9 Statistical Data Analysis
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Measurement Error.
Modeling errors in physical activity data Sarah Nusser Department of Statistics and Center for Survey Statistics and Methodology Iowa State University.
Air Quality Health Risk Assessment – Methodological Issues and Needs Presented to SAMSI September 19, 2007 Research Triangle Park, NC Anne E. Smith, Ph.D.
TWO-STAGE CASE-CONTROL STUDIES USING EXPOSURE ESTIMATES FROM A GEOGRAPHICAL INFORMATION SYSTEM Jonas Björk 1 & Ulf Strömberg 2 1 Competence Center for.
Spatial Statistics Jonathan Bossenbroek, PhD Dept of Env. Sciences Lake Erie Center University of Toledo.
Geo479/579: Geostatistics Ch12. Ordinary Kriging (1)
Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE.
CMAS special session Oct 13, 2010 Air pollution exposure estimation: 1.what’s been done? 2.what’s wrong with that? 3.what can be done? 4.how and what to.
Geographic Information Science
Health effects modeling Lianne Sheppard University of Washington.
Spatial Statistics in Ecology: Continuous Data Lecture Three.
Managerial Economics Demand Estimation & Forecasting.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Spatial Interpolation III
Section 10.1 Confidence Intervals
Predicting Long-term Exposures for Health Effect Studies Lianne Sheppard Adam A. Szpiro, Johan Lindström, Paul D. Sampson and the MESA Air team University.
9.3 and 9.4 The Spatial Model And Spatial Prediction and the Kriging Paradigm.
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
Office of Research and Development National Exposure Research Laboratory, Atmospheric Modeling and Analysis Division Office of Research and Development.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
The effect of variable sampling efficiency on reliability of the observation error as a measure of uncertainty in abundance indices from scientific surveys.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Exposure Assessment for Health Effect Studies: Insights from Air Pollution Epidemiology Lianne Sheppard University of Washington Special thanks to Sun-Young.
Machine Learning 5. Parametric Methods.
Technical Details of Network Assessment Methodology: Concentration Estimation Uncertainty Area of Station Sampling Zone Population in Station Sampling.
INTEGRATING SATELLITE AND MONITORING DATA TO RETROSPECTIVELY ESTIMATE MONTHLY PM 2.5 CONCENTRATIONS IN THE EASTERN U.S. Christopher J. Paciorek 1 and Yang.
Geo479/579: Geostatistics Ch12. Ordinary Kriging (2)
Matrix Models for Population Management & Conservation March 2014 Lecture 10 Uncertainty, Process Variance, and Retrospective Perturbation Analysis.
Capture-recapture Models for Open Populations “Single-age Models” 6.13 UF-2015.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Inference about the slope parameter and correlation
Why Model? Make predictions or forecasts where we don’t have data.
Statistical Methods for Model Evaluation – Moving Beyond the Comparison of Matched Observations and Output for Model Grid Cells Kristen M. Foley1, Jenise.
Paul D. Sampson Peter Guttorp
Task 6 Statistical Approaches
Presentation transcript:

Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS Special Session, October 13, 2010

Introduction Most epidemiological studies assess associations between air pollutants and a disease outcome by estimating a health effect (e.g. regression parameter such as a relative risk): –A complete set of pertinent exposure measurements is typically not available  Need to use an approach to assign (e.g. predict) exposure It is important to account for the quality of the exposure estimates in the health analysis  Exposure assessment for epidemiology should be evaluated in the context of the health effect estimation goal Focus of this talk: Exposure measurement error in cohort studies 2

Typical Approach for Air Pollution Epidemiology Studies 1.Assign (or predict, estimate) exposure as accurately as possible 2.Plug in exposure estimates into the disease model; estimate health effects Challenge – exposure measurement error –Health effect estimate is affected by the nature and quality of the exposure assessment approach –Health effect estimate may be Biased More (or less) variable –Typical analysis does not account for uncertainty in exposure prediction  inference not correct

Measurement Error Error in the outcome –Standard part of regression Models don’t explain all the variation in health outcomes –Explicitly incorporated: Y = β 0 + Xβ X + ε Measurement error in the exposure –Not a routine part of regression –Two general classes: Berkson – “measure part of the true exposure” Classical – “measure the true exposure plus noise” –Has an impact on health effect estimates, typically: Berkson – unbiased but more variable Classical – biased and (more or) less variable Often the exposure measurement error structure will have features of both types 4

Outcome Error Only “true outcome is model + error”

Classical Measurement Error “measure true exposure + noise”

Berkson Measurement Error “measure part of the true exposure”

“Plug-in Exposure” Health Effect Estimates Typical exposure assignment approaches –Time series studies: Daily average of all regulatory monitor measurements in a geographic area –Cohort studies: Predicted long-term average concentration for each subject based on a model (kriging, land use regression) or the nearest monitor Health effect regression models that ignore exposure assignment approach can be (but aren’t always) misleading. Impact depends on –Study design Type of study – focus on temporal or spatial variability? Alignment of monitoring and subject networks? Sample sizes –Underlying exposure distribution –Exposure assignment approach and quality Research is needed to define the best criteria

Impact on Time Series Study Results: Average Concentration vs. Personal Exposure Measurement error comes from a mixture of sources; some are Berkson and unlikely to cause bias –Berkson: Non-ambient source exposure doesn’t affect estimates when it is independent of ambient concentration; –Classical: Average concentration from multiple representative monitors gives better results (reduction in classical measurement error) –Unknown impact: Siting of regulatory monitors, particularly for pollutants with strong spatio-temporal structure Differences between health effect estimates in different studies may be driven by variations in population exposures –Parameter misalignment: Different health parameter due to replacing exposure with concentration Behaviors affecting population exposure vary by metropolitan areas Impact of monitor siting: Spatially homogeneous pollutants are not as sensitive to monitor locations  Some components may be very sensitive to monitor siting References: Zeger et al 2000; Sheppard et al 2005; Sarnat et al 2010

10 Impact on Cohort Study Results: Individual Exposure Predictions with Spatially Misaligned Data Cohort study disease model relates individual exposure to individual disease outcomes Exposure data are “spatially misaligned” in the cohort study setting –Spatial misalignment occurs when exposure data are not available at the locations of interest for epidemiology Air pollution exposures are typically predicted from misaligned data using –Nearest monitor interpolation –GIS covariate regression (land use regression) –Interpolation by geostatistical methods (kriging) –Semi-parametric smoothing Measurement error from predicted exposures can be decomposed into two parts: –Berkson-like –Classical-like

Exposure Surface Prediction True Exposure: XPredicted Exposure: W

12 Impact on Cohort Study Results: Measurement Error from Spatially Misaligned Predictions Measurement error structure is complex –Not purely classical or Berkson Berkson-like component results from information lost in smoothing (i.e. predictions are smoother than data) Classical-like component is related to uncertainty in estimating the exposure model parameters  Standard correction approaches are not appropriate Measurement error might be less of a problem when the exposure is more predictable. Depends on: –Good spatial structure in the underlying exposure surface Spatially varying mean structure Longer range (i.e. large scale spatial correlation) Small nugget (not much local variation left over) –The availability of data to capture this structure Measurements that represent the exposure variability Comparability of the subject and monitor locations

13 Health Effect Estimates Example – The Longer the Range the Better the Performance True exposure Fitted exposure (R 2 ) Bias 2 Variance Mean square error Coverage probability of 95% confidence interval Least predictable (shortest range) True Nearest Kriging (0) True Nearest Kriging (.20) True Nearest Kriging (.40) Most Predictable (longest range) True Nearest Kriging (.47) Note: Exposure models based on a constant mean model and dependence characterized by a spherical variogram with fixed partial sill (45), no nugget, and varying range (1-500 km) Reference: Kim, Sheppard, Kim (2009) Epidemiology

14 Exposure Measurement Error – Correction Approaches for Spatially Misaligned Data Exposure Simulation Joint Model 2-Stage Approach Estimate exposure and disease models jointly –Asymptotically optimal Practical problems –Computationally intensive –Published simulation examples haven’t worked (Gryparis et al, 2009; Madsen et al 2008) –Feedback between exposure and health models can lead to bias Particularly with sparse exposure and rich health data (Wakefield & Shaddick, 2006) Predict exposure at subject locations in the first stage Correct the disease model estimates for the predicted exposure in the second stage. –Parametric bootstrap –Parameter bootstrap –Szpiro, Sheppard, Lumley (2010). Efficient measurement error correction with spatially misaligned data. Available online. Use simulated exposure in the health analysis: –Generate multiple samples from the estimated exposure distribution –Plug into disease model and estimate parameters –Average estimates and fix the variance Gives biased estimates (Gryparis et al, 2009; Little 1992) Reasonable to simulate exposure for risk assessment

15 Exposure Measurement Error Correction – Simulation Results True health effect coefficient: β X = Partial parametric bootstrap only corrects for the Berkson-like error component 2 Parametric bootstrap based on 100 simulations; all others based on 2,000 Reference: Szpiro, Sheppard, Lumley (2010)

16 Exposure Measurement Error – Discussion The quality of exposure estimates affects health results –Assess: Bias Variance Coverage –Also relevant Study design and data structure –Monitoring network vs. subject locations Features of the underlying exposure Exposure prediction approach and estimation results Measurement error structure is complex and not purely classical or Berkson Emerging research findings suggest exposure prediction and health effect estimation should be treated as one problem