FOUR METHODS OF ESTIMATING PM 2.5 ANNUAL AVERAGES Yan Liu and Amy Nail Department of Statistics North Carolina State University EPA Office of Air Quality,

Slides:



Advertisements
Similar presentations
Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
Advertisements

Introduction to modelling extremes
Spatial point patterns and Geostatistics an introduction
Spatial autoregressive methods
The Simple Regression Model
Copyright 2002 David M. Hassenzahl Using r and  2 Statistics for Risk Analysis.
September 2000Department of Statistics Kansas State University 1 Statistics and Design of Experiments: Role in Research George A. Milliken, PhD Department.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Economics 20 - Prof. Anderson1 Time Series Data y t =  0 +  1 x t  k x tk + u t 2. Further Issues.
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
SPATIAL DATA ANALYSIS Tony E. Smith University of Pennsylvania Point Pattern Analysis Spatial Regression Analysis Continuous Pattern Analysis.
CHAPTER 5 TIME SERIES AND THEIR COMPONENTS (Page 165)
Structural Equation Modeling
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models A Collaborative Approach to Analyzing Stream Network Data Andrew A.
Deterministic Solutions Geostatistical Solutions
Some more issues of time series analysis Time series regression with modelling of error terms In a time series regression model the error terms are tentatively.
Time series analysis - lecture 5
Spatial Interpolation
FOUR METHODS OF ESTIMATING PM 2.5 ANNUAL AVERAGES Yan Liu and Amy Nail Department of Statistics North Carolina State University EPA Office of Air Quality,
Applied Geostatistics
ACDE model and estimability Why can’t we estimate (co)variances due to A, C, D and E simultaneously in a standard twin design?
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
Economics 20 - Prof. Anderson
Non-Seasonal Box-Jenkins Models
Ordinary Kriging Process in ArcGIS
Mapping Chemical Contaminants in Oceanic Sediments Around Point Loma’s Treated Wastewater Outfall Kerry Ritter Ken Schiff N. Scott Urquhart Dawn Olson.
LECTURE 16 STRUCTURAL EQUATION MODELING.
Introduction to Regression Analysis, Chapter 13,
Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;
Objectives of Multiple Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
MEASUREMENT MODELS. BASIC EQUATION x =  + e x = observed score  = true (latent) score: represents the score that would be obtained over many independent.
Sampling Design  M. Burgman & J. Carey Types of Samples Point samples (including neighbour distance samples) Transects line intercept sampling.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Geographic Information Science
Spatial Statistics in Ecology: Continuous Data Lecture Three.
Spatial Interpolation III
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 8 Analysis of Variance.
Mixed Effects Models Rebecca Atkins and Rachel Smith March 30, 2015.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
Introduction to kriging: The Best Linear Unbiased Estimator (BLUE) for space/time mapping.
Vamsi Sundus Shawnalee. “Data collected under different conditions (i.e. treatments)  whether the conditions are different from each other and […] how.
Exposure Assessment for Health Effect Studies: Insights from Air Pollution Epidemiology Lianne Sheppard University of Washington Special thanks to Sun-Young.
Geo479/579: Geostatistics Ch7. Spatial Continuity.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Multiple comparisons problem and solutions James M. Kilner
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Modeling Space/Time Variability with BMEGUI Prahlad Jat (1) and Marc Serre (1) (1) University of North Carolina at Chapel Hill.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
The General Linear Model Christophe Phillips SPM Short Course London, May 2013.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
B&A ; and REGRESSION - ANCOVA B&A ; and
The general linear model and Statistical Parametric Mapping
Ch3: Model Building through Regression
DATA FUSION & the CAAQS.
Paul D. Sampson Peter Guttorp
Statistical Assumptions for SLR
Chemical status (1) (A. V, 2.4.5)
The general linear model and Statistical Parametric Mapping
Principles of the Global Positioning System Lecture 11
Ch 4.1 & 4.2 Two dimensions concept
Poverty Maps for Sri Lanka
Presentation transcript:

FOUR METHODS OF ESTIMATING PM 2.5 ANNUAL AVERAGES Yan Liu and Amy Nail Department of Statistics North Carolina State University EPA Office of Air Quality, Planning, and Standards Emissions Monitoring, and Analysis Division

Project Objectives Estimation of annual average of PM 2.5 concentration Estimation of standard errors associated with annual average estimates Estimation of the probability that a site’s annual average exceeds 15 mg/m 3 At 2400 lattice points for 2000, 2001 Comparisons of 4 different methodologies: 1. Quarter-based analysis (Yan) 2. Annual-based analysis (Yan) Daily-based analyses: 3. “Doug’s method” (Bill) 4. Generalized least squares in SAS Proc Mixed (Amy)

Why are Standard Errors Important? We may estimate that the annual average for lattice point 329 is 16 mg/m 3, which exceeds the standard of 15. But since our estimate has some uncertainty or standard error, we’d like to take this uncertainty into account in order to determine the probability that lattice point 329 exceeds 15.

In addition to maps like this...

…we also want maps like this. Note: This Map is WRONG--so don’t show it to anyone! We haven’t figured out the correct way to determine errors, so we cannot correctly draw a probability map yet.

Map of 2400 Lattice Points

Data Description Concentrations of PM 2.5 measured during 2000, 2001 The domain analyzed: the portion of the U.S. east of –100 o longitude Concentrations measured every third day

Methods 3 & 4 - Daily-Based Used every third day data (122 days per year) Kriged each day to obtain predictions at 2400 lattice points At each lattice point fit a timeseries to the 122 days’ estimates to estimate annual average Calculated timeseries error for annual average (using proc arima)

Method 4 - “Amy’s Method” Fit a quadratic surface using Generalized Least Squares in SAS Proc Mixed Restricted (or residual) Maximum Likelihood used to estimate all parameters Did not assume errors iid when fitting quad surf, so coefficients in quad surf estimated based on cov structure Specified an exponential covariance structure with a nugget Estimated each parameter each day

Model for one day Y ij =  o +  1 i +  2 i 2 +  3 j +  4 j 2 +  5 ij +  ij Where i = lattitude j = longitude E(  ij ) = 0 Cov(  ij,  I’j’ ) =  2 n +  2 e -dist/  i=i’and j=j’  2 e -dist/  i  i’ or j  j’

Model for one site Y k = µ +  (Y k-1 - µ) + e k k = 1,…,122 Where E(e k ) = 0 Var (e k ) =  2 Note: this is an AR1 model. The errors are iid (0,  2 ) because the temporal correlation is accounted for using the  (Y k-1 - µ) term.

What if we “propagate” errors? At a given lattice point we have 122 days’ worth of predictions, each with a kriging prediction error. What if we treat the 122 days as independent observations (they aren’t, they are AR1) and combine the errors accordingly? We do this for each of our 2400 lattice points.

The Big Problem None of our standard error estimates are correct! We need to learn how to put spatial error components together with temporal error components.

Model for all sites and days? Y ijk =  o,k +  1,k i +  2,k i 2 +  3,k j +  4,k j 2 +  5,k ij +  ijk + e ijk Where E(  ijk ) = 0, E(e ijk ) = 0 We’ve assumed isotropy and stationarity for simplicity. But how do we model Cov(  ijk,  i’j’k’ ), Cov(e ijk, e i’j’k’ ), and Cov (  ijk, e i’j’k’ )?

Separability We’ve been treating the covariance structure as separable--meaning that the 1-D temporal and 2-D spatial covariance structures can be estimated separately and then can be mathematically combined to obtain a 3-D space-time covariance structure. We need to test for separability, and if the covariance components are separable, we need to appropriately combine them. We are just now learning how to do this.

Next steps…. Investigate the separability of the covariance structure and the correct method for combining space and time covariance components. Attempt a 3-dimensional kriging. No assumption of separability is required to do this. We must, however, write our own code for this project because there is no software package (to our knowledge) that performs such an analysis. This method would allow us to use even more data than we are using now, as we would not be restricted to every third day.

Other next steps…. Try two methods Stefanski recommended. One method avoids the issue of separability by treating the kriging prediction errors as measurement errors on the timeseries “observations.” The other method…