1 Case Example: Using a Stratified Sampling Design & Field XRF to Reduce the 95% UCL for Residential Soil Lead Deana Crumbling, EPA/OSRTI/TIFSD

Slides:



Advertisements
Similar presentations
Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
Advertisements

Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2010.
Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2008.
Lessons Learned Multi Incremental Sampling Alaska Forum on the Environment February, 2009 Alaska Department of Environmental Conservation.
Comparing One Sample to its Population
1 Manufacturing Process A sequence of activities that is intended to achieve a result (Juran). Quality of Manufacturing Process depends on Entry Criteria.
Incremental-Composite Sampling (ICS) and XRF: Tools for Improved Soil Data Deana Crumbling USEPA Office of Superfund Remediation and Technology Innovation.
Incremental-Composite Sampling (ICS) and XRF: Tools for Improved Soil Data Deana Crumbling USEPA Office of Superfund Remediation and Technology Innovation.
Inference for Regression
Module 61 Module 6: Uncertainty Don’t just calculate—first think about sources of error, and don’t double-count errors.
Measurement Systems Analysis with R&R INNOVATOR Lite TM The House of Quality presents.
Correlation and regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
MARLAP Measurement Uncertainty
ASSESSING PAST USE OF TANNERY WASTE SLUDGE AS FARM FIELD FERTILIZER IN NORTHWEST MISSOURI July 20, 2010 ITRC Meeting Seattle, WA.
PSY 307 – Statistics for the Behavioral Sciences
Types of Errors Difference between measured result and true value. u Illegitimate errors u Blunders resulting from mistakes in procedure. You must be careful.
Sample Size Determination In the Context of Hypothesis Testing
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Introduction to Regression Analysis, Chapter 13,
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
STA291 Statistical Methods Lecture 27. Inference for Regression.
1 of 25 The EPA 7-Step DQO Process Step 5 - Define Decision Rules 15 minutes Presenter: Sebastian Tindall DQO Training Course Day 2 Module 14.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Created with MindGenius Business 2005® 3. Sampling (ELO) “A defined procedure whereby a part of a substance, material or product is taken to provide for.
1 of 35 The EPA 7-Step DQO Process Step 4 - Specify Boundaries (30 minutes) Presenter: Sebastian Tindall Day 2 DQO Training Course Module 4.
Virtual COMSATS Inferential Statistics Lecture-6
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
PARAMETRIC STATISTICAL INFERENCE
3.1-1 Advanced Design Application & Data Analysis for Field-Portable XRF Contact: Stephen Dyment, OSRTI/TIFSD,
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
A Statistical Analysis of Seedlings Planted in the Encampment Forest Association By: Tony Nixon.
PM2.5 Model Performance Evaluation- Purpose and Goals PM Model Evaluation Workshop February 10, 2004 Chapel Hill, NC Brian Timin EPA/OAQPS.
Evaluation of Alternative Methods for Identifying High Collision Concentration Locations Raghavan Srinivasan 1 Craig Lyon 2 Bhagwant Persaud 2 Carol Martell.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
CHEM2017 ANALYTICAL CHEMISTRY
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Spatial Association Defining the relationship between two variables.
1 of 39 The EPA 7-Step DQO Process Step 7 - Optimize Sample Design DQO Case Study 45 minutes Presenter: Sebastian Tindall DQO Training Course Day 3 Module.
1 of 49 Key Concepts Underlying DQOs and VSP DQO Training Course Day 1 Module minutes (75 minute lunch break) Presenter: Sebastian Tindall.
1 of 37 Key Concepts Underlying DQOs and VSP DQO Training Course Day 1 Module 4 (60 minutes) (75 minute lunch break) Presenter: Sebastian Tindall.
1 of 39 The EPA 7-Step DQO Process Step 3 - Identify Inputs (45 minutes) Presenter: Sebastian Tindall Day 2 DQO Training Course Module 3.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
1 of 36 The EPA 7-Step DQO Process Step 6 - Specify Error Tolerances (60 minutes) (15 minute Morning Break) Presenter: Sebastian Tindall DQO Training Course.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
1 of 27 The EPA 7-Step DQO Process Step 5 - Define Decision Rules (15 minutes) Presenter: Sebastian Tindall Day 2 DQO Training Course Module 5.

McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Designing the Sample.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Chapter 11: Estimation of Population Means. We’ll examine two types of estimates: point estimates and interval estimates.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
1 of 31 The EPA 7-Step DQO Process Step 6 - Specify Error Tolerances 60 minutes (15 minute Morning Break) Presenter: Sebastian Tindall DQO Training Course.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
THE SCIENTIFIC METHOD: It’s the method you use to study a question scientifically.
1 of 48 The EPA 7-Step DQO Process Step 6 - Specify Error Tolerances 3:00 PM - 3:30 PM (30 minutes) Presenter: Sebastian Tindall Day 2 DQO Training Course.
Spatial Distribution of Arsenic in Ohio Soils Nate Wanner, CPG Cox-Colvin & Associates, Inc. Ohio Brownfield Conference 2016 Columbus, Ohio April 6, 2016.
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
Probability & Statistics Review I 1. Normal Distribution 2. Sampling Distribution 3. Inference - Confidence Interval.
Chapter 13 Simple Linear Regression
Meta-analysis statistical models: Fixed-effect vs. random-effects
The Practice of Statistics in the Life Sciences Fourth Edition
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Saturday, August 06, 2016 Farrokh Alemi, PhD.
Presenter: Deana Crumbling, USEPA
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

1 Case Example: Using a Stratified Sampling Design & Field XRF to Reduce the 95% UCL for Residential Soil Lead Deana Crumbling, EPA/OSRTI/TIFSD EPA Annual Quality Conference

2 What things increase the interval between the sample mean & UCL? What creates high data variability?  High variability in data set  Data set is from a non-normal or non- parametric distribution  Small number of physical samples in the statistical sample  True changes in matrix concentrations across space  Inadequate soil sample homogenization  Artifact caused small analytical subsample mass

3 Variability as an artifact of small analytical sample mass As analytical sample volumes increase, data variability decreases & distribution goes from lognormal to normal (assumes whole sample is measured)

4 Reduce the UCL by addressing: By procedures that support: Variability artifacts Non-normal statistical distributions  Small number of physical samples in the statistical sample High variability due to true variation Sample homogenization Increased sample mass True changes in matrix concentrations across space Physical manipulation of sample, increase volume (MIS) and/or sufficient replicate analyses }

5 Can anything be done about true spatial variations in concentration?  Methods for Evaluating the Attainment of Cleanup Standards Volume 1: Soils and Solid Media”, 1989, section  Guidance on Choosing a Sampling Design for Environmental Data Collection (EPA QA/G-5S), 2002, Chap  Data Quality Assessment: Statistical Methods for Practitioners (EPA QA/G-9S), 2006, section  Purpose: determine the overall mean & UCL for a decision unit (DU) when different sections of the DU have different means & standard deviations (SDs). (Statistical) Stratified Sampling Design

6 What Makes a Stratified Design Different? To calculate average over the entire area, routine practice is that data go straight into a database, and then… Sum(all) = 2736; then 2736 ÷ 12 = 228 ppm “Dividing by 12” assumes equal weight is given to each sample (1/12 th of total area) 16 * 22 * 20 * 18 * 15 * 21 * 25 * 120 * 184 * 155 * *

7 But the CSM supports partitioning the site into 3 distinct portions based on similar populations 20(0.75) + 153(0.20) (0.05) = 99 ppm 16 * 22 * 20 * 18 * 15 * 21 * 25 * 75% of area ave = * 184 * 155 * 20% of area ave = * 5% of area; ave = 1070 * A spatially weighted mean makes a difference! 143 (Δ=44)434 (Δ=196) 95% UCL SD Mean StratifiedRoutineLowMidHighArea

8 Basic Principles of a Stratified Sampling Design The CSM is the basis for defining both the DU & its strata  Decision Unit (DU) = a unit for which a decision is made: a single drum, a batch of drums, risk exposure unit, remediation unit, etc.  The DU is the volume & dimensions over which an average conc is desired  Strata are created by different release or transport mechanisms – cause different contaminant patterns in within the DU  Target properties like conc level & variability differ from strata to strata w/in the DU

9 Basic Principles (cont’d)  DU is delineated (stratified) into non-overlapping subsections according to the CSM  Each stratum’s area/volume is recorded as a fraction of the DU’s area/volume  Each stratum’s conc mean & SD determined  The means & SDs are weighted and mathematically combined  overall mean & UCL for the DU  Can apply stratification to data analysis even if not planned into sampling, but must have spatial info & final CSM available

10 Benefits of a Stratified Sampling Design  Small areas of very high or low conc do not bias the overall mean of the DU.  Reduces variability (SD) in the DU data set  Reduces statistical uncertainty (as distance between mean & UCL)  Preserves spatial information to identify source/transport mechanisms & support remedial design.

Case Example: XRF with stratified sampling design Properties in old town near Pb battery recycling plant XRF Pb data from bagged soil samples (~300 gram) Plastic bag of soil

12 Decision Goals  Resolve confusion over past conflicting data.  Determine mean (95% UCL) for exposure unit (entire yard): 500 ppm risk-based A/L; if over, cleanup high contamination areas  Pb source? Suggested by spatial contaminant pattern (does facility have liability?)  Property divided into 3 sections (strata)  Front yard (likely “same” conc within & own SD)  Side yard (ditto)  Back yard (ditto)  Each stratum  5 ~equal subsections (sample units)  1 grab (or MIS) sample ( g) into plastic bag  5 sample units/stratum or 15 sample units/DU (the EU) Data Collection Design

13 Preliminary CSM of Simplified Property Back Yard: 5 Samples Front Yard: 5 Samples { Side Yard: 5 Bagged Samples { House Footprint { Area fx = 0.60 Area fraction = 0.25 Area fx = 0.15 Action Level (entire yard) = 500 ppm Potential release: Traffic (facility truck, Pb gasoline); Pb house paint; facility’s atmospheric deposition; combination. Expected Pb conc: Higher. Potential release: Pb paint; atmos dep. Pb conc: Uncertain (near road, house?) Potential release: Pb paint (near structures); atmos dep. Expected Pb conc: Lower.

14 XRF Bag Analysis 4 30-sec XRF readings on bag –(2 on front & 2 on back) Results entered real-time into pre-programmed spreadsheet Spreadsheet immediately calculates: 1.ave & SD for each bag 2.ave & SD within each strata (yard section), 3.ave & UCL for the decision unit (entire property). 4.the greater of within-bag vs. between-bag variability IF statistical uncertainty interferes w/ desired decision confidence for DU: –Use #4 & a series of decision trees to reduce statistical uncertain until confident decision possible

15 Minimizing Variability Improves Statistical Confidence in EPCs Strategy & Results for Example Yard Mean (XRF) 95UCL (1/2 CI width) uncontrolled micro-scale (within-bag) variability (single analysis) & routine calc (171) control within-bag variability (replicates); still use routine EPC calculation (154) stratified sampling & data analysis on preliminary CSM (35) stratified sampling & data analysis on mature CSM (32) NOTE: “Routine” calculation applies same weighting to data points & database loses their spatial representativeness Note: ½ CI width = mean-to-UCL width

Preliminary CSM: an informed hypothesis about strata boundaries House Footprint Mature CSM: Data confirms or modifies hypothesis about strata boundaries Data Used to Mature the CSM

17 Progressive Data Uncertainty Management Unit Value (ppm) CV 95% UCL (Mean-to-UCL width) 1 XRF reading on 1 Front yard (FY) bag (instrument-reported error) XRF instrument only 801* (51*) 1 Bag (4 XRF readings on same bag) micro- scale 870 (81) Immature CSM, FY section only (10 bag samples) short- scale 907 (136) Mature CSM, revised FY section only (7 bag samples) CSM ↓ 977 (77) Combine w/ Side & Back sections  mature CSM, entire yard (area-wt’d) long- scale over property 231 (32) stratification & ↑ n → ↓ width * Normal z-distribution used for the XRF instrument’s counting statistics, rest of rows use the t-distribution

18 There is the question of XRF-ICP data comparability for this project, but no time in this talk to cover it. Bottom line: adjustment of the XRF data set to be more comparable to the ICP data set was needed, however, it did not change the compliance decision for this property.

19 There is the Question of XRF-ICP Data Comparability  No time here for details about XRF-ICP data comparability. But there was a problem.  XRF was significantly biased LOW compared to ICP.  Investigation found the plastic bags that were used decreased XRF signal.  When plastic interference combined with moistures in 15-20% range, the XRF data needed adjustment to be more comparable to the ICP data.  Adjusted XRF data were usable for decisions.

20 How Do They Compare? Notice strong upward deviation from the ideal regression line (i.e., slope > 1). Indicates that ICP results are consistently higher than XRF results

21 Adjusting XRF Data to Make More Comparable to ICP There was a consistent, statistically significant bias between XRF & ICP data. Math. relationship consistent enough to adjust the XRF results using the ICP vs. XRF regression eqn. Strategy: use the XRF data (x) to predict what ICP results (y) would be Don’t need to adjust every XRF data point. Can adjust the XRF means & UCLs directly. (Note: DO NOT adjust SDs!)

22 Previous summary table w/ adjustment for XRF bias Strategy Mean95UCL XRF Adj XRF Adj uncontrolled var. & traditional calc controlled var.; traditional calc stratified on prelim CSM stratified on mature CSM Decision: The Pb conc for this property is compliant with the 500 ppm risk benchmark.

23 Deana M. Crumbling, M.S. U.S. EPA, Office of Superfund Remediation & Technology Innovation 1200 Pennsylvania Ave., NW (5203P) Washington, DC PH: (703) Questions ?