Using logistic regression to predict the probability of occurrence of volatile organic compounds in ground water by Michael Moran NAWQA VOC National Synthesis.

Slides:



Advertisements
Similar presentations
Regional Scale Ground-Water Vulnerability Assessments in the Mid-Atlantic Region Based on Statistical Probability Models A Multi-Scale and Multi-Threshold.
Advertisements

Dale T Littlejohn Senior Geologist. What is fate and transport in the vadose zone? Vadose Zone Hydrocarbon release from buried pipeline Aquifer Surface.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Using isotopic analysis to determine the source and fate of groundwater contamination.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Very-Short-Range Forecast of Precipitation in Japan World Weather Research Program Symposium on Nowcasting and Very Short Range Forecasting Toulouse France,
How to Build a Groundwater Model Activity Source Created by the USA Groundwater Foundation; modified from the Science Olympiad event, Awesome Aquifers.
Statistics 350 Lecture 21. Today Last Day: Tests and partial R 2 Today: Multicollinearity.
Statistics: Data Analysis and Presentation Fr Clinic II.
Linear Regression and Correlation
Biological and Environmental Engineering Soil & Water Research Group Spatial Variability of Groundwater Soluble Phosphorous on an Alluvial Valley-Fill.
Air toxics measurements in a Seattle neighborhood Doris Montecastro and Hal Westberg Laboratory for Atmospheric Research Dept. of Civil and Environmental.
U.S. Department of the Interior U.S. Geological Survey Characterizing the Landscape for Water-Quality Analysis Methods and implementation 2006 National.
Analyzing quantitative data – section III Week 10 Lecture 1.
THE IDENTIFICATION PROBLEM
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Contamination of Ground Water by Perchloroethene (PCE) – A National Perspective By Michael Moran U.S. Geological Survey U.S. Department of the Interior.
Correlation & Regression
Chapter 11 Simple Regression
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Soil Contamination Countermeasures Hyogo Prefectural Government Agricultural & Environmental Affairs Department Environmental Management Bureau Water &
Overview of USGS Groundwater Quality Assessment Activities and Related Data in Alabama 2011 Alabama Water Resources Conference September 9, 2011, Perdido.
Statistics for Decision Making Bivariate Descriptive Statistics QM Fall 2003 Instructor: John Seydel, Ph.D.
SESSION Last Update 17 th June 2011 Regression.
Basic Social Statistic for AL Geography HO Pui-sing.
Groundwater Quality Beneath an Area of Urban Residential and Commercial Land Use, Mobile, Alabama Alabama Water Resources Conference September.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Spatial Association Defining the relationship between two variables.
Statistical Methods Statistical Methods Descriptive Inferential
GROUND WATER MONITORING TO EVALUATE EFFECTS OF LAND USE ON WATER QUALITY Mike Trojan Erin Eid Jennifer Maloney Jim Stockinger Minnesota Pollution Control.
Introduction to Behavioral Statistics Correlation & Regression.
Proposed Tools and Approach for Ground-Water Vulnerability Assessment (GWAVA) Using a Geographic Information System and Simulation Modeling Jack Barbash.
U.S. Department of the Interior U.S. Geological Survey Charles G. Crawford and Robert J. Gilliom National Water-Quality Assessment Program Pesticide National.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Chapter 16 Data Analysis: Testing for Associations.
Probability of Detecting Atrazine and Elevated Concentrations of Nitrate in Colorado’s Ground Water USGS Water-Resources Investigations Report
U.S. Department of the Interior U.S. Geological Survey Assessment of Shallow Ground-Water Quality in Agricultural and Urban Areas Within the Arid and Semiarid.
Tracking Groundwater Contamination
Trends in Shallow Ground-Water Quality of the Delmarva Peninsula Results from regional and local studies Linda M. Debrewer Judith M. Denver U.S. Department.
Chapter 6 (cont.) Difference Estimation. Recall the Regression Estimation Procedure 2.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Working With Simple Models to Predict Contaminant Migration Matt Small U.S. EPA, Region 9, Underground Storage Tanks Program Office.
1 LANDFILL GAS IMPACTS TO SHALLOW GROUNDWATER Steve Wampler, AquAeTer, Inc. Louis Bull, Waste Management Groundwater Protection Program What is the real.
Ground-Water Quality Trends in the South Platte River Alluvial Aquifer, Colorado Suzanne S. Paschke, Shana L. Mashburn, Jennifer L. Flynn, and Breton Bruce.
Quality of Ground Water and Finished Water of Community Water Systems – Preliminary Findings U.S. Department of Interior U.S. Geological Survey Jessica.
SWBAT: Calculate and interpret the residual plot for a line of regression Do Now: Do heavier cars really use more gasoline? In the following data set,
Hydrogeologic Analysis of the Delphi Corporation Site, Wyoming Michigan Mark Bryson, Emily Daniels, Sara Nagorsen, Kirk Perschbacher, Joe Root, Jason Stewart,
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
Ch.4. Groundwater  Recharge in Arid Region Evaporation is significant, which can make a big enrichment in isotopic compositions Evaporative enrichment.
Jim Stockinger, Erin Eid, Jennifer Maloney and Mike Trojan
Environmental Modeling Weighting GIS Layers Weighting GIS Layers.
By Alex Walton Josh Bush Alex Walton, Josh Bush1.
MULTIVARIATE ANALYSIS. Multivariate analysis  It refers to all statistical techniques that simultaneously analyze multiple measurements on objects under.
UNEXPECTED VOCS IN SOIL GAS ASSESSMENT RESULTS James M. Harless, PhD, CHMM Vice President / Principal Cheryl Kehres-Dietrich, CGWP Principal Paul Roberts.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
NAWQA Program U.S. Department of the Interior U.S. Geological Survey Regression models for explaining and predicting concentrations of organochlorine pesticides.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Assessing Ground Water Vulnerability through Statistical Methods I NWQMC May 7-11, 2006 San Jose, CA NWQMC May 7-11, 2006 San Jose, CA.
Final Presentation of GIS in Water Resources
Causality, Null Hypothesis Testing, and Bivariate Analysis
Leaking Underground Storage Tanks
Introduction to Behavioral Statistics
SA3202 Statistical Methods for Social Sciences
Linear Regression and Correlation
Linear Regression and Correlation
Descriptive Statistics Univariate Data
Presentation transcript:

Using logistic regression to predict the probability of occurrence of volatile organic compounds in ground water by Michael Moran NAWQA VOC National Synthesis

NAWQA VOC Monitoring Cycle 1 Ground Water Sampling   3,883 NAWQA-sampled wells analyzed for 55 target VOCs   161 total networks   1,185 wells sampled by other agencies (Retro) and analyzed for up to 55 VOCs   98 MAS networks   33 Urban networks   30 Agricultural networks

NAWQA VOC Monitoring

Statistical Analyses Hypothesis Testing   Contingency tables   Mann-Whitney test   Kolmogorov-Smirnov test Correlation   Spearman correlation Associations   Multivariate logistic regression

Logistic Regression Advantages   Response is ideal for censored data   Can be univariate or multivariate   Can identify variables associated with response   Can predict probability of binary or ordinal response relative to independent variables Disadvantages   Independent variables often generalized   Can be difficult to extrapolate probability beyond sample point

Logistic Regression Logistic Regression Equation where, P = probability of detecting VOC,  0 = y-intercept,  i = slope coefficient of X 1 to X i explanatory variables, X i = 1 to i explanatory variables

Prediction   MTBE and 4 solvents; DCM, PCE, TCA, TCE   Regressions were done to establish associations   No attempt to predict probability of occurrence VOC Analyses Process Understanding   Any VOC for all networks – National Scale [ES&T, vol. 33, no. 23, p ]   Individual VOCs in urban land use studies [ES&T, vol. 38, no. 20, p ]   Most frequently detected VOCs analyzed

Prediction – National Scale   Prediction was applied to analysis of any VOC for all networks [ES&T, vol. 33, no. 23, p ]   Logistic regression equation was univariate   Probability of occurrence of any VOC was only related to population density   Although little was learned about STF, the probability of occurrence could be extrapolated VOC Analyses

Taken from: Squillace, P.J., Moran, M.J., Lapham, W.W., Price, C.V., Clawges, R.M., and Zogorski, J.S., 1999, Volatile organic compounds in untreated ambient groundwater of the United States, : Environmental Science & Technology, v. 33, no. 23, p VOC Analyses

Process Understanding – Urban Areas   Prediction could not be applied to analysis of individual VOCs in urban land-use studies [ES&T, vol. 38, no. 20, p ] VOC Analyses   Logistic regression only yielded associations   Many equations included variables that could not be extrapolated beyond the sample point   Example: dissolved oxygen was related to many VOCs but the concentration of dissolved oxygen was unknown beyond the sampled well

Process Understanding – Major Aquifers VOC Analyses   Source and transport variables often surrogates   Example: sources often were population density and urban land use   For some variables, extrapolation beyond the sample point may not be possible   Example: transport variable depth to top of open interval

Process Understanding – Various VOCs in Ground Water VOC Analyses VOCAssociated variables Methyl tert-butyl ether (MTBE) population density use of MTBE in gasoline recharge aquifer consolidation soil permeability leaking underground storage tanks Perchlorethene (PCE) urban land use sand content of soil dissolved oxygen soil erodibility RCRA sites within 1km depth to top of screened interval septic systems within 1km Trichloroethene (TCE) population density dissolved oxygen RCRA sites within 1km CERCLA sites within 1km casing diameter 1,1,1- Trichloroethane (TCA) dissolved oxygen urban land use population density depth to top of screened interval recharge CERCLA sites within 1km septic systems within 1km Methylene chloride (DCM) population density bulk density of soil sand content of soil urban land use median year of home construction - estimation is possible - estimation may be difficult

Observations Logistic Regression IS useful for:   Identifying variables associated with occurrence of VOCs in ground water  This may be useful for determining the factors that cause or influence occurrence   Ranking variables associated with occurrence by importance  This may be useful for determining the relative importance of factors that cause or influence occurrence

Observations Logistic Regression MAY BE useful for: S Predicting the probability of occurrence of VOCs in ground water provided: 1) 1) Data are not spatially limited 2) Data are not from highly specific sampling approaches 3) Explanatory variables can be extrapolated

Hypothetical Cases Hypothetical Aquifer Study Area Sampled wells used to develop LR equation Extrapolate LR equation to here?

Hypothetical Urban Area New residential/ commercial areas Extrapolate LR equation to here? Sampled wells used to develop LR equation Hypothetical Cases

Well A Point B Average withdrawal point dissolved oxygen? industry? Benzene Occurrence Extrapolate LR equation to here? Hypothetical Cases

Conclusions   Logistic regression is useful for determining associations that may aid in understanding sources, transport, and fate   For determining associations, logistic regression should use explanatory variables that are as specific as possible   Although associations found using logistic regression may be insightful, they do not imply a direct cause and effect

Conclusions   For prediction purposes, logistic regression should use explanatory variables that: 1) 1) Are ubiquitous to the area of interest 2) Can be estimated or extrapolated   Logistic regression can be useful in predicting the probability of occurrence of ground water contaminants if: 1) 1) Data are not highly localized 2) Data are not from highly specific sampling approaches