Erin Peterson Geosciences Department Colorado State University Fort Collins, Colorado Predicting Water Quality Impaired Stream Segments using Landscape-scale.

Erin Peterson Geosciences Department Colorado State University Fort Collins, Colorado Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional Geostatistical Model

The work reported here was developed under STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. EPA does not endorse any products or commercial services mentioned in this presentation. Space-Time Aquatic Resources Modeling and Analysis Program This research is funded by U.S.EPA凡Science To Achieve Results (STAR) Program Cooperative Agreement # CR-829095 This research is funded by U.S.EPA凡Science To Achieve Results (STAR) Program Cooperative Agreement # CR-829095

Overview

The Clean Water Act (CWA) 1972 Section 303(d) Requires states and tribes to ID water quality impaired stream segments Section 305(b) Create a biannual water quality inventory –Characterizes regional water quality –Based on attainment of designated-use standards assigned to individual stream segments

Probability-based Random Survey Designs Used to meet section 305(b) requirements Derive a regional estimate of stream condition –Assign a weight based on stream order –Provides representative sample of streams by order –Statistical inference about population of streams, within stream order, over large area –Reported in stream miles based on inference of attainment Disadvantages Does not take watershed influence into account Does not ID spatial location of impaired stream segments Fails to meet requirements of CWA Section 303(d)

Purpose Develop a geostatistical methodology based on coarse-scale GIS data and field surveys that can be used to predict water quality characteristics about stream segments found throughout a large geographic area (e.g., state)

SCALE: Grain Substrate Biotic Condition Overhanging Vegetation Segment River Network Network Connectivity Tributary Size Differences Network Geometry Stream Network Connectivity Flow Direction Network Configuration Drainage Density Confluence Density Cross Sectional Area Channel Slope, Bed Materials Large Woody Debris Biotic Condition, Substrate Type, Overlapping Vegetation Detritus, Macrophytes Microhabitat Segment Contributing Area Riparian Vegetation Type & Condition Floodplain / Valley Floor Width Localized Disturbances Land Use/ Land Cover Landscape Climate Atmospheric deposition Geology Topography Soil Type Microhabitat Shading Detritus Inputs Riparian Zone Nested Watersheds Land Use Topography Vegetation Type Basin Shape/Size COARSE FINE Reach Aquatic Terrestrial

a.k.a. Kriging Interpolation method Allows spatial autocorrelation in error term More accurate predictions Fit an autocovariance function to data Describes relationship between observations based on separation distance Geostatistical Modeling Separation Distance Semivariance Sill Nugget Range 1000 0 0 10 3 Autocovariance Parameters 1)Nugget: variation between sites as separation distance approaches zero 2)Sill: delineated where semivariance asymptotes 3)Range: distance within which spatial autocorrelation occurs

Distance Measures & Spatial Relationships A B C Straight-line Distance (SLD) Geostatistical models typically based on SLD Distances and relationships are represented differently depending on the distance measure

A B C Symmetric Hydrologic Distance (SHD) Hydrologic connectivity: Fish movement Distance Measures & Spatial Relationships

A B C Distances and relationships are represented differently depending on the distance measure Distance Measures & Spatial Relationships Asymmetric Hydrologic Distance Longitudinal transport of material

A B C Challenge: Spatial autocovariance models developed for SLD may not be valid for hydrologic distances –Covariance matrix is not positive definite Distances and relationships are represented differently depending on the distance measure Distance Measures & Spatial Relationships

Asymmetric Autocovariance Models for Stream Networks Weighted asymmetric hydrologic distance (WAHD) Developed by Jay Ver Hoef, National Marine Mammal Laboratory, Seattle Moving average models Incorporate flow volume, flow direction, and use hydrologic distance Positive definite covariance matrices Flow Ver Hoef, J.M., Peterson, E.E., and Theobald, D.M., Spatial Statistical Models that Use Flow and Stream Distance, Environmental and Ecological Statistics. In Press.

Patterns of Spatial Autocorrelation in Stream Water Chemistry

Evaluate 8 chemical response variables 1.pH measured in the lab (PHLAB) 2.Conductivity (COND) measured in the lab μmho/cm 3.Dissolved oxygen (DO) mg/l 4.Dissolved organic carbon (DOC) mg/l 5.Nitrate-nitrogen (NO3) mg/l 6.Sulfate (SO4) mg/l 7.Acid neutralizing capacity (ANC) μeq/l 8.Temperature (TEMP) °C Determine which distance measure is most appropriate SLD SHD WAHD More than one? Find the range of spatial autocorrelation Objectives

Dataset Maryland Biological Stream Survey (MBSS) Data Maryland Department of Natural Resources –1995, 1996, 1997 Stratified probability-based random survey design 881 sites in 17 interbasins

Baltimore Annapolis Washington D.C. Northeastern U.S. Study Area Chesapeake Bay Maryland

Spatial Distribution of MBSS Data N

GIS Tools Automated tools needed to extract data about hydrologic relationships between survey sites did not exist! Wrote Visual Basic for Applications (VBA) programs to: 1.Calculate watershed covariates for each stream segment Functional Linkage of Watersheds and Streams (FLoWS) 2.Calculate separation distances between sites SLD, SHD, Asymmetric hydrologic distance (AHD) 3.Calculate the spatial weights for the WAHD 4.Convert GIS data to a format compatible with statistics software FLoWS tools will be available on the STARMAP website: http://nrel.colostate.edu/projects/starmap 1 2 3 1 2 3 SLD 12 3 SHDAHD

Spatial Weights for WAHD Proportional influence (PI): influence of each neighboring survey site on a downstream survey site Weighted by catchment area: Surrogate for flow volume 1.Calculate the PI of each upstream segment on segment directly downstream 2.Calculate the PI of one survey site on another site Flow-connected sites Multiply the segment PIs BA C Watershed Segment B Watershed Segment A Segment PI of A Watershed Area A Watershed Area B =

Proportional influence (PI): influence of each neighboring survey site on a downstream survey site Weighted by catchment area: Surrogate for flow volume A B C D E F G H survey sites stream segment Spatial Weights for WAHD 1.Calculate the PI of each upstream segment on segment directly downstream 2.Calculate the PI of one survey site on another site Flow-connected sites Multiply the segment PIs

Proportional influence (PI): influence of each neighboring survey site on a downstream survey site Weighted by catchment area: Surrogate for flow volume A B C D E F G H Site PI = B * D * F * G Spatial Weights for WAHD 1.Calculate the PI of each upstream segment on segment directly downstream 2.Calculate the PI of one survey site on another site Flow-connected sites Multiply the segment PIs

Data for Geostatistical Modeling 1.Distance matrices SLD, SHD, AHD 2.Spatial weights matrix Contains flow dependent weights for WAHD 3.Watershed covariates Lumped watershed covariates –Mean elevation, % Urban 4.Observations MBSS survey sites

Validation Set Unique for each chemical response variable 100 sites Initial Covariate Selection Reduce covariates to 5 Model Development Restricted model space to all possible linear models –Model set = 32 models (2 5 models) One model set for: –General linear model (GLM), SLD, SHD, and WAHD models Geostatistical Modeling Methods

Geostatistical model parameter estimation Maximize the profile log-likelihood function Geostatistical Modeling Methods Log-likelihood function of the parameters ( ) given the observed data Z is: Maximizing the log-likelihood with respect to B and sigma2 yields: and Both maximum likelihood estimators can be written as functions of alone Derive the profile log-likelihood function by substituting the MLEs ( ) back into the log-likelihood function

Fit exponential autocorrelation function Model selection within model set GLM: Akaike Information Corrected Criterion (AICC) Geostatistical models: Spatial AICC (Hoeting et al., in press) where is the covariance based on the distance between two sites, D, given the covariance parameter estimates: nugget ( ), sill ( ), and range ( ). Geostatistical Modeling Methods where n is the number of observations, p-1 is the number of covariates, and k is the number of autocorrelation parameters. http://www.stat.colostate.edu/~jah/papers/spavarsel.pdf

Geostatistical Modeling Methods Model selection between model types 100 Predictions: Universal kriging algorithm Mean square prediction error (MSPE) Cannot use AICC to compare models based on different distance measures Model comparison: r 2 for observed vs. predicted values

Results Summary statistics for distance measures Spatial neighborhood differs Affects number of neighboring sites Affects median, mean, and maximum separation distance * Asymmetric hydrologic distance is not weighted here Summary statistics for distance measures in kilometers using DO (n=826).

SLD SHD WAHD 180.79301.76 Range of spatial autocorrelation differs: Shortest for SLD TEMP = shortest range values DO = largest range values Results Mean Range Values SLD = 28.2 km SHD = 88.03 km WAHD = 57.8 km

MSPE GLM SLD SHD WAHD Distance Measures: GLM always has less predictive ability More than one distance measure usually performed well SLD, SHD, WAHD: PHLAB & DOC SLD and SHD : ANC, DO, NO3 WAHD & SHD: COND, TEMP SLD distance: SO4 Results

Strong: ANC, COND, DOC, NO3, PHLAB Weak: DO, TEMP, SO4 GLM SLD SHD WAHD r2r2 Results r2r2 Predictive ability of models:

Discussion Site’s relative influence on other sites Dictates form and size of spatial neighborhood Important because… Impacts accuracy of the geostatistical model predictions Distance measure influences how spatial relationships are represented in a stream network SHDWAHD SLD

SHD Geostatistical models describe more variability than GLM Patterns of spatial autocorrelation found at relatively coarse scale > 1 distance measure performed well SLD never substantially inferior Do not represent movement through network Different range of spatial autocorrelation? Larger SHD and WAHD range values Separation distance larger when restricted to network SLD, SHD, and WAHD represent spatial autocorrelation in continuous coarse-scale variables

Discussion Probability-based random survey design (-) affected WAHD Maximize spatial independence of sites Does not represent spatial relationships in networks Validation sites randomly selected Frequency Number of Neighboring Sites 244 sites did not have neighbors Sample Size = 881 Number of sites with ≤1 neighbor: 393 Mean number of neighbors per site: 2.81

Discussion 4500 0 Difference Number of Neighboring Sites 0 1234567 8 91011121314171516 WAHD GLM Not when neighbors had: Similar watershed conditions Significantly different chemical response values WAHD models explained more variability as neighboring sites increased

GLM predictions improved as number of neighbors increased Clusters of sites in space have similar watershed conditions –Statistical regression pulled towards the cluster GLM contained hidden spatial information –Explained additional variability in data with > neighbors 4500 0 Difference Number of Neighboring Sites 0 1234567 8 91011121314171516 WAHD GLM Discussion

Predictive Ability of Geostatistical Models r2r2 PH Coarse Fine Scale of dominant ecological processes ANC NO3 COND DOC SO4 DO 0 0.5 1.0 TEMP

Conclusions 1)Spatial autocorrelation exists in stream chemistry data at a relatively coarse scale 2)Geostatistical models improve the accuracy of water chemistry predictions 3)Patterns of spatial autocorrelation differ between chemical response variables Ecological processes acting at different spatial scales 4)SLD is the most suitable distance measure at regional scale at this time Unsuitable survey designs SHD: GIS processing time is prohibitive

Conclusions 5)Results are scale specific Spatial patterns change with survey scale Other patterns may emerge at shorter separation distances 6)Further research is needed at finer scales Watershed or small stream network 7)Need new survey designs for stream networks Capture both coarse and fine scale variation Ensure that hydrologic neighborhoods are represented

Objective Demonstrate how a geostatistical methodology can be used to meet the requirements of the Clean Water Act 1)Predict regional water quality conditions 2)ID the spatial location of potentially impaired stream segments

N 1996 MBSS DOC Data 020 Kilometers

Potential covariates Methods

Potential covariates after initial model selection (10) Methods

Fit geostatistical models Two distance measures: SLD and WAHD Restricted model space to all possible linear models 1024 models per set (2 10 models) Parameter Estimation Maximized the profile log-likelihood function Methods

Spatial AICC (Hoeting et al., in press) Model selection within distance measure & autocorrelation function Model selection between distance measure & autocorrelation function Cross-validation method using Universal kriging algorithm –312 predictions MSPE Model comparison: r 2 for the observed vs. predicted values

Results SLD models performed better than WAHD Exception: Spherical model Best models: SLD Exponential, Mariah, and Rational Quadratic models r 2 for SLD model predictions Almost identical Further analysis restricted to SLD Mariah model Exponential Spherical Mariah Hole EffectLinear with Sill Rational Quadratic Autocorrelation Function MSPE

Results Covariates for SLD Mariah model: WATER, EMERGWET, WOODYWET, FELPERC, & MINTEMP Positive relationship with DOC: WATER, EMERGWET, WOODYWET, MINTEMP Negative relationship with DOC FELPERC

Cross-validation interval: 95% of regression coefficients produced by leave-one-out cross validation procedure Narrow intervals Few extreme regression coefficient values –Not produced by common sites –Covariate values for the site are represented in observed data –Not clustered in space Cross-validation intervals for Mariah model regression coefficients Model coefficients represent change in log10 DOC per unit of X

r 2 Observed vs. Predicted Values n = 312 sites r 2 = 0.72 1 influential site r 2 without site = 0.66

Squared Prediction Error (SPE) Model Fit

SLD models more accurate than WAHD models Landscape-scale covariates were not restricted to watershed boundaries –Geology type –Temperature –Wetlands & water Discussion

Regression Coefficients Narrow cross-validation intervals Spatial location of the sites not as important as watershed characteristics Extreme regression coefficient values Not produced by common sites Not clustered in space Local-scale factor may have affected stream DOC Point source of organic waste Discussion

North and east of Chesapeake Bay - large SPE values Naturally acidic blackwater streams with elevated DOC Not well represented in observed dataset –2 blackwater sites Geostatistical model unable to account for natural variability –Large square prediction errors –Large prediction variances Spatial Patterns in Model Fit SPE values

West of Chesapeake Bay - low SPE values Due to statistical and spatial distribution of observed data –Regression equation fit to the mean in the data –Most observed sites = low DOC values Less variation in western and central Maryland –Neighboring sites tend to be similar Separation distances shorter in the west –Short separation distances = stronger covariances Spatial Patterns in Model Fit SPE values

What caused abrupt differences? Point sources of organic pollution –Not represented in the model Non-point sources of pollution –Lumped watershed attributes are non-spatial –Differences due to spatial location of landuse are not represented –Challenging to represent ecological processes using coarse- scale lumped attributes –i.e. Flow path of water Model Performance Unable to account for abrupt differences in DOC values between neighboring sites with similar watershed conditions

Generate Model Predictions Prediction sites Study area –1 st, 2 nd, and 3 rd order non-tidal streams –3083 segments = 5973 stream km ID downstream node of each segment –Create prediction site More than one site at each confluence Generate predictions and prediction variances SLD Mariah model Universal kriging algorithm Assigned predictions and prediction variances back to stream segments in GIS

DOC Predictions (mg/l)

Weak Model Fit

Strong Model Fit

Water Quality Attainment by Stream Kilometers Threshold values for DOC Set by Maryland Department of Natural Resources High DOC values may indicate biological or ecological stress

Implications for Water Quality Monitoring Can be used to provide an estimate of regional stream DOC values Cannot ID point sources of organic pollution 1)One geostatistical model can be used to predict DOC in stream segments throughout a large area 2)Tradeoff between cost-efficiency and model accuracy Western Maryland Can be described using a single geostatistical model Eastern and northeastern Maryland Accept poor model fit Collect additional survey data for regional geostatistical model Develop a separate geostatistical model for eastern Maryland

Implications for Water Quality Monitoring 3)Apply this methodology to other regulated constituents Technical and Regulatory Services Administration within the MDE modifying the NHD –Include water quality standards & stream-use designations by NHD segment Use water quality standards instead of thresholds Categorize predictions into potentially impaired or unimpaired status Report on attainment in stream miles/kilometers

Conclusions 1)Geostatistical models generated more accurate DOC predictions than previous non-spatial models based on coarse-scale landscape data 2)SLD is more appropriate than WAHD for regional geostatistical modeling of DOC at this time 3)Adds value to existing water quality monitoring efforts Used to comply with the CWA more easily Additional field sampling is not necessary Inferences about regional stream condition can be generated It can be used to identify the spatial location of potentially impaired stream segments

4)Model predictions and prediction variances Allow additional field efforts to be concentrated in –Areas with large amounts of uncertainty –Areas with a greater potential for water quality impairment 5)Model results can be displayed visually Allows professionals to communicate results to a wide variety of audiences Conclusions

Thank You! Advisors: Dave Theobald and Melinda Laituri Committee Members: Will Clements and Brian Bledsoe Collaborators: N. Scott Urquhart, Jay M. Ver Hoef, and Andrew A. Merton Team Theobald: Grant Wilcox, John Norman, Nate Peterson, and Melissa Sherburne Dennis Ojima and Keith Paustian Family and friends My husband Nate

Questions?

Erin Peterson Geosciences Department Colorado State University Fort Collins, Colorado Predicting Water Quality Impaired Stream Segments using Landscape-scale.

Similar presentations

Presentation on theme: "Erin Peterson Geosciences Department Colorado State University Fort Collins, Colorado Predicting Water Quality Impaired Stream Segments using Landscape-scale."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Erin Peterson Geosciences Department Colorado State University Fort Collins, Colorado Predicting Water Quality Impaired Stream Segments using Landscape-scale.

Similar presentations

Presentation on theme: "Erin Peterson Geosciences Department Colorado State University Fort Collins, Colorado Predicting Water Quality Impaired Stream Segments using Landscape-scale."— Presentation transcript:

Similar presentations

About project

Feedback