Developing O/E (Observed-to-Expected) Models for Assessing Biological Condition Chuck Hawkins Western Center for Monitoring and Assessment of Freshwater.

Slides:



Advertisements
Similar presentations
O/E: a standardized way to make site-specific assessments of biological condition Chuck Hawkins Western Center for Monitoring and Assessment of Freshwater.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
An Introduction to Multivariate Analysis
Objectives 10.1 Simple linear regression
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Inference for Regression
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Lec 12: Rapid Bioassessment Protocols (RBP’s)
Introduction to Regression Analysis
Experimental Uncertainties: A Practical Guide What you should already know well What you need to know, and use, in this lab More details available in handout.
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Chapter 10 Simple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Intro to Statistics for the Behavioral Sciences PSYC 1900
Brian Hemsley- Flint B.Sc. C.Biol. M.I.Biol. Northeast Region Ecology Team Leader.
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Lorelei Howard and Nick Wright MfD 2008
Simple Linear Regression Analysis
Standard error of estimate & Confidence interval.
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 24 Statistical Inference: Conclusion.
Macroinvertebrate Bioassessment Tools Aquatic Life/Nutrient Workgroup August 11, 2008.
Multivariate Data Analysis Chapter 8 - Canonical Correlation Analysis.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Comparability of Biological Assessments Derived from Predictive Models of Increasing Geographic Scope Peter Ode California Department of Fish and Game.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Multivariate Data Analysis CHAPTER seventeen.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Watershed Hydrology Modeling: What is Considered Calibrated? Presented by: Jeremy Wyss, HIT Tetra Tech Presented by: Jeremy Wyss, HIT Tetra Tech 27 th.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
1 G Lect 7M Statistical power for regression Statistical interaction G Multiple Regression Week 7 (Monday)
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Inference for 2 Proportions Mean and Standard Deviation.
CWWUC Presentation April 8, 2009 Application of the Integrated Impact Analysis Tool.
Fish O/E Modeling Aquatic Life/Nutrient Workgroup August 11, 2008.
National Monitoring Conference May 7-11, 2006
Environmental Assessment and Sustainability CIV913 BIOLOGICAL ASSESSMENT of River Water Quality Assessing the biological quality of fresh waters : Wright,
Chapter 8: Simple Linear Regression Yang Zhenlin.
Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Results from the Downscaling Needs Assessment Survey April 2011 Sarah Trainor Courtesy of Tony Weyiouanna Sr. & Dave Atkinson.
Results: How to interpret and report statistical findings Today’s agenda: 1)A bit about statistical inference, as it is commonly described in scientific.
Stat 31, Section 1, Last Time Distribution of Sample Means –Expected Value  same –Variance  less, Law of Averages, I –Dist’n  Normal, Law of Averages,
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Monitoring and Estimating Species Richness Paul F. Doherty, Jr. Fishery and Wildlife Biology Department Colorado State University Fort Collins, CO.
Using Regional Models to Assess the Relative Effects of Stressors Lester L. Yuan National Center for Environmental Assessment U.S. Environmental Protection.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Review Statistical inference and test of significance.
National Water Quality Monitoring Conference Session E4 April 29, 2014 Jonathan Witt & Britta Bierwagen, ORD/USEPA Jen Stamp & Anna Hamilton, Tetra Tech,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Predicting effects of climate change on stream ecosystems in the conterminous United States: results from a pilot study in California Charles P. Hawkins.
Methods of Presenting and Interpreting Information Class 9.
Statistical Inference
Aquatic, Watershed, and Earth Resources
26134 Business Statistics Week 5 Tutorial
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Combinations (= multimetrics)
Sampling and Power Slides by Jishnu Das.
Product moment correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Introductory Statistics
Presentation transcript:

Developing O/E (Observed-to-Expected) Models for Assessing Biological Condition Chuck Hawkins Western Center for Monitoring and Assessment of Freshwater Ecosystems Utah State University 11 May 2006 National Water Quality Monitoring Council 5th National Monitoring Conference San Jose, California

Content of Short Course O/E as a concept. E: simple idea, not so easy to estimate. Sampling, probabilities of capture, and E. E as a function of taxon-specific probabilities of capture. Predicting E: it only hurts for a while! Errors, inferences, and two types of assessments. O/E and the WSA – understanding the numbers.

What is O/E? O/E is a measure of the taxonomic completeness of the biological community observed at a site E = 8 taxaO = 3 taxa O/E 0.38

O/E Allows Comparison of “Apples” and “Oranges”

O/E is a Site-Specific, Standardized Measure of Biodiversity Loss 0.70 O = 7 E = 10 O = 21 E = 30

E: simple idea, not so easy to estimate: Accurately and precisely describing the biota expected in different waterbodies.

Hypothetical Variation in Probabilities of Capture of Several Taxa Along One Natural Environmental Gradient Temperature Probability of Capture The challenge is compounded because taxa pc’s are simultaneously controlled by several natural factors.

A Segue into the Messy Issue of Probabilities Sampling means uncertainty! Sampling error, probabilities of capture, and E.

Sampling Effort Number of Taxa Field Sample Complete Census Lab sub-sample The actual composition and number of taxa in any given sample will have a random component.

E as a function of taxon-specific probabilities of capture Although E = 8 taxa is a true statement, the picture of distinct composition is misleading. The real composition associated with E is actually a bit fuzzy. PC = 1PC = 0.8PC = 0.5 PC = 0.2

Sampling and Probabilities of Capture Taxon Replicate Sample NumberFreq (pc) Baetis**********1.0 Perla********0.8 Corixa*****0.5 Drunella*****0.5 Epeorus*0.1 Sp Count E = ∑ pc =  number of taxa / sample = 2.9.

Calculating O/E is simple…., if we can estimate probabilities of capture.

How O/E is Calculated: Sum of taxa pc’s estimates the number of taxa (E) that should be observed given standard sampling. TaxonpcO Atherix0.92● Baetis0.86● Caenis0.70 Drunella0.63 Epeorus0.51● Farula0.32 Gyrinus0.07 Hyalella0.00 E4.013 O/E = 3 / 4.01 = 0.75 O2O2 O3O3 ● ● ●● ●● ● 33

Predicting E: it only hurts for a while! Two basic approaches: –Model many individual species (logistic regression models) and then combine the many predictions. –Model a few assemblage types and then ‘back out’ probabilities of capture for individual species. –We do the latter.

Yes, explaining how E is predicted can be a little complicated. “In layman’s terms? I’m afraid I don’t know any layman’s terms.” Mc 2 x  R × m 3 kg -1 s -2 E =

The basic approach to estimating pc’s from predictions of assemblage type was worked out several years ago. Moss, D., M. T. Furse, J. F. Wright, and P. D. Armitage The prediction of the macro- invertebrate fauna of unpolluted running-water sites in Great Britain using environmental data. Freshwater Biology 17: Empirical modeling that derives predictions of the probability of capturing a species at a new location from observations at ‘reference’ sites. A primer is on our web page:

Three Major Steps in Estimating E 1.Classify reference sites based on their biological similarity. 2.Predict the class of a new site from environmental attributes with a discriminant functions model. 3.Weight frequencies of occurrences of taxa within classes by the site’s probabilities of class membership to estimate pc’s and then E.

Classifying Reference Sites (sites within classes are seldom spatially clustered) Cool Water, High Elevation Warm Water, Low Elevation Limestone Watershed Granite Limestone Spring Stream Small Medium Small Big Medium A B C D E F G H Environmental features associated with biologically defined classes.

Hydropsyche 100% Caenis 95% Baetis 90% Tricorythodes 80% Drunella grandis 70% Baetis 100% Drunella grandis 85% Arctopsyche80% Neophylax75% Optioservus70% Baetis 95% Epeorus 90% Simulium 90% Arctopsyche 75% Zapada70% We could use these numbers to estimate E at a new site belonging to one of these stream ‘types’, but… what if the site is ‘medium-big’, etc.? A B C Small Big Medium Granite

Class A Class B Class C Class D Discriminant Analysis Biologically Defined Reference Classes: Discriminant Model Reference Site Predictor Variables: Catchment Area Geology Latitute Longitude Elevation etc. Discriminant Functions Models Classify New Sites in Terms of Their Probabilities of Class Membership

Discriminant Model Predictor Variables Values A B C D Probability of Taxon Being in a Sample if the Site is in Reference Condition = 0.38 Frequency of Taxon in Class Probability of Class Membership By Weighting Taxon Frequencies of Occurrence within a Class by the Probabilities of Class Membership, We Can Estimate Individual Taxon Probabilities of Capture Class Contribution to PC

The model estimates the pc’s of every taxon (i.e., those observed in at least one reference site sample) at every assessed site, not just a few as shown here for illustration. Also, if a taxon is predicted to have a pc of zero, it does not contribute to O! O/E = 3 / 4.07 = 0.74 TaxonpcO Atherix0.70* Baetis0.92* Caenis0.86 Drunella0.63 Epeorus0.51* Farula0.38 Gyrinus0.07 Hyalella0.00* E4.073

Errors, inferences, and two types of assessments. Model error. Inferring site condition. Inferring regional conditions.

Need to Estimate Prediction Error for Site Assessments Is a site with O/E = 0.8 impaired? E O 1 O/E

Statistical Issues Regarding Inferences of Impairment (Single Samples) Statistical Hypothesis: Is the observed O/E value for a single sample from the same distribution of values estimated for reference sites, i.e., the site is either equivalent to reference or not. We should ideally set a threshold value to balance Type I and II errors. Easy to set Type I error, but Type II errors are problematic. 10 th and 90 th percentiles of reference site values have been used.

How Good can a Model Be? SD of O/E values calculated at reference quality sites is a measure of overall model error. –Part sampling error –Part prediction error (random and systematic) A model can be no more precise than random sampling error. A model should be no worse than a null model – i.e., assume all sites have similar biota.

For Regional Assessments, We Want to Compare the Distribution of Observed O/E Values Among Sites with the Expected Distribution 1 O/E Expected if All Sites are in Reference Condition Actual Distribution 35% 40% 25% Stream Miles Fair Poor Ref

Statistical Issues Regarding Inferences of Impairment (Multiple Sites and Replicated Samples at a Site) Statistical Hypothesis: is the observed mean different from 1 (the reference mean)? This test allows us to ask questions regarding how impaired a site or population of sites is. Sensitivity of the test is a function of model precision and sample size. Methods for balancing Type I and II errors are well worked out. Replicate samples at a site allow estimation of confidence limits around estimates of O/E.

O/E and the WSA – understanding the numbers. WSA reference sites How many models? Reference site classes Model predictors Model performance Assessment results

1097 Reference Sites in Three Super-Ecoregions WEST PLAINS EASTERN HIGHLANDS

Sample Sizes WESTPLAINSEAST Calibration Validation

Great variability in geographic distribution of sites within classes. Western Model used 30 classes of streams for modeling Graphic courtesy of Pete Ode.

Predictor Variables WestPlainsE. Highlands Longitude---Longitude Elevation --- Day of Year Basin Area Stream Slope --- Air TemperatureFreeze-Free DaysAir Temperature Log Precipitation Wet Days

Variation in Predictor Variable Values Within and Among the Eastern Highland Reference Site Classes

Model Performance ValidationWestPlainsE. Highlands Mean SD (model) SD (null) Test Sites Mean

Sites sampled for the Wadeable Streams Assessment by EPA Region.

Biodiversity status of the Nation’s streams as measured by O/E. Data summarized as % of stream miles in each of 4 O/E classes.

Concluding Remarks O/E has an intuitive biological meaning. It means the same thing everywhere. Its derivation and interpretation are independent of type and knowledge of stressors in the region. It is quantitative, but….

Our interpretations of assessments are still only as good as our understanding of aquatic ecosystems.