1 - COURSE 4.3 - APPLICATIONS OF STATISTICS IN WATER QUALITY MONITORING.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Hypothesis Testing Steps in Hypothesis Testing:
Correlation and regression Dr. Ghada Abo-Zaid
Statistics and Quantitative Analysis U4320
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Comparing Two Samples: Part II
Linear Regression and Correlation
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Final Review Session.
SIMPLE LINEAR REGRESSION
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Lecture 9: One Way ANOVA Between Subjects
Introduction to Probability and Statistics Linear Regression and Correlation.
Social Research Methods
SIMPLE LINEAR REGRESSION
Data Analysis Statistics. Inferential statistics.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Today Concepts underlying inferential statistics
Relationships Among Variables
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Nonparametric or Distribution-free Tests
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Active Learning Lecture Slides
SIMPLE LINEAR REGRESSION
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
AM Recitation 2/10/11.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Descriptive Statistics e.g.,frequencies, percentiles, mean, median, mode, ranges, inter-quartile ranges, sds, Zs Describe data Inferential Statistics e.g.,
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
1 10 Statistical Inference for Two Samples 10-1 Inference on the Difference in Means of Two Normal Distributions, Variances Known Hypothesis tests.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
ERT 207 ANALYTICAL CHEMISTRY 13 JAN 2011 Lecture 4.
INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA). COURSE CONTENT WHAT IS ANOVA DIFFERENT TYPES OF ANOVA ANOVA THEORY WORKED EXAMPLE IN EXCEL –GENERATING THE.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Ert 207 Analytical chemistry
Chapter 10 The t Test for Two Independent Samples
Data Analysis.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Chapter Eleven Performing the One-Sample t-Test and Testing Correlation.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
MONITORING FREQUENCIES AND OPTIMIZATIONS.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
1 - COURSE OPTIMIZATION PROGRAMMES. Peter Kelderman UNESCO-IHE Institute for Water Education Online Module Water Quality Assessment.
1 - COURSE 4 - DATA HANDLING AND PRESENTATION UNESCO-IHE Institute for Water Education Online Module Water Quality Assessment.
Appendix I A Refresher on some Statistical Terms and Tests.
INF397C Introduction to Research in Information Studies Spring, Day 12
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Social Research Methods
Inferential Statistics
Presentation transcript:

1 - COURSE APPLICATIONS OF STATISTICS IN WATER QUALITY MONITORING

2 APPLICATIONS IN WATER QUALITY MONITORING Peter Kelderman UNESCO-IHE Institute for Water Education Online Module Water Quality Assessment

3 CONTENTS Necessary water quality monitoring frequency Significant differences between two data sets Seasonal trends Linear correlation ANOVA Sign test Trend detection

4 CONTENTS Necessary water quality monitoring frequency Significant differences between two data sets Seasonal trends Linear correlation ANOVA Sign test Trend detection

5 NECESSARY MONITORING FREQUENCY In an annual water quality monitoring programme measuring phosphate (average = 0.20 mg P/L; s x = 0.05), it is required that the average is known, with 95% confidence, within mg P/L distance from this average. How many samples/year must be taken to fulfil this requirement? Assume a normal distribution of data.

6 So: (required max. distance away from average) Or: n = ; for n>10, t  2.2 (see t-table; 2-tailed)  n = (2.2 * 0.05/0.03) 2 = 13 (: monthly intervals) Use formula t-test: 95% confidence interval = X avg ± t s x /√n Above n, t assumption was correct. Suppose that the above calculation would have yielded a result: n=6, then t n=6 = 2.6  n new = (2.6 *0.05/0.03) 2 = 4.3. After a second “iteration” with n=4 (t=3.2), this would have yielded  n = 5.

7 “In an annual water quality monitoring programme measuring phosphate (average = 0.20 mg P/L; s x = 0.05), it is required that the average is known, with 95% confidence, within mg P/L distance from this average”

8 It will be clear that you, as a water quality manager, have two “degrees of freedom” to influence the necessary monitoring frequency (for “fixed” standard devation): 1. The allowable range around the maximum. In above example, if you reduce from 0.03 to 0.01 mg/L, the monitoring frequency must be : 9 times higher (check this !) 2. The level of confidence (90% ; 99%.. Confidence intervals)  see t -table Indeed, this is the normal way of estimating frequencies in water quality monitoring programmes!

9 Example: EU water Framework Directive We discussed items of EU-WFD before; see e.g. Course 3-4 The EU-WFD sets guidelines for required allowable range around average”(“precision”) and confidence for detecting long-term trends Temporarily, lower precision and confidence may be accepted for e.g. socio-economic reasons.

10 Example from EU-WFD: number of river samples/year, necessary to estimate PO 4 concentrations up to 90%, 75% and 50% precision with 90% confidence Allowable range

11 CONTENTS Necessary water quality monitoring frequency Significant differences between two data sets Seasonal trends Linear correlation ANOVA Sign test Trend detection

12 COMPARISON BETWEEN 2 YEARS WQ DATA Significant difference between the 2 years? year 1 year 2

Also for detecting “step trend” after e.g. wastewater treatment 13 [BOD]

14 Factors playing a role: Difference between the averages : d Number of observations : n standard deviation s x This leads to formula for t test value  t test = d/(s x /√n) Look up the value in the t-table, whether or not there is a significant difference between the two Data sets with significance level  (“probability  that the two years are not different”) PROCEDURE d

15 Intermezzo: HYPOTHESIS SETTING

16 In Statistics, you often test a certain hypothesis,e.g.: “there is a significant relationship between two variables” In the present example, it would be: H 0 hypothesis: the two years do not have different averages H 1 hypothesis: the two years do have different averages α: chance that you incorrectly think the years are different (“false positive”); β: chance that you incorrectly do not detect a difference (“false negative”) (We will work only with α here) DecisionREALITY H 0 is trueH 0 is not true Accept H 0 β ("Type 2 error") Reject H 0 α ("Type 1 error")

17 Example: Given two years of water quality data, use the “pooled t- test” to determine if the averages for the two years are significantly different, at 90% and 95% Confidence levels. Year 2: n y = 19 y avg. = 8.2 mg/L s y 2 = 5.4 (mg/L) 2 Year 1: n x = 22 X avg. = 9.4 mg/L s x 2 = 6.2 (mg/L) 2 (n x, n y = number of observations; x avg., y avg. = average values; s x 2, s y 2 = variances of the two data sets)

18 Step1: find out “pooled variance” s w 2 of the two data sets Year 1: n x = 22 X avg. = 9.4 mg/L s x 2 = 6.2 (mg/L) 2 Year 2: n y = 19 y avg. = 8.2 mg/L s y 2 = 5.4 (mg/L) 2 

19 Step 2: Calculate the t test value Year 1: n x = 22 X avg. = 9.4 mg/L s x 2 = 6.2 (mg/L) 2 Year 2: n y = 19 y avg. = 8.2 mg/L s y 2 = 5.4 (mg/L) 2 

= 41 observations  d.f. = = 39 (two independent data sets; for each substract: 1) p = 0.05  t = 2.02 p = 0.1  t = tailed t-test Our t value 1.59 is smaller than above t values, so not significant (p> 0.1)

21 What if we would have found another t value ? t test = Significant on “p<0.1 level”..; now check: t test = :significant; p< 0.05 t test = :significant; p <0.01 t test = : significant; p<0.001

22 CONTENTS Necessary water quality monitoring frequency Significant differences between two data sets Seasonal trends Linear correlation ANOVA Sign test Trend detection

Oxygen contents in small ditch, maximum values (often “supersaturation”; > 100% oxygen) Large seasonal variations will lead to high s w values  not possible to detect trends 23

Kruskal-Wallis test: 1.Divide into four seasons (or 12 months) 2.Rank all data (highest value = rank 1) 3. Sum of ranks for the four seasons (e.g. in above figure: Σ Winter = 580 Σ Spring = 675 Σ Summer =325 Σ Autum = Apply Kruskal-Wallis test (comparable with t-test)  significant differences between seasons? at which level?

25 CONTENTS Necessary water quality monitoring frequency Significant differences between two data sets Seasonal trends Linear correlation ANOVA Sign test Trend detection

26 CORRELATION ANALYSIS Measures strength of association between two independent variables  correlation coefficient r r = 1: 100% positive correlation r = -1: 100% negative correlation r = 0 : no correlation Used for, e.g.: Optimisation of monitoring (reducing frequency, number of variables, of stations) Finding out common sources of pollution Etc.

27 Highly significant correlation chloride – conductivity  Leave out Cl - (routinely)? Less significant correlation conductivity with hardness EXAMPLES

28 Example GEMS programme Correlation between Discharge – alkalinity (log-log transformed !) r 2 : gives % of the variation that can be ascribed to relationship between the two variables (here: 72.3 %).

29 EXAMPLE: Correlation NH 4 -N and N tot. in Kirinya wetland, Uganda Is this t test value significant ? (r-value found with EXCEL; see exercise)

30 2-tailed t-test 12 samples, so d.f. = = 10 (two dependent data sets) Highly significant correlation(p< 0.01) t test = 3.94

31 BE CAREFUL WITH….. -Data clouds with just a few outlyers…. Solution could be: - Leave out outliers (apply test, e.g. Dixon Q test) - RANK correlation (Spearman rank test; see book Chapman) Too many data. For, say, 100 data, already r 2 = 0.04 would give significant (p<0.05) correlation (4% of variation in the values of the variables explained by relationship between the two variables !?) Non-linear trends  apply non-linear regression.

32 Example: relationship light penetration, P content, pH, colour of lakes as f(altitude) (see Håkanson, 2006)

33 EXAMPLE: Correlation between TOC and COD in a wastewater Above graph using EXCEL; r= 0.89 ? (Correlation only for limited range!)

34 CONTENTS Necessary water quality monitoring frequency Significant differences between two data sets Seasonal trends Linear correlation ANOVA Sign test Trend detection

ANOVA: Analysis Of Variance : looks at the variability of data and divides this into: “Between groups variability” (e.g. between different soil types) “Within groups variability” (differences between replicates)  F value ; from its value it can be decided whether or not “groups” are significantly different, at what significance level. Just as for the t-test, the number of degrees of freedom will play a big role. ANOVA In researches, there are often sets of data, in which there are different “groups” (e.g. different soil types). For each group (soil type), we have e.g. data of P adsorption onto the soil. 35

Example: P adsorption capacity q m of five land use types in the Migina catchment, Rwanda. Box plots represent 25 th and 75 th percentile; bars represent minimum and maximum values. Only “wetland” soil is significantly different from the rest (one-way ANOVA* ; p<0.001). * One independent variable 36

37 CONTENTS Necessary water quality monitoring frequency Significant differences between two data sets Seasonal trends Linear correlation ANOVA Sign test Trend detection

38 Another useful (non-parametric) test: sign test It compares pairs of data in data set A and B and determines for how many pairs, Data A > Data B (and for how many pairs: Data A<Data B, or are equal)  Data set A significantly larger/smaller than B? Especially useful if there are large variations in the data set A and B (e.g. seasonal trends)

Example: monitoring sediment resuspension at two stations in Lake Markermeer, Netherlands; this was done by using “sediment traps” hung near the bottom, and at half-depth. STA different from STB? Differences between “bottom” and “half-depth” traps? 39

Very large variations over the season; however data can be compared in pairs, since they were monitored on the same days Comparing bottom traps: resuspension at STB always (n=7) > STA  “values at STB > STA” (p < 0.05) Comparing bottom with “half-depth” resuspension  bottom values always (n=14) > half-depth values  “values at bottom > half depth” (p<0.01) Significance levels: function of n and of “paired differences” (>, <, =)  look up in “sign test table” 40

41 CONTENTS Necessary water quality monitoring frequency Significant differences between two data sets Seasonal trends Linear correlation ANOVA Sign test Trend detection

42 TREND DETECTION See book Chapman, Ch. 10 Many types of trends Simple t-test can be used for trend detection, but also rigid statistical tests; these are outside scope of these lectures.

43 Highly significant decreasing trend BOD from For details, see MTM IV, page Example: BOD trend in 77 New Zealand rivers

44 Trends will more likely be detected for higher monitoring frequency

We have only discussed basic statistical procedures In this Course unit, you will also find an EXCEL exercise, dealing with the topics we have just highlighted. We did not deal with more advanced statistical tools such as multivariate techniques and Cluster analysis; the latter two tecniques will be highlighted in two (optional) presentations.

46 Literature Statistics D. Chapman (1996). Water Quality Assessments - A guide to use of Biota, sediments and water in Environmental monitoring. London, Chapman and Hall. D.C. Montgomery & G.C. Runger (2003). Applied statistics and probability for engineers. New York, Wiley and sons.

47 FURTHER READING Proceedings “Monitoring tailor-made” MTM II, p. 153 (trend assessment New Zealand waters) MTM II, p. 287 (North sea optimization; see Course 3.5) MTM III, p. 113 (Design monitoring programme) MTM III, p. 307 (Trend detection) MTM III, p. 323 (Multivariate techniques) MTM IV, p. 207 (Trend detection; see before)