Download presentation
Presentation is loading. Please wait.
Published byElfreda Lawrence Modified over 8 years ago
2
1 - COURSE 4 - DATA HANDLING AND PRESENTATION
3
UNESCO-IHE Institute for Water Education Online Module Water Quality Assessment
4
After successful completion of the course, participants will be able to: Understand and apply concepts of water quality and pollution processes in rivers and lakes Understand and apply the different steps of the monitoring cycle in rivers, lakes and groundwater Apply common statistical techniques for water quality data evaluation Design sound and sustainable freshwater quality monitoring and assessment programmes under specified conditions. Objectives of the OLC
5
4 (Course 4.1. 3.5.) 4.2. STATISTICAL CONCEPTS Peter Kelderman UNESCO-IHE Institute for Water Education Online Course Water Quality Assessment
6
5 Some water quality questions that can be tackled with “Statistics”: What is the general water quality at a site; is this significantly different from that at another site? Is the water quality improving or getting worse? How do certain variables relate to others? What is a necessary monitoring frequency for a certain desired “reliability”? How accurate are the values of the water quality parameters? Etc., etc.
7
6 Consult specialised statisticians for processing water quality data sets. They should also strongly be involved in set-up, execution and evaluation of water quality monitoring programmes ! HINT
8
7 CONTENTS Introduction- one year data of oil refinery effluents Normal and non-normal distributions t-test and confidence intervals Non-parametric statistics
9
8 CONTENTS Introduction- one year data of oil refinery effluents Normal and non-normal distributions t-test and confidence intervals Non-parametric statistics
10
9 LEVELS OF OIL-IN-WATER: 1 YEAR OF DATA How to process these 284 data??
11
10 Graph: yields trend + maximum/minimum
12
11 FREQUENCY TABLE OF THE DATA Conclusions: Most data grouped around 10-15 mg/L Split up into intervals of e.g. 5 mg/L “Outliers” towards higher values
13
12 HISTOGRAM OF DATA Data “skewed” to the right Slightly “non-normal distribution”
14
13 CONTENTS Introduction- one year data of oil refinery effluents Normal and non-normal distributions t-test and confidence intervals Non-parametric statistics
15
14 Normal distribution: symmetric around average (“Gauss curve”) Non- normal: “skewed”, may have two peaks,……. Normal/non-normal distributions
16
15 Testing for Normality : Visually: “Gauss curve”? Graphically, using “probability paper”; see graph Using rigid statistical tests such as “Chi square test” See book Chapman, Chapter 10
17
16 Many data sets form a more or less normal distribution; However, especially water quality data may deviate from that! With the help of normal distributions, many common statistical tests can be performed (t-test, 95% confidence intervals….) Non-normal distributions can often be “transformed” to normal distributions, e.g. by log-log transformation. NORMAL DISTRIBUTIONS ARE USEFUL…. :
18
17 x avg. = average (= mean) value x i = individual observations n= number of samples NORMAL DISTRIBUTION Bell-shaped “Gauss curve” Bending points: + 1 standard deviation away from average Standard deviation: Use EXCEL !
19
18 EXAMPLE (USING EXCEL) “Variance” = s x 2 (Here: = 2.77) “Standard error” (of the mean): s x / n (Here: = 0.52) “Coefficient of variation” (cv = s x /x avg. )*100% (Here: = 41%)
20
19 In this Course 4 we will also do a small exercise using EXCEL.
21
20 x avg + s x : 68% of all values x avg + 2 s x : 95% of all values x avg + 3s x : 99.7% of all values Standard deviation gives good idea of spread data around the average; for normal distributions: STANDARD DEVIATION
22
21 CONTENTS Introduction- one year data of oil refinery effluents Normal and non-normal distributions t-test and confidence intervals Non-parametric statistics
23
22 The Student t-test Invented by Gosset (1876-1937), under pseudonym “Student” An average value alone does not say too much; how sure are you of this value ? In environmental science and management, often, we would like to have a “95% confidence” (or maybe 90%, or 99%). In other words: a 5% chance that we are wrong in our statement. We have to take into account a relatively large uncertainty for small data sets (n<10) see t values.
24
23 For n samples/observations with mean value x avg and standard deviation s x, there is a 95 % chance that the average value lies between: X avg ± t s x /√n Example: For 10 Cl - determinations, x avg = 99.0 mg/L; s x = 2.0 mg/l; t= 2.3 Then x avg., 95% = 99.0 +(2.3 * 2.0/√10) = 99.0 + 1.5 mg/L 95% CONFIDENCE INTERVAL FOR DATA SET
25
24 Case 2: n=100 X avg. = 200 s x = 5.0 t = 1.96 95% C.I. = 200 + 1.96*5/ 100 = 200 + 1.0 X avg ± t s x /√n "Reliability" increases with (at least) the square root of the number of samples Case 1: n=10 X avg. = 200 s x = 5.0 t = 2.3 95% C.I. = 200 + 2.3*5/ 10 = 200 + 3.6
26
25 t- VALUES
27
26 SIGNIFICANCE LEVELS Significance level = 1- (% confidence)/100) So for 95% confidence: = 0.05 We could also use e.g.: 90% confidence limit: = 0.10 99% confidence limit: = 0.01 also written as: p If you are for more than 90% sure, then p<0.1 More than 95% sure: p 99% sure: p< 0.01.
28
27 t- VALUES “Degrees of freedom” = n-1 (for one parameter) = p = 0.05 = p= 0.01
29
Degrees of freedom......? Think of having 10 objects, of which you must pick one At the start, you can choose out of 10 objects.. Then out of 9.. With two objects left, there is still a choice: 1 out of 2 But with one object left, there is no more “freedom of choice” So in this case: degrees of freedom = n-1
30
29 What would be the t value for: 10 observations, α= 0.05? 20 observations, α = 0.01? 25 observations, α = 0.001? Let’s practise a bit with the t-table !
31
30 Please check yourself: 10 observations, α= 0.05? t = 2.26 20 observations, α = 0.01 t = 2.86 25 observations, α = 0.001? t = 3.75
32
31 For 10 Cl - determinations, x avg = 99.0 mg/L; s x = 2.0 mg/l; The t-value is now: 3.3 Then x avg., 99% = 99.0 +(3.3 * 2.0/√10) = 99.0 + 2.1 mg/L (So a wider range than for the 95% C.I. = 99.0 + 1.5 mg/L) So if we would use e.g. 99% Confidence interval for the earlier example? X avg ± t s x /√n
33
32 ONE-SIDED OR TWO-SIDED (TAILED) Two-sided (two-tailed): both left and right hand side are important; then read top row α-values One-sided: only one side important; then read bottom row α-values
34
EXAMPLE ; 1-TAILED TEST In a river, five oxygen values have been measured, with average [O 2 ]= 6.0 mg/L and s x = 1.1 mg/L. What is upper 95% confidence limit (C.L.) for this O 2 value? n=5; so df = 4 t = 2.13 bottom row O 2 avg. 6.0 - t*s x / n 6.0 – (2.13*1.1/ 5) 6.0 - 1.05 O 2 avg 4.95 mg/L (95% C.L.) 24
35
34 CONTENTS Introduction- one year data of oil refinery effluents Normal and non-normal distributions t-test and confidence intervals Non-parametric statistics
36
35 NON-PARAMETRIC STATISTICS These dont make use of frequency distribution + related tests. E.g.: “Range”; 50% value (median); 90 th percentile Especially suitable for non-normal distributions (holding for many water quality data). However can also be used for normal distributions. Advantages: easier to compute and understand; more resistant towards “outliers”, etc.
37
36 Sort the data in increasing order Range: 118-430 Median: 218 (50% is smaller; 50% is larger than 218) EXAMPLE:CONDUCTIVITY IN WATER SAMPLES
38
37
39
See Book Chapman, Ch. 10: 75 th percentile P 75 (75% of observations is smaller): Rank of P 75 = (75/100)*(N+1) = (0.75)(28) = 21 So observation 21 : 305 is the 75th percentile 90 th percentile P 90 (90% of observations is smaller): Rank of P 90 = (90/100)(N+1) = (0.9)(28) = 25.2 The percentile lies in between #25 and #26; by interpolation: P 90 =(395*0.8)+396(0.2) = 395 Check that 25 th percentile P 25 = 174.
40
39 EXAMPLE: OIL REFINERY DATA (n = 284) Put data in Table with frequency for each value
41
40 Median = 12.5 mg/L 90 th percentile = 36 mg/L A discharge permit could then stipulate: - “Not allowed to exceed 12.5 mg/L oil in > 50% of the time” - “Not allowed to exceed 36 mg/L oil in > 10% of the time” Data given as “cumulative percentage”; for example 2+5+6+11=24/284 =8.5% 8.5% ≤ 3mg/L
42
41 EXAMPLE FROM GEMS PROGRAMME Note: “flagged values”: below detection limit or out of range
43
42
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.