1 - COURSE 4 - DATA HANDLING AND PRESENTATION UNESCO-IHE Institute for Water Education Online Module Water Quality Assessment.

Slides:



Advertisements
Similar presentations
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.
Advertisements

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 10 th Edition.
8-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft.
Copyright ©2011 Pearson Education 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft Excel 6 th Global Edition.
Need to know in order to do the normal dist problems How to calculate Z How to read a probability from the table, knowing Z **** how to convert table values.
BPS - 5th Ed. Chapter 171 Inference about a Population Mean.
Statistics for Managers Using Microsoft® Excel 7th Edition
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics, A First Course.
7.1 Lecture 10/29.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.
AM Recitation 2/10/11.
Statistical Analysis Statistical Analysis
Confidence Interval Estimation
+ DO NOW What conditions do you need to check before constructing a confidence interval for the population proportion? (hint: there are three)
Slide 23-1 Copyright © 2004 Pearson Education, Inc.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 9: Testing a Claim Section 9.3a Tests About a Population Mean.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Confidence Interval Estimation
BPS - 5th Ed. Chapter 171 Inference about a Population Mean.
CHAPTER 18: Inference about a Population Mean
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
When σ is Unknown The One – Sample Interval For a Population Mean Target Goal: I can construct and interpret a CI for a population mean when σ is unknown.
1 Happiness comes not from material wealth but less desire.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.3 Estimating a Population Mean.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Manijeh Keshtgary Chapter 13.  How to report the performance as a single number? Is specifying the mean the correct way?  How to report the variability.
BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.
Essential Statistics Chapter 161 Inference about a Population Mean.
Chap 7-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 7 Estimating Population Values.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Chap 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers Using Microsoft Excel 7 th Edition, Global Edition Copyright ©2014 Pearson Education.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Estimating with Confidence Section 11.1 Estimating a Population Mean.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
Inference About Means Chapter 23. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it’d be nice.
+ Unit 5: Estimating with Confidence Section 8.3 Estimating a Population Mean.
+ Z-Interval for µ So, the formula for a Confidence Interval for a population mean is To be honest, σ is never known. So, this formula isn’t used very.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
1 - COURSE APPLICATIONS OF STATISTICS IN WATER QUALITY MONITORING.
CHAPTER 8 Estimating with Confidence
Chapter 8: Estimating with Confidence
Review of Power of a Test
CHAPTER 9 Testing a Claim
Chapter 7 Confidence Interval Estimation
Hypotheses and test procedures
Other confidence intervals
Chapter 8: Estimating with Confidence
CHAPTER 8 Confidence Estimating with Estimating a Population 8.3 Mean
CHAPTER 9 Testing a Claim
Confidence Interval Estimation
Chapter 8: Estimating with Confidence
CHAPTER 18: Inference about a Population Mean
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CHAPTER 9 Testing a Claim
CHAPTER 18: Inference about a Population Mean
CHAPTER 18: Inference about a Population Mean
Chapter 8: Estimating with Confidence
CHAPTER 9 Testing a Claim
Chapter 8: Estimating with Confidence
2/5/ Estimating a Population Mean.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Presentation transcript:

1 - COURSE 4 - DATA HANDLING AND PRESENTATION

UNESCO-IHE Institute for Water Education Online Module Water Quality Assessment

After successful completion of the course, participants will be able to: Understand and apply concepts of water quality and pollution processes in rivers and lakes Understand and apply the different steps of the monitoring cycle in rivers, lakes and groundwater Apply common statistical techniques for water quality data evaluation Design sound and sustainable freshwater quality monitoring and assessment programmes under specified conditions. Objectives of the OLC

4 (Course 4.1.  3.5.) 4.2. STATISTICAL CONCEPTS Peter Kelderman UNESCO-IHE Institute for Water Education Online Course Water Quality Assessment

5 Some water quality questions that can be tackled with “Statistics”: What is the general water quality at a site; is this significantly different from that at another site? Is the water quality improving or getting worse? How do certain variables relate to others? What is a necessary monitoring frequency for a certain desired “reliability”? How accurate are the values of the water quality parameters? Etc., etc.

6 Consult specialised statisticians for processing water quality data sets. They should also strongly be involved in set-up, execution and evaluation of water quality monitoring programmes ! HINT

7 CONTENTS Introduction- one year data of oil refinery effluents Normal and non-normal distributions t-test and confidence intervals Non-parametric statistics

8 CONTENTS Introduction- one year data of oil refinery effluents Normal and non-normal distributions t-test and confidence intervals Non-parametric statistics

9 LEVELS OF OIL-IN-WATER: 1 YEAR OF DATA How to process these 284 data??

10 Graph: yields trend + maximum/minimum

11 FREQUENCY TABLE OF THE DATA Conclusions: Most data grouped around mg/L Split up into intervals of e.g. 5 mg/L “Outliers” towards higher values

12 HISTOGRAM OF DATA Data “skewed” to the right  Slightly “non-normal distribution”

13 CONTENTS Introduction- one year data of oil refinery effluents Normal and non-normal distributions t-test and confidence intervals Non-parametric statistics

14 Normal distribution: symmetric around average (“Gauss curve”) Non- normal: “skewed”, may have two peaks,……. Normal/non-normal distributions

15 Testing for Normality : Visually: “Gauss curve”? Graphically, using “probability paper”; see graph Using rigid statistical tests such as “Chi square test” See book Chapman, Chapter 10

16 Many data sets form a more or less normal distribution; However, especially water quality data may deviate from that! With the help of normal distributions, many common statistical tests can be performed (t-test, 95% confidence intervals….) Non-normal distributions can often be “transformed” to normal distributions, e.g. by log-log transformation. NORMAL DISTRIBUTIONS ARE USEFUL…. :

17 x avg. = average (= mean) value x i = individual observations n= number of samples NORMAL DISTRIBUTION Bell-shaped “Gauss curve” Bending points: + 1 standard deviation away from average Standard deviation: Use EXCEL !

18 EXAMPLE (USING EXCEL) “Variance” = s x 2 (Here: = 2.77) “Standard error” (of the mean): s x /  n (Here: = 0.52) “Coefficient of variation” (cv = s x /x avg. )*100% (Here: = 41%)

19 In this Course 4 we will also do a small exercise using EXCEL.

20 x avg + s x : 68% of all values x avg + 2 s x : 95% of all values x avg + 3s x : 99.7% of all values Standard deviation gives good idea of spread data around the average; for normal distributions: STANDARD DEVIATION

21 CONTENTS Introduction- one year data of oil refinery effluents Normal and non-normal distributions t-test and confidence intervals Non-parametric statistics

22 The Student t-test Invented by Gosset ( ), under pseudonym “Student” An average value alone does not say too much; how sure are you of this value ? In environmental science and management, often, we would like to have a “95% confidence” (or maybe 90%, or 99%). In other words: a 5% chance that we are wrong in our statement. We have to take into account a relatively large uncertainty for small data sets (n<10)  see t values.

23 For n samples/observations with mean value x avg and standard deviation s x, there is a 95 % chance that the average value lies between: X avg ± t s x /√n Example: For 10 Cl - determinations, x avg = 99.0 mg/L; s x = 2.0 mg/l; t= 2.3 Then x avg., 95% = (2.3 * 2.0/√10) = mg/L 95% CONFIDENCE INTERVAL FOR DATA SET

24 Case 2: n=100 X avg. = 200 s x = 5.0 t = 1.96  95% C.I. = *5/  100 = X avg ± t s x /√n  "Reliability" increases with (at least) the square root of the number of samples Case 1: n=10 X avg. = 200 s x = 5.0 t = 2.3  95% C.I. = *5/  10 =

25 t- VALUES

26 SIGNIFICANCE LEVELS Significance level  = 1- (% confidence)/100) So for 95% confidence:  = 0.05 We could also use e.g.: 90% confidence limit:  = % confidence limit:  = 0.01  also written as: p If you are for more than 90% sure, then p<0.1 More than 95% sure: p 99% sure: p< 0.01.

27 t- VALUES “Degrees of freedom” = n-1 (for one parameter)  = p = 0.05  = p= 0.01

Degrees of freedom......? Think of having 10 objects, of which you must pick one At the start, you can choose out of 10 objects.. Then out of 9.. With two objects left, there is still a choice: 1 out of 2 But with one object left, there is no more “freedom of choice” So in this case: degrees of freedom = n-1

29 What would be the t value for: 10 observations, α= 0.05? 20 observations, α = 0.01? 25 observations, α = 0.001? Let’s practise a bit with the t-table !

30 Please check yourself: 10 observations, α= 0.05?  t = observations, α = 0.01  t = observations, α = 0.001?  t = 3.75

31 For 10 Cl - determinations, x avg = 99.0 mg/L; s x = 2.0 mg/l; The t-value is now: 3.3 Then x avg., 99% = (3.3 * 2.0/√10) = mg/L (So a wider range than for the 95% C.I. = mg/L) So if we would use e.g. 99% Confidence interval for the earlier example? X avg ± t s x /√n

32 ONE-SIDED OR TWO-SIDED (TAILED) Two-sided (two-tailed): both left and right hand side are important; then read top row α-values One-sided: only one side important; then read bottom row α-values

EXAMPLE ; 1-TAILED TEST In a river, five oxygen values have been measured, with average [O 2 ]= 6.0 mg/L and s x = 1.1 mg/L. What is upper 95% confidence limit (C.L.) for this O 2 value? n=5; so df = 4  t = 2.13 bottom row O 2 avg.  t*s x /  n  6.0 – (2.13*1.1/  5)   O 2 avg  4.95 mg/L (95% C.L.) 24

34 CONTENTS Introduction- one year data of oil refinery effluents Normal and non-normal distributions t-test and confidence intervals Non-parametric statistics

35 NON-PARAMETRIC STATISTICS These dont make use of frequency distribution + related tests. E.g.: “Range”; 50% value (median); 90 th percentile Especially suitable for non-normal distributions (holding for many water quality data). However can also be used for normal distributions. Advantages: easier to compute and understand; more resistant towards “outliers”, etc.

36 Sort the data in increasing order Range: Median: 218 (50% is smaller; 50% is larger than 218) EXAMPLE:CONDUCTIVITY IN WATER SAMPLES

37

See Book Chapman, Ch. 10: 75 th percentile P 75 (75% of observations is smaller): Rank of P 75 = (75/100)*(N+1) = (0.75)(28) = 21 So observation 21 : 305 is the 75th percentile 90 th percentile P 90 (90% of observations is smaller): Rank of P 90 = (90/100)(N+1) = (0.9)(28) = 25.2 The percentile lies in between #25 and #26; by interpolation: P 90 =(395*0.8)+396(0.2) = 395 Check that 25 th percentile P 25 = 174.

39 EXAMPLE: OIL REFINERY DATA (n = 284) Put data in Table with frequency for each value

40 Median = 12.5 mg/L 90 th percentile = 36 mg/L A discharge permit could then stipulate: - “Not allowed to exceed 12.5 mg/L oil in > 50% of the time” - “Not allowed to exceed 36 mg/L oil in > 10% of the time” Data given as “cumulative percentage”; for example =24/284 =8.5%  8.5% ≤ 3mg/L

41 EXAMPLE FROM GEMS PROGRAMME Note: “flagged values”: below detection limit or out of range

42