Download presentation
Presentation is loading. Please wait.
Published byJenifer Spry Modified over 10 years ago
2
Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper
3
Outline I. Water quality data: program design (CEZ, 15 min) II. Characteristics of water-quality data (CEZ, 15 min) III. Describing water quality(GIH, 30 min) IV. Data analysis for making decisions A, Compliance with numerical standards (EPS, 45 min) Dinner Break B, Locational / temporal comparisons (cause and effect) (EPS, 45) C, Detection of water-quality trends (GIH, 60 min)
4
III. Describing water quality (GIH, 30 min) Rivers and streams are an essential component of the biosphere Rivers are alive Life is characterized by variation Statistics is the science of variation Statistical Thinking/Statistical Perspective Thinking in terms of variation Thinking in terms of distribution
5
The present problem is multivariate WATER QUALITY as a function of TIME, under the influence of co-variates like FLOW, at multiple LOCATIONS
6
WQ variable versus time Time in Years Water Variable
7
Bear Creek below Town of Wise STP
8
Univariate WQ Variable Time Water Quality
9
Univariate WQ Variable Time Water Quality
10
Univariate WQ Variable Time Water Quality
11
Univariate WQ Variable Time Water Quality
12
Univariate WQ Variable Time Water Quality
13
Univariate WQ Variable Time Water Quality
14
Univariate WQ Variable Time Water Quality
15
Univariate WQ Variable Water Quality
16
Univariate WQ Variable Water Quality
17
Univariate WQ Variable Water Quality
18
Univariate WQ Variable Water Quality
19
Univariate WQ Variable Water Quality
20
Univariate Perspective, Real Data (pH below STP)
21
The three most important pieces of information in a sample: Central Location –Mean, Median, Mode Dispersion –Range, Standard Deviation, Inter Quartile Range Shape –Symmetry, skewness, kurtosis –No mode, unimodal, bimodal, multimodal
22
Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers
23
Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers
24
Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers
25
Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers
26
Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers
27
Central Location: Sample Mean (Sum of all observations) / (sample size) Center of gravity of the distribution depends on each observation therefore sensitive to outliers
28
Central Location: Sample Median Center of the ordered array I.e., the (0.5)(n + 1) observation in the ordered array. If sample size n is odd, then the median is the middle value in the ordered array. Example A: 1, 1, 0, 2, 3 Order: 0, 1, 1, 2, 3 n = 5, odd (0.5)(n + 1) = 3 Median = 1 If sample size n is even, then the median is the average of the two middle values in the ordered array. Example B: 1, 1, 0, 2, 3, 6 Order: 0, 1, 1, 2, 3, 6 n = 6, even, (0.5)(n + 1) = 3.5 Median = (1 + 2)/2 = 1.5
29
Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers
30
Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers
31
Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers
32
Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers
33
Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers
34
Central Location: Sample Median Center of the ordered array depends on the magnitude of the central observations only therefore NOT sensitive to outliers
35
Central Location: Mean vs. Median Mean is influenced by outliers Median is robust against (resistant to) outliers Mean moves toward outliers Median represents bulk of observations almost always Comparison of mean and median tells us about outliers
36
Dispersion Range Standard Deviation Inter-quartile Range
37
Dispersion: Range Maximum - Minimum Easy to calculate Easy to interpret Depends on sample size (biased) Therefore not good for statistical inference
38
Dispersion: Standard Deviation 05 +1 SD = 1 0 05 -2 +2 SD = 2 12 1 3
39
Dispersion: Properties of SD SD > 0 for all data SD = 0 if and only if all observations the same (no variation) Familiar Intervals for a normal distribution, –68% expected within 1 SD, –95% expected within 2 SD, –99.6% expected within 3 SD, –Exact for normal distribution, ballpark for any distn For any distribution, nearly all observations lie within 3 SD
40
Interpretation of SD n = 200 SD = 0.41 Median = 7.6 Mean = 7.6
41
Quartiles, Percentiles, Quantiles, Five Number Summary, Boxplot Maximum4 th quartile100 th percentile1.00 quantile 3 rd quartile75 th percentile0.75 quantile Median2 nd quartile50 th percentile0.50 quantile 1st quartile25 th percentile0.25 quantile Minimum0 th quartile0 th percentile0.00 quantile
42
Quartiles (undergrad classes) E.g., Sample: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 RankValue 105.1Maximum 93.9 83.83 rd Quartile 73.8 62.3 Median2 nd Quartile 52.2 40 301 st Quartile 20.4 13.1Minimum Note: Quartiles Q 0, Q 1, Q 2, Q 3, Q 4, = Quantiles Q 0.00, Q 0.25, Q 0.50, Q 0.75, Q 1.00
43
5-Number Summary and Boxplot (undergrad perspective) MinQ1Q2Q3Max 3.100.002.253.805.10
44
Terminology Warning: Quartiles, a.k.a. Percentiles, a.k.a. Quantiles Note: Quartiles Q 0, Q 1, Q 2, Q 3, Q 4, = Quantiles Q 0.00, Q 0.25, Q 0.50, Q 0.75, Q 1.00 QuartilesPercentilesQuantiles Q 4 = 4 th quartile = Max= 100 th percentile= Q 1.00 = 1.00 quantile Q 3 = 3 rd quartile= 75 th percentile= Q 0.75 = 0.75 quantile Q 2 = 2 nd quartile = Med= 50 th percentile= Q 0.50 = 0.50 quantile Q 1 = 1 st quartile= 25 th percentile= Q 0.25 = 0.25 quantile Q 0 = 0 th quartile = Min= 0 th percentile= Q 0.00 = 0.00 quantile
45
Terminology Warning: But Percentiles and Quantiles are more general Note: Quartiles Q 0, Q 1, Q 2, Q 3, Q 4, = Quantiles Q 0.00, Q 0.25, Q 0.50, Q 0.75, Q 1.00 QuartilesPercentilesQuantiles Q 4 = 4 th quartile = Max= 100 th percentile= Q 1.00 = 1.00 quantile 95 th percentile= Q 0.95 = 0.95 quantile Q 3 = 3 rd quartile= 75 th percentile= Q 0.75 = 0.75 quantile 60 th percentile= Q 0.60 = 0.60 quantile Q 2 = 2 nd quartile = Med= 50 th percentile= Q 0.50 = 0.50 quantile 34 th percentile= Q 0.34 = 0.34 quantile Q 1 = 1 st quartile= 25 th percentile= Q 0.25 = 0.25 quantile 2.5 th percentile= Q 0.025 = 0.025 quantile Q 0 = 0 th quartile = Min= 0 th percentile= Q 0.00 = 0.00 quantile
46
Quantile Location and Quantiles Quantile RankQuantile LocationQuartile 0.75 = 3/4 0.50 = 2/4 0.25 = 1/4 ValueRank 5.110 3.99 3.88 7 2.36 2.25 04 03 0.42 3.11 Minimum = 3.1 Maximum = 5.1 E.g., Sample: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10
47
Quantile Location and Quantiles by weighted averages (graduate classes) Example: Find the 20 th percentile of the sample above. Step 1: q = 0.20, n =10 L = 0.20(10 + 1) = 2.2 indicating the 2.2 th observation in the ordered array. Step 2: Therefore the 0.20 quantile is a weighted average of the 2 nd and 3 rd observations in the ordered array, which are a = 0.4, b = 0 and the weight is w = 0.2 Q = -0.4 + 0.2(0 – (– 0.4)) = – 0.40 + 0.08= – 0.32 E.g., Sample: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10
48
Quantile Location and Quantiles by weighted averages (graduate classes) Step 2: a = 0.4, b = 0, w = 0.2 Q = a + w(b – a) = – 0.4 + 0.2(0 – (– 0.4)) = – 0.4 + 0.2(0.4) = – 0.40 + 0.08 = – 0.32 E.g., Sample: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 – 0.4 0 0.4 – 0.32
49
Quantile Location and Quantiles Example: 0, 3.1, 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 ValueRank 5.110 3.99 3.88 7 2.36 2.25 04 03 0.42 3.11 Quantile rank, q Quantile Location, LQuantile, Q Common Name 1.00n = 105.1Maximum 0.75 0.75(10+1) = 8.25 3.8+0.25(3.9 3.8) = 3.825 3 rd Quartile 0.50 0.5(10+1) = 5.5 2.2+0.5(2.3 2.2) = 2.25 Median, or 2 nd Quartile 0.25 0.25(10+1) =2.75 0.4+0.75[0 (0.4)] = 0.1 1 st Quartile 0.0013.1Minimum
50
5-Number Summary and Boxplot using weighted averages for quantiles MinQ1Q2Q3Max 3.100.102.253.8255.10 Note slightly different results by using weighted averages.
51
Dispersion: IQR Inter-Quartile Range (3rd Quartile - (1st Quartile) Robust against outliers
52
Interpretation of IQR n = 200 SD = 0.41 Median = 7.6 Mean = 7.6 IQR = 0.54 For a Normal distribution, Median 2 IQR includes 99.3%
53
Shape: Symmetry and Skewness Symmetry mean bilateral symmetry
54
Shape: Symmetry and Skewness Symmetry mean bilateral symmetry Positive Skewness (asymmetric tail in positive direction)
55
Shape: Symmetry and Skewness Symmetry mean bilateral symmetry, skewness = 0 Mean = Median (approximately) Positive Skewness (asymmetric tail in positive direction) Mean > Median Negative Skewness (asymmetric tail in negative direction) Mean < Median Comparison of mean and median tells us about shape
56
Bear Creek below Town of Wise STP
57
Outlier Box Plot Outliers Whisker Median 75th %-tile = 3rd Quartile 25th %-tile = 1st Quartile IQR
58
Wise, VA, below STP pH TKN mg/l
59
Wise, VA below STP DO (% satur) BOD (mg/l)
60
Wise, VA below STP Tot Phosphorous (mg/l Fecal Coliforms
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.