Download presentation
Presentation is loading. Please wait.
1
BAE 5333 Applied Water Resources Statistics
Biosystems and Agricultural Engineering Department Division of Agricultural Sciences and Natural Resources Oklahoma State University Source Dr. Dennis R. Helsel & Dr. Edward J. Gilroy 2006 Applied Environmental Statistics Workshop and Statistical Methods in Water Resources
2
TEXTBOOK Free on-line at:
3
SOFTWARE Minitab Version 15 Statistical Software
4
Choosing a Statistical Method Depends on:
Chapter 1 SUMMARIZING DATA Numbers and Graphs Choosing a Statistical Method Depends on: Data characteristics Study objectives
5
Characterizes of Environmental Data
Lower bound of zero Presence of outliers, high values Positive skewness Non-normal distribution High variance Data below recording limits Data collected by other people
6
Categories of Measured Data
Continuous: 1.10, 2.56, …. Discrete: 1, 2, 5, 15 Qualitative, Grouped, Categorical Site 1, Site 2, Site 3 Below Detection Limit, Above Detection Limit
7
Histograms Show how many times Y occur in several groups of X.
Require grouping of a continuous variable Y-axis: frequency or relative frequency X - Independent Variable Y - Dependent Variable
8
Box Plots Good for continuous data Based on percentiles
50th percentile (median) 50 percent of data below or equal to median
9
Inter-quartile Range (IQR) A Measure of Variability
1st Quartile (Q1) = 25% data ≤ this value 2nd Quartile (Q2) = Median 50% data ≤ this value 3rd Quartile (Q3) = 75% data ≤ this value IQR = Q3 - Q1 IQR IQR = 15 – 2.5 = 12.5 Example 2 1,2,3,……..97,98,99 Q1 = 25 Q3 = 75 IQR = = 50 25th Quartile (2.5) 75th Quartile (15)
10
Box Plots
11
Box Plots Outliers Ends of Vertical Lines - Whiskers
Whisker – extends to highest or lowest data value within the limit. Upper Limit = Q (Q3 - Q1) Lower Limit = Q (Q3 - Q1) Q1 = First Quartile, 25th percentile Q3 = Third Quartile, 75th percentile
12
Population vs. Sample Data are samples that we assume represent the characteristics of a population.
13
Mean and Standard Deviation
Summary Statistics Mean and Standard Deviation
14
Mean vs. Median Effect of Outliers
Suppose an error is made, and Median Mean Becomes: The mean is NOT a “resistant” measure of the center Median and percentiles are generally not sensitive to outliers
15
Symmetric vs. Skewed Data
Box Plots Approximate Normal Distribution Non-normal Distribution
16
Common in Environmental Data
Positive Skewness Common in Environmental Data
17
Probability Density Function (pdf) Normal (Gaussian) Distribution
X = continuous variable μ = mean σ2 = variance
18
Cumulative Density Function
Cumulative Density Function, F(b) Area under the probability density function fx(x) from a to b.
19
Calculating Probabilities
20
Example Distributions
Uniform Triangular
21
Example Distributions
Lognormal Exponential
22
Boxplot vs. Probability Density Function
pdf Normal μ=0 σ2=1
23
Symmetric vs. Skewed Data Histograms and Box Plots
Symmetrical Data Approximates a Normal Distribution
24
Symmetric vs. Skewed Data Histograms and Box Plots
Box plot is compressed due to outliers.
25
(top half box width increases)
Increasing Skewness (top half box width increases)
26
Cumulative Distribution Functions
Histogram of natural log of loads and the resulting empirical cumulative density function (CDF) Blue – best fit normal distribution Red – Empirical CDF
27
If data are also straight, they follow a normal distribution.
Probability Plots Theoretical normal distribution plots as a straight line on normal probability paper. If data are also straight, they follow a normal distribution.
28
Not Normally Distributed
Concentrations are Not Normally Distributed
29
Logs of Concentration are
Normally Distributed
30
What to do with skewed data?
Data with outliers have a mean that may be larger than 75% of the data If we want a more “typical” measure of the center, we have two choices: Use a different method, i.e. use the median or geometric mean Transform the data
31
Purpose of Transformations
Make data more normal Make data more linear Make data more constant variance
32
Positive and Negative Skew
Source:
33
Transformations Using Ladder of Powers
34
Geometric Mean Mean of the natural logs of the data
If the logs are normally distributed, the geometric mean is: An estimate of the MEDIAN NOT the mean Mean = 7.40 Median = 0.50 Geometric Mean = 0.62
35
Outliers Observations that are different from the rest of the observations in the data set May be the most important observations in the data set Example: Antarctic ozone data NEVER throw way an outlier(s) Use an alternate method or transform the data
36
Cause of Outliers Measurement or recording error Skewed data
Solution: identify and fix problem Skewed data Solution: use alternate method or transformation Data from a different population Solution: split into two groups based on science and analyze separately
37
MINITAB Laboratory 1 Reading Assignment
Chapter 1 Summarizing Data (pages 1-12) Chapter 2 Graphical Data Analysis (pages 17-64) Statistical Methods in Water Resources by D.R. Helsel and R.M. Hirsch MINITAB Laboratory 1
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.