Download presentation
Presentation is loading. Please wait.
1
BAE 6520 Applied Environmental Statistics
Biosystems and Agricultural Engineering Department Division of Agricultural Sciences and Natural Resources Oklahoma State University Source Dr. Dennis R. Helsel & Dr. Edward J. Gilroy 2006 Applied Environmental Statistics Workshop and Statistical Methods in Water Resources
2
TEXTBOOK Free on-line at:
3
Choosing a Statistical Method Depends on:
Chapter 1 SUMMARIZING DATA Numbers and Graphs Choosing a Statistical Method Depends on: Data characteristics Study objectives
4
Characterizes of Environmental Data
Lower bound of zero Presence of outliers, high values Positive skewness Non-normal distribution High variance Data below recording limits Data collected by other people
5
Categories of Measured Data
Continuous: 1.10, 2.56, …. Discrete: 1, 2, 5, 15 Qualitative, Grouped, Categorical Site 1, Site 2, Site 3 Below Detection Limit, Above Detection Limit
6
Histograms Show how many times Y occur in several groups of X.
Require grouping of a continuous variable Y-axis: frequency or relative frequency
7
Box Plots Good for continuous data Based on percentiles
50th percentile (median) 50 percent of data below or equal to median
8
Inter-quartile Range (IQR) A Measure of Variability
IQR = 75th percentile – 25th percentile Represents the middle half of the data IQR = 15 – 2.5 = 12.5 IQR 25th Percentile (2.5) 75th Percentile (15)
9
Box Plots
10
Box Plots Outliers Ends of Vertical Lines - Whiskers
Whisker – extends to highest or lowest data value within the limit. Upper Limit = Q (Q3 - Q1) Lower Limit = Q (Q3 - Q1) Q1 = First Quartile, 25th percentile Q3 = Third Quartile, 75th percentile
11
Population vs. Sample Data are samples that we assume represent the characteristics of a population.
12
Mean and Standard Deviation
Summary Statistics Mean and Standard Deviation
13
Mean vs. Median Effect of Outliers
Suppose an error is made, and Median Mean Becomes: The mean is NOT a resistant measure of the center Median and percentiles are generally not sensitive to outliers
14
Symmetric vs. Skewed Data
Box Plots Approximate Normal Distribution Non-normal Distribution
15
Common in Environmental Data
Positive Skewness Common in Environmental Data
16
Symmetric vs. Skewed Data Histograms and Box Plots
Symmetrical Data Approximates a Normal Distribution
17
Symmetric vs. Skewed Data Histograms and Box Plots
Box plot is compressed due to outliers.
18
(top half box width increases)
Increasing Skewness (top half box width increases)
19
Cumulative Distribution Functions
Histogram of natural log of loads and the resulting empirical cumulative density function (CDF). Blue – best fit normal distribution Red – Empirical CDF
20
If data are also straight, they follow a normal distribution.
Probability Plots Theoretical normal distribution plots as a straight line on normal probability paper. If data are also straight, they follow a normal distribution.
21
Not Normally Distributed
Concentrations are Not Normally Distributed
22
Logs of Concentration are
Normally Distributed
23
What to do with skewed data?
Data with outliers have a mean that may be larger than 75% of the data If we want a more “typical” measure of the center, we have two choices: Use a different method, i.e. use the median or geometric mean Transform the data
24
Purpose of Transformations
Make data more normal Make data more linear Make data more constant variance
25
Positive and Negative Skew
Source:
26
Transformations Using Ladder of Powers
27
Geometric Mean Mean of the natural logs of the data
If the logs are normally distributed, the geometric mean is: An estimate of the MEDIAN NOT the mean Mean = 7.40 Median = 0.50 Geometric Mean = 0.62
28
Outliers Observations that are different from the rest of the observations in the data set May be the most important observations in the data set Example: Antarctic ozone data NEVER throw way an outlier(s) Use an alternate method or transform the data
29
Cause of Outliers Measurement or recording error Skewed data
Solution: identify and fix problem Skewed data Solution: use alternate method or transformation Data from a different population Solution: split into two groups based on science and analyze separately
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.