Presentation is loading. Please wait.

Presentation is loading. Please wait.

BAE 6520 Applied Environmental Statistics

Similar presentations


Presentation on theme: "BAE 6520 Applied Environmental Statistics"— Presentation transcript:

1 BAE 6520 Applied Environmental Statistics
Biosystems and Agricultural Engineering Department Division of Agricultural Sciences and Natural Resources Oklahoma State University Source Dr. Dennis R. Helsel & Dr. Edward J. Gilroy 2006 Applied Environmental Statistics Workshop and Statistical Methods in Water Resources

2 TEXTBOOK Free on-line at:

3 Choosing a Statistical Method Depends on:
Chapter 1 SUMMARIZING DATA Numbers and Graphs Choosing a Statistical Method Depends on: Data characteristics Study objectives

4 Characterizes of Environmental Data
Lower bound of zero Presence of outliers, high values Positive skewness Non-normal distribution High variance Data below recording limits Data collected by other people

5 Categories of Measured Data
Continuous: 1.10, 2.56, …. Discrete: 1, 2, 5, 15 Qualitative, Grouped, Categorical Site 1, Site 2, Site 3 Below Detection Limit, Above Detection Limit

6 Histograms Show how many times Y occur in several groups of X.
Require grouping of a continuous variable Y-axis: frequency or relative frequency

7 Box Plots Good for continuous data Based on percentiles
50th percentile (median) 50 percent of data below or equal to median

8 Inter-quartile Range (IQR) A Measure of Variability
IQR = 75th percentile – 25th percentile Represents the middle half of the data IQR = 15 – 2.5 = 12.5 IQR 25th Percentile (2.5) 75th Percentile (15)

9 Box Plots

10 Box Plots Outliers Ends of Vertical Lines - Whiskers
Whisker – extends to highest or lowest data value within the limit. Upper Limit = Q (Q3 - Q1) Lower Limit = Q (Q3 - Q1) Q1 = First Quartile, 25th percentile Q3 = Third Quartile, 75th percentile

11 Population vs. Sample Data are samples that we assume represent the characteristics of a population.

12 Mean and Standard Deviation
Summary Statistics Mean and Standard Deviation

13 Mean vs. Median Effect of Outliers
Suppose an error is made, and Median Mean Becomes: The mean is NOT a resistant measure of the center Median and percentiles are generally not sensitive to outliers

14 Symmetric vs. Skewed Data
Box Plots Approximate Normal Distribution Non-normal Distribution

15 Common in Environmental Data
Positive Skewness Common in Environmental Data

16 Symmetric vs. Skewed Data Histograms and Box Plots
Symmetrical Data Approximates a Normal Distribution

17 Symmetric vs. Skewed Data Histograms and Box Plots
Box plot is compressed due to outliers.

18 (top half box width increases)
Increasing Skewness (top half box width increases)

19 Cumulative Distribution Functions
Histogram of natural log of loads and the resulting empirical cumulative density function (CDF). Blue – best fit normal distribution Red – Empirical CDF

20 If data are also straight, they follow a normal distribution.
Probability Plots Theoretical normal distribution plots as a straight line on normal probability paper. If data are also straight, they follow a normal distribution.

21 Not Normally Distributed
Concentrations are Not Normally Distributed

22 Logs of Concentration are
Normally Distributed

23 What to do with skewed data?
Data with outliers have a mean that may be larger than 75% of the data If we want a more “typical” measure of the center, we have two choices: Use a different method, i.e. use the median or geometric mean Transform the data

24 Purpose of Transformations
Make data more normal Make data more linear Make data more constant variance

25 Positive and Negative Skew
Source:

26 Transformations Using Ladder of Powers

27 Geometric Mean Mean of the natural logs of the data
If the logs are normally distributed, the geometric mean is: An estimate of the MEDIAN NOT the mean Mean = 7.40 Median = 0.50 Geometric Mean = 0.62

28 Outliers Observations that are different from the rest of the observations in the data set May be the most important observations in the data set Example: Antarctic ozone data NEVER throw way an outlier(s) Use an alternate method or transform the data

29 Cause of Outliers Measurement or recording error Skewed data
Solution: identify and fix problem Skewed data Solution: use alternate method or transformation Data from a different population Solution: split into two groups based on science and analyze separately


Download ppt "BAE 6520 Applied Environmental Statistics"

Similar presentations


Ads by Google