BAE 5333 Applied Water Resources Statistics

Slides:



Advertisements
Similar presentations
Measures of Dispersion
Advertisements

Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics: A First Course 5 th.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 1 Overview and Descriptive Statistics.
Exploration of Mean & Median Go to the website of “Introduction to the Practice of Statistics”website Click on the link to “Statistical Applets” Select.
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
Chapter 2 Describing Data.
6-1 Numerical Summaries Definition: Sample Mean.
Describing distributions with numbers
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
To be given to you next time: Short Project, What do students drive? AP Problems.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Chapter 6: Interpreting the Measures of Variability.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Descriptive Statistics
Prof. Eric A. Suess Chapter 3
Exploratory Data Analysis
Chapter 3 INTERVAL ESTIMATES
STATISTICS AND PROBABILITY IN CIVIL ENGINEERING
Chapter 2 HYPOTHESIS TESTING
Business and Economics 6th Edition
Chapter 14 Fitting Probability Distributions
Box and Whisker Plots or Boxplots
BAE 6520 Applied Environmental Statistics
Chapter 11 LOcally-WEighted Scatterplot Smoother (Lowess)
Chapter 3 INTERVAL ESTIMATES
ISE 261 PROBABILISTIC SYSTEMS
Data Mining: Concepts and Techniques
Chapter 5 : Describing Distributions Numerically I
Chapter 4 Comparing Two Groups of Data
Unit 2 Section 2.5.
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
IET 603 Quality Assurance in Science & Technology
Summary Statistics 9/23/2018 Summary Statistics
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Chapter 2b.
Topic 5: Exploring Quantitative data
Numerical Measures: Skewness and Location
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Quartile Measures DCOVA
Exploratory data analysis: numerical summaries
Organizing Data AP Stats Chapter 1.
Continuous Statistical Distributions: A Practical Guide for Detection, Description and Sense Making Unit 3.
Statistics Fractiles
Honors Statistics Review Chapters 4 - 5
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Probability and Statistics
Univariate Data Univariate Data: involving a single variable
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
The Normal Distribution
Descriptive Statistics Civil and Environmental Engineering Dept.
Presentation transcript:

BAE 5333 Applied Water Resources Statistics Biosystems and Agricultural Engineering Department Division of Agricultural Sciences and Natural Resources Oklahoma State University Source Dr. Dennis R. Helsel & Dr. Edward J. Gilroy 2006 Applied Environmental Statistics Workshop and Statistical Methods in Water Resources

TEXTBOOK Free on-line at: http://pubs.usgs.gov/twri/twri4a3/

SOFTWARE Minitab Version 15 Statistical Software http://www.minitab.com/products/minitab/

Choosing a Statistical Method Depends on: Chapter 1 SUMMARIZING DATA Numbers and Graphs Choosing a Statistical Method Depends on: Data characteristics Study objectives

Characterizes of Environmental Data Lower bound of zero Presence of outliers, high values Positive skewness Non-normal distribution High variance Data below recording limits Data collected by other people

Categories of Measured Data Continuous: 1.10, 2.56, 100.5 …. Discrete: 1, 2, 5, 15 Qualitative, Grouped, Categorical Site 1, Site 2, Site 3 Below Detection Limit, Above Detection Limit

Histograms Show how many times Y occur in several groups of X. Require grouping of a continuous variable Y-axis: frequency or relative frequency X - Independent Variable Y - Dependent Variable

Box Plots Good for continuous data Based on percentiles 50th percentile (median) 50 percent of data below or equal to median

Inter-quartile Range (IQR) A Measure of Variability 1st Quartile (Q1) = 25% data ≤ this value 2nd Quartile (Q2) = Median 50% data ≤ this value 3rd Quartile (Q3) = 75% data ≤ this value IQR = Q3 - Q1 1 3 7 10 13 21 IQR IQR = 15 – 2.5 = 12.5 Example 2 1,2,3,……..97,98,99 Q1 = 25 Q3 = 75 IQR = 75-25 = 50 25th Quartile (2.5) 75th Quartile (15)

Box Plots

Box Plots Outliers Ends of Vertical Lines - Whiskers Whisker – extends to highest or lowest data value within the limit. Upper Limit = Q3 + 1.5 (Q3 - Q1) Lower Limit = Q1 - 1.5 (Q3 - Q1) Q1 = First Quartile, 25th percentile Q3 = Third Quartile, 75th percentile

Population vs. Sample Data are samples that we assume represent the characteristics of a population.

Mean and Standard Deviation Summary Statistics Mean and Standard Deviation

Mean vs. Median Effect of Outliers Suppose an error is made, and Median Mean 1 3 7 10 13 21 8.5 9.2 Becomes: 1 3 7 10 13 210 8.5 40.7 The mean is NOT a “resistant” measure of the center Median and percentiles are generally not sensitive to outliers

Symmetric vs. Skewed Data Box Plots Approximate Normal Distribution Non-normal Distribution

Common in Environmental Data Positive Skewness Common in Environmental Data

Probability Density Function (pdf) Normal (Gaussian) Distribution X = continuous variable μ = mean σ2 = variance

Cumulative Density Function Cumulative Density Function, F(b) Area under the probability density function fx(x) from a to b.

Calculating Probabilities

Example Distributions Uniform Triangular

Example Distributions Lognormal Exponential

Boxplot vs. Probability Density Function pdf Normal μ=0 σ2=1 http://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.png

Symmetric vs. Skewed Data Histograms and Box Plots Symmetrical Data Approximates a Normal Distribution

Symmetric vs. Skewed Data Histograms and Box Plots Box plot is compressed due to outliers.

(top half box width increases) Increasing Skewness (top half box width increases)

Cumulative Distribution Functions Histogram of natural log of loads and the resulting empirical cumulative density function (CDF) Blue – best fit normal distribution Red – Empirical CDF

If data are also straight, they follow a normal distribution. Probability Plots Theoretical normal distribution plots as a straight line on normal probability paper. If data are also straight, they follow a normal distribution.

Not Normally Distributed Concentrations are Not Normally Distributed

Logs of Concentration are Normally Distributed

What to do with skewed data? Data with outliers have a mean that may be larger than 75% of the data If we want a more “typical” measure of the center, we have two choices: Use a different method, i.e. use the median or geometric mean Transform the data

Purpose of Transformations Make data more normal Make data more linear Make data more constant variance

Positive and Negative Skew Source: http://www.georgetown.edu/departments/psychology/researchmethods/statistics/begin.htm

Transformations Using Ladder of Powers

Geometric Mean Mean of the natural logs of the data If the logs are normally distributed, the geometric mean is: An estimate of the MEDIAN NOT the mean Mean = 7.40 Median = 0.50 Geometric Mean = 0.62

Outliers Observations that are different from the rest of the observations in the data set May be the most important observations in the data set Example: Antarctic ozone data NEVER throw way an outlier(s) Use an alternate method or transform the data

Cause of Outliers Measurement or recording error Skewed data Solution: identify and fix problem Skewed data Solution: use alternate method or transformation Data from a different population Solution: split into two groups based on science and analyze separately

MINITAB Laboratory 1 Reading Assignment Chapter 1 Summarizing Data (pages 1-12) Chapter 2 Graphical Data Analysis (pages 17-64) Statistical Methods in Water Resources by D.R. Helsel and R.M. Hirsch MINITAB Laboratory 1