BAE 6520 Applied Environmental Statistics

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

1 Chapter 1: Sampling and Descriptive Statistics.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 4: The Normal Distribution and Z-Scores.
Chapter 2 Describing Data with Numerical Measurements
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Percentiles and Box – and – Whisker Plots Measures of central tendency show us the spread of data. Mean and standard deviation are useful with every day.
Chapter 2 Describing Data.
6-1 Numerical Summaries Definition: Sample Mean.
Categorical vs. Quantitative…
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
To be given to you next time: Short Project, What do students drive? AP Problems.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Summary Statistics: Measures of Location and Dispersion.
Chapter 6: Interpreting the Measures of Variability.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
5,8,12,15,15,18,20,20,20,30,35,40, Drawing a Dot plot.
Descriptive Statistics
Exploratory Data Analysis
Chapter 3 INTERVAL ESTIMATES
Exploring Data: Summary Statistics and Visualizations
Chapter 2 HYPOTHESIS TESTING
BAE 5333 Applied Water Resources Statistics
Chapter 3 INTERVAL ESTIMATES
ISE 261 PROBABILISTIC SYSTEMS
Data Mining: Concepts and Techniques
Chapter 5 : Describing Distributions Numerically I
Chapter 4 Comparing Two Groups of Data
Chapter 6 – Descriptive Statistics
Ch 4 實習.
Unit 2 Section 2.5.
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
IET 603 Quality Assurance in Science & Technology
Summary Statistics 9/23/2018 Summary Statistics
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Chapter 2b.
Topic 5: Exploring Quantitative data
Dot Plots & Box Plots Analyze Data.
Numerical Measures: Skewness and Location
Descriptive and inferential statistics. Confidence interval
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Chapter 3 Section 4 Measures of Position.
Displaying and Summarizing Quantitative Data
Exploratory data analysis: numerical summaries
Organizing Data AP Stats Chapter 1.
Honors Statistics Review Chapters 4 - 5
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Probability and Statistics
Higher National Certificate in Engineering
Univariate Data Univariate Data: involving a single variable
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
The Normal Distribution
Presentation transcript:

BAE 6520 Applied Environmental Statistics Biosystems and Agricultural Engineering Department Division of Agricultural Sciences and Natural Resources Oklahoma State University Source Dr. Dennis R. Helsel & Dr. Edward J. Gilroy 2006 Applied Environmental Statistics Workshop and Statistical Methods in Water Resources

TEXTBOOK Free on-line at: http://pubs.usgs.gov/twri/twri4a3/

Choosing a Statistical Method Depends on: Chapter 1 SUMMARIZING DATA Numbers and Graphs Choosing a Statistical Method Depends on: Data characteristics Study objectives

Characterizes of Environmental Data Lower bound of zero Presence of outliers, high values Positive skewness Non-normal distribution High variance Data below recording limits Data collected by other people

Categories of Measured Data Continuous: 1.10, 2.56, 100.5 …. Discrete: 1, 2, 5, 15 Qualitative, Grouped, Categorical Site 1, Site 2, Site 3 Below Detection Limit, Above Detection Limit

Histograms Show how many times Y occur in several groups of X. Require grouping of a continuous variable Y-axis: frequency or relative frequency

Box Plots Good for continuous data Based on percentiles 50th percentile (median) 50 percent of data below or equal to median

Inter-quartile Range (IQR) A Measure of Variability IQR = 75th percentile – 25th percentile Represents the middle half of the data IQR = 15 – 2.5 = 12.5 IQR 1 3 7 10 13 21 25th Percentile (2.5) 75th Percentile (15)

Box Plots

Box Plots Outliers Ends of Vertical Lines - Whiskers Whisker – extends to highest or lowest data value within the limit. Upper Limit = Q3 + 1.5 (Q3 - Q1) Lower Limit = Q3 - 1.5 (Q3 - Q1) Q1 = First Quartile, 25th percentile Q3 = Third Quartile, 75th percentile

Population vs. Sample Data are samples that we assume represent the characteristics of a population.

Mean and Standard Deviation Summary Statistics Mean and Standard Deviation

Mean vs. Median Effect of Outliers Suppose an error is made, and Median Mean 1 3 7 10 13 21 8.5 9.2 Becomes: 1 3 7 10 13 210 8.5 40.7 The mean is NOT a resistant measure of the center Median and percentiles are generally not sensitive to outliers

Symmetric vs. Skewed Data Box Plots Approximate Normal Distribution Non-normal Distribution

Common in Environmental Data Positive Skewness Common in Environmental Data

Symmetric vs. Skewed Data Histograms and Box Plots Symmetrical Data Approximates a Normal Distribution

Symmetric vs. Skewed Data Histograms and Box Plots Box plot is compressed due to outliers.

(top half box width increases) Increasing Skewness (top half box width increases)

Cumulative Distribution Functions Histogram of natural log of loads and the resulting empirical cumulative density function (CDF). Blue – best fit normal distribution Red – Empirical CDF

If data are also straight, they follow a normal distribution. Probability Plots Theoretical normal distribution plots as a straight line on normal probability paper. If data are also straight, they follow a normal distribution.

Not Normally Distributed Concentrations are Not Normally Distributed

Logs of Concentration are Normally Distributed

What to do with skewed data? Data with outliers have a mean that may be larger than 75% of the data If we want a more “typical” measure of the center, we have two choices: Use a different method, i.e. use the median or geometric mean Transform the data

Purpose of Transformations Make data more normal Make data more linear Make data more constant variance

Positive and Negative Skew Source: http://www.georgetown.edu/departments/psychology/researchmethods/statistics/begin.htm

Transformations Using Ladder of Powers

Geometric Mean Mean of the natural logs of the data If the logs are normally distributed, the geometric mean is: An estimate of the MEDIAN NOT the mean Mean = 7.40 Median = 0.50 Geometric Mean = 0.62

Outliers Observations that are different from the rest of the observations in the data set May be the most important observations in the data set Example: Antarctic ozone data NEVER throw way an outlier(s) Use an alternate method or transform the data

Cause of Outliers Measurement or recording error Skewed data Solution: identify and fix problem Skewed data Solution: use alternate method or transformation Data from a different population Solution: split into two groups based on science and analyze separately