Download presentation
Presentation is loading. Please wait.
Published byGwenda Avis Nicholson Modified over 8 years ago
1
module 21 Module 2: Terminology of Data Sets Attributes of Data Sets (Mean and Spread) Melinda Ronca-Battista, ITEP Catherine Brown, U.S. EPA
2
module 22 Histogram a.k.a. “frequency distribution” Many types of datasets form “bell”-shaped histograms a.k.a. “normal,” “standard,” “Gaussian” curves
3
module 23 Typical Histogram
4
module 24 Only 2 Factors Mean (center of data, where most data are) Spread (how far from mean is how much of the data)
5
module 25 Center Mean = average Outliers can strongly affect the mean Distribution may not be symmetrical
6
module 26 Many Environmental Distributions
7
module 27 Normal Distribution Useful
8
module 28Spread Sample standard deviation STDEV(range) The bigger the STDEV is, compared to the mean, the wider the spread COV = STDEV/mean
9
module 29 How can we use normal distribution? “Map” our distribution onto a normal distribution, using our mean and stdev Then can predict how many values in different degrees of “spread” away from mean
10
module 210 Sample vs. Population Our sample is a subset We assume our subset is subset of “real” population The closer our subset is to the real population, the better our prediction will be Good sampling plans produce better representations of the “real” distribution
11
module 211 Subsets Might be Biased
12
module 212 Terminology Mu = mean Sigma = s = standard deviation
13
module 213 How is this useful? Calculate mean and stdev NOW can predict reality! Put any “x” value in context of how many STDEVs away from mean it is
14
module 214 Standard Deviation STDEV(range)
15
module 215 Z Score Z shows how far away from the mean is the “x” value you are interested in
16
module 216 Z scores are Proportions of Spread
17
module 217
18
module 218 Sample Size Affects Confidence: The more N, the better your estimate (STDEV) reflects the real spread
19
module 219 Air Quality Daily sampling is best estimate of reality Compromise with 1 in 3 day sampling Worse estimate with 1 in 6 day Even worse with 1 in 12 day How well does one location estimate all air in airshed? Compromise in both frequency and number of sites
20
module 220 Module 2 Summary Data sets estimate reality Mean (average) Spread (stdev) N Good sampling plans produce good estimates of reality
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.