Presentation is loading. Please wait.

Presentation is loading. Please wait.

Module 21 Module 2: Terminology of Data Sets Attributes of Data Sets (Mean and Spread) Melinda Ronca-Battista, ITEP Catherine Brown, U.S. EPA.

Similar presentations


Presentation on theme: "Module 21 Module 2: Terminology of Data Sets Attributes of Data Sets (Mean and Spread) Melinda Ronca-Battista, ITEP Catherine Brown, U.S. EPA."— Presentation transcript:

1 module 21 Module 2: Terminology of Data Sets Attributes of Data Sets (Mean and Spread) Melinda Ronca-Battista, ITEP Catherine Brown, U.S. EPA

2 module 22 Histogram  a.k.a. “frequency distribution”  Many types of datasets form “bell”-shaped histograms  a.k.a. “normal,” “standard,” “Gaussian” curves

3 module 23 Typical Histogram

4 module 24 Only 2 Factors  Mean (center of data, where most data are)  Spread (how far from mean is how much of the data)

5 module 25 Center  Mean = average  Outliers can strongly affect the mean  Distribution may not be symmetrical

6 module 26 Many Environmental Distributions

7 module 27 Normal Distribution Useful

8 module 28Spread  Sample standard deviation  STDEV(range)  The bigger the STDEV is, compared to the mean, the wider the spread  COV = STDEV/mean

9 module 29 How can we use normal distribution?  “Map” our distribution onto a normal distribution, using our mean and stdev  Then can predict how many values in different degrees of “spread” away from mean

10 module 210 Sample vs. Population  Our sample is a subset  We assume our subset is subset of “real” population  The closer our subset is to the real population, the better our prediction will be  Good sampling plans produce better representations of the “real” distribution

11 module 211 Subsets Might be Biased

12 module 212 Terminology  Mu =  mean  Sigma = s = standard deviation

13 module 213 How is this useful?  Calculate mean and stdev  NOW can predict reality!  Put any “x” value in context of how many STDEVs away from mean it is

14 module 214 Standard Deviation  STDEV(range)

15 module 215 Z Score  Z shows how far away from the mean is the “x” value you are interested in

16 module 216 Z scores are Proportions of Spread

17 module 217

18 module 218 Sample Size Affects Confidence:  The more N, the better your estimate (STDEV) reflects the real spread

19 module 219 Air Quality  Daily sampling is best estimate of reality  Compromise with 1 in 3 day sampling  Worse estimate with 1 in 6 day  Even worse with 1 in 12 day  How well does one location estimate all air in airshed?  Compromise in both frequency and number of sites

20 module 220 Module 2 Summary  Data sets estimate reality  Mean (average)  Spread (stdev)  N  Good sampling plans produce good estimates of reality


Download ppt "Module 21 Module 2: Terminology of Data Sets Attributes of Data Sets (Mean and Spread) Melinda Ronca-Battista, ITEP Catherine Brown, U.S. EPA."

Similar presentations


Ads by Google