Presentation is loading. Please wait.

Presentation is loading. Please wait.

Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?

Similar presentations


Presentation on theme: "Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?"— Presentation transcript:

1 Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?

2 module 72 Concepts  Independence of each data point  Test statistics  Central Limit Theorem  Standard error of the mean  Confidence interval for a mean  Significance levels  How to apply in Excel

3 module 73 Independent Measurements  Each measurement must be independent (shake up basket of tickets)  Example of non-independent measurements –Public responses to questions (one result affects next person’s answer) –Samplers too close together, so air flows affected

4 module 74 Test Statistics  Some number calculated based on data  In student’s t test, for example, t  If t is >= 1.96 and –population normally distributed, –you’re to right of curve, –where 95% of data is in inner portion, symmetrically between right and left (t=1.96 on right, -1.96 on left)

5 module 75 Test statistics correspond to significance levels  “P” stands for percentile  P th percentile is where p of data falls below, and 1-p fall above

6 module 76 Two Major Types of Questions  Comparing mean against a standard –Does air quality here meet NAAQS?  Comparing two datasets –Is air quality different in 2006 than 2005? –Better? –Worse?

7 module 77 Comparing Mean to a Standard  Did air quality meet CARB annual standard of 12 microg/m 3 ? year Ft Smith avg Ft Smith Min Ft Smith Max N_Fort Smith ‘0514.780.137.977

8 module 78 Central Limit Theorem (magic!)  Even if underlying population is not normally distributed  If we repeatedly take datasets  These different datasets have means that cluster around true mean  Distribution of these means is normally distributed!

9 module 79 Magic Concept #2: Standard Error of the Mean  Represents uncertainty around mean  As sample size N gets bigger, error gets smaller!  The bigger the N, the more tightly you can estimate mean  LIKE standard deviation for a population, but this is for YOUR sample

10 module 710 For a “large” sample (N > 60), or when very close to a normal distribution… Confidence interval for population mean is: Choice of z determines 90%, 95%, etc.

11 module 711 For a “Small” Sample Replace Z value with a t value to get… …where “t” comes from Student’s t distribution, and depends on sample size

12 module 712 Student’s t Distribution vs. Normal Z Distribution

13 module 713 Compare t and Z Values

14 module 714 What happens as sample gets larger?

15 module 715 What happens to CI as sample gets larger? For large samples Z and t values become almost identical, so CIs are almost identical

16 module 716 First, graph and review data  Use box plot add-in  Evaluate spread  Evaluate how far apart mean and median are  (assume sampling design and QC are good)

17 module 717 Excel Summary Stats

18 module 718 N=77 Min0.1 25th7.5 Media n13.7 75th18.1 Max37.9 Mean14.8 SD8.7 1.Use the box-plot add-in 2.Calculate summary stats

19 module 719 Our Question  Can we be 95%, 90%, or how confident that this mean of 14.78 is really greater than standard of 12?  We saw that N = 77, and mean and median not too different  Use z (normal) rather than t

20 module 720 The mean is 14.8 +- what?  We know equation for CI is  Width of confidence interval represents how sure we want to be that this CI includes true mean  Now, decide how confident we want to be

21 module 721 CI Calculation  For 95%, z = 1.96 (often rounded to 2)  Stnd error (sigma/N) = (8.66/square root of 77) = 0.98  CI around mean = 2 x 0.98  We can be 95% sure that mean is included in (mean +- 2), or 14.8-2 at low end, to 14.8 + 2 at high end  This does NOT include 12 !

22 module 722 Excel can also calculate a confidence interval around the mean Mean, plus and minus 1.93, is a 95% confidence interval that does NOT include 12!

23 module 723 We know we are more than 95% confident, but how confident can we be that Ft Smith mean > 12?  Calculate where on curve our mean of 14.8 is, in terms of z (normal) score…  …or if N small, use t score

24 module 724 To find where we are on the curve, calc the test statistic…  Ft Smith mean = 14.8, sigma =8.66, N =77  Calculate test statistic, in this case the z factor (we decided we can use the z rather than the t distribution)  If N was < 60, test stat is t, but calculated the same way Data’s mean Standard of 12

25 module 725 Calculate z Easily  Our mean 14.8 minus standard of 12 (treat real mean  (mu) as standard) is numerator (= 2.8)  Standard error is sigma/square root of N = 0.98 (same as for CI)  so z = (2.8)/0.98 = z = 2.84  So where is this z on the curve?  Remember, at z = 3 we are to the right of ~ 99%

26 module 726 Where on the curve? Z = 3 Z = 2 So between 95 and 99% probable that the true mean will not include 12

27 module 727 You can calculate exactly where on the curve, using Excel  Use Normsdist function, with z If z (or t) = 2.84, in Excel Yields 99.8% probability that the true mean does NOT include 12


Download ppt "Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?"

Similar presentations


Ads by Google