Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?

Slides:



Advertisements
Similar presentations
PTP 560 Research Methods Week 9 Thomas Ruediger, PT.
Advertisements

Parameter Estimation Chapter 8 Homework: 1-7, 9, 10 Focus: when  is known (use z table)
Central Limit Theorem.
Objectives Look at Central Limit Theorem Sampling distribution of the mean.
The standard error of the sample mean and confidence intervals
The standard error of the sample mean and confidence intervals
Sampling Distributions
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
1.  Why understanding probability is important?  What is normal curve  How to compute and interpret z scores. 2.
Quiz 6 Confidence intervals z Distribution t Distribution.
Chapter 11: Random Sampling and Sampling Distributions
Standard error of estimate & Confidence interval.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Fall 2013Biostat 5110 (Biostatistics 511) Week 7 Discussion Section Lisa Brown Medical Biometry I.
Week 7: Means, SDs & z-scores problem sheet (answers)
14. Introduction to inference
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Estimating a Population Mean
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.
A Sampling Distribution
Dan Piett STAT West Virginia University
Albert Morlan Caitrin Carroll Savannah Andrews Richard Saney.
Estimation of Statistical Parameters
Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
8 Sampling Distribution of the Mean Chapter8 p Sampling Distributions Population mean and standard deviation,  and   unknown Maximal Likelihood.
Descriptive & Inferential Statistics Adopted from ;Merryellen Towey Schulz, Ph.D. College of Saint Mary EDU 496.
Confidence Intervals Target Goal: I can use normal calculations to construct confidence intervals. I can interpret a confidence interval in context. 8.1b.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Descriptive Statistics Review – Chapter 14. Data  Data – collection of numerical information  Frequency distribution – set of data with frequencies.
RESEARCH & DATA ANALYSIS
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.4 Estimation of a Population Mean  is unknown  This section presents.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Module 21 Module 2: Terminology of Data Sets Attributes of Data Sets (Mean and Spread) Melinda Ronca-Battista, ITEP Catherine Brown, U.S. EPA.
Lecture 5 Introduction to Sampling Distributions.
Descriptive Statistics for one variable. Statistics has two major chapters: Descriptive Statistics Inferential statistics.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
m/sampling_dist/index.html.
Comparing Datasets and Comparing a Dataset with a Standard How different is enough?
Political Science 30: Political Inquiry. The Magic of the Normal Curve Normal Curves (Essentials, pp ) The family of normal curves The rule of.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
 Normal Curves  The family of normal curves  The rule of  The Central Limit Theorem  Confidence Intervals  Around a Mean  Around a Proportion.
Measures of Position – Quartiles and Percentiles
Chapter 9 Roadmap Where are we going?.
Sampling Distributions
AP Biology Intro to Statistics
Practice & Communication of Science From Distributions to Confidence
Distribution of the Sample Means
Political Research & Analysis (PO657) Session V- Normal Distribution, Central Limit Theorem & Confidence Intervals.
Description of Data (Summary and Variability measures)
Chapter 7 Sampling Distributions.
From Distributions to Confidence
Chapter 7 Sampling Distributions.
Descriptive and inferential statistics. Confidence interval
Why does sampling work?.
POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population.
Chapter 7 Sampling Distributions.
Chapter 7 Sampling Distributions.
GENERALIZATION OF RESULTS OF A SAMPLE OVER POPULATION
Advanced Algebra Unit 1 Vocabulary
Chapter 7 Sampling Distributions.
Comparing two means: Module 7 continued module 7.
Presentation transcript:

Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?

module 72 Concepts  Independence of each data point  Test statistics  Central Limit Theorem  Standard error of the mean  Confidence interval for a mean  Significance levels  How to apply in Excel

module 73 Independent Measurements  Each measurement must be independent (shake up basket of tickets)  Example of non-independent measurements –Public responses to questions (one result affects next person’s answer) –Samplers too close together, so air flows affected

module 74 Test Statistics  Some number calculated based on data  In student’s t test, for example, t  If t is >= 1.96 and –population normally distributed, –you’re to right of curve, –where 95% of data is in inner portion, symmetrically between right and left (t=1.96 on right, on left)

module 75 Test statistics correspond to significance levels  “P” stands for percentile  P th percentile is where p of data falls below, and 1-p fall above

module 76 Two Major Types of Questions  Comparing mean against a standard –Does air quality here meet NAAQS?  Comparing two datasets –Is air quality different in 2006 than 2005? –Better? –Worse?

module 77 Comparing Mean to a Standard  Did air quality meet CARB annual standard of 12 microg/m 3 ? year Ft Smith avg Ft Smith Min Ft Smith Max N_Fort Smith ‘

module 78 Central Limit Theorem (magic!)  Even if underlying population is not normally distributed  If we repeatedly take datasets  These different datasets have means that cluster around true mean  Distribution of these means is normally distributed!

module 79 Magic Concept #2: Standard Error of the Mean  Represents uncertainty around mean  As sample size N gets bigger, error gets smaller!  The bigger the N, the more tightly you can estimate mean  LIKE standard deviation for a population, but this is for YOUR sample

module 710 For a “large” sample (N > 60), or when very close to a normal distribution… Confidence interval for population mean is: Choice of z determines 90%, 95%, etc.

module 711 For a “Small” Sample Replace Z value with a t value to get… …where “t” comes from Student’s t distribution, and depends on sample size

module 712 Student’s t Distribution vs. Normal Z Distribution

module 713 Compare t and Z Values

module 714 What happens as sample gets larger?

module 715 What happens to CI as sample gets larger? For large samples Z and t values become almost identical, so CIs are almost identical

module 716 First, graph and review data  Use box plot add-in  Evaluate spread  Evaluate how far apart mean and median are  (assume sampling design and QC are good)

module 717 Excel Summary Stats

module 718 N=77 Min0.1 25th7.5 Media n th18.1 Max37.9 Mean14.8 SD8.7 1.Use the box-plot add-in 2.Calculate summary stats

module 719 Our Question  Can we be 95%, 90%, or how confident that this mean of is really greater than standard of 12?  We saw that N = 77, and mean and median not too different  Use z (normal) rather than t

module 720 The mean is what?  We know equation for CI is  Width of confidence interval represents how sure we want to be that this CI includes true mean  Now, decide how confident we want to be

module 721 CI Calculation  For 95%, z = 1.96 (often rounded to 2)  Stnd error (sigma/N) = (8.66/square root of 77) = 0.98  CI around mean = 2 x 0.98  We can be 95% sure that mean is included in (mean +- 2), or at low end, to at high end  This does NOT include 12 !

module 722 Excel can also calculate a confidence interval around the mean Mean, plus and minus 1.93, is a 95% confidence interval that does NOT include 12!

module 723 We know we are more than 95% confident, but how confident can we be that Ft Smith mean > 12?  Calculate where on curve our mean of 14.8 is, in terms of z (normal) score…  …or if N small, use t score

module 724 To find where we are on the curve, calc the test statistic…  Ft Smith mean = 14.8, sigma =8.66, N =77  Calculate test statistic, in this case the z factor (we decided we can use the z rather than the t distribution)  If N was < 60, test stat is t, but calculated the same way Data’s mean Standard of 12

module 725 Calculate z Easily  Our mean 14.8 minus standard of 12 (treat real mean  (mu) as standard) is numerator (= 2.8)  Standard error is sigma/square root of N = 0.98 (same as for CI)  so z = (2.8)/0.98 = z = 2.84  So where is this z on the curve?  Remember, at z = 3 we are to the right of ~ 99%

module 726 Where on the curve? Z = 3 Z = 2 So between 95 and 99% probable that the true mean will not include 12

module 727 You can calculate exactly where on the curve, using Excel  Use Normsdist function, with z If z (or t) = 2.84, in Excel Yields 99.8% probability that the true mean does NOT include 12