Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.

Slides:



Advertisements
Similar presentations
CHAPTER 11: Sampling Distributions
Advertisements

Sampling: Final and Initial Sample Size Determination
Statistics and Quantitative Analysis U4320
1 Introduction to Inference Confidence Intervals William P. Wattles, Ph.D. Psychology 302.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Central Limit Theorem.
The standard error of the sample mean and confidence intervals
The standard error of the sample mean and confidence intervals
Topics: Inferential Statistics
1 Confidence Interval for the Population Proportion.
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
SAMPLING DISTRIBUTIONS. SAMPLING VARIABILITY
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Business Research Methods William G. Zikmund Chapter 17: Determination of Sample Size.
The Sampling Distribution of the Sample Mean AGAIN – with a new angle.
1 (Student’s) T Distribution. 2 Z vs. T Many applications involve making conclusions about an unknown mean . Because a second unknown, , is present,
Chapter 7 Probability and Samples: The Distribution of Sample Means
CHAPTER 11: Sampling Distributions
Chapter 4 SUMMARIZING SCORES WITH MEASURES OF VARIABILITY.
Standard error of estimate & Confidence interval.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Chapter 6: Sampling Distributions
CHAPTER 11: Sampling Distributions
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
A Sampling Distribution
Dan Piett STAT West Virginia University
Estimation of Statistical Parameters
Topic 5 Statistical inference: point and interval estimate
Statistics 101 Chapter 10. Section 10-1 We want to infer from the sample data some conclusion about a wider population that the sample represents. Inferential.
Random Sampling, Point Estimation and Maximum Likelihood.
Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.
Determination of Sample Size: A Review of Statistical Theory
CHAPTER 11: Sampling Distributions ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
What does Statistics Mean? Descriptive statistics –Number of people –Trends in employment –Data Inferential statistics –Make an inference about a population.
Stat 1510: Sampling Distributions
STA Lecture 171 STA 291 Lecture 17 Chap. 10 Estimation – Estimating the Population Proportion p –We are not predicting the next outcome (which is.
3 common measures of dispersion or variability Range Range Variance Variance Standard Deviation Standard Deviation.
Confidence Interval Estimation For statistical inference in decision making:
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
What is a Confidence Interval?. Sampling Distribution of the Sample Mean The statistic estimates the population mean We want the sampling distribution.
1 Chapter 9: Sampling Distributions. 2 Activity 9A, pp
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Sampling Theory and Some Important Sampling Distributions.
Introduction to Inference Sampling Distributions.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
Confidence Intervals INTRO. Confidence Intervals Brief review of sampling. Brief review of the Central Limit Theorem. How do CIs work? Why do we use CIs?
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Chapter Eleven Sample Size Determination Chapter Eleven.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Introduction to Inference
Statistics in Applied Science and Technology
Summary descriptive statistics: means and standard deviations:
Calculating Probabilities for Any Normal Variable
Sampling Distribution of a Sample Proportion
CHAPTER 15 SUMMARY Chapter Specifics
CHAPTER 11: Sampling Distributions
Sampling Distribution of a Sample Proportion
Interval Estimation Download this presentation.
How Confident Are You?.
Presentation transcript:

Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 14: Means of Samples Draw a sample of size 2 from the population in the hat. Compute the mean of the sample. Write the mean of your sample on the pad. Compute the average of the means on the pad. Compute the standard deviation of the means on the pad.

Problem 14: Means of Samples Draw a sample of size 3 from the population in the hat. Compute the mean of the sample. Write the mean of your sample on the pad. Compute the average of the means on the pad. Compute the standard deviation of the means on the pad.

Problem 14: Means of Samples Draw a sample of size 4 from the population in the hat. Compute the mean of the sample. Write the mean of your sample on the pad. Compute the average of the means on the pad. Compute the standard deviation of the means on the pad.

Problem 14 What do we know about the mean of the population in the hat? Why aren’t the means of all the samples the same? How accurate are the estimates of the population mean based on the sample means?

Definitions Scientific research involves intensive study of small groups (called samples) in order to draw conclusions about much larger groups (called populations Statistical inference uses techniques for drawing inferences or generalizations from samples to populations Such inferences are always subject to error

Definitions Population: a collection of objects, events, or individuals having a common characteristic that the researcher is interested in studying Sample: a small set selected from the population for study Population parameters are symbolized with Greek letters (theoretical distribution) Sample statistics are computed by the researcher on her/his samples.

Definitions The population mean (mu or μ) is the number the research is trying to estimate. Each of the samples provides a mean ( ) that is an estimate of the population mean. The variability of the sample means tells us about how much error there is in our research

Population from Problem 14 PopulationN = 6 S12 S23 S35 S46 S56 S68 Total =30 Mu = μ =5 Sigma = σ =2 Variance = σ 2 4 The actual mean of the population is 5 and the variance is 4 How close did the means of the samples come to the true population parameter? Was the mean of the means closer? Does sample size matter?

Population Distribution The mean of the population is 5 The variance is 4 The population is not normally distributed The theoretical distribution of the population is unknown

Problem 15 List all possible samples of size 2 from the population in problem 14. (use pad provided) Compute the mean of each sample. Compute the mean of the sample means. Compute the variance and standard deviation of the means (the variability of the means is an estimate of the amount of error in our inferences).

Problem 15 SampleMean There are 36 possible samples of size 2 The table shows the first 14 Here is a list of all 36 means: 2, 2.5, 3.5, 4, 4, 5, 2.5, 3, 4, 4.5, 4.5, 5.5, 3.5, 4, 5, 5.5, 5.5, 6.5, 4, 4.5, 5.5, 6, 6, 7, 4, 4.5, 5.5, 6, 6, 7, 5, 5.5, 6.5, 7, 7, 8

Problem 15 This is the distribution of the 36 means of samples of size 2 is the mean of the means is the variance of the means

Problem 16 List all possible samples of size 3 from the population in problem 14. Compute the mean of each sample. Compute the mean of the sample means. Compute the variance and standard deviation of the means (the variability of the means is an estimate of the amount of error in our inferences).

Problem 16 SampleMean The table shows the first 10 samples There are a total of 216 possible samples (I really wrote them all out and computed the means of each).

Problem 16 This is the distribution of the means of the 216 samples of size 3.

Problem 17 List all possible samples of size 4 from the population in problem 14. Compute the mean of each sample. Compute the mean of the sample means. Compute the variance and standard deviation of the means (the variability of the means is an estimate of the amount of error in our inferences).

Problem 17 SampleMean The table shows the first 10 samples There are a total of 1296 possible different samples

Problem 17 This is the distribution of the means of the 1296 samples of size 4.

Definition Sampling Distribution of the Mean: the distribution of the means of all possible samples of size N

Summary Population Sampling Distribution of the Mean for Samples of N = 2 Sampling Distribution of the Mean for Samples of N = 3 Sampling Distribution of the Mean for Samples of N = 4

Central Limit Theorem Given any population (with any distribution, normal or otherwise) with mean μ and variance σ 2, as the sample size increases the sampling distribution of the mean 1.Approaches a normal distribution with 2.Mean μ and 3.Variance

Effect of Sample Size on the Sampling Distribution As the sample size gets bigger the standard deviation of the sampling distribution gets smaller. Definition: – Standard Error: the standard deviation of the sampling distribution

Problem 18 We are studying the attitude of people in the USA toward the President’s foreign policy. We use the following survey question: Use the following scale to indicate your level of agreement or disagreement with the President’s foreign policy

Problem 18 In a real survey we would not know the population mean or variance, we’d have to estimate them from data, but for purposes of this example, pretend we know that the mean agreement with the President’s foreign policy for the whole USA is 5 (slight agreement) with a variance of 4 (standard deviation of 2 points). Use What ratings are 95% of the population between

Problem 18 If the population were normally distributed, which it probably isn’t, 95% of it would be between and The population is all over the place – all the way from approximately 1 to 9.

Problem 18 We wish to use a sample of people to estimate the mean of the population (pretend we don’t know that the mean is 5 and the variance is 4). We draw a sample of N = 10 people. Here are their ratings: 5, 5, 6, 3, 4, 5, 5, 6, 4, 5 Since most of the people are in the middle of the population distribution, most of our sample is in the middle also. The mean of the sample is 4.8 The standard deviation of the sample is.87

Problem 18 Sample size N = 10 The mean of the sample is 4.8, the SD =.87 The central limit theorem says the mean of samples of size 10 is approximately normally distributed with a mean = to the population mean and a variance = to the population variance / sample size. Population variance / sample size is 4/10 =.4 Standard error = square root of.4 =.623 Compute the 95% confidence interval on the mean

Problem 18 The central limit theorem tells us that when we draw samples of size 10 from this distribution 95% of the time the true mean will be between 3.57 and This is called the 95% confidence interval This gives the accuracy of our estimate of the population mean. We estimate the population mean is 4.8 (and we are 95% sure it is between 3.5 and 6.0) The accuracy is plus or minus 1.3 points.

Problem 18 Could we improve the accuracy by selecting a sample of 1000 people? N = 1000 We survey 1000 people. The mean of the sample is 4.95 The standard error is the square root of the population variance divided by the sample size = sqrt(4/1000) =

Problem 18 N = 1000 The 95% confidence interval is from 4.83 to The accuracy is plus or minus.12 points

Problem 18 The green graph is the sampling distribution for samples of size 10 The red graph is the sampling distribution for samples of size % of the red distribution is much closer to the true mean

Application The central limit theorem is what makes political polling possible on election night. It is easy for a major national polling firm to sample 10,000 people. Then they are dividing by 100 (square root of 10,000) to get the standard error.

References