A Sampling Distribution

Slides:



Advertisements
Similar presentations
Statistics Review.
Advertisements

A Sampling Distribution
Estimation in Sampling
Chapter 8: Estimating with Confidence
Chapter 10: Estimating with Confidence
Objectives Look at Central Limit Theorem Sampling distribution of the mean.
The standard error of the sample mean and confidence intervals
The standard error of the sample mean and confidence intervals
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Chapter Sampling Distributions and Hypothesis Testing.
1 The Sample Mean rule Recall we learned a variable could have a normal distribution? This was useful because then we could say approximately.
Chapter 10: Estimating with Confidence
Chapter 7 Probability and Samples: The Distribution of Sample Means
Inferential Statistics
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Leon-Guerrero and Frankfort-Nachmias,
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Comparing Two Groups’ Means or Proportions
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
Chapter 11: Estimation Estimation Defined Confidence Levels
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
AP Statistics Chapter 9 Notes.
Estimation of Statistical Parameters
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Lecture 14 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.
Rule of sample proportions IF:1.There is a population proportion of interest 2.We have a random sample from the population 3.The sample is large enough.
Sampling Distribution WELCOME to INFERENTIAL STATISTICS.
Inferential Statistics Part 1 Chapter 8 P
+ “Statisticians use a confidence interval to describe the amount of uncertainty associated with a sample estimate of a population parameter.”confidence.
Chapter 10: Introduction to Statistical Inference.
+ DO NOW. + Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population.
Chapter 10: Confidence Intervals
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Introduction to Inference Sampling Distributions.
Ch 8 Estimating with Confidence 8.1: Confidence Intervals.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
1 Probability and Statistics Confidence Intervals.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
The Normal Distribution. Normal and Skewed Distributions.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
And distribution of sample means
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Calculating Probabilities for Any Normal Variable
Chapter 8: Estimating with Confidence
Confidence Intervals with Proportions
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Advanced Algebra Unit 1 Vocabulary
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
How Confident Are You?.
Presentation transcript:

A Sampling Distribution The way our means would be distributed if we collected a sample, recorded the mean and threw it back, and collected another, recorded the mean and threw it back, and did this again and again, ad nauseam!

A Sampling Distribution From Vogt: A theoretical frequency distribution of the scores for or values of a statistic, such as a mean. Any statistic that can be computed for a sample has a sampling distribution. A sampling distribution is the distribution of statistics that would be produced in repeated random sampling (with replacement) from the same population. It is all possible values of a statistic and their probabilities of occurring for a sample of a particular size. Sampling distributions are used to calculate the probability that sample statistics could have occurred by chance and thus to decide whether something that is true of a sample statistic is also likely to be true of a population parameter.

A Sampling Distribution We are moving from descriptive statistics to inferential statistics. Inferential statistics allow the researcher to come to conclusions about a population on the basis of descriptive statistics about a sample.

A Sampling Distribution For example: Your sample says that a candidate gets support from 47%. Inferential statistics allow you to say that the candidate gets support from 47% of the population with a margin of error of +/- 4%. This means that the support in the population is likely somewhere between 43% and 51%.

A Sampling Distribution Margin of error is taken directly from a sampling distribution. It looks like this: 95% of Possible Sample Means 47% 43% 51% Your Sample Mean

A Sampling Distribution Let’s create a sampling distribution of means… Take a sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K

A Sampling Distribution Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K

A Sampling Distribution Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K

A Sampling Distribution Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K

A Sampling Distribution Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K

A Sampling Distribution Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K

A Sampling Distribution Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. $30K

A Sampling Distribution Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. $30K

A Sampling Distribution Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. $30K

A Sampling Distribution Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. The sample means would stack up in a normal curve. A normal sampling distribution. $30K

A Sampling Distribution Say that the standard deviation of this distribution is $10K. Think back to the empirical rule. What are the odds you would get a sample mean that is more than $20K off. The sample means would stack up in a normal curve. A normal sampling distribution. $30K -3z -2z -1z 0z 1z 2z 3z

A Sampling Distribution Say that the standard deviation of this distribution is $10K. Think back to the empirical rule. What are the odds you would get a sample mean that is more than $20K off. The sample means would stack up in a normal curve. A normal sampling distribution. 2.5% 2.5% $30K -3z -2z -1z 0z 1z 2z 3z

A Sampling Distribution Social Scientists usually get only one chance to sample. Our graphic display indicates that chances are good that the mean of our one sample will not precisely represent the population’s mean. This is called sampling error. If we can determine the variability (standard deviation) of the sampling distribution, we can make estimates of how far off our sample’s mean will be from the population’s mean.

A Sampling Distribution Knowing the likely variability of the sample means when repeatedly sampling gives us a context within which to judge how much we can trust the number we got from our sample. For example, if the variability is low, , we can trust our number more than if the variability is high, .

A Sampling Distribution Which sampling distribution has the lower variability or standard deviation? a b Now a definition! The standard deviation of a normal sampling distribution is called the standard error. The first sampling distribution above, a, has a lower standard error.

A Sampling Distribution Statisticians have found that the standard error of a sampling distribution is quite directly affected by the number of cases in the sample(s), and the variability of the population distribution. Population Variability: For example, Americans’ incomes are quite widely distributed, from $0 to Bill Gates’. Americans’ car values are less widely distributed, from about $50 to about $50K. The standard error of the latter’s sampling distribution will be a lot less variable.

A Sampling Distribution The sample size affects the sampling distribution: Standard error = population standard deviation / square root of sample size Y-bar= /n

A Sampling Distribution Standard error = population standard deviation / square root of sample size Y-bar= /n IF the population income were distributed with mean,  = $30K with standard deviation,  = $10K …the sampling distribution changes for varying sample sizes n = 2,500, Y-bar= $10K/50 = $200 n = 25, Y-bar= $10K/5 = $2,000 $30k

A Sampling Distribution Some rules about the sampling distribution of the mean… For a random sample of size n from a population having mean  and standard deviation , the sampling distribution of Y-bar (glitter-bar?) has mean  and standard error Y-bar = /n The Central Limit Theorem says that for random sampling, as the sample size n grows, the sampling distribution of Y-bar approaches a normal distribution. The sampling distribution will be normal no matter what the population distribution’s shape as long as n > 30. If n < 30, the sampling distribution is likely normal only if the underlying population’s distribution is normal. As n increases, the standard error (remember that this word means standard deviation of the sampling distribution) gets smaller. Precision provided by any given sample increases as sample size n increases.

A Sampling Distribution So we know in advance of ever collecting a sample, that if sample size is sufficiently large: Repeated samples would pile up in a normal distribution The sample means will center on the true population mean The standard error will be a function of the population variability and sample size The larger the sample size, the more precise, or efficient, a particular sample is 95% of all sample means will fall between +/- 2 s.e. from the population mean

A Sampling Distribution So why are sampling distributions less variable when sample size is larger? Example 1: Think about what kind of variability you would get if you collected income through repeated samples of size 1 each. Contrast that with the variability you would get if you collected income through repeated samples of size N – 1 (or 300 million minus one) each.

A Sampling Distribution So why are sampling distributions less variable when sample size is larger? Example 1: Think about what kind of variability you would get if you collected income through repeated samples of size 1 each. Contrast that with the variability you would get if you collected income through repeated samples of size N – 1 (or 300 million minus one) each. Example 2: Think about drawing the population distribution and playing “darts” where the mean is the bull’s-eye. Record each one of your attempts. Contrast that with playing “darts” but doing it in rounds of 30 and recording the average of each round. What kind of variability will you see in the first versus the second way of recording your scores. …Now, do you trust larger samples to be more accurate?

A Sampling Distribution An Example: A population’s car values are  = $12K with  = $4K. Which sampling distribution is for sample size 625 and which is for 2500? What are their s.e.’s? 95% of M’s 95% of M’s ? $12K ? ? $12K ? -3 -2 -1 0 1 2 3 -3-2-1 0 1 2 3

A Sampling Distribution An Example: A population’s car values are  = $12K with  = $4K. Which sampling distribution is for sample size 625 and which is for 2500? What are their s.e.’s? s.e. = $4K/25 = $160 s.e. = $4K/50 = $80 (625 = 25) (2500 = 50) 95% of M’s 95% of M’s $11,840 $12K $12,320 $11,920$12K $12,160 -3 -2 -1 0 1 2 3 -3-2-1 0 1 2 3

A Sampling Distribution A population’s car values are  = $12K with  = $4K. Which sampling distribution is for sample size 625 and which is for 2500? Which sample will be more precise? If you get a particularly bad sample, which sample size will help you be sure that you are closer to the true mean? 95% of M’s 95% of M’s $11,840 $12K $12,320 $11,920$12K $12,160 -3 -2 -1 0 1 2 3 -3-2-1 0 1 2 3

Estimation Before we collect our sample, we know: Repeated sampling sample means would stack up in a normal curve, Centered on the true population mean, With a standard error (measure of dispersion) that depends on 1. population standard deviation 2. sample size  -3z -2z -1z 0z 1z 2z 3z

Estimation But we do not know: 1. True Population Mean 2. Population Standard Deviation Repeated sampling sample means would stack up in a normal curve, Centered on the true population mean, With a standard error (measure of dispersion) that depends on 1. population standard deviation 2. sample size  -3z -2z -1z 0z 1z 2z 3z

Estimation Will our sample be one of these (accurate)? Or one of these (inaccurate)?  -3z -2z -1z 0z 1z 2z 3z

Estimation Which is more likely? accurate? or inaccurate?  -3z -2z -1z 0z 1z 2z 3z 68% 95%

Estimation We’re most likely to get close to the true population mean… Our sample’s mean is the best guess of the population mean, but it is not precise.  -3z -2z -1z 0z 1z 2z 3z 68% 95%

Estimation And if we increase our sample size (n)…  -3z -2z -1z 0z 1z 2z 3z 68% 95%

Estimation And if we increase our sample size our sample mean is an even better estimate of the population mean, we are more precise!  -3 -2 -1 0 1 2 3 -3z -2z -1z 0z 1z 2z 3z 68% 95%

Estimation We know that the standard deviation of this pile of samples (standard error) equals the population standard deviation () divided by the square root of the sample size (n).  -3 -2 -1 0 1 2 3 68% 95%

Estimation But we do not know the population standard deviation! What is our best guess of that?  -3 -2 -1 0 1 2 3 68% 95%

Estimation Our best guess of the population standard deviation is our sample’s s.d.! On average, this s.d. gives population . In fact, when we calculate that, we use “n – 1” to make our “estimate” larger to reflect that dispersion of a sample is smaller than a population’s.  (Yi – Y)2 s.d. = n - 1 = Cases in the sample 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 Population Dispersion Sample Dispersion

Estimation So now we know that we can use the sample standard deviation to stand in for the population’s standard deviation. So we can use run the formula for standard error with that s.d. estimate and get a good estimate s.e = of the dispersion of the  n sampling distribution.  -3 -2 -1 0 1 2 3 68% 95%

Estimation Now we know how far off our sample mean is likely to be from the true population mean! 68% of means will be within +/- 1 s.e. 95% of means will be within +/- 2 s.e. s.d. s.e. =  n  -3 -2 -1 0 1 2 3 68% 95%

Estimation For example, if we took GPAs from a sample of 625 students and our s.d. was .50… 68% of means will be within +/- 1(.02) 95% of means will be within +/- 2 (.02) .5 s.e. =  625 = 0.02  -3 -2 -1 0 1 2 3 68% 95%

Estimation For example, if we took GPAs from a sample of 625 students and our s.d. was .50… If our sample were this one( ), our estimate of the mean would be right! .5 s.e. =  625 = 0.02  -3 -2 -1 0 1 2 3 68% 95%

Estimation For example, if we took GPAs from a sample of 625 students and our s.d. was .50… But what if it were this one( )? We’d be slightly wrong, but well within +/- 2 (.02) 95% of samples would be! .5 s.e. =  625 = 0.02  -3 -2 -1 0 1 2 3 68% 95%

Estimation A sample’s mean is the best estimate of the population mean. But what if we base our estimate on this erroneous sample? But what if it were this one( )? We’d be slightly wrong, but well within +/- 2 s.d. 95% of samples would be! s.d. s.e. =  n  -3 -2 -1 0 1 2 3 68% 95%

Estimation Well, let’s center our sampling distribution on our sample’s mean( ). Check it Out! The true mean falls within the 95% bracket. s.d. s.e. =  n  -3 -2 -1 0 1 2 3 68% 95%

Estimation What if the sample we collected were this one ( )? Check it Out! The true mean falls within the 95% bracket. s.d. s.e. =  n  -3 -2 -1 0 1 2 3 68% 95%

Estimation The sampling distribution allows us to: 1. Be humble and admit that our sample statistic may not be the population’s and 2. Forms a measuring device with which we can determine a range where the true population mean is likely to fall... this is called a confidence interval.

Estimation If you calculate your sampling distribution’s standard error, you can form a device that tells you that if your sample mean is wrong, there is a documented a range in which the true population mean is likely 2Xist. Check it Out! The true mean falls within the 95% bracket. s.d. s.e. =  n  -3 -2 -1 0 1 2 3 68% 95%

Estimation For example, if we took GPAs from a sample of 625 students and our mean was 2.5 and s.d. was .50… We make a confidence interval (C.I.)by… Calculating the s.e. (.02) and Going +/- 2 s.e. from the mean. .5 s.e. =  625 = 0.02 =2.52 -3 -2 -1 0 1 2 3 68% 95% 95% C.I. = 2.5 +/- 2(.02) = 2.46 to 2.54 We are 95% confident that the true mean is in this range!

Estimation Guys… This is power! Knowing that the spread of 95% of normally distributed sample means has outer limits… We know that if we put these limits around our sample mean… We have defined the range where the true mean has a 95% probability of being!

Estimation Our sample statistics provide enough information to give us a great estimation (highly educated guess) about population statistics. We do this without needing to know the population mean—without needing to have a census.

Estimation Another Example: Sample of 2,500 with an average income of $28,000 with a standard deviation of $8,000. Provide a 95% C.I. = M +/- 2 (s.e.) s.e. = $8,000/2,500 = $160 2 * $160 = $320 C.I. = $28,000 +/- $320 C.I. >>> $27,680 to $28,320

Estimation Another Example: Sample of 2,500 with an average income of $28,000 with a standard deviation of $8,000. Provide a 95% C.I. = M +/- 2 (s.e.) s.e. = $8,000/2,500 = $160 2 * $160 = $320 C.I. = $28,000 +/- $320 C.I. >>> $27,680 to $28,320 We are 95% confident that the true mean falls from $27,680 up to $28,320

Estimation Another Example: Sample of 2,500 with an average income of $28,000 with a standard deviation of $8,000. What if we want a 99% confidence interval?

Estimation Oooh! Technically speaking, on a normal curve, 95% of cases fall between +/- 1.96 standard deviations rather than 2. 99% fall between +/- 2.58 z’s

Empirical Rule vs. Actuality Estimation Empirical Rule vs. Actuality 68% 1z 0.99z 95% 2z 1.96z 99.9973% 3z (all) 3z

Estimation Another Example: Sample of 2,500 with an average income of $28,000 with a standard deviation of $8,000. What if we want a 99% confidence interval? 99% C.I. = $28,000 +/- 2.58 * $320 $27,174.4 to $28,825.6 We are 99% confident the population mean falls between these values. Why did the interval get wider???