Sampling Distributions

Sampling Distributions

RELEVANCE To see how sampling can be used to predict population values.

Need for samples and their means
The census is only done once every 10 years because it is impractical to do it often. Therefore, the sample becomes very important.

The Sampling Issue…… The goal of the survey is to get the same results that would be obtained if all had answered from the entire population. It is important that every member of the population has an equal chance of being chosen. We want to estimate how the other people who were not surveyed would respond to the survey. So we use the sample to predict the population. In high level statistics, we never assume we know the population mean or population variance, because statisticians know that this will never the be case.

Learning Objectives Determine the sampling distributions of:
Means. Proportions. Explain the Central Limit Theorem. Determine the effect on the sampling distribution when the samples are relatively large compared to the population from which they are drawn. © 2002 The Wadsworth Group

Sampling Distribution of the Mean
When the population is normally distributed Shape: Regardless of sample size, the distribution of sample means will be normally distributed. Center: The mean of the distribution of sample means is the mean of the population. Sample size does not affect the center of the distribution. Spread: The standard deviation of the distribution of sample means, or the standard error, is . n x s = © 2002 The Wadsworth Group

Sample Mean vs. Population Mean……
Not every sample mean will be the same as the population mean, but if you take good samples the means will be very close.

What is a Sampling Distribution?
Sampling Distribution – the distribution of values for a sample obtained from repeated samples, all of the same size and all drawn from the same population.

Example…… Consider the following set: {0,2,4,6,8}.
a. Make a list of all possible samples of size 2 that can be drawn from this set. b. Construct a sampling distribution of the sample means for samples of size c. Graph the histogram of the population and sampling distribution. What do you notice?

a. {0,2,4,6,8} Sets of 2 (0,0) (2,0) (4,0) (6,0) (8,0) (0,2) (2,2) (4,2) (6,2) (8,2) (0,4) (2,4) (4,4) (6,4) (8,4) (0,6) (2,6) (4,6) (6,6) (8,6) (0,8) (2,8) (4,8) (6,8) (8,8)

b. 1st find the means for each sample……
(0,0) 0 (2,0) 1 (4,0) 2 (6,0) 3 (8,0) 4 (0,2) 1 (2,2) 2 (4,2) 3 (6,2) 4 (8,2) 5 (0,4) 2 (2,4) 3 (4,4) 4 (6,4) 5 (8,4) 6 (0,6) 3 (2,6) 4 (4,6) 5 (6,6) 6 (8,6) 7 (0,8) 4 (2,8) 5 (4,8) 6 (6,8) 7 (8,8) 8

Sample Space Notice that each of these sample means is equally likely to occur. Therefore, the probability of each is 1/25 = 0.04.

The sampling distribution of the sample means (SDSM)
x P(x) 1/25 1 2/25 2 3/25 3 4/25 4 5/25 5 6 7 8 Notice it is NORMAL!

Example – You Try…… Let’s say I picked out all the grades for the last quiz that were either 57, 67, 77, 87, or 97 and put them in a pile. Find every possible combination of quiz grades I could get if I picked 2 quizzes from this pile. NOTE: There will be 25 possible combinations. So out of my population of every person’s quiz grade, I took a sample of just those who got a 67, 77, 87, 97.

Now lets find the mean for each pair
(57, 57) (67, 57) (77, 57) (87, 57) (97, 57) (57, 67) (67, 67) (77, 67) (87, 67) (97, 67) (57, 77) (67, 77) (77, 77) (87, 77) (97, 77) (57, 87) (67, 87) (77, 87) (87, 87) (97, 87) (57, 97) (67, 97) (77, 97) (87, 97) (97, 97)

There are 25 possible combinations
(57, 57) (67, 57) (77, 57) (87, 57) (97, 57) (57, 67) (67, 67) (77, 67) (87, 67) (97, 67) (57, 77) (67, 77) (77, 77) (87, 77) (97, 77) (57, 87) (67, 87) (77, 87) (87, 87) (97, 87) (57, 97) (67, 97) (77, 97) (87, 97) (97, 97) Each of these samples are equally as likely, so what is the probability of each sample mean?

Each has a probability of 1/25 chance of selection.
Let’s make a chart.

Chart and Graph x P(x) 57 1/25 = 0.04 62 2/25 = 0.08 67 3/25 = 0.12 72 4/25 = 0.16 77 5/25 = 0.20 82 87 92 97 Hmmmmm…….. That graph looks familiar….. What does it look like?

Sampling Distribution of Sample Means - SDSM

Sampling Distribution of Sample Means
If all possible random samples, each of size n, are taken from any population with mean and st. deviation , then the SDSM will: Have a sampling distribution mean equal to the population mean. Have a sampling distribution standard deviation (called Standard Error) equal to the population st. dev. divided by the square root of the sample size.

Important Error does not mean there’s a mistake. It means there is a gap between the population and sample results Variability in a population of individuals is measured in standard deviations But Sample means vary because you’re not sampling the whole population, only a subset and, as samples vary, so will their means. Variability in the sample mean is measured in terms of standard errors The first component of standard error is the sample size, n. Standard error decreases as sample size increases. It makes sense that having more data gives less variation The second component of standard error involves the amount of diversity in the population. If the population standard deviation increases, the standard error of the sample means also increases BOTTOM LINE: Estimating the population average is harder when the population varies a lot to begin with (easier if more consistent). Sooooo » standard error of the sample mean is also larger when the population standard deviation is larger

Standard error and sample size

RECAP:The Standard Error of the Mean……
The symbol used to represent the standard deviation of the samples, also known as the standard error of the mean, is

The SDSM follows these rules…….
This measures the spread. (Note: “n” is the size of each sample) a. A normal parent population produces a normal sampling distribution. b. Use the CLT when the sample size is large enough to make a sampling distribution normal when the parent population is NOT normal. (we’ll see this later)

Let’s show how this works using an example…..
Consider all possibilities of sample size 2 of {2,4,6}. Find the probability distribution of the population with the histogram and then find the sampling distribution of the sample means and draw the histogram.

Probability Distribution of Parent & Histogram……
x P(x) 2 1/3 4 6

Now, let’s do a sampling distribution of sets of 2 from this population we just described.

The sets of 2 and their means……
(2,2) 2 (4,2) 3 (6,2) 4 (2,4) 3 (4,4) 4 (6,4) 5 (2,6) 4 (4,6) 5 (6,6) 6

Sampling Distribution……
x P(x) 2 1/9 3 2/9 4 3/9 5 6 Find the mean of the sampling distribution: Find the st. dev. of the sampling dist (i.e the standard error):

The Histogram…… Now, take a look at the shape of the histogram of the sampling distribution. It is approximately normal.

Properties of SDSM(Standard Error) – Center, Shape, Spread

Sample Question A certain population has a mean of 437 and a standard deviation of 63. Many samples of size 49 are randomly selected and the means are calculated. A. What value would you expect to find for the mean of all these samples? B. What value would you expect to find for the st. deviation (standard error) of all these samples? C. What shape would you expect the distribution of all these sample means to have?

Why is Sample Size Important?
If What happens as the sample size increases? Answer: As the sample size increases, the standard deviation of the sample decreases. This means that the variation is decreasing. Remember, less variation is better. Larger sample size- smaller variation Smaller sample size- larger variation

The shape of the distribution……
If the population has a normal distribution, then the sampling distribution of the sample means will also be normal. If the population is NOT a normal distribution, then we use the Central Limit Theorem to make the sampling distribution approximately normal.

Central Limit Theorem According to the Central Limit Theorem (CLT), the larger the sample size, the more normal the distribution of sample means becomes. The CLT is central to the concept of statistical inference because it permits us to draw conclusions about the population based strictly on sample data without having knowledge about the distribution of the underlying (total) population. © 2002 The Wadsworth Group

The CLT…… Definition – The SDSM will more closely resemble the normal distribution as the sample size increases. The CLT can be used to answer questions about sample means in the same manner that the normal distribution can be used to answer questions about individual values. **The CLT is used when the sampled population is NOT normal. The sampling distribution will be approximately normal under the right conditions.

Visualizing the Central Limit Theorem Using Dice
Sec. 5.3 Visualizing the Central Limit Theorem Using Dice Suppose we roll one die 1,000 times and record the outcome of each roll, which can be the number 1, 2, 3, 4, 5, or 6.

Sec. 5.3 Visualizing the Central Limit Theorem Using Dice Now suppose we roll two dice 1,000 times and record the mean of the two numbers that appear on each roll. To find the mean for a single roll, we add the two numbers and divide by 2.

Sec. 5.3 Visualizing the Central Limit Theorem Using Dice Suppose we roll five dice 1,000 times and record the mean of the five numbers on each roll.

Sec. 5.3 Visualizing the Central Limit Theorem Using Dice Now we will further increase the number of dice to ten on each of 1,000 rolls.

Sec. 5.3 Visualizing the Central Limit Theorem Using Dice What do you notice about the shape of the distribution as the sample size increases? It approximates a normal distribution What do you notice about the mean of the distribution of sample means as the sample size increases in comparison to the true mean of the population (3.5)? It approaches the population mean What do you notice about the standard deviation of the distribution of means as the sample size increases? It gets smaller representing a lower variation

The Central Limit Theorem https://www.youtube.com/watch?v=Mjy0AbJ5rJw
Sec. 5.3 The Central Limit Theorem The distribution of means will be approximately a normal distribution for larger sample sizes The mean of the distribution of means approaches the population mean, μ, for large sample sizes The standard deviation of the distribution of means approaches for large sample sizes, where σ is the standard deviation of the population and n is the sample size σ/ n

The Central Limit Theorem Side Notes
Sec. 5.3 The Central Limit Theorem Side Notes For practical purposes, the distribution of means will be nearly normal if the sample size is larger than 30 If the original population is normally distributed, then the sample means will remain normally distributed for any sample size n, and it will become narrower The original variable can have any distribution, it does not have to be a normal distribution

Shapes of Distributions as Sample Size Increases
Sec. 5.3 Shapes of Distributions as Sample Size Increases

Example 1 ~ Predicting Test Scores
Sec. 5.3 Example 1 ~ Predicting Test Scores You are a middle school principal and your 100 eighth-graders are about to take a national standardized test. The test is designed so that the mean score is μ = 400 with a standard deviation of σ = 70. Assume the scores are normally distributed. a. What is the likelihood that one of your eighth-graders, selected at random, will score below 375 on the exam? Since the distribution is normal, we can just use z-scores to determine the percentage for one student According to the table, a z-score of corresponds to about 36% which means that about 36% of all students can be expected to score below 375, thus there is a 36% chance that a randomly selected student will score below 375

Example 1 ~ Predicting Test Scores
Sec. 5.3 Example 1 ~ Predicting Test Scores You are a middle school principal and your 100 eighth-graders are about to take a national standardized test. The test is designed so that the mean score is μ = 400 with a standard deviation of σ = 70. Assume the scores are normally distributed. b. Your performance as a principal depends on how well your entire group of eighth-graders scores on the exam. What is the likelihood that your group of 100 eighth-graders will have a mean score below 375? According to the C.L.T. if we take random groups of say 100 students and study their means, then the means distribution will approach normal. Hence, the μ = 400 and its standard error is σ/√n = 70/√100 = 70/10 = 7 according to the C.L.T. Therefore, the z-score for a mean of 375 with a standard deviation of 7 is: The percent that corresponds to a z-score of is less than .01%, which means that fewer than .01% of all samples of 100 students will have a mean score of In other words, 1 in 5000 samples of 100 students will have a mean score of 375.

Importance of approximate probabilities
As we saw in the last example, even if the distribution of a sample is not normal or unknown, and as long as your sample (n) is large enough, you can still use the CLT to find approximate probabilities using the standard normal (Z) distribution formula and converting the z value using the z table. When you use the CLT to find a probability, make sure to say that your answer is an approximation that should be close enough because you gut a large n Beyond the actual calculations, probabilities about sample distributions can also help you to decide whether an assumption or a claim about a population mean is on target, based on your data. The process of checking assumptions or challenging claims about a population is called hypothesis testing (later in the course).

Sampling Distributions

Similar presentations

Presentation on theme: "Sampling Distributions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sampling Distributions

Similar presentations

Presentation on theme: "Sampling Distributions"— Presentation transcript:

Similar presentations

About project

Feedback