Chapter 18: Sampling Distribution Models

Chapter 18: Sampling Distribution Models
AP Statistics

Overview of Chapter We have already discussed samples and descriptive statistics, like sample proportions and sample means. We know that if we take a large enough sample, our results should be close to what we would get if we asked the entire population (as long as sample is random, etc) In this chapter, we look at many samples of to help us do many things—maybe most important of those things is to determine what is statistically significant

Modeling the Distribution of Sample Proportions
Suppose that a poll was conducted in September in which 1000 people were asked if they supported sending more troops to Afghanistan and 45% said yes. A few days later, a different polling organization asked the same question to 1000 people and instead found that 42% said yes. Which one is correct? Should we be surprised with these different results? Why or why not?

What would have to do to answer those questions, is to assume that one of those proportions is “correct” and then imagine what would happen if I looked at the results of many, many different samples of 1000 people. How much would those samples differ? What would the distribution of those who said yes look like?

What we would find out is that the distribution of those many, many samples would be symmetric and unimodal—centering on the true population proportion (or what you are calling the true proportion). From this symmetric and unimodal distribution, we can then model the sample proportions as a normal model—AS LONG AS CERTAIN ASSUMPTIONS AND CONDITIONS ARE SATISFIED!!!!

Once we can establish the use of the normal model, we are then able to find the standard deviation of the distribution and therefore, our model has the parameters

Visual of How A Model of a Sampling Distribution of Proportions is Formed

Summary of Modeling the Distribution of Sample Proportions

Normal Model for the Distribution of the Percent of American Who Believe we Should Send More Troops to Afghanistan

Assumptions and Conditions
We can only use the Normal Model for the Distribution of Sample Proportions IF two assumptions are met: The sampled values must be independent of each other. The sample size, n, must be large enough

Assumptions and Conditions
It is difficult (if not sometimes impossible) to check or satisfy those assumptions. Therefore, we can verify certain conditions that provide information about the assumptions. Those conditions are Randomization Condition 10% Condition Success/Failure Condition

The Three Conditions for using a Normal Model for Sampling Distribution of Proportions
Randomization Condition: The sample should be an SRS (or at least very confident it is not biased) 10% Condition: If the sample has not been made with replacement, the sample size must be no larger than 10% of the population. Success/Failure Condition: The sample size has to be big enough so that both np and nq are at least 10.

Thoughts about Sampling Distribution Models
No longer is a proportion something we just compute, we now see it as a random quantity that has a distribution. These models now can tell us the amount of variation to expect if we sample (and what we shouldn’t expect) Sampling Distributions act as a bridge between the real world of data and an imaginary model. This bridge and the model that results has huge implications in statistics

Example #1 Assume that 30% of all students at a university wear contact lenses. We randomly pick 100 student and want to know the approximate probability that more than one-third of those students wear contacts. (In the process of answering this question, specify the appropriate model, the mean and the standard deviation. Be sure the verify that the conditions are met.)

Modeling Distributions of Sample Means
Below is the distribution of the numbers on the face of a die if 10,000 dice were rolled.

Below are the distributions of rolling 2, 3, 5 and 20 dice and taking the mean of the rolls. What do you notice?

The Distribution of Sample Means (like Sample Proportions) will produce a symmetric and unimodal distribution. As long as a few assumptions/conditions are met, then that distribution can be modeled using the Normal Model. This concept (along with a few other important points) is called the Central Limit Theorem (CLT). Sometimes, because of its importance, it is called the Fundamental Theorem of Statistics.

The Central Limit Theorem
Very simply, the Central Limit Theorem states: The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal Model. The larger the sample size, the better the approximation will be.

The Central Limit Theorem
The sampling distribution of any mean becomes more nearly normal as the sample size grows. The distribution of the population does NOT matter—the distribution of sample means will always approximate the Normal Curve. Need to verify two assumptions: the observations are independent and collected with randomization. We use conditions to help us satisfy those important assumptions.

Conditions for Central Limit Theorem
In order to justify those assumptions, you can check these three conditions: Randomization Condition: Data must be sampled randomly. 10% Condition: If the sample has not been made with replacement, the sample size must be no larger than 10% of the population. This satisfies the Independence Assumption. Large Enough Sample Size: This gets discussed more in chapter 24, but for now just think about how your sample size relates to the population size.

Important Information about Central Limit Theorem
The CLT does NOT talk about the distribution of the data from the sample. It talks about the sample means and sample proportions of many different randeomsamples drawn from the same population

Normal Model for the Distribution of Sample Means
A few things to remember: * Will be centered at the population mean. Means have smaller standard deviations than individuals. The standard deviation of the sample mean falls as the sample size grows. The relationship between the standard deviation of the mean and the sample size can be shown by the formula:

The Normal Model for the Distribution of Sample Means has the parameters

Example #2 Assume that SAT scores are normally distributed with a mean of 500 and a standard deviation of 10. Describe the distribution of sample means if we randomly pick 50 students. Verify that conditions are met. Do you think it would be unreasonable to have a randomly selected group of 50 students who had mean of 550? Justify, using statistics.

Standard Error The Standard Error is what we call our estimation of the standard deviation of a sampling distribution when we don’t know the population proportion or the standard deviation. For sampling distribution of sample proportions: For sampling distribution of sample means:

Sampling Distribution Models (Visual of Logic)

Problems to Look Out For
Don’t confuse the sampling distribution with the distribution of the sample. Beware of observations that are not independent. Watch out for small samples from skewed populations. --Will take large sample sizes to “undo” the skewness and create symmetric sampling distributions.

Chapter 18: Sampling Distribution Models

Similar presentations

Presentation on theme: "Chapter 18: Sampling Distribution Models"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 18: Sampling Distribution Models

Similar presentations

Presentation on theme: "Chapter 18: Sampling Distribution Models"— Presentation transcript:

Similar presentations

About project

Feedback