Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understand that Sampling error is really sample variability

Similar presentations


Presentation on theme: "Understand that Sampling error is really sample variability"— Presentation transcript:

1 Understand that Sampling error is really sample variability
AP Statistics Objectives Ch18 Understand that Sampling error is really sample variability unavoidable and predictable

2 Be able to check all conditions for a sampling distribution
AP Statistics Objectives Ch18 Be able to check all conditions for a sampling distribution Be able to describe the shape, center, and spread of a sampling distribution

3 Describe the shape, center, and spread of sample means
AP Statistics Objectives Ch18 Be able to check the conditions necessary to use the Central Limit Theorem Describe the shape, center, and spread of sample means

4 Vocabulary Sampling Error Sampling distribution model
Sampling distribution model for a proportion Sampling distribution model for a mean Central Limit Theorem Standard error

5 Textbook Notes Classroom Notes Chapter 18 Assignments

6 Chapter 18 Assignment Part I
pp #2&4,10,12&14,21&22 Chapter 18 Assignment Part II pp #26,30,36,42

7 Modeling the Distribution of Sample Proportions
Rather than showing real repeated samples, imagine what would happen if we were to actually draw many samples. Now imagine what would happen if we looked at the sample proportions for these samples. What would the histogram of all the sample proportions look like?

8 Modeling the Distribution of Sample Proportions (cont.)
We would expect the histogram of the sample proportions to center at the true proportion, p, in the population. As far as the shape of the histogram goes, we can simulate a bunch of random samples that we didn’t really draw.

9 Modeling the Distribution of Sample Proportions (cont.)
It turns out that the histogram is unimodal, symmetric, and centered at p. More specifically, it’s an amazing and fortunate fact that a Normal model is just the right one for the histogram of sample proportions. To use a Normal model, we need to specify its mean and standard deviation. The mean of this particular Normal is at p.

10 Modeling the Distribution of Sample Proportions (cont.)
When working with proportions, knowing the mean automatically gives us the standard deviation as well—the standard deviation we will use is So, the distribution of the sample proportions is modeled with a probability model that is

11 Modeling the Distribution of Sample Proportions (cont.)
A picture of what we just discussed is as follows:

12 How Good Is the Normal Model?
The Normal model gets better as a good model for the distribution of sample proportions as the sample size gets bigger. Just how big of a sample do we need? This will soon be revealed…

13 Assumptions and Conditions
Most models are useful only when specific assumptions are true. There are two assumptions in the case of the model for the distribution of sample proportions: The sampled values must be independent of each other. The sample size, n, must be large enough.

14 Assumptions and Conditions (cont.)
Assumptions are hard—often impossible—to check. That’s why we assume them. Still, we need to check whether the assumptions are reasonable by checking conditions that provide information about the assumptions. The corresponding conditions to check before using the Normal to model the distribution of sample proportions are the 10% Condition and the Success/Failure Condition.

15 Assumptions and Conditions (cont.)
10% condition: If sampling has not been made with replacement, then the sample size, n, must be no larger than 10% of the population. Success/failure condition: The sample size has to be big enough so that both and are greater than 10. So, we need a large enough sample that is not too large.

16 A Sampling Distribution Model for a Proportion
A proportion is no longer just a computation from a set of data. It is now a random quantity that has a distribution. This distribution is called the sampling distribution model for proportions. Even though we depend on sampling distribution models, we never actually get to see them. We never actually take repeated samples from the same population and make a histogram. We only imagine or simulate them.

17 A Sampling Distribution Model for a Proportion (cont.)
Still, sampling distribution models are important because they act as a bridge from the real world of data to the imaginary world of the statistic and enable us to say something about the population when all we have is data from the real world.

18 The Sampling Distribution Model for a Proportion (cont.)
Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of is modeled by a Normal model with Mean: Standard deviation:

19 What About Quantitative Data?
Proportions summarize categorical variables. The Normal sampling distribution model looks like it will be very useful. Can we do something similar with quantitative data? We can indeed. Even more remarkable, not only can we use all of the same concepts, but almost the same model.

20 Simulating the Sampling Distribution of a Mean
Like any statistic computed from a random sample, a sample mean also has a sampling distribution. We can use simulation to get a sense as to what the sampling distribution of the sample mean might look like…

21 Means – The “Average” of One Die
Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is:

22 Means – Averaging More Dice
Looking at the average of two dice after a simulation of 10,000 tosses: The average of three dice after a simulation of 10,000 tosses looks like:

23 Means – Averaging Still More Dice
The average of 5 dice after a simulation of 10,000 tosses looks like: The average of 20 dice after a simulation of 10,000 tosses looks like:

24 Means – What the Simulations Show
As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean. So, we see the shape continuing to tighten around 3.5 And, it probably does not shock you that the sampling distribution of a mean becomes Normal.

25 The Fundamental Theorem of Statistics
The sampling distribution of any mean becomes Normal as the sample size grows. All we need is for the observations to be independent and collected with randomization. We don’t even care about the shape of the population distribution! The Fundamental Theorem of Statistics is called the Central Limit Theorem (CLT).

26 The Fundamental Theorem of Statistics (cont.)
The CLT is surprising and a bit weird: Not only does the histogram of the sample means get closer and closer to the Normal model as the sample size grows, but this is true regardless of the shape of the population distribution. The CLT works better (and faster) the closer the population model is to a Normal itself. It also works better for larger samples.

27 The Fundamental Theorem of Statistics (cont.)
The Central Limit Theorem (CLT) The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be.

28 But Which Normal? The CLT says that the sampling distribution of any mean or proportion is approximately Normal. But which Normal model? For proportions, the sampling distribution is centered at the population proportion. For means, it’s centered at the population mean. But what about the standard deviations?

29 But Which Normal? (cont.)
The Normal model for the sampling distribution of the mean has a standard deviation equal to where σ is the population standard deviation.

30 But Which Normal? (cont.)
The Normal model for the sampling distribution of the proportion has a standard deviation equal to

31 Assumptions and Conditions
The CLT requires remarkably few assumptions, so there are few conditions to check: Random Sampling Condition: The data values must be sampled randomly or the concept of a sampling distribution makes no sense. Independence Assumption: The sample values must be mutually independent. (When the sample is drawn without replacement, check the 10% condition…) Large Enough Sample Condition: There is no one-size-fits-all rule.

32 Diminishing Returns The standard deviation of the sampling distribution declines only with the square root of the sample size. While we’d always like a larger sample, the square root limits how much we can make a sample tell about the population. (This is an example of the Law of Diminishing Returns.)

33 Standard Error Both of the sampling distributions we’ve looked at are Normal. For proportions For means

34 Standard Error (cont.) When we don’t know p or σ, we’re stuck, right?
Nope. We will use sample statistics to estimate these population parameters. Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error.

35 Standard Error (cont.) For a sample proportion, the standard error is
For the sample mean, the standard error is

36 Sampling Distribution Models
Always remember that the statistic itself is a random quantity. We can’t know what our statistic will be because it comes from a random sample. Fortunately, for the mean and proportion, the CLT tells us that we can model their sampling distribution directly with a Normal model.

37 Sampling Distribution Models (cont.)
There are two basic truths about sampling distributions: Sampling distributions arise because samples vary. Each random sample will have different cases and, so, a different value of the statistic. Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for means and proportions.

38 The Process Going Into the Sampling Distribution Model

39 What Can Go Wrong? Don’t confuse the sampling distribution with the distribution of the sample. When you take a sample, you look at the distribution of the values, usually with a histogram, and you may calculate summary statistics. The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples—the one you got and the ones you didn’t get.

40 What Can Go Wrong? (cont.)
Beware of observations that are not independent. The CLT depends crucially on the assumption of independence. You can’t check this with your data—you have to think about how the data were gathered. Watch out for small samples from skewed populations. The more skewed the distribution, the larger the sample size we need for the CLT to work.

41 What have we learned? Sample proportions and means will vary from sample to sample—that’s sampling error (sampling variability). Sampling variability may be unavoidable, but it is also predictable!

42 What have we learned? (cont.)
We’ve learned to describe the behavior of sample proportions when our sample is random and large enough to expect at least 10 successes and failures. We’ve also learned to describe the behavior of sample means (thanks to the CLT!) when our sample is random (and larger if our data come from a population that’s not roughly unimodal and symmetric).

43 Chapter 18 Already Taken 𝒙 𝝁 s 𝝈 𝑝 𝝆 Mean Standard Deviation
Latin Sample Statistic Greek Population or Model Parameter Mean Standard Deviation Proportion 𝒙 𝝁 s 𝝈 𝑝 𝝆 Already Taken

44 Chapter 18 1. Sampling Error – natural variation from sample to sample (aka Sampling Variability)

45 Chapter 18 Sampling Distribution for a Proportion
Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of 𝒑 is modeled by a Normal model with mean 𝜇 𝑝 = 𝒑 And standard deviation 𝑆𝐷 𝑝 = 𝒑𝒒 𝒏 Where 𝑝 is the sample proportion and 𝑝 is the population’s proportion; 𝑛 is the sample size and 𝑞 = 1 – 𝒑

46 CATEGORICAL DATA Chapter 18 mean 𝜇 𝑝 = 𝝆
2. Sampling Distribution for a Proportion - Used with CATEGORICAL DATA Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of 𝒑 is estimated by a Normal model with mean 𝜇 𝑝 = 𝝆 N( 𝝆 , 𝝆𝒒 𝒏 ) And standard deviation 𝑆𝐷 𝑝 = 𝝆𝒒 𝒏 Where 𝑝 is the _______ proportion and sample 𝑝 is the ___________ proportion; population’s 𝑛 is the sample _____ and 𝑞 = ______ size 1 – 𝝆

47 Chapter 18 Sampling Distribution for a Proportion
Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of 𝑝 is modeled by a Normal model with … 3. So the two assumptions to use the Normal model are: independent values and the sample size is large enough. 4. Assumptions are difficult to check so you should always check the following conditions:

48 Chapter 18 4. Assumptions are difficult to check so you should always check the following conditions: Independence Assumption i) Randomization Condition: An SRS or at least a sampling method that is not biased and that is representative of the population ii) 10% Condition: If sampling without replacement, then the sample size, n, must be no larger than 10% of the population. Large Enough Sample Assumption i) Success/Failure Condition: The sample size, n, has to big enough that both np and nq are at least 10.

49 5. M&M’s Investigation Consider a “fun size” pack of plain M&M’s. Discuss the assumptions and conditions needed to estimate the sampling distribution using a ______ model. Normal We can assume the M&M’s are ___________ of each other because, independent each “fun size” pack can be considered ____________ of the entire _________ of M&M’s and representative population the number of M&M’s in a “fun size” pack of M&M’s is certainly ____ ____ ___ of all M&M’s. less than 10%

50 5. M&M’s Investigation We can assume that the “fun size” pack is large enough, if np and nq are both __ _____ __ . at least 10 This requires that we know 1) The number of M&M’s in our “fun size” pack 2) The _________ of a particular _____ of M&M. proportion color

51 We’ll start with blue, but write each of the proportions down.
We need to pick a color. 𝝆 𝟎.𝟐𝟒 𝟎.𝟐𝟎 𝟎.𝟏𝟔 𝟎.𝟏𝟒 𝟎.𝟏𝟑 𝟎.𝟏𝟑 We’ll start with blue, but write each of the proportions down.

52 5. M&M’s Investigation 𝝆 𝟎.𝟐𝟒 𝟎.𝟐𝟎 𝟎.𝟏𝟔 𝟎.𝟏𝟒 𝟎.𝟏𝟑 𝟎.𝟏𝟑 Sample size too small 𝟒.𝟑 𝟑.𝟔 𝟐.𝟖 𝟐.𝟑 n𝝆 𝟐.𝟓 𝟐.𝟑 n𝒒 𝟏𝟑.𝟕 𝟏𝟒.𝟒 𝟏𝟓.𝟐 𝟏𝟓.𝟓 𝟏𝟓.𝟕 𝟏𝟓.𝟕 𝟏𝟖 too small for all but blue & orange 𝟏𝟏.𝟒 𝟗.𝟏 n𝝆 𝟏𝟑.𝟕 𝟖.𝟎 𝟕.𝟒 𝟕.𝟒 𝟓𝟕 n𝒒 𝟒𝟑.𝟑 𝟒𝟓.𝟔 𝟒𝟕.𝟗 𝟒𝟗.𝟎 𝟒𝟗.𝟔 𝟒𝟗.𝟔 Sample size large enough for all colors 𝟖𝟓.𝟖 𝟔𝟖.𝟔 𝟔𝟎.𝟏 𝟓𝟓.𝟖 𝟓𝟓.𝟖 n𝝆 𝟏𝟎𝟑 𝟑𝟕𝟑.𝟐 𝟒𝟐𝟗 n𝒒 𝟑𝟐𝟔 𝟑𝟒𝟑.𝟐 𝟑𝟔𝟎.𝟒 𝟑𝟔𝟖.𝟗 𝟑𝟕𝟑.𝟐

53 6. Since the conditions are met for an “Individual Size” Bag of plain M&M’s, let’s use that bag as our sample. ---NOTE: We can assume the M&M’s are independent of each other because, the conditions that worked for a “fun size” pack still apply for a “individual size” bag. We can now use a Normal model to estimate our sampling distribution of the proportion of blue , but which Normal model? Mean: 𝜇 𝜌 = Standard Deviation: 𝑆𝐷 𝜌 = 𝛒 = 0.24 𝝆𝒒 𝒏 = .𝟐𝟒 (.𝟕𝟔) 𝟓𝟕 = 0.057 𝑁 𝟎.𝟐𝟒 ,𝟎.𝟎𝟓𝟕

54 z = 𝒑 − 𝝁 𝝈 = − = P( z ) ≈ < 𝑝 = 𝟏𝟎 𝟓𝟕 ≈0.175 0.175 0.24 -1.14
a. What is the probability that the number of blue M&M’s in an “individual size” bag of 57 M&M’s will be 10 or fewer? 𝑝 = 𝟏𝟎 𝟓𝟕 ≈0.175 z = 𝒑 − 𝝁 𝝈 = 0.175 0.24 − = -1.14 0.057 P( z ) ≈ < -1.14 0.127 𝑁 𝟎.𝟐𝟒 ,𝟎.𝟎𝟓𝟕

55 z = 𝒑 − 𝝁 𝝈 = − = P( z ) ≈ > 𝑝 = 𝟐𝟎 𝟓𝟕 ≈0.351 0.351 0.24 1.95 0.057
b. What is the probability that the number of blue M&M’s in an “individual size” bag of 57 M&M’s will be 20 or larger? 𝑝 = 𝟐𝟎 𝟓𝟕 ≈0.351 z = 𝒑 − 𝝁 𝝈 = 0.351 0.24 − = 1.95 0.057 P( z ) ≈ > 1.95 0.026 𝑁 𝟎.𝟐𝟒 ,𝟎.𝟎𝟓𝟕

56 I’ll assign a different color
7. Since the conditions are met for a “Medium Bag” of plain M&M’s, let’s use that bag as our sample. We can now use a Normal model to estimate our sampling distribution for the proportion of _________________ , but which Normal model? On your own. I’ll assign a different color to each group.

57 Chapter 18 Fine. Let’s draw the distribution for N( , ) 68% 95% 99.7%

58 Chapter 18 CATEGORICAL DATA Sampling Distribution for a Proportion
- Used with CATEGORICAL DATA Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of 𝑝 is modeled by a Normal model with mean 𝜇 𝑝 = 𝑝 And standard deviation 𝑆𝐷 𝑝 = 𝑝𝑞 𝑛 Where 𝑝 is the sample proportion and 𝑝 is the population’s proportion; 𝑛 is the sample size and 𝑞 = 1 – 𝑝

59 Sampling Distribution of the Sample Proportion 𝑝
1. Suppose that 70% of all dialysis patients will survive for at least 5 years. If 100 new dialysis patients are selected at random, what is the probability that the proportion surviving for at least 5 years will exceed 80%?

60 Sampling Distribution of the Sample Proportion 𝑝
1. Suppose that 70% of all dialysis patients will survive for at least 5 years. If 100 new dialysis patients are selected at random, what is the probability that the proportion surviving for at least 5 years will exceed 80%? Independence is reasonable, because the problem states that 100 dialysis patients are selected at random and 100 patients would be less than 10% of all dialysis patients. (10% Condition) Success/Failure Condition- np = 100(.70) = 70 > 10 nq = 100(.30) = 30 > 10 so the sample size is reasonably large enough.

61 Sampling Distribution of the Sample Proportion 𝑝
1. Suppose that 70% of all dialysis patients will survive for at least 5 years. If 100 new dialysis patients are selected at random, what is the probability that the proportion surviving for at least 5 years will exceed 80%? Since the conditions are satisfied, we can model the sampling distribution of 𝒑 with a Normal model with 𝝁( 𝒑 )=𝒑= .𝟕𝟎 SD( 𝒑 )= 𝒑𝒒 𝒏 = .𝟕 (.𝟑) 𝟏𝟎𝟎 ≈𝟎.𝟎𝟒𝟓𝟖 z = 𝑥 − 𝜇 𝜎 z = 0.8 − =

62 Sampling Distribution of the Sample Proportion 𝑝
1. Suppose that 70% of all dialysis patients will survive for at least 5 years. If 100 new dialysis patients are selected at random, what is the probability that the proportion surviving for at least 5 years will exceed 80%? z = 2.183 P( 𝑝 > .80) = P(z > 2.183) = 0.015 There is about a 1.5% chance that the proportion surviving for at least 5 years will exceed 80% of our sample.

63 Sampling Distribution of the Sample Proportion 𝑝
2. It is estimated that 48% of all motorists use their seat belts. If a police officer observes 400 cars go by in an hour, what is the probability that the proportion of drivers wearing seat belts is between 45% and 55%?

64 Sampling Distribution of the Sample Proportion 𝑝
2. It is estimated that 48% of all motorists use their seat belts. If a police officer observes 400 cars go by in an hour, what is the probability that the proportion of drivers wearing seat belts is between 45% and 55%? Independence is reasonable, because the 400 cars seen on the highway could be considered representative of all cars on the highway and 400 cars would be less than 10% of all cars on the highways. (10% condition) Success/Failure Condition- np = 400(.48) = 192 > 10 nq = 400(.52) = 208 > 10 So the sample size is reasonably large enough.

65 Sampling Distribution of the Sample Proportion 𝑝
2. It is estimated that 48% of all motorists use their seat belts. If a police officer observes 400 cars go by in an hour, what is the probability that the proportion of drivers wearing seat belts is between 45% and 55%? Since the conditions are satisfied, we can model the sampling distribution of 𝒑 with a Normal model with 𝝁( 𝒑 )=𝒑= .𝟒𝟖 SD( 𝒑 )= 𝒑𝒒 𝒏 = .𝟒𝟖 (.𝟓𝟐) 𝟒𝟎𝟎 ≈𝟎.𝟎𝟐𝟓𝟎 z = − = -1.2 z = 𝑥 − 𝜇 𝜎 z = − = 2.8

66 Sampling Distribution of the Sample Proportion 𝑝
2. It is estimated that 48% of all motorists use their seat belts. If a police officer observes 400 cars go by in an hour, what is the probability that the proportion of drivers wearing seat belts is between 45% and 55%? z = -1.2 z = 2.8 P(.45< 𝑝 < .55) = P(-1.2<z<2.8)= 0.882 There is about a 88.2% chance that the proportion of belted motorist the officer see will be between 45% and 55% of the 400 he sees.

67 Sampling Distribution of the Sample Proportion 𝑝
3. In a large class of introductory Statistics students, the professor has each person toss a coin 16 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions.

68 Sampling Distribution of the Sample Proportion 𝑝
3. In a large class of introductory Statistics students, the professor has each person toss a coin 16 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions. a) What shape would you expect this histogram to be? Why?

69 Sampling Distribution of the Sample Proportion 𝑝
3. In a large class of introductory Statistics students, the professor has each person toss a coin 16 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions. b) Where do you expect the histogram to be centered?

70 Sampling Distribution of the Sample Proportion 𝑝
3. In a large class of introductory Statistics students, the professor has each person toss a coin 16 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions. c) How much variability would you expect among these proportions?

71 Sampling Distribution of the Sample Proportion 𝑝
3. In a large class of introductory Statistics students, the professor has each person toss a coin 16 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions. d) Explain why a Normal model should not be used here.

72 Sampling Distribution of the Sample Proportion 𝑝
3. In a large class of introductory Statistics students, the professor has each person toss a coin 16 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions. e) The students try again, but toss the coins 25 times each. Use the Rule to describe the sampling distribution model.

73 Sampling Distribution of the Sample Proportion 𝑝
3. In a large class of introductory Statistics students, the professor has each person toss a coin 16 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions. f) Confirm that you can use a Normal model here.

74 Sampling Distribution of the Sample Proportion 𝑝
3. In a large class of introductory Statistics students, the professor has each person toss a coin 16 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions. g) They increase the number of tosses to 64 each. Draw and label the appropriate sampling distribution model. Check the appropriate conditions to justify your model.

75 Sampling Distribution of the Sample Proportion 𝑝
3. In a large class of introductory Statistics students, the professor has each person toss a coin 16 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions. g) They increase the number of tosses to 64 each. Draw and label the appropriate sampling distribution model. Check the appropriate conditions to justify your model.

76 Sampling Distribution of the Sample Proportion 𝑝
3. In a large class of introductory Statistics students, the professor has each person toss a coin 16 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions. h) Explain how the sampling distribution model changes as the number of tosses increases.

77 Chapter 18 1. Central Limit Theorem (CLT) -The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation.

78 2. Central Limit Theorem Assumptions
Independence Assumption i) Randomization Condition: An SRS or at least a sampling method that is not biased and that is representative of the population ii) 10% Condition: If sampling without replacement, then the sample size, n, must be no larger than 10% of the population.

79 2. Central Limit Theorem Assumptions
Independence Assumption i) Randomization Condition: An SRS or at least a sampling method that is not biased and that is representative of the population ii) 10% Condition: If sampling without replacement, then the sample size, n, must be no larger than 10% of the population. Large Enough Sample Assumption i) Depends on distribution of population a) If it is already unimodal & symmetric, then a small sample size is fine b) If it has a Strong skew, then a larger sample size is needed… > 40

80 Chapter 18 Central Limit Theorem Assumptions 1. Must be independent samples ---Check 10% Condition ---- Must be collected at random 2. Must be large enough ---Depends on distribution of population ----Already unimodal & symmetric – ---- small # in sample is fine ----Strong skew ---- larger # in sample need: > 30

81 3. Sampling Distribution for a Mean
- Used with Quantitative Data Provided that the sampled values are drawn at random from a population of mean 𝝁 and standard deviation 𝝈, the sample distribution will have Mean 𝒙 = 𝝁 𝒙 and standard deviation 𝑺𝑫 𝒙 = 𝝈 𝒙 = 𝝈 𝒏 The shape of the sample distribution will be approximately normal as long as the sample size is large enough. The larger the sample size is, the more closely the Normal model approximates the sample distribution.

82 Quantitative Data Chapter 18 3. Sampling Distribution for a Mean
- Used with Quantitative Data Provided that the sampled values are drawn at random from a population of mean 𝜇 and standard deviation 𝜎 , the sample distribution will have Mean 𝑥 =𝜇 And standard deviation 𝑆𝐷 𝑥 = 𝜎 𝑥 = 𝜎 𝑛 The shape of the sample distribution will be approximately normal as long as the sample size is large enough. The larger the sample size is, the more closely the Normal model approximates the sample distribution.

83 Central Limit Thm Let’s look at the age of our pennies.

84 Chapter 18 4. Standard Error –
Both sampling distributions request knowledge of the population as a whole. But sometimes we only have information on the sample itself. For portions we may not know p, only 𝒑 𝑺𝑬 𝒑 = 𝒑 𝒒 𝒏 For means we may not know 𝝈, only s 𝑺𝑬 𝒙 = 𝒔 𝒏 Use what you have to estimate. Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error.

85 Chapter 18 4. Standard Error –
Both sampling distributions request knowledge of the population as a whole. But sometimes we only have information on the sample itself. For portions we may not know p, only 𝒑 For means we may not know 𝝈, only s Use what you have to estimate. Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error. 𝑺𝑬 𝒑 = 𝒑 𝒒 𝒏 𝑺𝑬 𝒙 = 𝒔 𝒏

86 1998 Question 1 5. Consider the sampling distribution of a sample mean obtained by random sampling from a infinite population. This population has a distribution that is highly skewed toward the larger values. (a) How is the mean of the sampling distribution related to the mean of the population?

87 1998 Question 1 5. Consider the sampling distribution of a sample mean obtained by random sampling from a infinite population. This population has a distribution that is highly skewed toward the larger values. How is the standard deviation of the sampling distribution related to the standard deviation of the population?

88 1998 Question 1 5. Consider the sampling distribution of a sample mean obtained by random sampling from a infinite population. This population has a distribution that is highly skewed toward the larger values. How is the shape of the sampling distribution affected by the sample size?

89 Page 430 #25 6. Assume that the duration of human pregnancies can be described by a Normal model with mean 266 days and standard deviation 16 days. a) What percentage of pregnancies should last between 270 and 280 days?

90 Page 430 #25 6. Assume that the duration of human pregnancies can be described by a Normal model with mean 266 days and standard deviation 16 days. b) At least how many days should the longest 25% of all pregnancies last?

91 Page 430 #25 6. Assume that the duration of human pregnancies can be described by a Normal model with mean 266 days and standard deviation 16 days. c) Suppose a certain obstetrician is currently providing prenatal care to 60 pregnant women. Let 𝑦 represent the mean length of their pregnancies. According to the Central Limit Theorem, what’s the distribution of this sample mean, 𝑦 ? Specify the model, mean, and standard deviation. Independence is reasonable, because the 60 women that are being treated can be considered representative of all pregnant women and the 60 women would be less than 10% of all pregnant women. (10% condition) Since the duration of human pregnancies can be described by a Normal model, any sample size is reasonably large enough. CONTINUED

92 Page 430 #25 6. Assume that the duration of human pregnancies can be described by a Normal model with mean 266 days and standard deviation 16 days. c) Suppose a certain obstetrician is currently providing prenatal care to 60 pregnant women. Let 𝑦 represent the mean length of their pregnancies. According to the Central Limit Theorem, what’s the distribution of this sample mean, 𝑦 ? Specify the model, mean, and standard deviation.

93 Page 430 #25 Assume that the duration of human pregnancies can be described by a Normal model with mean 266 days and standard deviation 16 days. d) What’s the probability that the mean duration of these patients’ pregnancies will be less than 260 days? Z = 𝟐𝟔𝟎−𝟐𝟔𝟔 𝟐.𝟎𝟕

94 Page 430 #31 SCORE % of STUDENTS 5 12.5 4 22.5 3 24.8 2 19.8 1 20.4
7. The College Board reported the score distribution shown in the table for all students who took the 2004 AP Statistics exam. SCORE % of STUDENTS 5 12.5 4 22.5 3 24.8 2 19.8 1 20.4 a) Find the mean and standard deviation of the scores.

95 Page 430 #31 SCORE % of STUDENTS 5 12.5 4 22.5 3 24.8 2 19.8 1 20.4
7. The College Board reported the score distribution shown in the table for all students who took the 2004 AP Statistics exam. SCORE % of STUDENTS 5 12.5 4 22.5 3 24.8 2 19.8 1 20.4 b) If we select a random sample of 40 AP Statistics students, would you expect their scores to follow a Normal model? Explain.

96 Page 430 #31 7. The College Board reported the score distribution shown in the table for all students who took the 2004 AP Statistics exam. SCORE % of STUDENTS 5 12.5 4 22.5 3 24.8 2 19.8 1 20.4 c) Consider the mean scores of random samples of 40 AP Statistics students. Describe the sampling model for these means (shape, center, and spread). Independence is reasonable, because the 40 student scores were selected at random and the 40 scores is less than 10% of all scores. (10% condition) A sample size of 40 is reasonably large enough. CONTINUED

97 Page 430 #31 SCORE % of STUDENTS 5 12.5 4 22.5 3 24.8 2 19.8 1 20.4
7. The College Board reported the score distribution shown in the table for all students who took the 2004 AP Statistics exam. SCORE % of STUDENTS 5 12.5 4 22.5 3 24.8 2 19.8 1 20.4 c) Consider the mean scores of random samples of 40 AP Statistics students. Describe the sampling model for these means (shape, center, and spread).

98 8. Scores of students on the ACT college entrance exam in a recent year had the normal distribution with mean 𝜇=18.6 and standard deviation 𝜎=5.9. a. What is the probability that a single student randomly chosen from all those taking the ACT scores 21 or higher? z = 𝑥 − 𝜇 𝜎 z = 21 − ≈ .407 0.342 P(ACT score > 21) = P(z > 0.407) = There is about a 34.2% chance that the ACT score from a random student will be 21 or higher.

99 8. Scores of students on the ACT college entrance exam in a recent year had the normal distribution with mean 𝜇=18.6 and standard deviation 𝜎=5.9. b. What is the probability that the mean score for 36 students randomly selected from all those who took the ACT is 21 or higher? Hint: Standard deviation of the sampling distribution is 𝜎 𝑥 = 𝜎 𝑛 Independence is reasonable, because the problem states that the 36 students are selected at random and 36 students would be less than 10% of all students taking the ACT. Since the population of ACT scores had a “normal distribution”, any sample size would be large enough.

100 8. Scores of students on the ACT college entrance exam in a recent year had the normal distribution with mean 𝜇=18.6 and standard deviation 𝜎=5.9. b. What is the probability that the mean score for 36 students randomly selected from all those who took the ACT is 21 or higher? Hint: Standard deviation of the sampling distribution is 𝜎 𝑥 = 𝜎 𝑛 Since the conditions are satisfied, we can model the sampling distribution of 𝒙 with a Normal model with 𝝁 𝒙 =𝝁=𝟏𝟖.𝟔 SD( 𝒙 )= 𝝈 𝒏 = 𝟓.𝟗 𝟑𝟔 ≈𝟎.𝟗𝟖𝟑𝟑 z = 𝑥 − 𝜇 𝜎 z = 21 − ≈ 2.44

101 z = 2.44 P( 𝑥 > 21) = P(z > 2.44) = 0.007
8. Scores of students on the ACT college entrance exam in a recent year had the normal distribution with mean 𝜇=18.6 and standard deviation 𝜎=5.9. b. What is the probability that the mean score for 36 students randomly selected from all those who took the ACT is 21 or higher? Hint: Standard deviation of the sampling distribution is 𝜎 𝑥 = 𝜎 𝑛 z = 2.44 P( 𝑥 > 21) = P(z > 2.44) = 0.007 There is about a 0.7% chance that the mean ACT score from our sample of 36 students will be 21 or higher.

102 z = 2.44 P( 𝑥 > 21) = P(z > 2.44) = 0.007
8. Scores of students on the ACT college entrance exam in a recent year had the normal distribution with mean 𝜇=18.6 and standard deviation 𝜎=5.9. b. What is the probability that the mean score for 36 students randomly selected from all those who took the ACT is 21 or higher? Hint: Standard deviation of the sampling distribution is 𝜎 𝑥 = 𝜎 𝑛 z = 2.44 P( 𝑥 > 21) = P(z > 2.44) = 0.007 There is about a 0.7% chance that the mean ACT score from our sample of 36 students will be 21 or higher.


Download ppt "Understand that Sampling error is really sample variability"

Similar presentations


Ads by Google