8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the distribution of sample means and the distribution of sample proportions are formed. Also learn the notation used to represent sample means and proportions. Page 334 Copyright © 2009 Pearson Education, Inc.
Copyright © 2009 Pearson Education, Inc. Sample Means: The Basic Idea Table 8.1 lists the weights of the five starting players (labeled A through E for convenience) on a professional basketball team. We regard these five players as the entire population (with a mean of 242.4 pounds). Samples drawn from this population of five players can range in size from n = 1 (one player out of the five) to n = 5 (all five players). Page 334 With a sample size of n = 1, there are 5 different samples that could be selected: Each player is a sample. The mean of each sample of size n = 1 is simply the weight of the player in the sample. Copyright © 2009 Pearson Education, Inc. Slide 8.1- 2
Copyright © 2009 Pearson Education, Inc. Figure 8.1 shows a histogram of the means of the 5 samples; it is called a distribution of sample means, because it shows the means of all 5 samples of size n = 1. The distribution of sample means created by this process is an example of a sampling distribution. This term simply refers to a distribution of a sample statistic, such as a mean, taken from all possible samples of a particular size. Figure 8.1 Sampling distribution for sample size n 1. Page 335 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 3
Copyright © 2009 Pearson Education, Inc. Notice that the mean of the 5 sample means is the mean of the entire population: 215 + 242 + 225 + 215 + 315 5 = 242.4 pounds Page 335 This demonstrates a general rule: The mean of a distribution of sample means is the population mean. Copyright © 2009 Pearson Education, Inc. Slide 8.1- 4
Copyright © 2009 Pearson Education, Inc. Let’s move on to samples of size n = 2, in which each sample consists of two different players. With five players, there are 10 different samples of size n = 2. Each sample has its own mean. Table 8.2 lists the 10 samples with their means. Figure 8.2 shows the distribution of all 10 sample means. means is equal to the population mean, 242.4 pounds. Again, notice that the mean of the distribution of sample Page 335 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 5
Copyright © 2009 Pearson Education, Inc. Ten different samples of size n = 3 are possible in a population of five players. Table 8.3 shows these samples and their means, and Figure 8.3 shows the distribution of these sample means. Again, the mean of the distribution of sample means is equal to the population mean, 242.4 pounds. Page 336 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 6
Copyright © 2009 Pearson Education, Inc. With a sample size of n = 4, only 5 different samples are possible. Table 8.4 shows these samples and their means, and Figure 8.4 shows the distribution of these sample means. Page 334 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 7
Copyright © 2009 Pearson Education, Inc. Finally, for a population of five players, there is only 1 possible sample of size n = 5: the entire population. In this case, the distribution of sample means is just a single bar (Figure 8.5). Again the mean of the distribution of sample means is the population mean, 242.4 pounds. Figure 8.5 Sampling distribution for sample size n = 5. To summarize, when we work with all possible samples of a population of a given size, the mean of the distribution of sample means is always the population mean. Page 336 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 8
Copyright © 2009 Pearson Education, Inc. Sample Means with Larger Populations In typical statistical applications, populations are huge and it is impractical or expensive to survey every individual in the population; consequently, we rarely know the true population mean, μ. Therefore, it makes sense to consider using the mean of a sample to estimate the mean of the entire population. Although a sample is easier to work with, it cannot possibly represent the entire population exactly. Therefore, we should not expect an estimate of the population mean obtained from a sample to be perfect. The error that we introduce by working with a sample is called the sampling error. Page 337 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 9
Copyright © 2009 Pearson Education, Inc. Sampling Error The sampling error is the error introduced because a random sample is used to estimate a population parameter. It does not include other sources of error, such as those due to biased sampling, bad survey questions, or recording mistakes. Page 338 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 10
Copyright © 2009 Pearson Education, Inc. TIME OUT TO THINK Would you expect the sampling error to increase or decrease if the sample size were increased? Explain. Page 338 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 11
Copyright © 2009 Pearson Education, Inc. Results from a survey of students who were asked how many hours they spend per week using a search engine on the Internet. n = 400 μ = 3.88 σ = 2.40 Page 337 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 12
Copyright © 2009 Pearson Education, Inc. A sample of 32 students selected from the 400 on the previous slide. 1.1 7.8 6.8 4.9 3.0 6.5 5.2 2.2 5.1 3.4 4.7 7.0 3.8 5.7 6.5 2.7 2.6 1.4 7.1 5.5 3.1 5.0 6.8 6.5 1.7 2.1 1.2 0.3 0.9 2.4 2.5 7.8 Sample 1 The mean of this sample is x = 4.17; we use the standard notation x to denote this mean. We say that x is a sample statistic because it comes from a sample of the entire population. Thus, x is called a sample mean. ¯ x ¯ x ¯ x Page 338 ¯ x Copyright © 2009 Pearson Education, Inc. Slide 8.1- 13
Copyright © 2009 Pearson Education, Inc. Notation for Population and Sample Means n = sample size m = population mean x = sample mean ¯ Page 338 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 14
Copyright © 2009 Pearson Education, Inc. A different sample of 32 students selected from the 400. 1.8 0.4 4.0 2.4 0.8 6.2 0.8 6.6 5.7 7.9 2.5 3.6 5.2 5.7 6.5 1.2 5.4 5.7 7.2 5.1 3.2 3.1 5.0 3.1 0.5 3.9 3.1 5.8 2.9 7.2 0.9 4.0 Sample 2 For this sample x is = 3.98. ¯ x Now you have two sample means that don’t agree with each other, and neither one agrees with the true population mean. Page 338 ¯ x x1 = 4.17 (slide 13) x2 = 3.98 m = 3.88 (slide 10) ¯ x Copyright © 2009 Pearson Education, Inc. Slide 8.1- 15
Copyright © 2009 Pearson Education, Inc. Figure 8.6 shows a histogram that results from 100 different samples, each with 32 students. Notice that this histogram is very close to a normal distribution and its mean is very close to the population mean, μ = 3.88. Pages 338-339 Figure 8.6 A distribution of 100 sample means, with a sample size of n = 32, appears close to a normal distribution with a mean of 3.88. Copyright © 2009 Pearson Education, Inc. Slide 8.1- 16
Copyright © 2009 Pearson Education, Inc. TIME OUT TO THINK Suppose you choose only one sample of size n = 32. According to Figure 8.6, are you more likely to choose a sample with a mean less than 2.5 or a sample with a mean less than 3.5? Explain. Page 338 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 17
Copyright © 2009 Pearson Education, Inc. The Distribution of Sample Means The distribution of sample means is the distribution that results when we find the means of all possible samples of a given size. The larger the sample size, the more closely this distribution approximates a normal distribution. In all cases, the mean of the distribution of sample means equals the population mean. If only one sample is available, its sample mean, x, is the best estimate for the population mean, m. Page 339 ¯ x Copyright © 2009 Pearson Education, Inc. Slide 8.1- 18
Copyright © 2009 Pearson Education, Inc. Page 339 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 19
Copyright © 2009 Pearson Education, Inc. If we were to include all possible samples of size n = 32, this distribution would have these characteristics: • The distribution of sample means is approximately a normal distribution. • The mean of the distribution of sample means is 3.88 (the mean of the population). • The standard deviation of the distribution of sample means depends on the population standard deviation and the sample size. The population standard deviation is σ = 2.40 and the sample size is n = 32, so the standard deviation of sample means is Page 339 = = 0.42 σ n 2.40 32 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 20
Copyright © 2009 Pearson Education, Inc. Suppose we select the following random sample of 32 responses from the 400 responses given earlier: 5.8 7.5 5.8 5.2 3.9 3.4 7.3 4.1 0.5 7.9 7.7 7.7 5.0 2.3 7.8 2.3 5.0 6.8 6.5 1.7 2.1 7.3 4.0 2.2 5.6 4.7 5.3 3.5 6.5 3.4 6.6 5.0 Sample 3 The mean of this sample is x = 5.01. ¯ x Given that the mean of the distribution of sample means is 3.88 and the standard deviation is 0.42, the sample mean of x = 5.01 has a standard score of Page 340. Note that standard scores are discussed in Section 5.2. ¯ x z = = = 2.7 sample mean – pop. mean standard deviation 5.01 – 3.88 0.42 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 21
Copyright © 2009 Pearson Education, Inc. The sample (from the previous slide) has a standard score of z = 2.7, indicating that it is 2.7 standard deviations above the mean of the sampling distribution. From Table 5.1, this standard score corresponds to the 99.65th percentile, so the probability of selecting another sample with a mean less than 5.01 is about 0.9965. It follows that the probability of selecting another sample with a mean greater than 5.01 is about 1 – 0.9965 = 0.0035. Apparently, the sample we selected is rather extreme within this distribution. Page 340 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 22
Copyright © 2009 Pearson Education, Inc. TIME OUT TO THINK Suppose a sample mean is in the 95th percentile. Explain why the probability of randomly selecting another sample with a mean greater than the first mean is 0.05. Page 340 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 23
Copyright © 2009 Pearson Education, Inc. EXAMPLE 1 Sampling Farms Texas has roughly 225,000 farms, more than any other state in the United States. The actual mean farm size is μ = 582 acres and the standard deviation is σ = 150 acres. For random samples of n = 100 farms, find the mean and standard deviation of the distribution of sample means. What is the probability of selecting a random sample of 100 farms with a mean greater than 600 acres? Solution: Because the distribution of sample means is a normal distribution, its mean should be the same as the mean of the entire population, which is 582 acres. The standard deviation of the sampling distribution is σ/ n = 150/ 100 = 15. Page 340 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 24
Copyright © 2009 Pearson Education, Inc. EXAMPLE 1 Sampling Farms Solution: (cont.) A sample mean of acres therefore has a standard score of z = = = 1.2 sample mean – pop. mean standard deviation 600 – 582 15 According to Table 5.1, this standard score is in the 88th percentile, so the probability of selecting a sample with a mean less than 600 acres is about 0.88. Thus, the probability of selecting a sample with a mean greater than 600 acres is about 0.12. Page 340 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 25
Copyright © 2009 Pearson Education, Inc. Sample Proportions In a survey where 400 students were asked if they own a car, 240 replied that they did. The exact proportion of car owners is p = = 0.6 240 400 This population proportion, p = 0.6, is another example of a population parameter. Pages 340-341. Table 5.1 is on page 211. Copyright © 2009 Pearson Education, Inc. Slide 8.1- 26
Copyright © 2009 Pearson Education, Inc. TIME OUT TO THINK Give another survey question that would result in a population proportion rather than a population mean. Page 341 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 27
Copyright © 2009 Pearson Education, Inc. A sample of 32 was selected from the 400 students and 21 were car owners. p = = 0.656 21 32 p ˆ This proportion is another example of a sample statistic. In this case, it is a sample proportion because it is the proportion of car owners within a sample; we use the symbol p (read “p-hat”) to distinguish this sample proportion from the population proportion, p. p ˆ Page 341 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 28
Copyright © 2009 Pearson Education, Inc. Notation for Population and Sample Proportions n = sample size p = population proportion p = sample proportion ˆ Page 341 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 29
Copyright © 2009 Pearson Education, Inc. Figure 8.7 shows such a histogram of sample proportions from 100 samples of size n = 32. As we found for sample means, this distribution of sample proportions is very close to a normal distribution. Furthermore, the mean of this distribution is very close to the population proportion of 0.6. Page 342 Figure 8.7 The distribution of 100 sample proportions, with a sample size of 32, appears to be close to a normal distribution. Copyright © 2009 Pearson Education, Inc. Slide 8.1- 30
Copyright © 2009 Pearson Education, Inc. Suppose it were possible to select all possible samples of size n = 32. The resulting distribution would be called a distribution of sample proportions. The mean of this distribution equals the population proportion exactly. This distribution approaches a normal distribution as the sample size increases. In practice, we often have only one sample to work with. In that case, the best estimate for the population proportion, p, is the sample proportion, p. Page 342 ˆ p Copyright © 2009 Pearson Education, Inc. Slide 8.1- 31
Copyright © 2009 Pearson Education, Inc. The Distribution of Sample Proportions The distribution of sample proportions is the distribution that results when we find the proportions ( ) in all possible samples of a given size. The larger the sample size, the more closely this distribution approximates a normal distribution. In all cases, the mean of the distribution of sample proportions equals the population proportion. If only one sample is available, its sample proportion, , is the best estimate for the population proportion, p. ˆ p Page 342 ˆ p Copyright © 2009 Pearson Education, Inc. Slide 8.1- 32
Copyright © 2009 Pearson Education, Inc. EXAMPLE 2 Analyzing a Sample Proportion Consider the distribution of sample proportions shown in Figure 8.7 (slide 30). Assume that its mean is p = 0.6 and its standard deviation is 0.1. Suppose you randomly select the following sample of 32 responses: Y Y N Y Y Y Y N Y Y Y Y Y Y N Y Y N Y Y Y N Y Y N Y Y N Y N Y Y ˆ p Compute the sample proportion, p, for this sample. How far does it lie from the mean of the distribution? What is the probability of selecting another sample with a proportion greater than the one you selected? Pages 342-343 Solution: The proportion of Y responses in this sample is = = 0.75 24 32 ˆ p Copyright © 2009 Pearson Education, Inc. Slide 8.1- 33
EXAMPLE 2 Analyzing a Sample Proportion Solution: (cont.) Using a mean of 0.6 and a standard deviation of 0.1, we find that the sample statistic, = 0.75, has a standard score of The sample proportion is 1.5 standard deviations above the mean of the distribution. Using Table 5.1, we see that a standard score of 1.5 corresponds to the 93rd percentile. The probability of selecting another sample with a proportion less than the one we selected is about 0.93. ˆ p z = = = 1.2 sample proportion – pop. proportion standard deviation 0.75 – 0.6 0.1 Page 343 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 34
Copyright © 2009 Pearson Education, Inc. EXAMPLE 2 Analyzing a Sample Proportion Solution: (cont.) Thus, the probability of selecting another sample with a proportion greater than the one we selected is about 1 – 0.93 = 0.07. In other words, if we were to select 100 random samples of 32 responses, we should expect to see only 7 samples with a higher proportion than the one we selected. Page 343 Copyright © 2009 Pearson Education, Inc. Slide 8.1- 35
Copyright © 2009 Pearson Education, Inc. The End Copyright © 2009 Pearson Education, Inc. Slide 8.1- 36