Asking and Answering Questions About A Population Mean

Asking and Answering Questions About A Population Mean
Chapter 12 Asking and Answering Questions About A Population Mean Created by Kathy Fritz

Let's review some statistical notation.
n the sample size 𝒙 the mean of a sample s the standard deviation of a sample m the mean of the entire population s the standard deviation of the entire population

When the purpose of a statistical study is to learn about a population mean m, the sample mean 𝑥 can be used as an estimate of m. To understand statistical inference procedures based on 𝑥 , you must first study how sampling variability causes 𝑥 to vary in value from one sample to another. The sample size n and characteristics of the population (its shape, mean value m, and standard deviation s) are important in determining the sampling distribution of 𝑥 .

The Sampling Distribution of the Sample Mean
Properties of the Sampling Distribution of 𝑥 Central Limit Theorem

The paper “Mean Platelet Volume in Patients with Metabolic Syndrome and Its Relationship with Coronary Artery Disease” (Thrombosis Research, 2007) includes data that suggests that the distribution of x = platelet volume for patients who do not have metabolic syndrome is approximately normal with mean m = 8.25 and standard deviation s = 0.75. We can use a statistical software package to select 500 random samples of n = 5 from the population. The sample mean platelet volume 𝑥 was computed for each sample, and these 500 values were used to construct a density histogram. What values for the sample mean would be expected if you were to take a random sample of size 5 from this population distribution?

Platelet Volume Continued . . .
Notice that a sample mean may be 7.3 or 9.1. A sample size of 5 from the population of patients won’t always provide very precise information about the mean platelet volume in the population. To investigate the effect of sample size on the behavior of 𝑥 , we selected 500 samples of size 10, 500 samples of size 20, and 500 samples of size 30. Density histograms of the resulting 𝑥 are created. What if a larger sample is selected?

What do you notice about the means of these distributions?
What do you notice about the standard deviation of these distributions? What do you notice about the means of these distributions? What do you notice about the shape of these distributions?

The paper “Is the Overtime Period in an NHL Game Long Enough
The paper “Is the Overtime Period in an NHL Game Long Enough?” (American Statistician, 2008) gave data on the time (in minutes) from the start of the game to the first goal scored for the 281 regular season games from the season that went into overtime. The density histogram for the data is shown below. Using a statistical software package, we selected 500 samples of each sample sizes n = 5, n = 10, n = 20, n = 30. We then constructed a histogram of the 500 𝑥 values for each of the four sample sizes. Let’s consider these 281 values as a population. The distribution is strongly positively skewed with mean m = 13 minutes and with a median of 10 minutes.

Are these distributions centered at approximately m = 13?
What do you notice about the standard deviations of these distributions? Are these distributions centered at approximately m = 13? What do you notice about the shape of these distributions?

General Properties of Sampling Distributions of 𝑥
Rule 1: 𝜇 𝑥 =𝜇 Rule 2: 𝜎 𝑥 = 𝜎 𝑛 This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the population is included in the sample.

General Properties Continued . .
Rule 3: When the population distribution is normal, the sampling distribution of 𝑥 is also normal for any sample size n. n = 16 n = 4 Population

General Properties Continued . . .
Rule 4: Central Limit Theorem When n is sufficiently large, the sampling distribution of 𝑥 is well approximated by a normal curve, even when the population distribution is not itself normal. How large is “sufficiently large” anyway? n = 16 n = 4 Population CLT can safely be applied if n ≥ 30.

In a study of the courtship of mating scorpion flies, one variable of interest was x = courtship time, which was defined as the time from the beginning of a female-male interaction until mating. Data suggest that it is reasonable to think that the population mean and standard deviation of 𝑥 are m = minutes and s = minutes. The sampling distribution of 𝑥 = mean courtship time for a random sample of 20 scorpion fly mating pairs would have mean 𝜇 𝑥 =𝜇=117.1 minutes Do you think the distribution of x, courtship time, is approximately normal? Explain. The standard deviation of 𝑥 is 𝜎 𝑥 = 𝜎 𝑛 = =24.40 It is not reasonable to assume that the shape of the sampling distribution of 𝑥 is normal.

x = fat content of a hot dog
A hot dog manufacturer claims that one of its brands of hot dogs has a mean fat content of m = 18 grams per hot dog. Consumers of this brand would probably not be disturbed if the mean was less than 18 grams, but would be unhappy if it exceeded 18 grams. In this situation, the variable of interest is x = fat content of a hot dog For purposes of this example, suppose we know that s, the standard deviation of the x distribution, is equal to 1 gram. An independent testing organization is asked to analyze a random sample of 36 hot dogs. The fat content for each of the 36 hot dogs is measured and the sample mean is calculated to be 𝑥 =18.4 grams. Does this result suggest that the manufacturer’s claim that the population mean is 18 is incorrect?

Hot Dogs Continued . . . Let’s look at the sampling distribution of 𝑥 : The sample size, n = 36, is large enough to say that the sampling distribution of 𝑥 will be approximately normal. The standard deviation of the 𝑥 distribution is 𝜎 𝑥 = 𝜎 𝑛 = =0.1667 Since the sample size is greater than 30, the Central Limit Theorem applies. If the manufacturer’s claim is correct, you also know that 𝜇 𝑥 =𝜇=18

Hot Dogs Continued . . . 𝑃( 𝑥 ≥18.4)≈𝑃 𝑧≥ 18.4−18 0.1667
You know that even if m = 18, 𝑥 will not usually be exactly 18 due to sampling variability. But, is it likely that you would see a sample mean at least as large as 18.4 when the population mean is really 18? Using the normal distribution, you can compute the probability of observing a sample mean this large. If the manufacturer’s claim is correct, 𝑃( 𝑥 ≥18.4)≈𝑃 𝑧≥ 18.4− =𝑃 𝑧≥2.4 =0.0082 The value 𝑥 =18.4 is enough greater than 18 that you should be skeptical of the manufacturer’s claim. Values of 𝑥 as large as 18.4 will be observed only about 0.82% of the time when a random sample of size 36 is taken from a population with m = 18 and s = 1.

A Confidence Interval for a Population Mean
t-distributions A One-Sample t Confidence Interval for m

Confidence intervals for m when s is known
The general formula for a confidence interval estimate is statistic ±(critical value) standard error of the statistic From Section 12.1, you know that: The sampling distribution of 𝑥 is centered at m. The standard deviation of 𝑥 is 𝜎 𝑥 = 𝜎 𝑛 . As long as n is large (n ≥ 30), the sampling distribution of 𝑥 is approximately normal. z critical value because the sampling distribution of 𝑥 is approximately normal when n is large statistic that provides an estimate of m standard error of 𝑥 This suggests that a confidence interval for a population mean when the sample size is large and s is known is . . . 𝑥 ±𝑧 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝜎 𝑛

𝑥 ± 𝑧 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝜎 𝑛 =219±(1.96) 35 100 =(212.15, 225.86)
Cosmic radiation levels rise with increasing altitude, promoting researchers to consider how pilots and flight crews might be affected by increased exposure to cosmic radiation. A study reported a mean annual cosmic radiation dose of 219 mrem for a sample of flight personnel of Xinjiang Airlines. Suppose this mean is based on a random sample of 100 flight crew members and that s = 35 mrems. Let: m = mean annual cosmic radiation exposure for all Xinjiang Airlines flight crew members A 95% confidence interval for m is . . . It is almost never the case that you would know the value of the population mean (which is why you would be using sample data to estimate it) but would not know the value of the population standard deviation. For this reason, you will probably never use this confidence interval formula. Let’s see what to do when the population standard deviation s is unknown. Based on this sample, plausible values of m, the mean annual cosmic radiation exposure for all Xinjiang Airlines flight crew members, are between and mrem. 𝑥 ± 𝑧 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝜎 𝑛 =219±(1.96) =(212.15, )

Important Properties of t Distributions
The t distribution corresponding to any particular number of degrees of freedom is bell shaped and centered at zero (just like the standard normal (z) distribution). Each t distribution is more spread out than the standard normal distribution. Just as there are many different normal distributions, there are also many different t distributions. t distributions are distinguished by a positive whole number called degrees of freedom (df). z curve Why is the z curve taller than the t curve for 2 df? t curve for 2 df

Important Properties of t Distributions Continued . . .
3) As the number of degrees of freedom increases, the spread of the corresponding t distribution decreases. t curve for 8 df t curve for 2 df

Important Properties of t Distributions Continued . . .
4) As the number of degrees of freedom increases, the corresponding sequence of t distributions approaches the standard normal distribution. For what df would the t distribution be approximately the same as a standard normal distribution? z curve t curve for 2 df t curve for 5 df

Finding t critical values
Table 3 t Critical Values Central area captured / Confidence level df .80 80% .90 90% .95 95% .98 98% .99 99% .998 99.8% .999 99.9% 1 3.08 6.31 12.71 31.82 63.66 318.31 636.62 2 1.89 2.92 4.30 6.97 9.93 23.53 31.60 3 1.64 2.35 3.18 4.54 5.84 10.21 12.92 4 1.53 2.13 2.78 3.75 4.60 7.17 8.61 5 1.48 2.02 2.57 3.37 4.03 5.89 6.86 6 1.44 4.94 2.45 3.14 3.71 5.21 5.96 7 1.42 4.90 2.37 3.00 3.50 4.79 5.41 8 1.40 1.86 2.31 2.90 3.36 4.50 5.04 9 1.38 1.83 2.26 2.82 3.25 4.78 10 1.37 1.81 2.23 2.76 3.17 4.14 4.59 Suppose you wish to compute a 95% confidence interval using sample data. The sample size is n = 10. df = 10 – 1 = 9 The t critical value is 2.26 2.26

One-Sample t Confidence Interval for a Population Mean m
Appropriate when the following conditions are met: The sample is a random sample from the population of interest or the sample is selected in a way that results in a sample that is representative of the population. This is used when the population standard deviation s is unknown. The sample size is large (n ≥ 30) or the population distribution is normal. t critical values are found in Table 3 or using technology When these conditions are met, a confidence interval for the population mean is 𝑥 ±(𝑡 critical value) 𝑠 𝑛 The t critical value is based on df = n – 1 and the desired confidence level.

One-Sample t Confidence Interval for a Population Mean m Continued …
Interpretation of Confidence Interval You can be confident that the actual value of the population mean is included in the computed interval. In a given problem, this statement should be worded in context. Interpretation of Confidence Level The confidence level specifies the long-run proportion of the time that this method is expected to be successful in capturing the actual population mean.

n = 38 𝒙 = 26 minutes s = 1.57 minutes
During a flu outbreak, many people visit emergency rooms. Before being treated, they often spend time in crowded waiting rooms where other patients may be exposed. A study was performed investigating a drive-through model where flu patients are evaluated while they remain in their cars. In the study, 38 people were each given a scenario for a flu case that was selected at random from the set of all flu cases actually seen in the emergency room. The scenarios provided the “patient” with a medical history and a description of symptoms that would allow the patient to respond to questions from the examining physician. Researchers were interested in estimating the mean processing time for flu patients using the drive-through model. Use 95% confidence to estimate this mean. The patients were processed using a drive-through procedure that was implemented in the parking structure of Stanford University Hospital. The time to process each case from admission to discharge was recorded. The following sample statistics were computed from the data: n = 38 𝒙 = 26 minutes s = 1.57 minutes

n = 38 𝑥 = 26 minutes s = 1.57 minutes
Drive-through Model Continued . . . The following sample statistics were computed from the data: n = 38 𝑥 = 26 minutes s = 1.57 minutes Step 1 (Estimate): You want to estimate the value of m, the mean time to process a flu case using the new drive-through model. Step 2 (Method): Because the answers to the four key questions are: 1) estimation, 2) sample data, 3) one numerical variable, and 4) one sample, consider using a one-sample t confidence interval for a population mean. A confidence level of 95% was specified for this example.

Drive-through Model Continued . . . The following sample statistics were computed from the data: n = 38 𝑥 = 26 minutes s = 1.57 minutes Step 3 (Check): Because the sample size is large (38 > 30), you do not need to worry about whether the population distribution is approximately normal. The sample is a random sample because the 38 flu cases were randomly selected from the population of all flu cases seen at the emergency room Step 4 (Calculate): t critical value (from Table 3) = 2.02 𝑥 ± 𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝑠 𝑛 =26± =(25.486, )

Drive-through Model Continued . . . The following sample statistics were computed from the data: n = 38 𝑥 = 26 minutes s = 1.57 minutes Step 5 (Communicate Results): Confidence Interval You can be 95% confident that the actual mean processing time for emergency room flu cases using the new drive-through model is between minutes and minutes Confidence Level The method used to construct this interval estimate is successful in capturing the actual value of the population mean about 95% of the time.

In a study, seven chimpanzees learned to use an apparatus that dispensed food when either of two ropes was pulled. When one of the ropes was pulled, only the chimp controlling the apparatus received food. When the other rope was pulled, food was dispensed both to the chimp controlling the apparatus and also to a chimp in the adjoining cage. The accompanying data represent the number of times out of 36 trials that each of seven chimps chose the option that would provide food to both chimps. Construct a 99% confidence interval for the mean number of times out of 36 trials chimpanzees would choose the option that would provide food to both chimps.

Chimp Problem Continued . . .
Step 1 (Estimate): The mean number of times out of 36 chimps choose the charitable response, m, will be estimated. Step 2 (Method): Because the answers to the four key questions are estimation, sample data, one numerical variable, and one sample, a one-sample t confidence interval for a population mean will be considered. A confidence level of 99% was specified.

Chimp Problem Continued . . .
Step 3 (Check): It was stated that it is reasonable to regard the sample as representative of the population The normal probability plot is reasonably straight, so it seems plausible that the population distribution is approximately normal. Because the sample size is small (n = 7), you need to consider whether it is reasonable to think that the distribution of the number of charitable responses out of 36 for the population of all chimps is at least approximately normal. This is typically done by plotting the data.

𝑥 ± 𝑡 critical value 𝑠 𝑛 =21.29± 3.71 1.80 7
Chimp Problem Continued . . . Step 4 (Calculation): 𝑥 =21.29 𝑠=1.80 𝑥 ± 𝑡 critical value 𝑠 𝑛 =21.29± =(18.77, 23.81) Step 5 (Communicate Results): Based on this sample, you can be 99% confident that the population mean number of charitable responses (out of 36 trials) is between and

Choosing a Sample Size 𝑀=1.96 𝜎 𝑛 𝑛= 1.96𝜎 𝑀 2
The sample size required to estimate a population mean m with a specified margin of error M is 𝑀=1.96 𝜎 𝑛 Recall, the margin of error is the maximum likely estimation error expected when the statistic is used as an estimator. Solving for n, you get . . . 𝑛= 𝜎 𝑀 2 If the value of s is unknown, it may be estimated based on previous information or, for a population that is not skewed, by using range 4 .

Rounding up, a sample size of 97 or larger is recommended.
A college financial advisor wants to estimate the mean cost of textbooks per quarter for students at the college. For the estimate to be useful, it should have a margin of error of $20 or less. How large a sample should be used to be confident of achieving this level of accuracy? Suppose the financial advisor thinks that the amount spent on books varies widely, but that most values are between $150 to $550. A reasonable estimate of s is : 𝑟𝑎𝑛𝑔𝑒 4 = 550−150 4 =100 Rounding up, a sample size of 97 or larger is recommended. Using this estimate of the population standard deviation, the required sample size is: 𝑛= 𝜎 𝑀 2 = (100) =96.04

Testing Hypotheses About a Population Mean

Hypotheses: When testing hypotheses about a population mean, the null hypothesis will have the form: H0: m = m0 where m0 is a particular hypothesized value. The alternative hypothesis has one of the following three forms, depending on the research question being addressed: Ha: m > m0 Ha: m < m0 Ha: m ≠ m0

Test Statistic 𝑡= 𝑥 −𝜇 𝑠 𝑛
If n is large (n ≥ 30) or if the population distribution is approximately normal, the appropriate test statistic is 𝑡= 𝑥 −𝜇 𝑠 𝑛 If the null hypothesis is true, this test statistic has a t distribution with df = n – 1. This means that the P-value for a hypothesis test about a population mean will be based on a t distribution and not the standard normal distribution.

Computing P-values Upper-tailed test: Ha: m > m0
2. Lower-tailed test: Ha: m < m0 t curve Calculated t P-value = area in upper tail P-value = area in lower tail t curve Calculated -t t curve Calculated –t and t P-value = sum of area in two tails 3. Two-tailed test: Ha: m ≠ m0

The One-Sample t-test for a Population Mean
Appropriate when the following conditions are met: The sample is a random sample from the population of interest or the sample is selected in a way that results in a sample that is representative of the population. The sample size is large (n ≥ 30) or the population distribution is normal. When these conditions are met, the following test statistic can be used: 𝑡= 𝑥 − 𝜇 0 𝑠 𝑛 Where m0 is the hypothesized value from the null hypothesis.

The One-Sample t-test for a Population Mean Continued . . .
Null hypothesis: H0: m = m0 When the conditions are met and the null hypothesis is true, the t test statistic has a t distribution with df = n – 1. When the Alternative Hypothesis Is . . . The P-value Is . . . Ha: m > m0 Area under the t curve to the right of the calculated value of the test statistic Ha: m < m0 Area under the t curve to the left of the calculated value of the test statistic Ha: m ≠ m0 2·(area to the right of t) if t is positive Or 2·(area to the left of t) if t is negative

What is the mean and standard deviation of the sample?
A study conducted by researchers at Pennsylvania State University investigated whether time perception, an indication of a person’s ability to concentrate, is impaired during nicotine withdrawal. After a 24-hour smoking abstinence, 20 smokers were asked to estimate how much time had elapsed during a 45-second period. Researchers wanted to see whether smoking abstinence had a negative impact on time perception, causing elapsed time to be overestimated. Suppose the resulting data on perceived elapsed time (in seconds) were as follows: 69 65 72 73 59 55 39 52 67 57 56 50 70 47 45 64 53 What is the mean and standard deviation of the sample? n = 𝑥 = s = 9.84

Smoking Abstinence Continued . . .
Step 1 (Hypotheses): The population mean is m = mean perceived elapsed time for smokers who have abstained from smoking for 24 hours Null hypothesis: H0: m = 45 Alternative hypothesis: Ha: m > 45 Step 2 (Method): Because the answers to the four key questions are: ) hypothesis test, 2) sample data, 3) one numerical variable, and 4) one sample, consider a one-sample t test for a population mean. When the null hypothesis is true, this statistic will have a t distribution with df = 20 – 1 = 19. You should choose a significance level based on a consideration of the consequences of Type I and Type II errors. In this situation, because neither type of error is much more serious than the other, a value for a of 0.05 is a reasonable choice. Significance level: a = 0.05

Step 3 (Check): The researchers conducting the study indicated that they believed that the sample was selected in a way that would result in a sample that was representative of all smokers in general. Because n is only 20 in this example, you need to verify that the normality condition is reasonable. A boxplot of the sample data is shown here. Although the boxplot is not perfectly symmetric, it is not too skewed and there are no outliers. It is reasonable to think that the population distribution is at least approximately normal.

Step 4 (Calculate): n = 𝑥 = s = 9.84 Test statistic: 𝑡= 59.30− =6.50 This is an upper-tailed test, so the P-value is the area under the t curve with df = 19 to the right side of the computed t value. Associated P-value: P-value = area under t curve to the right of 6.50 = P(t > 6.50) ≈ 0

Step 5 (Communicate Results): Decision: 0 < 0.05, Reject H0 Conclusion: There is convincing evidence that the mean perceived time elapsed is greater than the actual time elapsed of 45 seconds. It would be very unlikely to see a sample mean this extreme just by chance when H0 is true.

Statistical Versus Practical Significance
Carrying out a hypothesis test amounts to deciding whether the value obtained for the test statistic could plausibly have resulted when H0 is true. When the value of the test statistic leads to rejection of H0, it is customary to say that the result is statistically significant a the chosen significance level a. However, statistical significance does NOT mean that the true situation differs from what the null states in any practical sense. See the following example.

Let m denote the actual mean score on a standardized test for children in a large school district. The mean score for all children in the United States is known to be District administrators are interested in testing H0: m = 100 versus Ha: m >100 using a significance level of a = Data from a random sample of 2500 children in the district resulted in 𝑥 = and s = Minitab output from a one-sample t test is shown here: One-Sample T Test of mu = 100 vs > 100 n Mean StDev SEMean 95% Lower Bound T P 2500 15.000 0.300 3.33 0.000 From a practical point of view, a 1-point difference here may not be important. From the Minitab output, P-value ≈ 0, so H0 is rejected. But, there is only a difference of 1 between the sample mean of 101 and the population mean of 100.

Avoid These Common Mistakes

You will need to think about the distinction between proportions and means when choosing an appropriate method. The best way to distinguish which method to use is to focus on the type of data - Categorical Numerical Think Think Proportions Means

Be sure to keep in mind that conditions are important. Use of the one-sample t confidence interval and hypothesis test REQUIRE that conditions are met. Be sure to check these conditions are met before using these methods.

Remember that the results of a hypothesis test can never provide strong support for the null hypothesis. Make sure that you don’t confuse “I am not convinced that the null is false” with the statement “I am convinced that the null hypothesis is true”. These are not the same!

Asking and Answering Questions About A Population Mean

Similar presentations

Presentation on theme: "Asking and Answering Questions About A Population Mean"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Asking and Answering Questions About A Population Mean

Similar presentations

Presentation on theme: "Asking and Answering Questions About A Population Mean"— Presentation transcript:

Similar presentations

About project

Feedback