Download presentation
Presentation is loading. Please wait.
Published byElla Wilkins Modified over 8 years ago
1
Chapter 11 Inference with Means Sampling Distribution, Confidence Intervals, and Hypothesis Tests
2
The campus of Fritz University has a fish pond. Suppose there are 20 fish in the pond. The lengths of the fish (in inches) are given below: 4.55.410.37.98.56.611.78.92.29.8 6.34.39.68.713.34.610.713.47.75.6 We caught fish with lengths 6.3 inches, 2.2 inches, and 13.3 inches. x = 7.27 inches Let’s catch two more samples and look at the sample means. 2 nd sample - 8.5, 4.6, and 5.6 inches. x = 6.23 inches 3 rd sample – 10.3, 8.9, and 13.4 inches. x = 10.87 inches Suppose we randomly catch a sample of 3 fish from this pond and measure their length. What would the mean length of the sample be? This is an example of sampling variability This is a statistic! The true mean = 8. Notice that some sample means are closer and some farther away; some above and some below the mean.
3
Fish Pond Continued... 4.55.410.37.98.56.611.78.92.29.8 6.34.39.68.713.34.610.713.47.75.6 There are 1140 ( 20 C 3 ) different possible samples of size 3 from this population. If we were to catch all those different samples and calculate the mean length of each sample, we would have a distribution of all possible x. This would be the sampling distribution of x.
4
Sampling Distributions of x The distribution that would be formed by considering the value of a sample statistic for every possible sample of a given size from a population. In this case, the sample statistic is the sample mean x.
5
Fish Pond Revisited... Suppose there are only 5 fish in the pond. The lengths of the fish (in inches) are given below: x = 7.84 x = 3.262 What is the mean and standard deviation of this population? 6.611.78.92.29.8 We will keep the population size small so that we can find ALL the possible samples.
6
Fish Pond Revisited... What is the mean and standard deviation of these sample means? 6.611.78.92.29.8 Pairs 6.6 & 11.7 6.6 & 8.9 6.6 & 2.2 6.6 & 9.8 11.7 & 8.9 11.7 & 2.2 11.7 & 9.8 8.9 & 2.2 8.9 & 9.8 2.2 & 9.8 x9.157.754.48.210.36.9510.755.559.356 x = 7.84 x = 1.998 How do these values compare to the population mean and standard deviation? x = 7.84 and x = 3.262 Let’s find all the samples of size 2. These values determine the sampling distribution of x for samples of size 2.
7
Fish Pond Revisited... 6.611.78.92.29.8 Triples 6.6, 11.7, 8.9 6.6, 11.7, 2.2 6.6, 11.7, 9.8 6.6, 8.9, 2.2 6.6, 8.9, 9.8 6.6, 2.2, 9.8 11.7, 8.9, 2.2 11.7, 8.9, 9.8 11.7, 2.2, 9.8 8.9, 2.2, 9.8 x9.0676.8339.3675.98.4336.27.610.1337.96.967 x = 7.84 x = 1.332 How do these values compare to the population mean and standard deviation? x = 7.84 and x = 3.262 Now let’s find all the samples of size 3. What is the mean and standard deviation of these sample means? These values determine the sampling distribution of x for samples of size 3.
8
What do you notice? The mean of the sampling distribution EQUALS the mean of the population. As the sample size increases, the standard deviation of the sampling distribution decreases. x = x = as n xx
9
General Properties of Sampling Distributions of x Rule 1: Rule 2: This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the population is included in the sample Note that in the previous fish pond examples this standard deviation formula was not correct because the sample sizes were more than 10% of the population.
10
Sampling Distribution Activity – Day 1
11
General Properties Continued... Rule 3: When the population distribution is normal, the sampling distribution of x is also normal for any sample size n.
12
The paper “Mean Platelet Volume in Patients with Metabolic Syndrome and Its Relationship with Coronary Artery Disease” (Thrombosis Research, 2007) includes data that suggests that the distribution of platelet volume of patients who do not have metabolic syndrome is approximately normal with mean = 8.25 and standard deviation = 0.75. We can use Minitab to generate random samples from this population. We will generate 500 random samples of n = 5 and compute the sample mean for each.
13
Platelets Continued... Similarly, we will generate 500 random samples of n = 10, n = 20, and n = 30. The density histograms below display the resulting 500 x for each of the given sample sizes. What do you notice about the means of these histograms? What do you notice about the standard deviation of these histograms? What do you notice about the shape of these histograms?
14
General Properties Continued... Rule 4: Central Limit Theorem When n is sufficiently large, the sampling distribution of x is well approximated by a normal curve, even when the population distribution is not itself normal. CLT can safely be applied if n exceeds 30. How large is “sufficiently large” anyway?
15
The paper “Is the Overtime Period in an NHL Game Long Enough?” (American Statistician, 2008) gave data on the time (in minutes) from the start of the game to the first goal scored for the 281 regular season games from the 2005-2006 season that went into overtime. The density histogram for the data is shown below. Let’s consider these 281 values as a population. The distribution is strongly positively skewed with mean = 13 minutes and with a median of 10 minutes. Using Minitab, we will generate 500 samples of the following sample sizes from this distribution: n = 5, n = 10, n = 20, n = 30.
16
These are the density histograms for the 500 samples Are these histograms centered at approximately = 13? What do you notice about the standard deviations of these histograms? What do you notice about the shape of these histograms?
17
A hot dog manufacturer asserts that one of its brands of hot dogs has a average fat content of 18 grams per hot dog with standard deviation of 1 gram. Consumers of this brand would probably not be disturbed if the mean was less than 18 grams, but would be unhappy if it exceeded 18 grams. An independent testing organization is asked to analyze a random sample of 36 hot dogs. Suppose the resulting sample mean is 18.4 grams. Does this result suggest that the manufacturer’s claim is incorrect? Since the sample size is greater than 30, the Central Limit Theorem applies. So the distribution of x is approximately normal with
18
Hot Dogs Continued... Suppose the resulting sample mean is 18.4 grams. Does this result suggest that the manufacturer’s claim is incorrect? P(x > 18.4) = Yes, the probability of getting a sample mean of 18.4 is small enough to cause us to doubt that the manufacturer’s claim is correct. What do we call this?
19
Now let’s look at confidence intervals to estimate the mean of a population.
20
Cosmic radiation levels rise with increasing altitude, promoting researchers to consider how pilots and flight crews might be affected by increased exposure to cosmic radiation. A study reported a mean annual cosmic radiation dose of 219 mrems with a standard deviation of 35 mrems for a sample of flight personnel of Xinjiang Airlines. Suppose this mean is based on a random sample of 100 flight crew members. Remember that: Are these values parameters or statistics? What value would you use for ? The use of the value of s introduces extra variability. Therefore, we will use the distribution of t values since it has more variability than a standard normal curve.
21
What are t-distributions? Created by William Gosset Let’s look at some graphs of t-distributions
22
Important Properties of t Distributions 1)The t distribution corresponding to any particular number of degrees of freedom is bell shaped and centered at zero (just like the standard normal (z) distribution). 2)Each t distribution is more spread out than the standard normal distribution. z curve t curve for 2 df Why is the z curve taller than the t curve for 2 df? 0 t distributions are described by degrees of freedom (df).
23
Important Properties of t Distributions Continued... 3) As the number of degrees of freedom increases, the spread of the corresponding t distribution decreases. 4) As the number of degrees of freedom increases, the corresponding sequence of t distributions approaches the standard normal distribution. z curve t curve for 2 df t curve for 5 df For what df would the t distribution be approximately the same as a standard normal distribution? 0
24
Confidence intervals for when is unknown
25
Confidence intervals for when is unknown The general formula for a confidence interval for a population mean based on a sample of size n is... Where the t critical value is based on df = n - 1. But this is unknown – so what do we use? What is this called? Standard error What is this called? estimate What is this called? Margin of error
26
How to find t* Use Table B for t distributions Look up confidence level at bottom & df on the sides df = n – 1 Find these t* 90% confidence when n = 5 95% confidence when n = 15 t* = 2.132 t* = 2.145 Can also use invT on the calculator!
27
Steps for a Confidence Interval Verify assumptions –Random sample –Distribution is (approximately) normal AND why Given Large sample size Graph data Calculate interval Write conclusion in context We are ____% confident that the true mean ______ is between ___ and ___. You MUST show the graph We typically use either a boxplot or a normal probability plot to assess normality.
28
Cosmic radiation levels rise with increasing altitude, promoting researchers to consider how pilots and flight crews might be affected by increased exposure to cosmic radiation. A study reported a mean annual cosmic radiation dose of 219 mrems and a standard deviation of 35 mrems for a sample of flight personnel of Xinjiang Airlines. Suppose this mean is based on a random sample of 100 flight crew members. Calculate and interpret a 95% confidence interval for the actual mean annual cosmic radiation exposure for Xinjiang flight crew members. 1)Data is from a random sample of crew members 2)Sample size n is large (n > 30) First, verify that the conditions are met.
29
Cosmic Radiation Continued... Let x = 219 mrems n = 100 flight crew members s = 35 mrems. Calculate and interpret a 95% confidence interval for the actual mean annual cosmic radiation exposure for Xinjiang flight crew members. What does this mean in context? We are 95% confident that the actual mean annual cosmic radiation exposure for Xinjiang flight crew members is between 212.06 mrems and 225.94 mrems. What would happen to the width of this interval if the confidence level was 90% instead of 95%?
30
The article “Chimps Aren’t Charitable” (Newsday, November 2, 2005) summarized the results of a research study published in the journal Nature. In this study, chimpanzees learned to use an apparatus that dispersed food when either of two ropes was pulled. When one of the ropes was pulled, only the chimp controlling the apparatus received food. When the other rope was pulled, food was dispensed both to the chimp controlling the apparatus and also a chimp in the adjoining cage. The accompanying data represent the number of times out of 36 trials that each of seven chimps chose the option that would provide food to both chimps (charitable response). 23222124192020 Compute a 99% confidence interval for the mean number of charitable responses for the population of all chimps. First verify that the conditions for a t-interval are met.
31
The plot is reasonable straight, so it seems plausible that the population distribution of number of charitable responses is approximately normal. Number of Charitable Responses Normal Scores Chimps Continued... 23222124192020 Since n is small, we need to verify if it is plausible that this sample is from a population that is approximately normal. Let’s use a normal probability plot. Let’s suppose it is reasonable to regard this sample of seven chimps as representative of the chimp population.
32
Chimps Continued... 23222124192020 x = 21.29 and s = 1.80df = 7 – 1 = 6 We are 99% confident that the mean number of charitable responses for the population of all chimps is between 18.77 and 23.81.
33
The margin of error associated with a confidence interval is Solve this for n: Choosing a Sample Size We can use this to find the necessary sample size for a particular margin of error. This requires to be known – since it is rarely known, we MUST assume it is some specified value.
34
The financial aid office wishes to estimate the mean cost of textbooks per quarter for students at a particular university. For the estimate to be useful, it should be within $20 of the true population mean. How large a sample should be used to be 95% confident of achieving this level of accuracy? Assume = $100 Always round sample size up to the next whole number!
35
Hypothesis Tests for a Population Mean
36
Let’s review the assumptions for a confidence interval for a population mean 2) The distribution is (approximately) normal because sample size n is large (n > 30) or the graph indicates it’s plausible, and 3) , the population standard deviation, is unknown 1) x is the sample mean from a random sample, The assumptions are the same for a large-sample hypothesis test for a population mean.
37
The One-Sample t-test for a Population Mean Null hypothesis:H 0 : = hypothesized value Alternative Hypothesis: H a : > hypothesized value H a : < hypothesized value H a : ≠ hypothesized value Test Statistic:
38
How to find a p-value To find a p-value: Use tcdf(LB, UB, df) Find the p-value for a one-sided test with a test statistic t = -1.5 when n = 15. tcdf(-∞,-1.5,14) =.0779 What would you do if this were a two-sided test?
39
A study conducted by researchers at Pennsylvania State University investigated whether time perception, an indication of a person’s ability to concentrate, is impaired during nicotine withdrawal. After a 24-hour smoking abstinence, 20 smokers were asked to estimate how much time had elapsed during a 45-second period. Researchers wanted to see whether smoking abstinence had a negative impact on time perception, causing elapsed time to be overestimated. Suppose the resulting data on perceived elapsed time (in seconds) were as follows: 695572735955395257 56507047564570545753 x = 57.30 s = 9.30 n = 20 What is the mean and standard deviation of the sample?
40
Smoking Abstinence Continued... H 0 : = 45 H a : > 45 Assumptions: 1)It is reasonable to believe that the sample of smokers is representative of all smokers. 695572735955395257 56507047564570545753 x = 57.30 s = 9.3 n = 20 Where is the true mean perceived elapsed time for smokers who have abstained from smoking for 24-hours State the hypotheses. Verify assumptions. 50406070 2) Since the sample size is not at least 30, we must determine if it is plausible that the population distribution is approximately normal. To do this, we need to graph the data using a boxplot or normal probability plot Since the boxplot is approximately symmetrical, it is plausible that the population distribution is approximately normal.
41
Smoking Abstinence Continued... H 0 : = 45 H a : > 45 Test statistic: P-value ≈ 0 =.05 Since P-value < , we reject H 0. There is convincing evidence that the mean perceived elapsed time is greater than the actual elapsed time of 45 seconds. 695572735955395257 56507047564570545753 x = 57.30 s = 9.3 n = 20 Where is the true mean perceived elapsed time for smokers who have abstained from smoking for 24-hours Compute the test statistic and P-value.
42
Smoking Abstinence Continued... H 0 : = 45 H a : > 45 Since P-value < , we reject H 0. =.05 x = 57.30 s = 9.3 n = 20 Where is the true mean perceived elapsed time for smokers who have abstained from smoking for 24-hours Compute the appropriate confidence interval. Since this is a one-tailed test, goes in the upper tail..05 goes in the lower tail, leaving.90 in the middle. Notice that the hypothesized value of 45 is NOT in the 90% confidence interval and that we “rejected” H 0 ! 695572735955395257 56507047564570545753
43
A growing concern of employers is time spent in activities like surfing the Internet and emailing friends during work hours. The San Luis Obispo Tribune summarized the findings of a large survey of workers in an article that ran under the headline “Who Goofs Off More than 2 Hours a Day? Most Workers, Survey Says” (August 3, 2006). Suppose that the CEO of a large company wants to determine whether the average amount of wasted time during an 8-hour day for employees of her company is less than the reported 120 minutes. Each person in a random sample of 10 employees was contrasted and asked about daily wasted time at work. The resulting data are the following: 108112117130111131113 105128 x = 116.80 s = 9.45 n = 10 What is the mean and standard deviation of the sample?
44
H 0 : = 120 H a : < 120 Assumptions: 1) The given sample was a random sample of employees Surfing Internet Continued... 108112117130111131113 105128 x = 116.80 s = 9.45 n = 10 State the hypotheses. Where is the true mean daily wasted time for employees of this company Verify the assumptions. 2) Since the sample size is not at least 30, we must determine if it is plausible that the population distribution is approximately normal. The boxplot reveals some skewness, but there is no outliers. It is plausible that the population distribution is approximately normal. 110120130
45
H 0 : = 120 H a : < 120 Test Statistic: P-value =.150 =.05 Since p-value > , we fail to reject H 0. There is not sufficient evidence to conclude that the mean daily wasted time for employees of this company is less than 120 minutes. Surfing Internet Continued... 108112117130111131113 105128 x = 116.80 s = 9.45 n = 10 Where is the true mean daily wasted time for employees of this company Compute the test statistic and P-value. What potential error could we have made? Type II
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.