9.3 Tests about a Population Mean Objectives SWBAT: STATE and CHECK the Random, 10%, and Normal/Large Sample conditions for performing a significance test.

9.3 Tests about a Population Mean Objectives SWBAT: STATE and CHECK the Random, 10%, and Normal/Large Sample conditions for performing a significance test about a population mean. PERFORM a significance test about a population mean. USE a confidence interval to draw a conclusion for a two-sided test about a population parameter. PERFORM a significance test about a mean difference using paired data.

What are the three conditions for conducting a significance test for a population mean? How are these different than the conditions for calculating a confidence interval for a population mean? Conditions For Performing A Significance Test About A Mean Random: The data come from a well-designed random sample or randomized experiment. o 10%: When sampling without replacement, check that n ≤ (1/10)N. Normal/Large Sample: The population has a Normal distribution or the sample size is large (n ≥ 30). If the population distribution has unknown shape and n < 30, use a graph of the sample data to assess the Normality of the population. Do not use t procedures if the graph shows strong skewness or outliers. The conditions are the same as the conditions for constructing a confidence interval.

What test statistic do we use when testing a population mean? Is the formula on the formula sheet? As with proportions, all we get on the formula sheet is the generic Reminder: we always operate under the assumption that the null is true. Also, with means, we use the t-distribution unless there is a rare instance when we know the population standard deviation. Because the population standard deviation σ is usually unknown, we use the sample standard deviation s x in its place. The resulting test statistic has the standard error of the sample mean in the denominator When the Normal condition is met, this statistic has a t distribution with n - 1 degrees of freedom.

How do you calculate p-values using the t distributions? You can use either the table or the calculator to get a p-value. Let’s look at an example. A battery company wants to test H 0 : µ = 30 versus H a : µ > 30 based on an SRS of 15 new AAA batteries with mean lifetime and standard deviation The P-value is the probability of getting a result this large or larger in the direction indicated by H a, that is, P(t ≥ 1.54). Go to the df = 14 row. Since the t statistic falls between the values 1.345 and 1.761, the “Upper-tail probability p” is between 0.10 and 0.05. The P-value for this test is between 0.05 and 0.10. Upper-tail probability p df.10.05.025 131.3501.7712.160 141.3451.7612.145 151.3411.7533.131 80%90%95% Confidence level C Reminder: for two-sided tests, double the p-value

Alternate Example: Short Subs Abby and Raquel like to eat sub sandwiches. However, they noticed that the lengths of the “6-inch sub” sandwiches they get at their favorite restaurant seemed shorter than the advertised length. To investigate, they randomly selected 24 different times during the next month and ordered a “6-inch” sub. Here are the actual lengths of each of the 24 sandwiches (in inches): 4.504.754.755.005.005.005.505.50 5.505.505.505.505.755.755.756.00 6.006.006.006.006.506.756.757.00

Alternate Example: Short Subs Abby and Raquel like to eat sub sandwiches. However, they noticed that the lengths of the “6-inch sub” sandwiches they get at their favorite restaurant seemed shorter than the advertised length. To investigate, they randomly selected 24 different times during the next month and ordered a “6-inch” sub. Here are the actual lengths of each of the 24 sandwiches (in inches): 4.504.754.755.005.005.005.505.50 5.505.505.505.505.755.755.756.00 6.006.006.006.006.506.756.757.00 Calculator: tcdf(lower: -10000; upper: -2.38, df: 23) = 0.0130

Alternate Example: Short Subs Abby and Raquel like to eat sub sandwiches. However, they noticed that the lengths of the “6-inch sub” sandwiches they get at their favorite restaurant seemed shorter than the advertised length. To investigate, they randomly selected 24 different times during the next month and ordered a “6-inch” sub. Here are the actual lengths of each of the 24 sandwiches (in inches): 4.504.754.755.005.005.005.505.50 5.505.505.505.505.755.755.756.00 6.006.006.006.006.506.756.757.00 b) Given your conclusion in part (a), which kind of mistake – a Type I or a Type II error – could you have made? Explain what this mistake would mean in context. Because we rejected the null hypothesis, it is possible that we made a Type I error. In other words, it is possible that we found convincing evidence that the mean length was less than 6 inches when in reality the mean length is 6 inches.

To do this in the calculator: 4.504.754.755.005.005.005.505.50 5.505.505.505.505.755.755.756.00 6.006.006.006.006.506.756.757.00 Start by entering the data into a list. STAT > TESTS > 2: T-Test List: wherever you put the data Calculate t = -2.407 p = 0.0123

Can you use your calculator for the Do step? Are there any drawbacks? As we’ve seen, yes, you can use your calculator. However, if you just show calculator results with no work, and one or more values are wrong, you won’t get any credit for the “do” step. If you opt for the calculator-only method, name the procedure (t test) and report the test statistic, degrees of freedom, and p-value.

Alternate Example: Don’t break the ice In the children’s game Don’t Break the Ice, small plastic ice cubes are squeezed into a square frame. Each child takes turns tapping out a cube of “ice” with a plastic hammer, hoping that the remaining cubes don’t collapse. For the game to work correctly, the cubes must be big enough so that they hold each other in place in the plastic frame but not so big that they are too difficult to tap out. The machine that produces the plastic cubes is designed to make cubes that are 29.5 millimeters (mm) wide, but the actual width varies a little. To ensure that the machine is working well, a supervisor inspects a random sample of 50 cubes every hour and measures their width. The Fathom output summarizes the data from a sample taken during one hour. a)Interpret the standard deviation and the standard error provided by the computer output. Standard deviation: The widths of the cubes are typically about 0.090 mm from the mean width (29.4846). Standard error: In random samples of size 50, the sample mean will typically be about 0.013 mm from the true mean.

Alternate Example: Don’t break the ice In the children’s game Don’t Break the Ice, small plastic ice cubes are squeezed into a square frame. Each child takes turns tapping out a cube of “ice” with a plastic hammer, hoping that the remaining cubes don’t collapse. For the game to work correctly, the cubes must be big enough so that they hold each other in place in the plastic frame but not so big that they are too difficult to tap out. The machine that produces the plastic cubes is designed to make cubes that are 29.5 millimeters (mm) wide, but the actual width varies a little. To ensure that the machine is working well, a supervisor inspects a random sample of 50 cubes every hour and measures their width. The Fathom output summarizes the data from a sample taken during one hour. b) 1)The true mean really is not 29.5 mm 2)The true mean really is 29.5 mm but we produced 29.4943 mm by random chance

Alternate Example: Don’t break the ice In the children’s game Don’t Break the Ice, small plastic ice cubes are squeezed into a square frame. Each child takes turns tapping out a cube of “ice” with a plastic hammer, hoping that the remaining cubes don’t collapse. For the game to work correctly, the cubes must be big enough so that they hold each other in place in the plastic frame but not so big that they are too difficult to tap out. The machine that produces the plastic cubes is designed to make cubes that are 29.5 millimeters (mm) wide, but the actual width varies a little. To ensure that the machine is working well, a supervisor inspects a random sample of 50 cubes every hour and measures their width. The Fathom output summarizes the data from a sample taken during one hour.

Alternate Example: Don’t break the ice In the children’s game Don’t Break the Ice, small plastic ice cubes are squeezed into a square frame. Each child takes turns tapping out a cube of “ice” with a plastic hammer, hoping that the remaining cubes don’t collapse. For the game to work correctly, the cubes must be big enough so that they hold each other in place in the plastic frame but not so big that they are too difficult to tap out. The machine that produces the plastic cubes is designed to make cubes that are 29.5 millimeters (mm) wide, but the actual width varies a little. To ensure that the machine is working well, a supervisor inspects a random sample of 50 cubes every hour and measures their width. The Fathom output summarizes the data from a sample taken during one hour. p-value: using the t distribution with 40 degrees of freedom, the p-value is greater than 2(0.25) = 0.50. Using technology: With df = 49, the calculator’s t-test gives a p-value = 2(0.32395) = 0.6479

On the calculator: T-Test Stats t = -0.4595 p = 0.6479 df = 50-1 = 49

Paired Data Comparative studies are more convincing than single-sample investigations. For that reason, one-sample inference is less common than comparative inference. Study designs that involve making two observations on the same individual, or one observation on each of two similar individuals, result in paired data. When paired data result from measuring the same quantitative variable twice, we can make comparisons by analyzing the differences in each pair. If the conditions for inference are met, we can use one-sample t procedures to perform inference about the mean difference µ d. These methods are sometimes called paired t procedures. Our null hypothesis will that there is no mean difference between the two items being compared (H 0 : µ d = 0). Our alternative will be that there is a mean difference (H a : µ d > 0). Be sure to define the order in which you are subtracting.

Alternate Example: Is the express lane faster? For their second semester project in AP Statistics, Libby and Kathryn decided to investigate which line was faster in the supermarket: the express lane or the regular lane. To collect their data, they randomly selected 15 times during a week, went to the same store, and bought the same item. However, one of them used the express lane and the other used a regular lane. To decide which lane each of them would use, they flipped a coin. If it was heads, Libby used the express lane and Kathryn used the regular lane. If it was tails, Libby used the regular lane and Kathryn used the express lane. They entered their randomly assigned lanes at the same time, and each recorded the time in seconds it took them to complete the transaction. Carry out a test to see if there is convincing evidence that the express lane is faster. Since these data are paired, we will consider the differences in time (regular − express). Here are the 15 differences. In this case, a positive difference means that the express lane was faster. 5246−46121305579−94 −17952014129−3942 Now find the mean difference by adding up these differences and dividing by 15. 640/15 = 42.7

Alternate Example: Is the express lane faster? For their second semester project in AP Statistics, Libby and Kathryn decided to investigate which line was faster in the supermarket: the express lane or the regular lane. To collect their data, they randomly selected 15 times during a week, went to the same store, and bought the same item. However, one of them used the express lane and the other used a regular lane. To decide which lane each of them would use, they flipped a coin. If it was heads, Libby used the express lane and Kathryn used the regular lane. If it was tails, Libby used the regular lane and Kathryn used the express lane. They entered their randomly assigned lanes at the same time, and each recorded the time in seconds it took them to complete the transaction. Carry out a test to see if there is convincing evidence that the express lane is faster.

On the calculator: T-Test Data Mu 0: 0 List Mu: >m0 Df: 14 t = 1.9668 p = 0.0347

What is the difference between statistical and practical significance? Statistical Significance and Practical Importance When a null hypothesis (“no effect” or “no difference”) can be rejected at the usual levels (α = 0.05 or α = 0.01), there is good evidence of a difference. But that difference may be very small. When large samples are available, even tiny deviations from the null hypothesis will be significant. Statistical Significance and Practical Importance When a null hypothesis (“no effect” or “no difference”) can be rejected at the usual levels (α = 0.05 or α = 0.01), there is good evidence of a difference. But that difference may be very small. When large samples are available, even tiny deviations from the null hypothesis will be significant. Remember the wise saying: Statistical significance is not the same thing as practical importance. The remedy for attaching too much importance to statistical significance is to pay attention to the actual data as well as to the p-value. Plot the data and examine it carefully. Are there outliers or other departures from a consistent pattern? A few outlying observations can produce highly significant tests.

What is the problem of multiple tests? Beware of Multiple Analyses Statistical significance ought to mean that you have found a difference that you were looking for. The reasoning behind statistical significance works well if you decide what difference you are seeking, design a study to search for it, and use a significance test to weigh the evidence you get. In other settings, significance may have little meaning.

Suppose that 20 significance tests were conducted and in each case the null hypothesis was true. What is the probability that we avoid a Type I error in all 20 tests?

9.3 Tests about a Population Mean Objectives SWBAT: STATE and CHECK the Random, 10%, and Normal/Large Sample conditions for performing a significance test.

Similar presentations

Presentation on theme: "9.3 Tests about a Population Mean Objectives SWBAT: STATE and CHECK the Random, 10%, and Normal/Large Sample conditions for performing a significance test."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

9.3 Tests about a Population Mean Objectives SWBAT: STATE and CHECK the Random, 10%, and Normal/Large Sample conditions for performing a significance test.

Similar presentations

Presentation on theme: "9.3 Tests about a Population Mean Objectives SWBAT: STATE and CHECK the Random, 10%, and Normal/Large Sample conditions for performing a significance test."— Presentation transcript:

Similar presentations

About project

Feedback