Copyright © 2009 Pearson Education, Inc. Chapter 21 More About Tests.

Slides:



Advertisements
Similar presentations
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Advertisements

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Hypotheses tests for proportions
ONE-PROPORTION Z-TESTS CHAPTER 20 PART 3. 4 Steps : 1)State the hypotheses 2)Check conditions and model (Normal model) 3)Mechanics (Find z-score and P-value)
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 21 More About Tests.
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
More About Tests and Intervals
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Lecture 2: Thu, Jan 16 Hypothesis Testing – Introduction (Ch 11)
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
AP Statistics: Chapter 20
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
Objective: To test claims about inferences for proportions, under specific conditions.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 21 More About Tests.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Fundamentals of Hypothesis Testing: One-Sample Tests
March  There is a maximum of one obtuse angle in a triangle, but can you prove it?  To prove something like this, we mathematicians must do a.
Testing Hypotheses About Proportions
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 20 Testing Hypotheses About Proportions.
TESTING HYPOTHESES ABOUT PROPORTIONS CHAPTER 20. ESSENTIAL CONCEPTS Hypothesis testing involves proposing a model, then determining if the data we observe.
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
More About Tests and Intervals Chapter 21. Zero In on the Null Null hypotheses have special requirements. To perform a hypothesis test, the null must.
Chapter 21: More About Tests “The wise man proportions his belief to the evidence.” -David Hume 1748.
Copyright © 2009 Pearson Education, Inc. Chapter 21 More About Tests.
Chapter 11 Testing Hypotheses about Proportions © 2010 Pearson Education 1.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Chapter 20 Testing hypotheses about proportions
Hypotheses tests for means
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20 Testing Hypotheses About Proportions.
Copyright © 2012 Pearson Education. Chapter 13 Testing Hypotheses.
Chapter 8 Testing Hypotheses about Proportions Part II: Significance Levels, Type I and Type II Errors, Power 1.
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Chapter 20 Testing Hypothesis about proportions
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Chapter 21: More About Test & Intervals
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 20, Slide 1 Chapter 20 More about Tests and Intervals.
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Slide 21-1 Copyright © 2004 Pearson Education, Inc.
Chapter 21: More About Tests
AP Statistics Chapter 21 Notes
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Slide 20-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2009 Pearson Education, Inc. Chapter 20 Testing Hypotheses About Proportions.
Statistics 20 Testing Hypothesis and Proportions.
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Introduction to Inference Tests of Significance Proof
Module 10 Hypothesis Tests for One Population Mean
More About Tests and Intervals
Type II Error, Power and Sample Size Calculations
Testing Hypotheses about Proportions
Chapter 21 More About Tests.
Testing Hypotheses About Proportions
AP Statistics More About Tests and Intervals
More about Tests and Intervals
Testing Hypotheses about Proportions
More About Tests and Intervals
Testing Hypotheses About Proportions
Testing Hypotheses About Proportions
CHAPTER 9 Testing a Claim
Chapter 9: Significance Testing
Stats: Modeling the World
Testing Hypotheses About Proportions
Presentation transcript:

Copyright © 2009 Pearson Education, Inc. Chapter 21 More About Tests

Copyright © 2009 Pearson Education, Inc. Slide 1- 3 Objectives: The student will be able to: Explain Type I and Type II errors in the context of the problem.

Copyright © 2009 Pearson Education, Inc. Slide 1- 4 Zero In on the Null Null hypotheses have special requirements. To perform a hypothesis test, the null must be a statement about the value of a parameter for a model. We then use this value to compute the probability that the observed sample statistic—or something even farther from the null value—will occur.

Copyright © 2009 Pearson Education, Inc. Slide 1- 5 Zero In on the Null (cont.) There is a temptation to state your claim as the null hypothesis. However, you cannot prove a null hypothesis true. So, it makes more sense to use what you want to show as the alternative. This way, when you reject the null, you are left with what you want to show.

Copyright © 2009 Pearson Education, Inc. Slide 1- 6 How to Think About P-Values A P-value is a conditional probability—the probability of the observed statistic given that the null hypothesis is true. The P-value is NOT the probability that the null hypothesis is true. It’s not even the conditional probability that null hypothesis is true given the data. Be careful to interpret the P-value correctly.

Copyright © 2009 Pearson Education, Inc. Slide 1- 7 What to Do with a High P-Value When we see a small P-value, this means our observed event would be very very rare if the null hypothesis were true. We use this as evidence to reject the null hypothesis However big P-values just mean what we observed isn’t surprising. That is, the results are now in line with our assumption that the null hypothesis models the world, so we have no reason to reject it. A big P-value doesn’t prove that the null hypothesis is true, but it certainly offers no evidence that it is not true. Thus, when we see a large P-value, all we can say is that we “don’t reject the null hypothesis.”

Copyright © 2009 Pearson Education, Inc. Slide 1- 8 Alpha Levels We can define “rare event” arbitrarily by setting a threshold for our P-value. If our P-value falls below that point, we’ll reject H 0. We call such results statistically significant. The threshold is called an alpha level, denoted by .

Copyright © 2009 Pearson Education, Inc. Slide 1- 9 Alpha Levels (cont.) Common alpha levels are 0.10, 0.05, and You have the option—almost the obligation—to consider your alpha level carefully and choose an appropriate one for the situation. The alpha level is also called the significance level. When we reject the null hypothesis, we say that the test is “significant at that level.”

Copyright © 2009 Pearson Education, Inc. Slide Alpha Levels (cont.) What can you say if the P-value does not fall below  ? You should say that “The data have failed to provide sufficient evidence to reject the null hypothesis.” Don’t say that you “accept the null hypothesis.”

Copyright © 2009 Pearson Education, Inc. Slide Alpha Levels (cont.) Recall that, in a jury trial, if we do not find the defendant guilty, we say the defendant is “not guilty”—we don’t say that the defendant is “innocent.”

Copyright © 2009 Pearson Education, Inc. Slide What Not to Say About Significance What do we mean when we say that a test is statistically significant? All we mean is that the test statistic had a P- value lower than our alpha level. Don’t be lulled into thinking that statistical significance carries with it any sense of practical importance or impact.

Copyright © 2009 Pearson Education, Inc. Slide What Not to Say About Significance (cont.) For large samples, even small, unimportant (“insignificant”) deviations from the null hypothesis can be statistically significant. On the other hand, if the sample is not large enough, even large, financially or scientifically “significant” differences may not be statistically significant.

Copyright © 2009 Pearson Education, Inc. Slide Confidence Intervals and Hypothesis Tests Confidence intervals and hypothesis tests are built from the same calculations. They have the same assumptions and conditions. You can approximate a hypothesis test by examining a confidence interval. Just ask whether the null hypothesis value is consistent with a confidence interval for the parameter at the corresponding confidence level.

Copyright © 2009 Pearson Education, Inc. Slide Confidence Intervals and Hypothesis Tests (cont.) Because confidence intervals are two-sided, they correspond to two-sided tests. In general, a confidence interval with a confidence level of C% corresponds to a two-sided hypothesis test with an  -level of 100 – C%.

Copyright © 2009 Pearson Education, Inc. Recall … There are supposed to be 20% orange M&Ms in a bag. Suppose a bag of 122 has only 21 orange ones. Does this contradict the company’s 20% claim? We previously solved this using a 2-sided one proportion z-Test. H 0 : p = 0.2 H A : p≠0.2 Alternatively we can look at the 95% confidence interval on p^. This reflects 95% confidence that the true proportion is within the identified range. p^±1.96*SE(p^) We ask, does this interval include the null hypothesis proportion? Slide 1- 16

Copyright © 2009 Pearson Education, Inc. Slide Confidence Intervals and Hypothesis Tests (cont.) The relationship between confidence intervals and one-sided hypothesis tests is a little more complicated. A confidence interval with a confidence level of C% corresponds to a one-sided hypothesis test with an  -level of ½(100 – C)%. So 90% confidence corresponds to an alpha of 5%. For a one-sided test, we use a “looser” confidence interval because we are only really concerned with one side of the interval.

Copyright © 2009 Pearson Education, Inc. A 1996 report from the U.S. Consumer Product Safety Commission claimed that at least 90% of all American homes have at least one smoke detector. A city’s fire department has been running a public safety campaign about smoke detectors consisting of posters, billboards, and ads on radio, TV, and in the newspaper. The city wonders if this concerted effort has raised the local level above the 90% national rate. Building inspectors visit 400 randomly selected homes and find that 376 have smoke detectors. Is this strong evidence that the local rate is higher than the national rate? Slide Recall our example from chapter 20

Copyright © 2009 Pearson Education, Inc. We solved this using a one-proportion z-Test Mechanics: n=400, x=376. p^=376/400=0.94 p 0 =0.9 and q 0 =0.1 z = (0.94 – 0.90) / √( (0.9)(0.1) / 400 ) = 2.67 P = P(z>2.67) = Alternatively we for an alpha level of 0.05 (5%) we could have looked at the 90% confidence interval on p^. This interval is p^± z*SE(p^) Slide 1- 19

Copyright © 2009 Pearson Education, Inc. Practice In May 2007, George W. Bush’s approval rating stood at 30% according to a CBS News/ New York Times national survey of 1125 randomly selected adults Make a 95% confidence interval for his approval rating Based on the confidence interval, test the null hypothesis that Bush’s approval rating was no better than the 27% level established by Richard Nixon during the Watergate scandal. What is the significance level of this test? Slide 1- 20

Copyright © 2009 Pearson Education, Inc. Aside… Notation etc. Slide 1- 21

Copyright © 2009 Pearson Education, Inc. Which P is Which? p is the true population proportion. This is usually unknown. Sometimes we call it p 0 which is an assumed value under the null hypothesis. p^ is our observed sample proportion. P is the P-value found using a z-Test which reflects the probability of observing a certain proportion in a sample (p^) under the null hypothesis. P(z>z*) is the probability that a z-score z is greater than a critical z-score z*. Slide 1- 22

Copyright © 2009 Pearson Education, Inc. When to use SE(p^) and when to use SD(p^) We use SE(p^) when we are looking at confidence intervals of an observed proportion when we don’t know the actual population parameters As part of creating a one-proportion z-interval As part of our conclusion statement following rejecting the null hypothesis in a on-proportion z-Test We use SD(p^) when we are conducting a one- proportion z-Test. In this case we are assuming the null hypothesis, that p^=p 0, thus we use p 0 and q 0 in our calculation and call it a Standard Deviation because we have assumed a known model. Slide 1- 23

Copyright © 2009 Pearson Education, Inc. What if we’re wrong… We can be incorrect in 2 different ways Lets talk about these ways Slide 1- 24

Copyright © 2009 Pearson Education, Inc. Slide Making Errors Nobody’s perfect. Even with lots of evidence we can still make the wrong decision. When we perform a hypothesis test, we can make mistakes in two ways: I. The null hypothesis is true, but we mistakenly reject it. (Type I error) II. The null hypothesis is false, but we fail to reject it. (Type II error)

Copyright © 2009 Pearson Education, Inc. Slide Making Errors (cont.) Which type of error is more serious depends on the situation at hand. In other words, the gravity of the error is context dependent. Here’s an illustration of the four situations in a hypothesis test:

Copyright © 2009 Pearson Education, Inc. Slide Making Errors (cont.) How often will a Type I error occur? Since a Type I error is rejecting a true null hypothesis, the probability of a Type I error is our  level. When H 0 is false and we reject it, we have done the right thing. A test’s ability to detect a false hypothesis is called the power of the test.

Copyright © 2009 Pearson Education, Inc. Slide Making Errors (cont.) When H 0 is false and we fail to reject it, we have made a Type II error. We assign the letter  to the probability of this mistake. It’s harder to assess the value of  because we don’t know what the value of the parameter really is.

Copyright © 2009 Pearson Education, Inc. Slide Making Errors (cont.) One way to focus our attention on a particular  is to think about the effect size. Ask “How big a difference would matter?” We could reduce  for all alternative parameter values by increasing . This would reduce  but increase the chance of a Type I error. This tension between Type I and Type II errors is inevitable. The only way to reduce both types of errors is to collect more data. Otherwise, we just wind up trading off one kind of error against the other.

Copyright © 2009 Pearson Education, Inc. Examples – Type I and Type II errors Production managers on an assembly line must monitor to be sure that the level of defective products remains small. They periodically inspect a random sample of the items produced. If they find a significant increase in the production of items that must be rejected, they will halt the assembly process until the problem can be identified and repaired. What are the hypotheses? Would you do a one-tailed or two tailed test? In this context, what is a Type I error? Type II? Highway safety engineers test new road signs, hoping that increased reflectivity will make them more visible to drivers. Volunteers drive through a test course with several of the new and old style signs and rate which kind shows up the best. What are the hypotheses? In this context, what are the Type I and Type II errors? Slide 1- 30

Copyright © 2009 Pearson Education, Inc. In 2005 the US Census Bureau reported that 68.9% of American families owned their homes. Census data reveal that the ownership rate in one small city is much lower. The city council is debating a plan to offer tax breaks to first- time home buyers in order to encourage people to become home owners. They decide to adopt the plan of a 2-year trial basis and use the data they collect to make a decision about continuing the tax breaks. Since this plan costs the city tax revenues, they will continue to use it only if there is strong evidence that the rate of home ownership is increasing. What will their hypotheses be? H 0 : the rate of home ownership does not change H A : the rate of home ownership increases What would a type I error be? Type I we reject H 0 when H 0 is true. So we think that the ownership has increased but really it has stayed the same What would a type II error be? Type II we accept H 0 when really H 0 is false. This occurs if we think that the rate of ownership has stayed the same when really it has increased. For each type of error, who will be harmed? Type I – the tax payers/city would be harmed, Type II – future first time home buyers who will not get the tax break Slide 1- 31

Copyright © 2009 Pearson Education, Inc. More practice Dropouts A statistics professor has observed that about 13% of the students who initially enroll in his Introductory Stats class withdraw before the end of the semester. A salesman suggests that he try a statistics software package that gets students more involved with computers, predicting it will cut the dropout rate. The software is expensive so the salesman offers to let the professor use it for a semester to see if the dropout rate goes down significantly. Is this a one-tailed or two-tailed test? Write the null and alternative hypotheses In this context, what would happen if the professor makes a Type I error? In this context, what would happen if the professor makes a Type II error? Slide 1- 32

Copyright © 2009 Pearson Education, Inc. Dropouts part II Initially 203 students signed up for the course. They used the software suggested by the salesman and only 11 dropped out of the course. Should the professor spend the money for this software? Support your recommendation with an appropriate test. Explain what your P-value means Slide 1- 33

Copyright © 2009 Pearson Education, Inc. Remaining slides are For Your Information Slide 1- 34

Copyright © 2009 Pearson Education, Inc. Power – motivating example Suppose you are testing a new type of running shoe to see if it’s worth the money. If the new shoe made you 5% faster, how easy would it be to tell? It would take a lot of runs to make that difference clear. On the other hand if the shoe made you twice as fast, it should be clear from a very small sample size (of test runs). The effect size and the sample size are important to determining the power of a test. Slide 1- 35

Copyright © 2009 Pearson Education, Inc. Slide Power The power of a test is the probability that it correctly rejects a false null hypothesis. Noticing a difference that is really there When the power is high, we can be confident that we’ve looked hard enough at the situation. The power of a test is 1 – .  is the probability of falsely accepting an incorrect null hypothesis

Copyright © 2009 Pearson Education, Inc. Slide Power (cont.) Whenever a study fails to reject its null hypothesis, the test’s power comes into question. We have to ask ourselves, was the null hypothesis really true or did our test lack power to detect a real difference. When we calculate power, we imagine that the null hypothesis is false. The value of the power depends on how far the truth lies from the null hypothesis value. The distance between the null hypothesis value, p 0, and the truth, p, is called the effect size. Power depends directly on effect size.

Copyright © 2009 Pearson Education, Inc. Slide A Picture Worth Words The larger the effect size, the easier it should be to see it. Obtaining a larger sample size decreases the probability of a Type II error, so it increases the power. It also makes sense that the more we’re willing to accept a Type I error, the less likely we will be to make a Type II error.

Copyright © 2009 Pearson Education, Inc. Slide This diagram shows the relationship between these concepts: A Picture Worth Words (cont)

Copyright © 2009 Pearson Education, Inc. Slide Reducing Both Type I and Type II Error The previous figure seems to show that if we reduce Type I error, we must automatically increase Type II error. But, we can reduce both types of error by making both curves narrower. How do we make the curves narrower? Increase the sample size.

Copyright © 2009 Pearson Education, Inc. Slide Reducing Both Type I and Type II Error (cont.) This figure has means that are just as far apart as in the previous figure, but the sample sizes are larger, the standard deviations are smaller, and the error rates are reduced:

Copyright © 2009 Pearson Education, Inc. Slide Reducing Both Type I and Type II Error (cont.) Original comparison of errors: Comparison of errors with a larger sample size:

Copyright © 2009 Pearson Education, Inc. Slide What Can Go Wrong? Don’t interpret the P-value as the probability that H 0 is true. The P-value is about the data, not the hypothesis. It’s the probability of the data given that H 0 is true, not the other way around. Don’t believe too strongly in arbitrary alpha levels. It’s better to report your P-value and a confidence interval so that the reader can make her/his own decision.

Copyright © 2009 Pearson Education, Inc. Slide What Can Go Wrong? (cont.) Don’t confuse practical and statistical significance. Just because a test is statistically significant doesn’t mean that it is significant in practice. And, sample size can impact your decision about a null hypothesis, making you miss an important difference or find an “insignificant” difference. Don’t forget that in spite of all your care, you might make a wrong decision.