AP Statistics More About Tests and Intervals

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide
Advertisements

Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Hypotheses tests for proportions
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 21 More About Tests.
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
More About Tests and Intervals
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Lecture 2: Thu, Jan 16 Hypothesis Testing – Introduction (Ch 11)
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 21 More About Tests.
More About Tests and Intervals
Testing Hypotheses About Proportions
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
More About Tests and Intervals Chapter 21. Zero In on the Null Null hypotheses have special requirements. To perform a hypothesis test, the null must.
Chapter 21: More About Tests “The wise man proportions his belief to the evidence.” -David Hume 1748.
Copyright © 2009 Pearson Education, Inc. Chapter 21 More About Tests.
Chapter 11 Testing Hypotheses about Proportions © 2010 Pearson Education 1.
Chapter 20 Testing hypotheses about proportions
Hypotheses tests for means
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20 Testing Hypotheses About Proportions.
Copyright © 2012 Pearson Education. Chapter 13 Testing Hypotheses.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Chapter 8 Testing Hypotheses about Proportions Part II: Significance Levels, Type I and Type II Errors, Power 1.
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Chapter 20 Testing Hypothesis about proportions
Chapter 21: More About Test & Intervals
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 20, Slide 1 Chapter 20 More about Tests and Intervals.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Slide 21-1 Copyright © 2004 Pearson Education, Inc.
Chapter 21: More About Tests
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Slide 20-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Statistics 20 Testing Hypothesis and Proportions.
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Chapter Nine Hypothesis Testing.
Module 10 Hypothesis Tests for One Population Mean
More About Tests and Intervals
CHAPTER 9 Testing a Claim
Section Testing a Proportion
Type II Error, Power and Sample Size Calculations
Chapter 9: Testing a Claim
Testing Hypotheses about Proportions
AP Statistics Comparing Two Proportions
Chapter 21 More About Tests.
Testing Hypotheses About Proportions
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
More about Tests and Intervals
Testing Hypotheses about Proportions
More About Tests and Intervals
Hypothesis Testing Two Proportions
Testing Hypotheses About Proportions
Daniela Stan Raicu School of CTI, DePaul University
Tests of Significance.
Testing Hypotheses About Proportions
CHAPTER 9 Testing a Claim
Chapter 9: Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 9: Significance Testing
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Testing Hypotheses About Proportions
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Presentation transcript:

AP Statistics More About Tests and Intervals Chapter 21 AP Statistics More About Tests and Intervals

Objectives Alpha Level Statistically Significant Significance Level Confidence Interval as a Hypothesis Test Type I Error Type II Error Power Effect Size

Conclusions in Hypothesis Testing P-value is small, therefore reject the null hypothesis H0 and accept the alternative hypothesis Ha. P-value is large, therefore fail to reject the null hypothesis H0. Fail to reject does not mean you accept H0. Fail to reject more correctly means the available evidence is not strong enough to warrant rejection of H0.

Alpha Levels Sometimes we need to make a firm decision about whether or not to reject the null hypothesis. When the P-value is small, it tells us that our data are rare given the null hypothesis. How rare is “rare”?

Alpha Levels We can define “rare event” arbitrarily by setting a threshold for our P-value. If our P-value falls below that point, we’ll reject H0. We call such results statistically significant. The threshold is called an alpha level, denoted by .

Statistically Significant Means, the results observed were not likely to have occurred by chance alone. A statistically significant result is a result with a small p-value.

Alpha Levels Common alpha levels are 0.10, 0.05, and 0.01. You have the option—almost the obligation—to consider your alpha level carefully and choose an appropriate one for the situation. The alpha level is also called the significance level. When we reject the null hypothesis, we say that the test is “significant at that level.”

Alpha Levels What can you say if the P-value does not fall below ? You should say that “The data have failed to provide sufficient evidence to reject the null hypothesis.” Don’t say that you “accept the null hypothesis.” Recall that, in a jury trial, if we do not find the defendant guilty, we say the defendant is “not guilty”—we don’t say that the defendant is “innocent.”

Alpha Levels The P-value gives the reader far more information than just stating that you reject or fail to reject the null. In fact, by providing a P-value to the reader, you allow that person to make his or her own decisions about the test. What you consider to be statistically significant might not be the same as what someone else considers statistically significant. There is more than one alpha level that can be used, but each test will give only one P-value.

Summary Significance Level (α) The decisive value of p-value for determining statistical significance. If P-value is < α, we say that the data are statistically significant at α level. If we choose α = .05 (5%), we are requiring that the data give evidence against H0 so strong that it would happen no more than 5% of the time (1 time in 20) when H0 is true. For a two-tailed test, significance level is the complement of confidence level. A 95% confidence level is the same as = .05.

Significant vs. Important What do we mean when we say that a test is statistically significant? All we mean is that the test statistic had a P-value lower than our alpha level. Don’t be lulled into thinking that statistical significance carries with it any sense of practical importance or impact.

Significant vs. Important For large samples, even small, unimportant (“insignificant”) deviations from the null hypothesis can be statistically significant. On the other hand, if the sample is not large enough, even large, financially or scientifically “significant” differences may not be statistically significant. It’s good practice to report the magnitude of the difference between the observed statistic value and the null hypothesis value (in the data units) along with the P-value on which we base statistical significance.

Confidence Intervals and Hypothesis Tests Confidence intervals and hypothesis tests are built from the same calculations. They have the same assumptions and conditions. You can approximate a hypothesis test by examining a confidence interval. Just ask whether the null hypothesis value is consistent with a confidence interval for the parameter at the corresponding confidence level.

Confidence Intervals and Hypothesis Tests Because confidence intervals are two-sided, they correspond to two-sided tests. In general, a confidence interval with a confidence level of C% corresponds to a two-sided hypothesis test with an -level of 100 – C%.

Confidence Intervals and Hypothesis Tests The relationship between confidence intervals and one-sided hypothesis tests is a little more complicated. A confidence interval with a confidence level of C% corresponds to a one-sided hypothesis test with an -level of ½(100 – C)%.

Tests from Confidence Intervals A two-sided test at significance level α can be carried out directly from a confidence interval with confidence level C = 1- α. Two-sided test at significance level α = .05, then the confidence level would be 95% for the confidence interval. A level α two-sided significance test rejects the null hypothesis H0: p=p0 when the value of p0 falls outside the level 1-α confidence interval for . Level α =.01 two-sided significance test with H0: p=p0, calculate 99% confidence interval for and if p0 is not in the interval, reject H0.

Example – Making a decision based on a CI The baseline seven-year risk of heart attacks for diabetics is 20.2%. In 2007 a NEJM study reported a 95% confidence interval equivalent to 20.8% to 40.0% for the risk among patients taking the diabetes drug Avandia. What did this confidence interval suggest to the FDA about the safety of the drug? The FDA could be 95% confident that the interval from 20.8% to 40.0% included the true risk of heart attack for diabetes patients taking Avandia. Because the lower limit of this interval was higher than the baseline risk of 20.2%, there was evidence of an increased risk.

Example Teens are at the greatest risk of being killed or injured in traffic crashes. Because many of these deaths could easily be prevented by the use of safety belts, several states have begun “Click It or Ticket” campaigns to increase seatbelt use. Overall use in Mass. Quickly increased from 51% in 2002 to 64.8% in 2006, with a goal of surpassing the national average of 82%. Recently, a local newspaper reported that a roadblock resulted in 23 tickets to drivers who were unbelted out of 134 stopped for inspection. Does this provide evidence that the goal of over 82% compliance was met? Use a confidence interval to test this hypothesis.

Solution Hypothesis H0: p=.82 Ha: p>.82 Calculate a 90% CI – To use a confidence interval, we need a confidence level that corresponds to the alpha level of the test. If we use =.05, we should construct a 90% CI, because this is a one-sided test. That will leave 5% on each side of the observed proportion (an -level of ½(100 – C)%).

Solution P A N I C Parameter p0: 82% national average safety belt use. Assumptions Randomization Condition: This wasn’t a random sample, but I assume these drivers are representative of the driving public. 10% Condition: The police stopped fewer than 10% of all drivers. Success/Failure Condition: np=111>10 and nq=23>10.

Solution Name One sample proportion z-interval Interval

Solution Conclusion in context I am 90% confident that between 77.4% and 88.2% of all drivers wear their seatbelts. Hypothesis test conclusion Because the hypothesized rate of 82% is within this interval, I fail to reject the null hypothesis. There is insufficient evidence to conclude that the campaign was truly effective and now more than 82% of all drivers are wearing seatbelts.

Your Turn Suppose a researcher believes that the voters in a particular state are not going to follow their usual trend in some particular year. Ordinarily 65% of them are Republicans and 35% of them are Democrats, but this time the belief is that some recent event in the state is going to have an unpredictable but strong impact on the voters, preference. That is, the researcher’s theory is that the event will be meaningful, though she would not like to venture a guess as to which party will profit from it. The researcher plans to test the null hypothesis using a 95% Confidence Interval. The researcher randomly polls 450 voters in all and it turns out that 223 of them are Republicans.

*A 95% Confidence Interval for Small Samples When the Success/Failure Condition fails, all is not lost. A simple adjustment to the calculation lets us make a confidence interval anyway. All we do is add four phony observations, two successes and two failures. So instead of we use the adjusted proportion

*A Better Confidence Interval for Proportions Now the adjusted interval is The adjusted form gives better performance overall and works much better for proportions near 0 or 1. It has the additional advantage that we no longer need to check the Success/Failure Condition.

Example – “Plus-Four” Interval Surgeons examined their results to compare two methods for a surgical procedure used to alleviate pain on the outside of the wrist. A new method was compared with the traditional method for the procedure. Of 45 operations using the traditional method, 3 were unsuccessful, for a failure rate of 6.7%. With only 3 failures, the data doesn’t satisfy the Success/Failure Condition, so we can’t use a standard confidence interval. What’s the 95% confidence interval using the “plus-four” method?

Solution The “plus-four” interval is There were 42 successes and 3 failures. Adding 2 successes and 2 failures, we find The 95% CI is

Making Errors Here’s some shocking news for you: nobody’s perfect. Even with lots of evidence we can still make the wrong decision. When we perform a hypothesis test, we can make mistakes in two ways: The null hypothesis is true, but we mistakenly reject it. (Type I error) The null hypothesis is false, but we fail to reject it. (Type II error)

Type I Error A Type I error is the mistake of rejecting the null hypothesis when it is true. (reject correct) The symbol (alpha) is used to represent the probability of a type I error. Example on page 375 of text

Type II Error A Type II error is the mistake of failing to reject the null hypothesis when it is false. (fail reject incorrect) The symbol  (beta) is used to represent the probability of a type II error.

Type I and II errors—court of law H0: The person on trial is not a thief. (In the U.S., people are considered innocent unless proven otherwise.) Ha: The person on trial is a thief. (The police believe this person is the main suspect.) A Type I error is made if a jury convicts a truly innocent person. (They reject the null hypothesis even though the null hypothesis is actually true.) A Type II error is made if a truly guilty person is set free. (The jury fails to reject the null hypothesis even though the null hypothesis is false.)

Making Errors Which type of error is more serious depends on the situation at hand. In other words, the gravity of the error is context dependent. Here’s an illustration of the four situations in a hypothesis test:

Example Given the hypothesis H0: Sam is innocent. What would a type-I error be? Innocent Sam goes to jail. (reject correct H0) What would a type-II error be? Guilty Sam goes free. (fail reject incorrect H0)

Your Turn Given the hypothesis H0: The projector blub does not need replacing. What would a type-I error be? What would a type-II error be?

Making Errors How often will a Type I error occur? Since a Type I error is rejecting a true null hypothesis, the probability of a Type I error is our  level. When H0 is false and we reject it, we have done the right thing. A test’s ability to detect a false hypothesis is called the power of the test.

Making Errors When H0 is false and we fail to reject it, we have made a Type II error. We assign the letter  to the probability of this mistake. It’s harder to assess the value of  because we don’t know what the value of the parameter really is. There is no single value for , we can think of a whole collection of ’s, one for each incorrect parameter value.

Making Errors One way to focus our attention on a particular  is to think about the effect size. Ask “How big a difference would matter?” We could reduce  for all alternative parameter values by increasing . This would reduce  but increase the chance of a Type I error. This tension between Type I and Type II errors is inevitable. The only way to reduce both types of errors is to collect more data. Otherwise, we just wind up trading off one kind of error against the other.

Power The power of a test is the probability that it correctly rejects a false null hypothesis. When the power is high, we can be confident that we’ve looked hard enough at the situation. The power of a test is 1 –  ; because  is the probability that a test fails to reject a false null hypothesis and power is the probability that it does reject.

Power Whenever a study fails to reject its null hypothesis, the test’s power comes into question. When we calculate power, we imagine that the null hypothesis is false. The value of the power depends on how far the truth lies from the null hypothesis value. The distance between the null hypothesis value, p0, and the truth, p, is called the effect size. Power depends directly on effect size.

A Picture Worth Words The larger the effect size, the easier it should be to see it. Obtaining a larger sample size decreases the probability of a Type II error, so it increases the power. It also makes sense that the more we’re willing to accept a Type I error, the less likely we will be to make a Type II error.

A Picture Worth Words This diagram shows the relationship between these concepts:

Reducing Both Type I and Type II Error The previous figure seems to show that if we reduce Type I error, we must automatically increase Type II error. But, we can reduce both types of error by making both curves narrower. How do we make the curves narrower? Increase the sample size. Why? Larger sample size less variation.

Reducing Both Type I and Type II Error This figure has means that are just as far apart as in the previous figure, but the sample sizes are larger, the standard deviations are smaller, and the error rates are reduced:

Reducing Both Type I and Type II Error Original comparison of errors: Comparison of errors with a larger sample size: