Type II Error, Power and Sample Size Calculations

Slides:



Advertisements
Similar presentations
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Advertisements

Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 21 More About Tests.
Introduction to Hypothesis Testing
Lecture 2: Thu, Jan 16 Hypothesis Testing – Introduction (Ch 11)
Introduction to Hypothesis Testing
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Ch 11 實習 (2).
Chapter 8 Introduction to Hypothesis Testing
Type II Error, Power and Sample Size Calculations
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 21 More About Tests.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Confidence Intervals and Hypothesis Testing - II
Introduction to Hypothesis Testing
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 9 Introduction to Hypothesis Testing.
Fundamentals of Hypothesis Testing: One-Sample Tests
More About Tests and Intervals Chapter 21. Zero In on the Null Null hypotheses have special requirements. To perform a hypothesis test, the null must.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Chapter 4 Introduction to Hypothesis Testing Introduction to Hypothesis Testing.
1 Introduction to Hypothesis Testing. 2 What is a Hypothesis? A hypothesis is a claim A hypothesis is a claim (assumption) about a population parameter:
Copyright © 2009 Pearson Education, Inc. Chapter 21 More About Tests.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Hypotheses tests for means
Chapter 8 Testing Hypotheses about Proportions Part II: Significance Levels, Type I and Type II Errors, Power 1.
1 Required Sample Size, Type II Error Probabilities Chapter 23 Inference for Means: Part 2.
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Chapter 20 Testing Hypothesis about proportions
Chapter 21: More About Test & Intervals
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 20, Slide 1 Chapter 20 More about Tests and Intervals.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Slide 21-1 Copyright © 2004 Pearson Education, Inc.
Chapter 21: More About Tests
Ch 11 實習 (2).
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Slide 20-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
Chapter Nine Hypothesis Testing.
Module 10 Hypothesis Tests for One Population Mean
Statistics for Managers Using Microsoft® Excel 5th Edition
CHAPTER 9 Testing a Claim
Hypothesis Testing: One Sample Cases
Keller: Stats for Mgmt & Econ, 7th Ed Hypothesis Testing
Introduction to Hypothesis Testing
Chapter 21 More About Tests.
Testing Hypotheses About Proportions
CHAPTER 9 Testing a Claim
More about Tests and Intervals
Chapter 9 Hypothesis Testing.
Testing Hypotheses about Proportions
More About Tests and Intervals
Decision Errors and Power
Testing Hypotheses About Proportions
Chapter 11: Introduction to Hypothesis Testing Lecture 5a
Daniela Stan Raicu School of CTI, DePaul University
Significance Tests: The Basics
Testing Hypotheses About Proportions
CHAPTER 9 Testing a Claim
Ch 11 實習 (2).
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Testing Hypotheses About Proportions
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Presentation transcript:

Type II Error, Power and Sample Size Calculations Inference for Means: Part 2 Type II Error, Power and Sample Size Calculations

Alpha Levels: a Threshold for the P-value Sometimes we need to make a firm decision about whether or not to reject the null hypothesis. When the P-value is small, it tells us that our data are rare if the null hypothesis is true. How rare is “rare”?

Alpha Levels - 2 We can define “rare event” arbitrarily by setting a threshold for our P-value. If our P-value falls below that threshold, we’ll reject H0. We call such results statistically significant. The threshold is called an alpha level, denoted by α.

Alpha Levels - 3 Common alpha levels are 0.10, 0.05, and 0.01. You have the option—almost the obligation—to consider your alpha level carefully and choose an appropriate one for the situation. The alpha level is also called the significance level. When we reject the null hypothesis, we say that the test is “significant at that level.” Rejection Region (RR): values of the test statistic z that lead to rejection of the null hypothesis H0.

Levels and Rejection Regions 1-Tail QTM1310/ Sharpe Levels and Rejection Regions 1-Tail If HA:  > 0 and =.10 then RR={t: t > 1.3114} n = 30 df = 29 If HA:  > 0 and =.05 then RR={t: t > 1.6991}  Rej Region .10 t > 1.3114 .05 t > 1.6991 .01 t > 2.4620 If HA:  > 0 and =.01 then RR={t: t > 2.4620} 5 5

Rejection Region: t > 1.8331 Recall: Sweetening colas Is there evidence that storage results in sweetness loss in colas? H0:  = 0 versus Ha:  > 0 (one-sided test) Use  = 0.05 Taster Sweetness loss 1 2.0 2 0.4 3 0.7 4 2.0 5 -0.4 6 2.2 7 -1.3 8 1.2 9 1.1 10 2.3 ___________________________ Average 1.02 Standard deviation 1.196 Degrees of freedom n − 1 = 9 Rejection Region: t > 1.8331 Test statistic: Conclusion: since the test statistic value 2.70 is in the RR, reject H0 :  = 0 in favor of Ha :  > 0; there is sufficient evidence to conclude that there is a loss of sweetness due to storage. P-value= P(t9 > 2.70) = . 012

Levels and Rejection Regions 1-Tail QTM1310/ Sharpe Levels and Rejection Regions 1-Tail If HA:  < 0 and =.10 then RR={t: t < -1.3114} n = 30 df = 29 If HA:  < 0 and =.05 then RR={t: t < -1.6991}  Rej Region .10 t < -1.3114 .05 t < -1.6991 .01 t < -2.4620 If HA:  < 0 and =.01 then RR={t: t < -2.4620} 7 7

Rejection region for a 2-tail test with α = 0.05 df = 29 A 2-tailed test means that area α/2 is in each tail, thus: -A middle area of 1 − α = .95, and tail areas of α /2 = 0.025. RR={t < -2.0452, t > 2.0452} From t Table

Levels and Rejection Regions 2-Tail QTM1310/ Sharpe Levels and Rejection Regions 2-Tail If HA:   0 and =.10, then RR={t: t < -1.6991, t >1.6991} n = 30 df = 29 If HA:   0 and =.05, then RR={t: t < -2.0452, t > 2.0452}  Rejection Region .10 t < -1.6991, t > 1.6991 .05 t < -2.0452, t > 2.0452 .01 t < -2.7564, t > 2.7564 If HA:   0 and =.01, then RR={t: t < -2.7564, t > 2.7564} 9 9

Summary:  Levels and Rejection Regions; n = 40, df = 39 HA:  > 0 HA:  < 0 HA:  ≠ 0  Rejection Region .01 t > 2.4258 .05 t > 1.6849 .10 t > 1.3036  Rejection Region .01 t < -2.4258 .05 t < -1.6849 .10 t < -1.3036  Rejection Region .01 t < -2.7079, t > 2.7079 .02 t < -2.4258, t > 2.4258 .05 t < -2.0227, t > 2.0227 .10 t < -1.6849, t > 1.6849

Alpha Levels (cont.) What can you say if the P-value does not fall below α ? You should say that “The data have failed to provide sufficient evidence to reject the null hypothesis.” Don’t say that you “accept the null hypothesis.”

Alpha Levels (cont.) H0: defendant innocent; HA: defendant guilty Recall that, in a jury trial, if we do not find the defendant guilty, we say the defendant is “not guilty”—we don’t say that the defendant is “innocent.”

Alpha Levels (cont.) The P-value gives the reader far more information than just stating that you reject or fail to reject the null. In fact, by providing a P-value to the reader, you allow that person to make his or her own choice of the significance level . What you consider to be statistically significant might not be the same as what someone else considers statistically significant. There is more than one alpha level that can be used, but each test will give only one P-value.

Confidence Intervals and Hypothesis Tests Because confidence intervals are two-sided, they correspond to two-sided (two-tailed) hypothesis tests. In general, a confidence interval with a confidence level of C% corresponds to a two-sided hypothesis test with an α-level of 100 – C%. For example: If a 2-sided hypothesis test at level .05 rejects H0 , then the null hypothesized value of  will not be in a 95% confidence interval calculated from the same data. A 95% confidence interval shows the values of null hypothesis values 0 for which a 2-sided hypothesis test at level .05 will NOT reject the null hypothesis.

Confidence Intervals and Hypothesis Tests: Example Sleep researchers claim that college students need at least 7 hours of sleep each night. A random sample of n = 25 college students was asked how many hours they slept the previous night. Summary: Construct a 90% confidence interval for the mean hours  slept by college students each night. Perform the hypothesis test H0:=7 vs Ha: 7 at =.10

H0:  = 0 , HA:  ¹ 0 at  =.10 will NOT reject H0:  = 0 Solution The 90% confidence interval (6.272, 7.008) gives the values of 0 for which a 2-tailed hypothesis test H0:  = 0 , HA:  ¹ 0 at  =.10 will NOT reject H0:  = 0

Here’s some shocking news for you: nobody’s perfect Here’s some shocking news for you: nobody’s perfect. Even with lots of evidence we can still make the wrong decision. When we perform a hypothesis test, we can make mistakes in two ways: The null hypothesis is not false, but we mistakenly reject it. (Type I error) The null hypothesis is false, but we fail to reject it. (Type II error) Making Errors

H0: not pregnant, Ha: pregnant

Making Errors (cont.) Which type of error is more serious depends on the situation at hand. In other words, the gravity of the error is context dependent. Here’s an illustration of the four situations in a hypothesis test:

Making Errors (cont.) How often will a Type I error occur? Since a Type I error is rejecting a true null hypothesis, the probability of a Type I error is our α level. When H0 is false and we reject it, we have done the right thing. A test’s ability to detect a false hypothesis is called the power of the test.

Making Errors (cont.) When H0 is false and we fail to reject it, we have made a Type II error. We assign the letter β to the probability of this mistake. It’s harder to assess the value of β because we don’t know what the value of the parameter really is. There is no single value for β--we can think of a whole collection of β’s, one for each incorrect parameter value.

Making Errors (cont.) One way to focus our attention on a particular β is to think about the effect size. Ask “How big a difference would matter?” We could reduce β for all alternative parameter values by increasing α. This would reduce β but increase the chance of a Type I error. This tension between Type I and Type II errors is inevitable. The only way to reduce both types of errors is to collect more data. Otherwise, we just wind up trading off one kind of error against the other.

Hypothesis Testing for , Type II Error Probabilities (Right-tail example) A new billing system for a department store will be cost- effective only if the mean monthly account is more than $170. A sample of 400 accounts has a mean of $174 and s = $65. Can we conclude that the new system will be cost effective?

Example (cont.) Hypotheses The population of interest is the credit accounts at the store. We want to know whether the mean account for all customers is greater than $170. H0 : m = 170 HA : m > 170 Where m is the mean account value for all customers

Example (cont.) Test statistic: H0 : m = 170 HA : m > 170

Type II error is possible Example (cont.) P-value: The probability of observing a value of the test statistic as extreme or more extreme then t = 1.23, given that m = 170 is… t399 Since the P-value > .05, we conclude that there is not sufficient evidence to reject H0 : =170. Type II error is possible

Calculating , the Probability of a Type II Error Calculating  for the t test is not at all straightforward and is beyond the level of this course The distribution of the test statistic t is quite complicated when H0 is false and HA is true However, we can obtain very good approximate values for  using z (the standard normal) in place of t.

Calculating , the Probability of a Type II Error (cont.) We need to specify an appropriate significance level ; Determine the rejection region in terms of z Then calculate the probability of not being in the rejection when  = 1, where 1 is a value of  that makes HA true.

Example (cont.) calculating  Test statistic: H0 : m = 170 HA : m > 170 Choose  = .05 Rejection region in terms of z: z > z.05 = 1.645 a = 0.05

Example (cont.) calculating  Express the rejection region directly, not in standardized terms The rejection region with a = .05. Let the alternative value be m = 180 (rather than just m>170) H0: m = 170 HA: m = 180 a=.05 m= 170 m=180 Specify the alternative value under HA. Do not reject H0

Example (cont.) calculating  A Type II error occurs when a false H0 is not rejected. Suppose =180, that is, H0 is false. A false H0… H0: m = 170 …is not rejected H1: m = 180 a=.05 m= 170 m=180

Example (cont.) calculating  Power when =180 = 1-(180)=.9236 H0: m = 170 H1: m = 180 m=180 m= 170

Effects on b of changing a Increasing the significance level a, decreases the value of b, and vice versa. a2 > b2 < a1 b1 m= 170 m=180

Judging the Test A hypothesis test is effectively defined by the significance level a and by the sample size n. If the probability of a Type II error b is judged to be too large, we can reduce it by increasing a, and/or increasing the sample size.

Judging the Test Increasing the sample size reduces b By increasing the sample size the standard deviation of the sampling distribution of the mean decreases. Thus, the cutoff value of for the rejection region decreases.  

Judging the Test Increasing the sample size reduces b Note what happens when n increases: a does not change, but b becomes smaller m=180 m= 170

Judging the Test Increasing the sample size reduces b In the example, suppose n increases from 400 to 1000. a remains 5%, but the probability of a Type II error decreases dramatically.

A Left - Tail Test Self-Addressed Stamped Envelopes. The chief financial officer in FedEx believes that including a stamped self-addressed (SSA) envelope in the monthly invoice sent to customers will decrease the amount of time it take for customers to pay their monthly bills. Currently, customers return their payments in 24 days on the average, with a standard deviation of 6 days. Stamped self-addressed envelopes are included with the bills for 75 randomly selected customers. The number of days until they return their payment is recorded.

A Left - Tail Test: Hypotheses The parameter tested is the population mean payment period (m) for customers who receive self-addressed stamped envelopes with their bill. The hypotheses are: H0: m = 24 H1: m < 24 Use  = .05; n = 75.

A Left - Tail Test: Rejection Region The rejection region: t < t.05,74 = 1.666 Results from the 75 randomly selected customers:

A Left -Tail Test: Test Statistic The test statistic is: Since the rejection region is We do not reject the null hypothesis. Note that the P-value = P(t74 < -1.52) = .066. Since our decision is to not reject the null hypothesis, A Type II error is possible.

Left-Tail Test: Calculating , the Probability of a Type II Error The CFO thinks that a decrease of one day in the average payment return time will cover the costs of the envelopes since customer checks can be deposited earlier. What is (23), the probability of a Type II error when the true mean payment return time  is 23 days?

Left-tail test: calculating  (cont.) Test statistic: H0 : m = 24 HA : m < 24 Choose  = .05 Rejection region in terms of z: z < -z.05 = -1.645 a = 0.05

Left-tail test: calculating  (cont.) Express the rejection region directly, not in standardized terms The rejection region with a = .05. Let the alternative value be m = 23 (rather than just m < 24) m=24 H0: m = 24 Specify the alternative value under HA. m= 23 HA: m = 23 Do not reject H0 a=.05

Left-tail test: calculating  (cont.) Power when =23 is 1-(23)=1-.58=.42 H0: m = 24 HA: m = 23 m=24 a=.05 m= 23

A Two - Tail Test for  The Federal Communications Commission (FCC) wants competition between phone companies. The FCC wants to investigate if AT&T rates differ from their competitor’s rates. According to data from the (FCC) the mean monthly long-distance bills for all AT&T residential customers is $17.09.

A Two - Tail Test (cont.) A random sample of 100 AT&T customers is selected and their bills are recalculated using a leading competitor’s rates. The mean and standard deviation of the bills using the competitor’s rates are Can we infer that there is a difference between AT&T’s bills and the competitor’s bills (on the average)?

A Two - Tail Test (cont.) Is the mean different from 17.09? n = 100; use  = .05 H0: m = 17.09

A Two – Tail Test (cont.) Rejection region t99 a/2 = 0.025 a/2 = 0.025 -ta/2 = -1.9842 ta/2 = 1.9842 Rejection region

A Two – Tail Test: Conclusion There is insufficient evidence to conclude that there is a difference between the bills of AT&T and the competitor. Also, by the P-value approach: The P-value = P(t < -1.19) + P(t > 1.19) = 2(.1184) = .2368 > .05 a/2 = 0.025 a/2 = 0.025 A Type II error is possible -1.19 1.19 -ta/2 = -1.9842 ta/2 = 1.9842

Two-Tail Test: Calculating , the Probability of a Type II Error The FCC would like to detect a decrease of $1.50 in the average competitor’s bill. (17.09-1.50=15.59) What is (15.59), the probability of a Type II error when the true mean competitor’s bill  is $15.59?

Two – Tail Test: Calculating  (cont.) Rejection region a/2 = 0.025 a/2 = 0.025 Do not reject H0 17.09 Reject H0

Two – Tail Test: Calculating  (cont.) Power when =15.59 is 1-(15.59)=.972 H0: m = 17.09 HA: m = 15.59 m=17.09 m= 15.59

General formula: Type II Error Probability (A) for a Level  Test

Sample Size n for which a level  test also has (A) = 