Presentation is loading. Please wait.

Presentation is loading. Please wait.

Type II Error, Power and Sample Size Calculations

Similar presentations


Presentation on theme: "Type II Error, Power and Sample Size Calculations"— Presentation transcript:

1 Type II Error, Power and Sample Size Calculations
Inference for Means: Part 2 Type II Error, Power and Sample Size Calculations

2 Alpha Levels: a Threshold for the P-value
Sometimes we need to make a firm decision about whether or not to reject the null hypothesis. When the P-value is small, it tells us that our data are rare if the null hypothesis is true. How rare is “rare”?

3 Alpha Levels - 2 We can define “rare event” arbitrarily by setting a threshold for our P-value. If our P-value falls below that threshold, we’ll reject H0. We call such results statistically significant. The threshold is called an alpha level, denoted by α.

4 Alpha Levels - 3 Common alpha levels are 0.10, 0.05, and 0.01.
You have the option—almost the obligation—to consider your alpha level carefully and choose an appropriate one for the situation. The alpha level is also called the significance level. When we reject the null hypothesis, we say that the test is “significant at that level.” Rejection Region (RR): values of the test statistic z that lead to rejection of the null hypothesis H0.

5 Levels and Rejection Regions 1-Tail
QTM1310/ Sharpe Levels and Rejection Regions 1-Tail If HA:  > 0 and =.10 then RR={t: t > } n = 30 df = 29 If HA:  > 0 and =.05 then RR={t: t > } Rej Region .10 t > .05 t > .01 t > If HA:  > 0 and =.01 then RR={t: t > } 5 5

6 Rejection Region: t > 1.8331
Recall: Sweetening colas Is there evidence that storage results in sweetness loss in colas? H0:  = 0 versus Ha:  > 0 (one-sided test) Use  = 0.05 Taster Sweetness loss ___________________________ Average Standard deviation Degrees of freedom n − 1 = 9 Rejection Region: t > Test statistic: Conclusion: since the test statistic value 2.70 is in the RR, reject H0 :  = 0 in favor of Ha :  > 0; there is sufficient evidence to conclude that there is a loss of sweetness due to storage. P-value= P(t9 > 2.70) = . 012

7 Levels and Rejection Regions 1-Tail
QTM1310/ Sharpe Levels and Rejection Regions 1-Tail If HA:  < 0 and =.10 then RR={t: t < } n = 30 df = 29 If HA:  < 0 and =.05 then RR={t: t < } Rej Region .10 t < .05 t < .01 t < If HA:  < 0 and =.01 then RR={t: t < } 7 7

8 Rejection region for a 2-tail test with α = 0.05
df = 29 A 2-tailed test means that area α/2 is in each tail, thus: -A middle area of 1 − α = .95, and tail areas of α /2 = RR={t < , t > } From t Table

9 Levels and Rejection Regions 2-Tail
QTM1310/ Sharpe Levels and Rejection Regions 2-Tail If HA:   0 and =.10, then RR={t: t < , t >1.6991} n = 30 df = 29 If HA:   0 and =.05, then RR={t: t < , t > } Rejection Region .10 t < , t > .05 t < , t > .01 t < , t > If HA:   0 and =.01, then RR={t: t < , t > } 9 9

10 Summary:  Levels and Rejection Regions; n = 40, df = 39
HA:  > 0 HA:  < 0 HA:  ≠ 0 Rejection Region .01 t > .05 t > .10 t > Rejection Region .01 t < .05 t < .10 t < Rejection Region .01 t < , t > .02 t < , t > .05 t < , t > .10 t < , t >

11 Alpha Levels (cont.) What can you say if the P-value does not fall below α ? You should say that “The data have failed to provide sufficient evidence to reject the null hypothesis.” Don’t say that you “accept the null hypothesis.”

12 Alpha Levels (cont.) H0: defendant innocent; HA: defendant guilty
Recall that, in a jury trial, if we do not find the defendant guilty, we say the defendant is “not guilty”—we don’t say that the defendant is “innocent.”

13 Alpha Levels (cont.) The P-value gives the reader far more information than just stating that you reject or fail to reject the null. In fact, by providing a P-value to the reader, you allow that person to make his or her own choice of the significance level . What you consider to be statistically significant might not be the same as what someone else considers statistically significant. There is more than one alpha level that can be used, but each test will give only one P-value.

14 Confidence Intervals and Hypothesis Tests
Because confidence intervals are two-sided, they correspond to two-sided (two-tailed) hypothesis tests. In general, a confidence interval with a confidence level of C% corresponds to a two-sided hypothesis test with an α-level of 100 – C%. For example: If a 2-sided hypothesis test at level .05 rejects H0 , then the null hypothesized value of  will not be in a 95% confidence interval calculated from the same data. A 95% confidence interval shows the values of null hypothesis values 0 for which a 2-sided hypothesis test at level .05 will NOT reject the null hypothesis.

15 Confidence Intervals and Hypothesis Tests: Example
Sleep researchers claim that college students need at least 7 hours of sleep each night. A random sample of n = 25 college students was asked how many hours they slept the previous night. Summary: Construct a 90% confidence interval for the mean hours  slept by college students each night. Perform the hypothesis test H0:=7 vs Ha: 7 at =.10

16 H0:  = 0 , HA:  ¹ 0 at  =.10 will NOT reject H0:  = 0
Solution The 90% confidence interval (6.272, 7.008) gives the values of 0 for which a 2-tailed hypothesis test H0:  = 0 , HA:  ¹ 0 at  =.10 will NOT reject H0:  = 0

17 Here’s some shocking news for you: nobody’s perfect
Here’s some shocking news for you: nobody’s perfect. Even with lots of evidence we can still make the wrong decision. When we perform a hypothesis test, we can make mistakes in two ways: The null hypothesis is not false, but we mistakenly reject it. (Type I error) The null hypothesis is false, but we fail to reject it. (Type II error) Making Errors

18 H0: not pregnant, Ha: pregnant

19 Making Errors (cont.) Which type of error is more serious depends on the situation at hand. In other words, the gravity of the error is context dependent. Here’s an illustration of the four situations in a hypothesis test:

20 Making Errors (cont.) How often will a Type I error occur?
Since a Type I error is rejecting a true null hypothesis, the probability of a Type I error is our α level. When H0 is false and we reject it, we have done the right thing. A test’s ability to detect a false hypothesis is called the power of the test.

21 Making Errors (cont.) When H0 is false and we fail to reject it, we have made a Type II error. We assign the letter β to the probability of this mistake. It’s harder to assess the value of β because we don’t know what the value of the parameter really is. There is no single value for β--we can think of a whole collection of β’s, one for each incorrect parameter value.

22 Making Errors (cont.) One way to focus our attention on a particular β is to think about the effect size. Ask “How big a difference would matter?” We could reduce β for all alternative parameter values by increasing α. This would reduce β but increase the chance of a Type I error. This tension between Type I and Type II errors is inevitable. The only way to reduce both types of errors is to collect more data. Otherwise, we just wind up trading off one kind of error against the other.

23 Hypothesis Testing for , Type II Error Probabilities (Right-tail example)
A new billing system for a department store will be cost- effective only if the mean monthly account is more than $170. A sample of 400 accounts has a mean of $174 and s = $65. Can we conclude that the new system will be cost effective?

24 Example (cont.) Hypotheses
The population of interest is the credit accounts at the store. We want to know whether the mean account for all customers is greater than $170. H0 : m = 170 HA : m > 170 Where m is the mean account value for all customers

25 Example (cont.) Test statistic: H0 : m = 170 HA : m > 170

26 Type II error is possible
Example (cont.) P-value: The probability of observing a value of the test statistic as extreme or more extreme then t = 1.23, given that m = 170 is… t399 Since the P-value > .05, we conclude that there is not sufficient evidence to reject H0 : =170. Type II error is possible

27 Calculating , the Probability of a Type II Error
Calculating  for the t test is not at all straightforward and is beyond the level of this course The distribution of the test statistic t is quite complicated when H0 is false and HA is true However, we can obtain very good approximate values for  using z (the standard normal) in place of t.

28 Calculating , the Probability of a Type II Error (cont.)
We need to specify an appropriate significance level ; Determine the rejection region in terms of z Then calculate the probability of not being in the rejection when  = 1, where 1 is a value of  that makes HA true.

29 Example (cont.) calculating 
Test statistic: H0 : m = 170 HA : m > 170 Choose  = .05 Rejection region in terms of z: z > z.05 = 1.645 a = 0.05

30 Example (cont.) calculating 
Express the rejection region directly, not in standardized terms The rejection region with a = .05. Let the alternative value be m = 180 (rather than just m>170) H0: m = 170 HA: m = 180 a=.05 m= 170 m=180 Specify the alternative value under HA. Do not reject H0

31 Example (cont.) calculating 
A Type II error occurs when a false H0 is not rejected. Suppose =180, that is, H0 is false. A false H0… H0: m = 170 …is not rejected H1: m = 180 a=.05 m= 170 m=180

32 Example (cont.) calculating 
Power when =180 = 1-(180)=.9236 H0: m = 170 H1: m = 180 m=180 m= 170

33 Effects on b of changing a
Increasing the significance level a, decreases the value of b, and vice versa. a2 > b2 < a1 b1 m= 170 m=180

34 Judging the Test A hypothesis test is effectively defined by the significance level a and by the sample size n. If the probability of a Type II error b is judged to be too large, we can reduce it by increasing a, and/or increasing the sample size.

35 Judging the Test Increasing the sample size reduces b
By increasing the sample size the standard deviation of the sampling distribution of the mean decreases. Thus, the cutoff value of for the rejection region decreases.

36 Judging the Test Increasing the sample size reduces b
Note what happens when n increases: a does not change, but b becomes smaller m=180 m= 170

37 Judging the Test Increasing the sample size reduces b
In the example, suppose n increases from 400 to 1000. a remains 5%, but the probability of a Type II error decreases dramatically.

38 A Left - Tail Test Self-Addressed Stamped Envelopes.
The chief financial officer in FedEx believes that including a stamped self-addressed (SSA) envelope in the monthly invoice sent to customers will decrease the amount of time it take for customers to pay their monthly bills. Currently, customers return their payments in 24 days on the average, with a standard deviation of 6 days. Stamped self-addressed envelopes are included with the bills for 75 randomly selected customers. The number of days until they return their payment is recorded.

39 A Left - Tail Test: Hypotheses
The parameter tested is the population mean payment period (m) for customers who receive self-addressed stamped envelopes with their bill. The hypotheses are: H0: m = 24 H1: m < 24 Use  = .05; n = 75.

40 A Left - Tail Test: Rejection Region
The rejection region: t < t.05,74 = 1.666 Results from the 75 randomly selected customers:

41 A Left -Tail Test: Test Statistic
The test statistic is: Since the rejection region is We do not reject the null hypothesis. Note that the P-value = P(t74 < -1.52) = .066. Since our decision is to not reject the null hypothesis, A Type II error is possible.

42 Left-Tail Test: Calculating , the Probability of a Type II Error
The CFO thinks that a decrease of one day in the average payment return time will cover the costs of the envelopes since customer checks can be deposited earlier. What is (23), the probability of a Type II error when the true mean payment return time  is 23 days?

43 Left-tail test: calculating  (cont.)
Test statistic: H0 : m = 24 HA : m < 24 Choose  = .05 Rejection region in terms of z: z < -z.05 = a = 0.05

44 Left-tail test: calculating  (cont.)
Express the rejection region directly, not in standardized terms The rejection region with a = .05. Let the alternative value be m = 23 (rather than just m < 24) m=24 H0: m = 24 Specify the alternative value under HA. m= 23 HA: m = 23 Do not reject H0 a=.05

45 Left-tail test: calculating  (cont.)
Power when =23 is 1-(23)=1-.58=.42 H0: m = 24 HA: m = 23 m=24 a=.05 m= 23

46 A Two - Tail Test for  The Federal Communications Commission (FCC) wants competition between phone companies. The FCC wants to investigate if AT&T rates differ from their competitor’s rates. According to data from the (FCC) the mean monthly long-distance bills for all AT&T residential customers is $17.09.

47 A Two - Tail Test (cont.) A random sample of 100 AT&T customers is selected and their bills are recalculated using a leading competitor’s rates. The mean and standard deviation of the bills using the competitor’s rates are Can we infer that there is a difference between AT&T’s bills and the competitor’s bills (on the average)?

48 A Two - Tail Test (cont.) Is the mean different from 17.09?
n = 100; use  = .05 H0: m = 17.09

49 A Two – Tail Test (cont.) Rejection region t99 a/2 = 0.025
a/2 = 0.025 -ta/2 = ta/2 = Rejection region

50 A Two – Tail Test: Conclusion
There is insufficient evidence to conclude that there is a difference between the bills of AT&T and the competitor. Also, by the P-value approach: The P-value = P(t < -1.19) + P(t > 1.19) = 2(.1184) = > .05 a/2 = 0.025 a/2 = 0.025 A Type II error is possible -1.19 1.19 -ta/2 = ta/2 =

51 Two-Tail Test: Calculating , the Probability of a Type II Error
The FCC would like to detect a decrease of $1.50 in the average competitor’s bill. ( =15.59) What is (15.59), the probability of a Type II error when the true mean competitor’s bill  is $15.59?

52 Two – Tail Test: Calculating  (cont.)
Rejection region a/2 = 0.025 a/2 = 0.025 Do not reject H0 17.09 Reject H0

53 Two – Tail Test: Calculating  (cont.)
Power when =15.59 is 1-(15.59)=.972 H0: m = 17.09 HA: m = 15.59 m=17.09 m= 15.59

54 General formula: Type II Error Probability (A) for a Level  Test

55 Sample Size n for which a level  test also has (A) = 


Download ppt "Type II Error, Power and Sample Size Calculations"

Similar presentations


Ads by Google