Statistical Hypothesis Testing Review A statistical hypothesis is an assertion concerning one or more populations. In statistics, a hypothesis test is conducted on a set of two mutually exclusive statements: H0 : null hypothesis H1 : alternate hypothesis Example H0 : μ = 17 H1 : μ ≠ 17 We sometimes refer to the null hypothesis as the “equals” hypothesis. Draw critical region, critical value. EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Potential errors in decision-making α Probability of committing a Type I error Probability of rejecting the null hypothesis given that the null hypothesis is true P (reject H0 | H0 is true) β Probability of committing a Type II error Power of the test = 1 - β (probability of rejecting the null hypothesis given that the alternate is true.) Power = P (reject H0 | H1 is true) Power of the test = 1- β EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Hypothesis Testing – Approach 1 Approach 1 - Fixed probability of Type 1 error. State the null and alternative hypotheses. Choose a fixed significance level α. Specify the appropriate test statistic and establish the critical region based on α. Draw a graphic representation. Calculate the value of the test statistic based on the sample data. Make a decision to reject H0 or fail to reject H0, based on the location of the test statistic. Make an engineering or scientific conclusion. recall our question about the amount of coffee in the cup … 1. H0 : μ = 8 oz. H1 : μ < 8 oz. 2. α = 0.05 3. zα = -1.645 5. if zcalc < -1.645, reject H0 if zcalc > -1.645, fail to reject H0 6. e.g., coffee in the cup is significantly less than 8 oz. or coffee in the cup is not significantly less than 8 oz. EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Hypothesis Testing – Approach 2 Approach 2 - Significance testing based on the calculated P-value State the null and alternative hypotheses. Choose an appropriate test statistic. Calculate value of test statistic and determine P-value. Draw a graphic representation. Make a decision to reject H0 or fail to reject H0, based on the P-value. Make an engineering or scientific conclusion. recall our question about the amount of coffee in the cup … 1. H0 : μ = 8 oz. H1 : μ < 8 oz. 2. if variance known or n large, z-test (assume z in this case) 3. from zcalc, determine p-value from table A.3 or as given by statistical software packages. 4. P = 0, H0 rejected / not plausible (e.g., coffee in the cup is significantly less than 8 oz.) P = 1, H0 is not rejected (coffee in the cup is not significantly less than 8 oz.) Note: Approach 1 is the classical method. Approach 2 is gaining acceptance, partly because of the increasing availability of statistical software packages. The conclusion based on a P-value requires judgment. The smaller the P-value, the less plausible is the null hypothesis. p = 0.05 ↓ P-value 0 0.25 0.50 0.75 1.00 P-value EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Example: Single Sample Test of the Mean P-value Approach A sample of 20 cars driven under varying highway conditions achieved fuel efficiencies as follows: Sample mean x = 34.271 mpg Sample std dev s = 2.915 mpg Test the hypothesis that the population mean equals 35.0 mpg vs. μ < 35. Step 1: State the hypotheses. H0: μ = 35 H1: μ < 35 Step 2: Determine the appropriate test statistic. σ unknown, n = 20 Therefore, use t distribution H0 : μ = 35 H1 : μ < 35 n = 20 use t-distribution EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Single Sample Example (cont.) Approach 2: = -1.11842 Find probability from chart or use Excel’s tdist function. P(x ≤ -1.118) = TDIST (1.118, 19, 1) = 0.139665 p = 0.14 0______________1 Decision: Fail to reject null hypothesis Conclusion: The mean is not significantly less than 35 mpg. t = -1.11842 =(34.271-35)/(2.915/(SQRT(20))) P = 0.138665 =TDIST(1.118,19,1) draw the graphs. NOTE: if we look at table A.4, pg. 672, the α value associated to 1.118 (by symmetry) falls between 0.15 and 0.10. 13.86% of the area under the curve lies to the left of t = -1.118. Judge that H0 is plausible (fail to reject) and conclude that μ does not differ significantly from 35. EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Example (concl.) Approach 1: Predetermined significance level (alpha) Step 1: Use same hypotheses. Step 2: Let’s set alpha at 0.05. Step 3: Determine the critical value of t that separates the “reject H0 region” from the “do not reject H0 region”. t, n-1 = t0.05,19 = 1.729 Since H1 specifies “< ” we declare tcrit = -1.729 Step 4: Using the equation, we calculate tcalc = -1.11842 Step 5: Decision Fail to reject H0 Step 6: Conclusion: The mean is not significantly less than 35 mpg. (one-sided or one-tailed test) t0.05,19 = 1.729, tcrit = -1.729 t = -1.11842 draw the picture … EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Your turn … same data, different hypotheses A sample of 20 cars driven under varying highway conditions achieved fuel efficiencies as follows: Sample mean = 34.271 mpg Sample std dev (s) = 2.915 mpg Test the hypothesis that the population mean equals 35.0 mpg vs. μ ≠ 35 at an α level of 0.05. Be sure to draw the picture. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 (Conclusion will be different.) t0.025,19 = -2.093 2.093 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Two-Sample Hypothesis Testing A professor has designed an experiment to test the effect of reading the textbook before attempting to complete a homework assignment. Four students who read the textbook before attempting the homework recorded the following times (in hours) to complete the assignment: 3.1, 2.8, 0.5, 1.9 hours Five students who did not read the textbook before attempting the homework recorded the following times to complete the assignment: 0.9, 1.4, 2.1, 5.3, 4.6 hours EGR252 2016 Ch10 Lec2and3 9th edition rev3
Two-Sample Hypothesis Testing Define the difference in the two means as: μ1 - μ2 = d0 where d0 is the actual value of the hypothesized difference What are the Hypotheses? H0: _______________ H1: _______________ or NOTE: d0 is often 0 (there is, statistically speaking, no difference in the means) H0: μ1 - μ2 = 0 H1: μ1 - μ2 < 0 (note: compare lower to higher for lower-tail test) H1: μ1 - μ2 ≠ 0 H1: μ1 – μ2 > 0 (note: compare higher to lower for upper-tail test) EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Our Example Using Excel Reading: n1 = 4 mean x1 = 2.075 s12 = 1.363 No reading: n2 = 5 mean x2 = 2.860 s22 = 3.883 If we have reason to believe the population variances are “equal”, we can conduct a t- test assuming equal variances in Minitab or Excel. t-Test: Two-Sample Assuming Equal Variances Read DoNotRead Mean 2.075 2.860 Variance 1.3625 3.883 Observations 4 5 Pooled Variance 2.8027857 Hypothesized Mean Difference df 7 t Stat -0.698986 P(T<=t) one-tail 0.2535567 t Critical one-tail 1.8945775 P(T<=t) two-tail 0.5071134 t Critical two-tail 2.3646226 t-test sp2 = (3(1.363)+4(3.883))/(4+5-2) = 2.803 , s = 1.674 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Your turn … Recall Lower-tail test (μ1 - μ2 < 0) “Fixed α” approach (“Approach 1”) at α = 0.05 level. “p-value” approach (“Approach 2”) Upper-tail test (μ2 – μ1 > 0) “Fixed α” approach at α = 0.05 level. “p-value” approach Two-tailed test (μ1 - μ2 ≠ 0) Recall Note that we have to compare higher – lower mean to conduct an upper-tail test EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Our Example – Hand Calculation Reading: n1 = 4 mean x1 = 2.075 s12 = 1.363 No reading: n2 = 5 mean x2 = 2.860 s22 = 3.883 To conduct the test by hand, we must calculate sp2 . = 2.803 sp = 1.674 and = ???? t-test sp2 = (3(1.363)+4(3.883))/(4+5-2) = 2.803 , s = 1.674 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Lower-tail test (μ1 - μ2 < 0) Why? Draw the picture: Approach 1: df = 7, t0.05,7 = 1.895 tcrit = -1.895 Calculation: tcalc = ((2.075-2.860)-0)/(1.674*sqrt(1/4 + 1/5)) = -0.70 Graphic: Decision: Conclusion: tcalc = ((2.075-2.860)-0)/(1.674*sqrt(1/4 + 1/5)) = -0.70 Approach 1: df = 7, t0.05,7 = 1.895 tcrit = -1.895 Approach 2: =TDIST(0.7,7,1) = 0.253259 Decision: fail to reject H0 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Upper-tail test (μ2 – μ1 > 0) Conclusions The data do not support the hypothesis that the mean time to complete homework is less for students who read the textbook. or There is no statistically significant difference in the time required to complete the homework for the people who read the text ahead of time vs those who did not. The data do not support the hypothesis that the mean completion time is less for readers than for non-readers. tcalc = ((2.860-2.075)-0)/(1.674*sqrt(1/4 – 1/5)) = 0.70 Approach 1: df = 7, t0.5,7 = 1.895 tcrit = 1.895 Approach 2: =TDIST(0.7,7,1) = 0.253259 Decision: fail to reject H0 Conclusion: the data do not support the hypothesis that the mean time to complete homework is more for students who do not read the textbook EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Our Example Using Excel Reading: n1 = 4 mean x1 = 2.075 s12 = 1.363 No reading: n2 = 5 mean x2 = 2.860 s22 = 3.883 What if we do not have reason to believe the population variances are “equal”? We can conduct a t- test assuming unequal variances in Minitab or Excel. t-Test: Two-Sample Assuming Equal Variances Read DoNotRead Mean 2.075 2.860 Variance 1.3625 3.883 Observations 4 5 Pooled Variance 2.8027857 Hypothesized Mean Difference df 7 t Stat -0.698986 P(T<=t) one-tail 0.2535567 t Critical one-tail 1.8945775 P(T<=t) two-tail 0.5071134 t Critical two-tail 2.3646226 t-Test: Two-Sample Assuming Unequal Variances Read DoNotRead Mean 2.075 2.86 Variance 1.3625 3.883 Observations 4 5 Hypothesized Mean Difference df 7 t Stat -0.7426759 P(T<=t) one-tail 0.2409258 t Critical one-tail 1.8945775 P(T<=t) two-tail 0.4818516 t Critical two-tail 2.3646226 t-test sp2 = (3(1.363)+4(3.883))/(4+5-2) = 2.803 , s = 1.674 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Another Example: Low Carb Meals Suppose we want to test the difference in carbohydrate content between two “low-carb” meals. Random samples of the two meals are tested in the lab and the carbohydrate content per serving (in grams) is recorded, with the following results: n1 = 15 x1 = 27.2 s12 = 11 n2 = 10 x2 = 23.9 s22 = 23 tcalc = ______________________ ν = ______________ (using equation in table 10.3) tcalc = =(27.2-23.9)/(SQRT(11/15+23/10)) = 1.894759 df from equation in table 10.2, pg. 313 ν = 14.69 ≈ 15 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Example (cont.) What are our options for hypotheses? H0: μ1 - μ2 = 0 or H0: μ1 - μ2 = 0 H1: μ1 - μ2 > 0 H1: μ1 - μ2 ≠ 0 At an α level of 0.05, One-tailed test, t0.05, 15 = 1.753 Two-tailed test, t0.025, 15 = 2.131 How are our conclusions affected? Our data don’t support a conclusion that the mean carb content of the two meals are different at an alpha level of .05 (What is H1 ?) Our data do support a conclusion that Meal 1 has more average carbs than Meal 2 at an alpha level of .05. (What is H1 ?) H0: μ1 - μ2 = 0 vs H0: μ1 - μ2 = 0 H1: μ1 - μ2 > 0 H1: μ1 - μ2 ≠ 0 t0.05, 15 = 1.753 t0.025, 15 = 2.131 Our data don’t support a conclusion that the two meals are different at an alpha level of .05 Our data do support a conclusion that meal 1 has more carbs than meal 2 at an alpha level of .05 note, 1-sided p-value =TDIST(1.895,15,1) = 0.038766 2-sided p-value =TDIST(1.895,15,2) = 0.077533 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Special Case: Paired Sample T-Test Which designs are paired-sample? Car Radial Belted 1 ** ** Radial, Belted tires 2 ** ** placed on each car. 3 ** ** 4 ** ** Person Pre Post 1 ** ** Pre- and post-test 2 ** ** administered to each 3 ** ** person. Student Test1 Test2 1 ** ** 4 scores from test 1, 2 ** ** 4 scores from test 2. paired-sample paired sample maybe – if we have information that the test1 and test2 scores can be matched to a particular individual for every subject in the study, it is a paired-sample experiment; otherwise it is a 2-sample experiment. EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Sheer Strength Example* An article in the Journal of Strain Analysis compares several methods for predicting the shear strength of steel plate girders. Data for two of these methods, when applied to nine specific girders, are shown in the table on the next slide. We would like to determine if there is any difference, on average, between the two methods. Procedure: We will conduct a paired-sample t-test at the 0.05 significance level to determine if there is a difference between the two methods. * adapted from Montgomery & Runger, Applied Statistics and Probability for Engineers. difference scores, d 0.119 0.159 etc. EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Sheer Strength Example Data Girder Karlsruhe Method Lehigh Method Difference (d) 1 1.186 1.061 0.125 2 1.151 0.992 0.159 3 1.322 1.063 0.259 4 1.339 1.062 0.277 5 1.200 1.065 0.135 6 1.402 1.178 0.224 7 1.365 1.037 0.328 8 1.537 1.086 0.451 9 1.559 1.052 0.507 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Sheer Strength Example Calculations Hypotheses: H0: μD = 0 H1: μD ≠ 0 t0.025,8 = 2.306 Why 8? Calculation of difference scores (d), mean and standard deviation, and tcalc … d = 0.2739 sd = 0.1351 tcalc = ( d – d0 ) = (0.2739 - 0) = 6.082 sd / sqrt(n) (1.1351 / 3) t0.025,8 = 2.306 difference scores – previous page tcalc = (dbar-d0) sd/sqrt(n) EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
What does this mean? Draw the graphic: Decision: Conclusion: Graphic: t-test with tcrit = -2.306 (lower boundary) and 2.306(upper boundary) and tcalc = 6.05 Decision: reject H0 Conclusion: The two methods produce different results, on average. Since we subtracted Lehigh scores from Karlsruhe scores, and the resulting difference scores were positive, we have evidence that the Karlsruhe method yields larger strength predictions. EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Goodness-of-Fit Tests Procedures for confirming or refuting hypotheses about the distributions of random variables. Hypotheses: H0: The population follows a particular distribution. H1: The population does not follow the distribution. Examples: H0: The data come from a normal distribution. H1: The data do not come from a normal distribution. EGR252 2016 Ch10 Lec2and3 9th edition rev3
Goodness of Fit Tests: Basic Method Test statistic is χ2 Draw the picture Determine the critical value χ2 with parameters α, ν = k – 1 Calculate χ2 from the sample Compare χ2calc to χ2crit Make a decision about H0 State your conclusion draw χ2 graph k = number of “cells” (note: some texts use k-1-h where h is the number of parameters in the distribution being tested – e.g., 1 for Poisson, 2 for normal) Table A.5, pg 739-740 show calc and crit on the drawing EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Tests of Independence Salaried 160 140 40 340 Hourly 60 200 100 500 Example: 500 employees were surveyed with respect to pension plan preferences. Hypotheses H0: Worker Type and Pension Plan are independent. H1: Worker Type and Pension Plan are not independent. Develop a Contingency Table showing the observed values for the 500 people surveyed. Worker Type Pension Plan Total #1 #2 #3 Salaried 160 140 40 340 Hourly 60 200 100 500 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Calculation of Expected Values Worker Type Pension Plan Total #1 #2 #3 Salaried 160 140 40 340 Hourly 60 200 100 500 2. Calculate expected probabilities P(#1 ∩ S) = P(#1)*P(S) = (200/500)*(340/500)=0.272 E(#1 ∩ S) = 0.272 * 500 = 136 P(#1 ∩ S) =P(#1)*P(S) = (200/500)*(340/500)=0.272 E(#1 ∩ S) = 0.272*500 = 136 P(#1 ∩ H) = P(#1)*P(H) = (200/500)*(160/500)=0.128 E(#1 ∩ H) = 64 #1 #2 #3 S (exp) 136 136 68 H(exp) 64 64 32 #1 #2 #3 S (exp.) 136 ? 68 H (exp.) 64 32 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
Calculate the Sample-based Statistic Calculation of the sample-based statistic = (160-136)^2/(136) + (140-136)^2/(136) + … (60-32)^2/(32) = 49.63 (160-136)^2/136 + (140-136)^2/136 + (40-68)^2/68 + (40-64)^2/64 + (60-64)^2/64 + (60-32)^2/32 = 49.63 EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
The Chi-Squared Test of Independence 5. Compare to the critical statistic, χ2α, v where v = (r – 1)(c – 1) Note: v is the symbol for degrees of freedom For our example, suppose α = 0.01 χ2 0.01,2 = ___________ χ2 calc = ___________ Decision: Conclusion: 2013 χ20.01,2__ = 9.210 (from Table A.5, pp 740) χ2calc> χ2crit so reject the null hypothesis and conclude that worker and plan are not independent EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition
The Chi-Squared Test in Minitab 15 Chi-Square Test: pp1, pp2, pp3 Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts pp1 pp2 pp3 Total 1 160 140 40 340 136.00 136.00 68.00 4.235 0.118 11.529 2 40 60 60 160 64.00 64.00 32.00 9.000 0.250 24.500 Total 200 200 100 500 Test statistic: Chi-Sq calc = 49.632, DF = 2, P-Value = 0.000 Reject Ho. Conclude that worker and plan are not independent. 2013 χ20.01,2__ = 9.210 (from Table A.5, pp 740) χ2calc> χ2crit so reject the null hypothesis and conclude that worker and plan are not independent EGR252 2016 Ch10 Lec2and3 9th edition rev3 EGR 252 F06 Ch. 10 8th edition