Analyze Phase Hypothesis Testing Non Normal Data Part 2

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

CHI-SQUARE(X2) DISTRIBUTION
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Chapter 9 Chapter 10 Chapter 11 Chapter 12
Introduction to Hypothesis Testing
Chapter Goals After completing this chapter, you should be able to:
Chi-square Test of Independence
Analyze Phase Introduction to Hypothesis Testing
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
AM Recitation 2/10/11.
Hypothesis Testing:.
Confidence Intervals and Hypothesis Testing - II
Hypothesis testing is used to make decisions concerning the value of a parameter.
Statistical Analysis Statistical Analysis
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Analyze Improve Define Measure Control L EAN S IX S IGMA L EAN S IX S IGMA Chi-Square Analysis Chi-Square Analysis Chi-Square Training for Attribute Data.
Statistics for Decision Making Basic Inference QM Fall 2003 Instructor: John Seydel, Ph.D.
© Copyright McGraw-Hill 2004
ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,
Leftover Slides from Week Five. Steps in Hypothesis Testing Specify the research hypothesis and corresponding null hypothesis Compute the value of a test.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Introduction to Inference Tests of Significance Proof
Sampling and Sampling Distribution
Inference for a Single Population Proportion (p)
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Control Phase Wrap Up and Action Items
Chapter 12 Chi-Square Tests and Nonparametric Tests
Analyze Phase Hypothesis Testing Normal Data Part 2
Introduction to Inference
Chapter 10 Two-Sample Tests and One-Way ANOVA.
Chapter 9: Non-parametric Tests
Presentation 12 Chi-Square test.
Chi-square test or c2 test
Improve Phase Welcome to Improve
Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from sample data Expected Values– row total *
CHAPTER 11 CHI-SQUARE TESTS
Analyze Phase Introduction to Hypothesis Testing
Extra Brownie Points! Lottery To Win: choose the 5 winnings numbers from 1 to 49 AND Choose the "Powerball" number from 1 to 42 What is the probability.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Chapter 12 Tests with Qualitative Data
Chapter 8: Inference for Proportions
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
Chapter 11 Goodness-of-Fit and Contingency Tables
The Analysis of Categorical Data and Chi-Square Procedures
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Chapter 9 Hypothesis Testing.
Is a persons’ size related to if they were bullied
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Lesson Comparing Two Means.
Reasoning in Psychology Using Statistics
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
Hypothesis Testing.
Statistical Inference about Regression
Chi Square (2) Dr. Richard Jackson
Inferential Stat Week 13.
Control Phase Welcome to Control
Extra Brownie Points! Lottery To Win: choose the 5 winnings numbers from 1 to 49 AND Choose the "Powerball" number from 1 to 42 What is the probability.
Inference on Categorical Data
The Analysis of Categorical Data and Goodness of Fit Tests
CHAPTER 11 CHI-SQUARE TESTS
The Analysis of Categorical Data and Goodness of Fit Tests
Reasoning in Psychology Using Statistics
11E The Chi-Square Test of Independence
The Analysis of Categorical Data and Goodness of Fit Tests
The Analysis of Categorical Data and Goodness of Fit Tests
Presentation transcript:

Analyze Phase Hypothesis Testing Non Normal Data Part 2 Now we will continue in the Analyze Phase with “Hypothesis Testing Non-Normal Data Part 2”.

Hypothesis Testing Non Normal Data Part 2 Tests for Proportions Contingency Tables Hypothesis Testing NND P1 Hypothesis Testing ND P1 Intro to Hypothesis Testing Inferential Statistics “X” Sifting Welcome to Analyze Hypothesis Testing ND P2 Wrap Up & Action Items Hypothesis Testing NND P2 The core fundamentals of this phase are Tests for Proportions and Contingency Tables.

Hypothesis Testing Roadmap Attribute Data One Factor Two Factors One Sample Proportion Two Sample Proportion MINITABTM: Stat - Basic Stats - 2 Proportions If P-value < 0.05 the proportions are different Chi Square Test (Contingency Table) Stat - Tables - Chi-Square Test If P-value < 0.05 the factors are not independent If P-value < 0.05 at least one proportion is different Two or More Samples Two Samples One Sample Attribute Data We will now continue with the roadmap for Attribute Data. Since Attribute Data is Non-normal by definition it belongs in this module on Non-normal Data.

Sample Size and Types of Data For Continuous Data: Capability Analysis – a minimum of 30 samples Hypothesis Testing – depends on the practical difference to be detected and the inherent variation in the process as well as the statistical confidence you wish to have. For Attribute Data: Capability Analysis – a lot of samples Hypothesis Testing – a lot but depends on practical difference to be detected as well as the statistical confidence you wish to have. Sample size is dependent on the type of data. MINITABTM can estimate sample sizes but remember the smaller the difference that needs to be detected the larger the sample size must be!

Proportion versus a Target This test is used to determine if the process proportion (p) equals some desired value, p0. The hypotheses: Ho: p = p 0 Ha: p p 0 The observed test statistic is calculated as follows: (normal approximation) This is compared to Zcrit = Za/2 This formula is an approximation for ease of manual calculation.

Proportion versus a Target Shipping accuracy has a target of 99%; determine if the current process is on target. Hypotheses: Ho: p = 0.99 Ha: p 0.99 One sample proportion test Choose a = 5% Sample size: Enter multiple values for alternative values of p and MINITABTM will give the different sample sizes. Stat > Power and Sample Size > 1 Proportion… Now let’s try an example.

Proportion versus a Target Our sample included 500 shipped items of which 480 were accurate. Power and Sample Size Test for One Proportion Testing proportion = 0.99 (versus not = 0.99) Alpha = 0.05 Alternative Sample Target Proportion Size Power Actual Power 0.95 140 0.9 0.900247 0.96 221 0.9 0.900389 0.97 428 0.9 0.900316 0.98 1402 0.9 0.900026 Take note of how quickly the sample size increases as the alternative proportion goes up. It would require 1402 samples to tell a difference between 98% and 99% accuracy. Our sample of 500 will do because the alternative hypothesis is 96% according to the proportion formula.

Proportion versus a Target Statistical Conclusion: Reject the null hypothesis because the hypothesized Mean is not within the confidence interval. Practical Conclusion: We are not performing to the accuracy target of 99%. Stat > Basic Statistics > 1 Proportion… Test and CI for One Proportion Test of p = 0.99 vs p not = 0.99 Exact Sample X N Sample p 95% CI P-value 1 480 500 0.960000 (0.938897, 0.975399) 0.000 After you analyze the data you will see the statistical conclusion is to reject the null hypothesis. What is the practical conclusion…the process is not performing to the desired accuracy of 99%.

Out of 2000 shipments only 1680 were accurate. Exercise Exercise objective: To practice solving problem presented using the appropriate Hypothesis Test. You are the shipping manager charged with improving shipping accuracy. Your annual bonus depends on your ability to prove shipping accuracy is better than the target of 80%. How many samples do you need to take if the anticipated sample proportion is 82%? Out of 2000 shipments only 1680 were accurate. Do you get your annual bonus? Was the sample size good enough? Exercise.

Proportion vs Target Example: Solution First we must determine the proper sample size to achieve our target of 80%. Stat > Power and Sample Size > 1 Proportion… The Alternative Proportion should be .82 and the Hypothesized Proportion should be .80. Select a Power Value of .9 and click “OK”. As you can see the Sample Size should be at least 4073 to prove our hypothesis.

Proportion versus Target Example: Solution Now let’s calculate if we receive our bonus… Out of the 2000 shipments 1680 were accurate. Was the sample size sufficient? ? Do you get your bonus? Yes, you get your bonus since .80 is not within the confidence interval. Because the improvement was 84% the sample size was sufficient. Answer: Use alternative proportion of .82, hypothesized proportion of .80. n = 4073. Either you had better ship a lot of stuff or you had better improve the process more than just 2%!

Comparing Two Proportions This test is used to determine if the process defect rate (or proportion, p) of one sample differs by a certain amount, D, from that of another sample (e.g., before and after your improvement actions) The hypotheses: H0: p1 - p2 = D Ha: p1 – p2 = D The test statistic is calculated as follows: This is compared to Zcritical = Za/2 Catch some Z’s! MINITABTM gives you a choice of using the Normal approximation or the exact method. We will use the exact method. The formula here is an approximation for ease of manual calculation.

Sample Size and Two Proportions Take a few moments to practice calculating the minimum sample size required to detect a difference between two proportions using a power of 0.90. Enter the expected proportion for proportion 2 (null hypothesis). For a more conservative estimate when the null hypothesis is close to 100 use smaller proportion for p1. When the null hypothesis is close to 0, use the larger proportion for p1. a  p1 p2 n 5% .01 0.79 0.8 ___________ 5% .01 0.81 0.8 ___________ 5% .02 0.08 0.1 ___________ 5% .02 0.12 0.1 ___________ 5% .01 0.47 0.5 ___________ 5% .01 0.53 0.5 ___________ Answers: 34,247 32,986 4,301 5,142 5,831

Proportion versus a Target Shipping accuracy must improve from a historical baseline of 85% towards a target of 95%. Determine if the process improvements made have increased the accuracy. Hypotheses: Ho: p1 – p2 = 0.0 Ha: p1 – p2 0.0 Two sample proportion test Choose a = 5% Sample size ~ Stat>Power and Sample Size> 2 Proportions… Power and Sample Size Test for Two Proportions Testing proportion 1 = proportion 2 (versus not =) Calculating power for proportion 2 = 0.95 Alpha = 0.05 Sample Target Proportion 1 Size Power Actual Power 0.85 188 0.9 0.901 451 The sample size is for each group. In MINITABTM click Stat>Power and Sample Size>2 Proportions. For the field “Proportion 1 values:” type .85 and for the field “Power values:” type .90; The last field “Proportion 2:” enter .95 then click OK. A sample of at least 188 is necessary for each group to be able to detect a 10% difference. If you have reason to believe your improved process has only improved to 90% and you would like to be able to prove an improvement is occurring the sample size of 188 is not appropriate. Recalculate using .90 for proportion 2 and leave proportion 1 at .85. It would require a sample size of 918 for each sample!

Comparing Two Proportions The following data were taken: Calculate proportions:   Total Samples Accurate Before Improvement 600 510 After Improvement 225 212 Before Improvement: 600 samples, 510 accurate After Improvement: 225 samples, 212 accurate The data shown here was gathered for two processes.

Comparing Two Proportions Stat>Basic Statistics>2 Proportions… Test and CI for Two Proportions Sample X N Sample p 1 510 600 0.850000 2 212 225 0.942222 Difference = p (1) - p (2) Estimate for difference: -0.0922222 95% CI for difference: (-0.134005, -0.0504399) Test for difference = 0 (vs not = 0): Z = -4.33 P-Value = 0.000 Statistical Conclusion: Reject the null Practical Conclusion: You have achieved a significant difference in accuracy. To compare two proportions in MINITABTM select Stat>Basic Statistics>2 Proportions… Select the “Summarized data” option and in the “Trials:” and the “Events:” column input the appropriate data and click “OK”.

Boris and Igor tend to make a lot of mistakes writing requisitions. Exercise Exercise objective: To practice solving a problem presented using the appropriate Hypothesis Test. Boris and Igor tend to make a lot of mistakes writing requisitions. Who is worse? Is the sample size large enough? Exercise.

2 Proportion vs Target Example: Solution First we need to calculate our estimated p1 and p2 for Boris and Igor. Boris Igor Please read the slide.

2 Proportion vs Target Example: Solution Now let’s see what the minimum sample size should be… Stat > Power and Sample Size > 2 Proportions As you can see we Fail to reject the null hypothesis with the data given. One conclusion is the sample size is not large enough. It would take a minimum sample size of 1673 to distinguish the sample proportions for Boris and Igor. Sample X N Sample p 1 47 356 0.132022 2 99 571 0.173380 Difference = p (1) - p (2) Estimate for difference: -0.0413576 95% CI for difference: (-0.0882694, 0.00555426) Test for difference = 0 (vs not = 0): Z = -1.73 P-Value = 0.084 Power and Sample Size Test for Two Proportions Testing proportion 1 = proportion 2 (versus not =) Calculating power for proportion 2 = 0.13 Alpha = 0.05 Sample Target Proportion 1 Size Power Actual Power 0.17 1673 0.9 0.900078 The sample size is for each group.

Some examples for use include: Return proportion by product line Contingency Tables Contingency Tables are used to simultaneously compare more than two sample proportions with each other. It is called a Contingency Table because we are testing if the proportion is contingent upon, or dependent upon, the factor used to subgroup the data. This test generally works the best with five or more observations in each cell. Observations can be pooled by combining cells. Some examples for use include: Return proportion by product line Claim proportion by customer Defect proportion by manufacturing line That? ..oh, that’s my contingency table! Please read the slide.

Ha: at least one p is different Contingency Tables The null hypothesis is that the population proportions of each group are the same. Ho: p1 = p2 = p3 = … = pn Ha: at least one p is different Statisticians have shown the following statistic forms a chi-square distribution when H0 is true: Where “observed” is the sample frequency, “expected” is the calculated frequency based on the null hypothesis and the summation is over all cells in the table. Please read the slide.

Test Statistic Calculations Chi-square Test Where: O = the observed value (from sample data) E = the expected value r = number of rows c = number of columns Frow = total frequency for that row Fcol = total frequency for that column Ftotal = total frequency for the table n = degrees of freedom [(r-1)(c-1)] From the Chi-Square Table Wow!!! Can you believe this is the math in a Contingency Table. Thank goodness for MINITABTM. Now let’s do an example.

Contingency Table Example Larry, Curley and Moe are order entry operators and you suspect one of them has a lower defect rate than the others. Ho: pMoe = pLarry = pCurley Ha: at least one p is different Use Contingency Table since there are 3 proportions. Sample Size: To ensure a minimum of 5 occurrences were detected the test was run for one day. Can’t you clowns get the entries correct?! Note the data gathered in the table. Curley is not looking too good right now (as if he ever did).

Contingency Table Example The sample data are the “observed” frequencies. To calculate the “expected” frequencies, first add the rows and columns: Then calculate the overall proportion for each row: 33/108 = 0.306 The sample data are the “observed” frequencies. To calculate the “expected” frequencies first add the rows and columns. Then calculate the overall proportion for each row.

Contingency Table Example Now use these proportions to calculate the expected frequencies in each cell: 0.306 * 45 = 13.8 0.694 * 38 = 26.4 Please read the slide.

Contingency Table Example Next calculate the 2 value for each cell in the table: Finally add these numbers to get the observed chi-square: Moe Larry Curley Defective 0.912 1.123 2.841 OK 0.401 0.494 1.250 Please read the slide.

Contingency Table Example A summary of the table: Defective OK The final step is to create a summary table including the observed chi-squared.

Contingency Table Example Critical Value ~ Like any other Hypothesis Test compare the observed statistic with the critical statistic. We decide a = 0.05 so what else do we need to know? For a chi-square distribution we need to specify n in a Contingency Table: n = (r - 1)(c - 1), where r = # of rows c = # of columns In our example we have 2 rows and 3 columns so n = 2 What is the critical chi-square? For a Contingency Table all the risk is in the right hand tail (i.e. a one-tail test); look it up in MINITABTM using Calc>Probability Distributions>Chisquare… Please read the slide.

Contingency Table Example Graphical Summary: Since the observed chi-square exceeds the critical chi-square we reject the null hypothesis that the defect rate is independent of which person enters the orders. Accept Reject Chi-square probability density function for n = 2 Please read the slide.

Contingency Table Example Using MINITABTM ~ Of course MINITABTM eliminates the tedium of crunching these numbers. Type the order entry data from the Contingency Table Example into MINITABTM as shown: Notice the row labels are not necessary and row and column totals are not used just the observed counts for each cell. Please read the slide.

Contingency Table Example Stat>Tables>Chi-Square Test (2 way table in worksheet) Chi-Square Test: Moe, Larry, Curley Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Moe Larry Curley Total 1 5 8 20 33 7.64 11.61 13.75 0.912 1.123 2.841 2 20 30 25 75 17.36 26.39 31.25 0.401 0.494 1.250 Total 25 38 45 108 Chi-Sq = 7.021, DF = 2, P-Value = 0.030 Statistical Conclusion: Reject the null hypothesis. Practical Conclusion: The defect rate for one of these stooges is different. In other words, defect rate is contingent upon the stooge. As you can see the data confirms: to reject the null hypothesis and the Practical Conclusion is: The defect rate for one of these stooges is different. In other words defect rate is contingent upon the stooge.

Write the null and alternative hypothesis. Exercise Exercise objective: To practice solving problem presented using the appropriate Hypothesis Test. You are the quotations manager and your team thinks the reason you do not get a contract depends on its complexity. You determine a way to measure complexity and classify lost contracts as follows: Write the null and alternative hypothesis. Does complexity have an effect? Exercise.

Contingency Table Example: Solution First we need to create a table in MINITABTM Secondly, in MINITABTM perform a Chi-Square Test Stat>Tables>Chi-Square Test Please read the slide.

Contingency Table Example: Solution Are the factors independent of each other? After analyzing the data we can see the P-value is 0.426 which is larger than 0.05. Therefore we except the null hypothesis. Instructor notes: 1. Ho: plow = pmed = phigh Ha: at least one is different 2. Obs Chi square = 3.856 Crit Chi square = 9.488 df = (3-1)(3-1) Fail to reject. There is no basis they do not get contracts because of their complexity.

Contingency Tables are another form of Hypothesis Testing. Overview Contingency Tables are another form of Hypothesis Testing. They are used to test for association (or dependency) between two classifications. The null hypothesis is that the classifications are independent. A Chi-square Test is used for frequency (count) type data. If the data is converted to a rate (over time) then a continuous type test would be possible. However, determining the period of time that the rate is based on can be controversial. We do not want to just pick a convenient interval; there needs to be some rationale behind the decision. Many times we see rates based on a day because that is the easiest way to collect data. However a more appropriate way would be to look at the rate distribution per hour. Per hour? Per day? Per month? Please read the slide.

At this point you should be able to: Summary At this point you should be able to: Calculate and explain test for proportions Calculate and explain Contingency Tests Explain and execute a Chi-squared Test Please read the slide.

Learn about IASSC Certifications and Exam options at… http://www.iassc.org/six-sigma-certification/ IASSC Certified Lean Six Sigma Green Belt (ICGB) The International Association for Six Sigma Certification (IASSC) is a Professional Association dedicated to growing and enhancing the standards within the Lean Six Sigma Community. IASSC is the only independent third-party certification body within the Lean Six Sigma Industry that does not provide training, mentoring and coaching or consulting services. IASSC exclusively facilitates and delivers centralized universal Lean Six Sigma Certification Standards testing and organizational Accreditations. The IASSC Certified Lean Six Sigma Green Belt (ICGB) is an internationally recognized professional who is well versed in the Lean Six Sigma Methodology who both leads or supports improvement projects. The Certified Green Belt Exam, is a 3 hour 100 question proctored exam. 37