Analyze Phase Hypothesis Testing Non Normal Data Part 2

Analyze Phase Hypothesis Testing Non Normal Data Part 2
Now we will continue in the Analyze Phase with “Hypothesis Testing Non-Normal Data Part 2”.

Hypothesis Testing Non Normal Data Part 2
Tests for Proportions Contingency Tables Hypothesis Testing NND P1 Hypothesis Testing ND P1 Intro to Hypothesis Testing Inferential Statistics “X” Sifting Welcome to Analyze Hypothesis Testing ND P2 Wrap Up & Action Items Hypothesis Testing NND P2 The core fundamentals of this phase are Tests for Proportions and Contingency Tables.

Hypothesis Testing Roadmap Attribute Data
One Factor Two Factors One Sample Proportion Two Sample Proportion MINITABTM: Stat - Basic Stats - 2 Proportions If P-value < 0.05 the proportions are different Chi Square Test (Contingency Table) Stat - Tables - Chi-Square Test If P-value < 0.05 the factors are not independent If P-value < 0.05 at least one proportion is different Two or More Samples Two Samples One Sample Attribute Data We will now continue with the roadmap for Attribute Data. Since Attribute Data is Non-normal by definition it belongs in this module on Non-normal Data.

Sample Size and Types of Data
For Continuous Data: Capability Analysis – a minimum of 30 samples Hypothesis Testing – depends on the practical difference to be detected and the inherent variation in the process as well as the statistical confidence you wish to have. For Attribute Data: Capability Analysis – a lot of samples Hypothesis Testing – a lot but depends on practical difference to be detected as well as the statistical confidence you wish to have. Sample size is dependent on the type of data. MINITABTM can estimate sample sizes but remember the smaller the difference that needs to be detected the larger the sample size must be!

Proportion versus a Target
This test is used to determine if the process proportion (p) equals some desired value, p0. The hypotheses: Ho: p = p 0 Ha: p p 0 The observed test statistic is calculated as follows: (normal approximation) This is compared to Zcrit = Za/2 This formula is an approximation for ease of manual calculation.

Shipping accuracy has a target of 99%; determine if the current process is on target. Hypotheses: Ho: p = 0.99 Ha: p One sample proportion test Choose a = 5% Sample size: Enter multiple values for alternative values of p and MINITABTM will give the different sample sizes. Stat > Power and Sample Size > 1 Proportion… Now let’s try an example.

Our sample included 500 shipped items of which 480 were accurate. Power and Sample Size Test for One Proportion Testing proportion = 0.99 (versus not = 0.99) Alpha = 0.05 Alternative Sample Target Proportion Size Power Actual Power Take note of how quickly the sample size increases as the alternative proportion goes up. It would require 1402 samples to tell a difference between 98% and 99% accuracy. Our sample of 500 will do because the alternative hypothesis is 96% according to the proportion formula.

Statistical Conclusion: Reject the null hypothesis because the hypothesized Mean is not within the confidence interval. Practical Conclusion: We are not performing to the accuracy target of 99%. Stat > Basic Statistics > 1 Proportion… Test and CI for One Proportion Test of p = 0.99 vs p not = 0.99 Exact Sample X N Sample p % CI P-value ( , ) After you analyze the data you will see the statistical conclusion is to reject the null hypothesis. What is the practical conclusion…the process is not performing to the desired accuracy of 99%.

Out of 2000 shipments only 1680 were accurate.
Exercise Exercise objective: To practice solving problem presented using the appropriate Hypothesis Test. You are the shipping manager charged with improving shipping accuracy. Your annual bonus depends on your ability to prove shipping accuracy is better than the target of 80%. How many samples do you need to take if the anticipated sample proportion is 82%? Out of 2000 shipments only 1680 were accurate. Do you get your annual bonus? Was the sample size good enough? Exercise.

Proportion vs Target Example: Solution
First we must determine the proper sample size to achieve our target of 80%. Stat > Power and Sample Size > 1 Proportion… The Alternative Proportion should be .82 and the Hypothesized Proportion should be Select a Power Value of .9 and click “OK”. As you can see the Sample Size should be at least 4073 to prove our hypothesis.

Proportion versus Target Example: Solution
Now let’s calculate if we receive our bonus… Out of the 2000 shipments 1680 were accurate. Was the sample size sufficient? ? Do you get your bonus? Yes, you get your bonus since .80 is not within the confidence interval. Because the improvement was 84% the sample size was sufficient. Answer: Use alternative proportion of .82, hypothesized proportion of n = Either you had better ship a lot of stuff or you had better improve the process more than just 2%!

Comparing Two Proportions
This test is used to determine if the process defect rate (or proportion, p) of one sample differs by a certain amount, D, from that of another sample (e.g., before and after your improvement actions) The hypotheses: H0: p1 - p2 = D Ha: p1 – p2 = D The test statistic is calculated as follows: This is compared to Zcritical = Za/2 Catch some Z’s! MINITABTM gives you a choice of using the Normal approximation or the exact method. We will use the exact method. The formula here is an approximation for ease of manual calculation.

Sample Size and Two Proportions
Take a few moments to practice calculating the minimum sample size required to detect a difference between two proportions using a power of 0.90. Enter the expected proportion for proportion 2 (null hypothesis). For a more conservative estimate when the null hypothesis is close to 100 use smaller proportion for p1. When the null hypothesis is close to 0, use the larger proportion for p1. a  p1 p n 5% ___________ 5% ___________ 5% ___________ 5% ___________ 5% ___________ 5% ___________ Answers: 34,247 32,986 4,301 5,142 5,831

Shipping accuracy must improve from a historical baseline of 85% towards a target of 95%. Determine if the process improvements made have increased the accuracy. Hypotheses: Ho: p1 – p2 = 0.0 Ha: p1 – p Two sample proportion test Choose a = 5% Sample size ~ Stat>Power and Sample Size> 2 Proportions… Power and Sample Size Test for Two Proportions Testing proportion 1 = proportion 2 (versus not =) Calculating power for proportion 2 = 0.95 Alpha = 0.05 Sample Target Proportion Size Power Actual Power The sample size is for each group. In MINITABTM click Stat>Power and Sample Size>2 Proportions. For the field “Proportion 1 values:” type .85 and for the field “Power values:” type .90; The last field “Proportion 2:” enter .95 then click OK. A sample of at least 188 is necessary for each group to be able to detect a 10% difference. If you have reason to believe your improved process has only improved to 90% and you would like to be able to prove an improvement is occurring the sample size of 188 is not appropriate. Recalculate using .90 for proportion 2 and leave proportion 1 at It would require a sample size of 918 for each sample!

The following data were taken: Calculate proportions: Total Samples Accurate Before Improvement 600 510 After Improvement 225 212 Before Improvement: 600 samples, 510 accurate After Improvement: samples, 212 accurate The data shown here was gathered for two processes.

Stat>Basic Statistics>2 Proportions… Test and CI for Two Proportions Sample X N Sample p Difference = p (1) - p (2) Estimate for difference: 95% CI for difference: ( , ) Test for difference = 0 (vs not = 0): Z = P-Value = 0.000 Statistical Conclusion: Reject the null Practical Conclusion: You have achieved a significant difference in accuracy. To compare two proportions in MINITABTM select Stat>Basic Statistics>2 Proportions… Select the “Summarized data” option and in the “Trials:” and the “Events:” column input the appropriate data and click “OK”.

Boris and Igor tend to make a lot of mistakes writing requisitions.
Exercise Exercise objective: To practice solving a problem presented using the appropriate Hypothesis Test. Boris and Igor tend to make a lot of mistakes writing requisitions. Who is worse? Is the sample size large enough? Exercise.

2 Proportion vs Target Example: Solution
First we need to calculate our estimated p1 and p2 for Boris and Igor. Boris Igor Please read the slide.

2 Proportion vs Target Example: Solution
Now let’s see what the minimum sample size should be… Stat > Power and Sample Size > 2 Proportions As you can see we Fail to reject the null hypothesis with the data given. One conclusion is the sample size is not large enough. It would take a minimum sample size of 1673 to distinguish the sample proportions for Boris and Igor. Sample X N Sample p Difference = p (1) - p (2) Estimate for difference: 95% CI for difference: ( , ) Test for difference = 0 (vs not = 0): Z = P-Value = 0.084 Power and Sample Size Test for Two Proportions Testing proportion 1 = proportion 2 (versus not =) Calculating power for proportion 2 = 0.13 Alpha = 0.05 Sample Target Proportion Size Power Actual Power The sample size is for each group.

Some examples for use include: Return proportion by product line
Contingency Tables Contingency Tables are used to simultaneously compare more than two sample proportions with each other. It is called a Contingency Table because we are testing if the proportion is contingent upon, or dependent upon, the factor used to subgroup the data. This test generally works the best with five or more observations in each cell. Observations can be pooled by combining cells. Some examples for use include: Return proportion by product line Claim proportion by customer Defect proportion by manufacturing line That? ..oh, that’s my contingency table! Please read the slide.

Ha: at least one p is different
Contingency Tables The null hypothesis is that the population proportions of each group are the same. Ho: p1 = p2 = p3 = … = pn Ha: at least one p is different Statisticians have shown the following statistic forms a chi-square distribution when H0 is true: Where “observed” is the sample frequency, “expected” is the calculated frequency based on the null hypothesis and the summation is over all cells in the table. Please read the slide.

Test Statistic Calculations
Chi-square Test Where: O = the observed value (from sample data) E = the expected value r = number of rows c = number of columns Frow = total frequency for that row Fcol = total frequency for that column Ftotal = total frequency for the table n = degrees of freedom [(r-1)(c-1)] From the Chi-Square Table Wow!!! Can you believe this is the math in a Contingency Table. Thank goodness for MINITABTM. Now let’s do an example.

Contingency Table Example
Larry, Curley and Moe are order entry operators and you suspect one of them has a lower defect rate than the others. Ho: pMoe = pLarry = pCurley Ha: at least one p is different Use Contingency Table since there are 3 proportions. Sample Size: To ensure a minimum of 5 occurrences were detected the test was run for one day. Can’t you clowns get the entries correct?! Note the data gathered in the table. Curley is not looking too good right now (as if he ever did).

The sample data are the “observed” frequencies. To calculate the “expected” frequencies, first add the rows and columns: Then calculate the overall proportion for each row: 33/108 = 0.306 The sample data are the “observed” frequencies. To calculate the “expected” frequencies first add the rows and columns. Then calculate the overall proportion for each row.

Now use these proportions to calculate the expected frequencies in each cell: 0.306 * 45 = 13.8 0.694 * 38 = 26.4 Please read the slide.

Next calculate the 2 value for each cell in the table: Finally add these numbers to get the observed chi-square: Moe Larry Curley Defective OK Please read the slide.

A summary of the table: Defective OK The final step is to create a summary table including the observed chi-squared.

Critical Value ~ Like any other Hypothesis Test compare the observed statistic with the critical statistic. We decide a = 0.05 so what else do we need to know? For a chi-square distribution we need to specify n in a Contingency Table: n = (r - 1)(c - 1), where r = # of rows c = # of columns In our example we have 2 rows and 3 columns so n = 2 What is the critical chi-square? For a Contingency Table all the risk is in the right hand tail (i.e. a one-tail test); look it up in MINITABTM using Calc>Probability Distributions>Chisquare… Please read the slide.

Graphical Summary: Since the observed chi-square exceeds the critical chi-square we reject the null hypothesis that the defect rate is independent of which person enters the orders. Accept Reject Chi-square probability density function for n = 2 Please read the slide.

Using MINITABTM ~ Of course MINITABTM eliminates the tedium of crunching these numbers. Type the order entry data from the Contingency Table Example into MINITABTM as shown: Notice the row labels are not necessary and row and column totals are not used just the observed counts for each cell. Please read the slide.

Stat>Tables>Chi-Square Test (2 way table in worksheet) Chi-Square Test: Moe, Larry, Curley Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Moe Larry Curley Total Total Chi-Sq = 7.021, DF = 2, P-Value = 0.030 Statistical Conclusion: Reject the null hypothesis. Practical Conclusion: The defect rate for one of these stooges is different. In other words, defect rate is contingent upon the stooge. As you can see the data confirms: to reject the null hypothesis and the Practical Conclusion is: The defect rate for one of these stooges is different. In other words defect rate is contingent upon the stooge.

Write the null and alternative hypothesis.
Exercise Exercise objective: To practice solving problem presented using the appropriate Hypothesis Test. You are the quotations manager and your team thinks the reason you do not get a contract depends on its complexity. You determine a way to measure complexity and classify lost contracts as follows: Write the null and alternative hypothesis. Does complexity have an effect? Exercise.

Contingency Table Example: Solution
First we need to create a table in MINITABTM Secondly, in MINITABTM perform a Chi-Square Test Stat>Tables>Chi-Square Test Please read the slide.

Contingency Table Example: Solution
Are the factors independent of each other? After analyzing the data we can see the P-value is which is larger than Therefore we except the null hypothesis. Instructor notes: 1. Ho: plow = pmed = phigh Ha: at least one is different 2. Obs Chi square = Crit Chi square = df = (3-1)(3-1) Fail to reject. There is no basis they do not get contracts because of their complexity.

Contingency Tables are another form of Hypothesis Testing.
Overview Contingency Tables are another form of Hypothesis Testing. They are used to test for association (or dependency) between two classifications. The null hypothesis is that the classifications are independent. A Chi-square Test is used for frequency (count) type data. If the data is converted to a rate (over time) then a continuous type test would be possible. However, determining the period of time that the rate is based on can be controversial. We do not want to just pick a convenient interval; there needs to be some rationale behind the decision. Many times we see rates based on a day because that is the easiest way to collect data. However a more appropriate way would be to look at the rate distribution per hour. Per hour? Per day? Per month? Please read the slide.

At this point you should be able to:
Summary At this point you should be able to: Calculate and explain test for proportions Calculate and explain Contingency Tests Explain and execute a Chi-squared Test Please read the slide.

Learn about IASSC Certifications and Exam options at…
IASSC Certified Lean Six Sigma Green Belt (ICGB) The International Association for Six Sigma Certification (IASSC) is a Professional Association dedicated to growing and enhancing the standards within the Lean Six Sigma Community. IASSC is the only independent third-party certification body within the Lean Six Sigma Industry that does not provide training, mentoring and coaching or consulting services. IASSC exclusively facilitates and delivers centralized universal Lean Six Sigma Certification Standards testing and organizational Accreditations. The IASSC Certified Lean Six Sigma Green Belt (ICGB) is an internationally recognized professional who is well versed in the Lean Six Sigma Methodology who both leads or supports improvement projects. The Certified Green Belt Exam, is a 3 hour 100 question proctored exam. 37

Analyze Phase Hypothesis Testing Non Normal Data Part 2

Similar presentations

Presentation on theme: "Analyze Phase Hypothesis Testing Non Normal Data Part 2"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analyze Phase Hypothesis Testing Non Normal Data Part 2

Similar presentations

Presentation on theme: "Analyze Phase Hypothesis Testing Non Normal Data Part 2"— Presentation transcript:

Similar presentations

About project

Feedback