Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls.

Similar presentations


Presentation on theme: "Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls."— Presentation transcript:

1 Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version 1.0 Creative commons

2 Unit 1 Unit 2 Scenario: You have two parallel processes that carry out the same reaction using very similar equipment. Question: Are these units actually behaving the same or not?

3 Approach: (1) Gather data on yield from both units Plot of data does not clearly show any difference

4 Approach: (1) Gather data on yield from both units (2) Perform statistical analysis - Fisher’s exact test - Chi squared test - ANOVA Requires binning Directly on data

5 Binning Data: Data reduction by reassigning data into windows HIGH LOW

6 Binning Data: Data reduction by reassigning data into windows HIGH LOW Choosing a binning strategy: Assign to bins that naturally appear such as groupings or important thresholds (e.g. yield>50 is profitable, so this is a natural window) If multiple windows appear, assign multiple bins If no natural bins appear, choose equally sized bins or above/below average Bin in excel with IF.. THEN statements

7 HIGH LOW For Fisher’s exact and Chi squared tests,create a contingency table. Contingency table High Low Unit 1 Unit 2 53 8268 97 135165 150 300 As mentioned in last lecture, we can use Fisher’s exact to calculate a p-value of the probability of finding this configuration at random

8 High Low Unit 1 Unit 2 53 8268 97 135165 150 300 observed High Low Unit 1 Unit 2 52 8367 98 135165 150 300 “more extreme” configuration “most extreme” configuration High Low Unit 1 Unit 2 0 135 150 135165 150 300 15

9 Observed case Probability of configuration # changes away from observed More extreme =0.0005 Less extreme =0.9995 Most likely cases if this were a random sample Total area=1.0 Conclusion: The units are behaving differently IDEA! The distance between observed case and the most likely if random is far, so can we just use that?

10 # changes away from observed IDEA! The distance between observed case and the most likely if random is far, so can we just use that? Probability of configuration If this distance is “big” then the observed case is unusual What is this point?

11 High Low Unit 1 Unit 2 53 8268 97 135165 150 300 What is this point? Observed case Most likely case if random High Low Unit 1 Unit 2 135165 150 300 =150*(135/300) =67.5 =150*(135/300) =67.5 =150*(165/300) =82.5 =150*(165/300) =82.5 Distance between these two cases? But this depends on the magnitude, so normalize it.. Chi squared statistic

12 High Low Unit 1 Unit 2 53 8268 97 135165 150 300 Observed case Most likely case if random High Low Unit 1 Unit 2 135165 150 300 Chi squared statistic For this case: Okay.. So what? What is the p-value? =150*(135/300) =67.5 =150*(135/300) =67.5 =150*(165/300) =82.5 =150*(165/300) =82.5

13 Chi squared statistic For this case: The chi squared statistic has a known distribution that can be looked up or found in excel using “chidist” with 1 degree of freedom. =chidist(11.33,1)=0.00076 This can be done in a more automated way in excel using “chitest” For this case chitest & Fisher’s exact agree

14 Chi squared test vs. Fisher’s exact For a random null, Fisher’s exact will always yield a correct result Chi squared test is often easier to carry out (the math is easier) Chi squared will give incorrect results when fewer than 20 samples are present if there are between 20 and 40 samples and one expected number is 5 or below Chitest says the result is 2x more significant--error due to small sample effect

15 Chi squared test vs. Fisher’s exact (continued) Chi squared test is easy to do for larger contingency tables and when the expected distribution is not random. Can be done with a Fisher’s like test, but the math gets much harder. Example: 3 by 3 contingency table with a model for expectations Observed is close to the expected, but far from random

16 Approach: (1) Gather data on yield from both units (2) Perform statistical analysis - Fisher’s exact test - Chi squared test - ANOVA Requires binning Directly on data

17 ANOVA: Analysis of Variance Method to compare continuous measurements determine if they are sampled from the same or different distributions. For a single factor ANOVA, we assume that each observation in each class can be modeled as: Observation = overall mean + class effect + random error In the study we are following in this class, the class effect would be the effect unit 1 or unit 2. ANOVA analysis can be easily done in Excel using Tools->Data Analysis-> ANOVA

18 1 way ANOVA Key value: p-value here tells the probability that both units (each group) are the same.

19 2 way ANOVA with replicates Scenario: Testing three units in triplicate, each with three different control architectures: Feedback (FB), Model predictive control (MPC), and a cascade architecture. In each case we measure the yield. Questions: 1)Do the units significantly differ? 2)Do the control architectures significantly differ? Tools->Data Analysis ->ANOVA:Two factor with replication

20 2 way ANOVA with replicates Controllers (samples) have a significant effect ?? Looks like an error, and may be why we get a negative F value and no p-value Columns (units) don’t have a significant effect

21 ANOVA ANOVA tells you if factors are significantly related to an outcome according to a linear model –Nonlinear relationships can be strong, but may appear insignificant in an ANOVA analysis. ANOVA does not tell you the model parameters. ANOVA, t-test, and z-test all provide similar kinds of information for different kinds of data.

22 Unit 1 Unit 2 Physical process Experimental Data Statistical Analysis Results: Unit 1 is different from unit 2 This difference is clearer in the binned data (chi squared and fisher’s<ANOVA)

23 Take Home Messages Chi squared tests are analogous to Fisher’s exact tests, but are generally easier to calculate Chi squared tests fail when sample sizes are small ANOVA determines if lists of continuous measurements likely the same or different ANOVA can determine the significance of a set of factors on the measurements

24 The following pages have additional examples of ChemE applications of ANOVA analyses

25 Solution approach: two factor ANOVA. Factor 1: Farm Factor 2: Shipper See if a factor has a significant p-value

26 Looking at averages and ranges, it looks like shipper Rex has a somewhat worse record than Ned. The farms have some variation, but it is small. This said, both shippers will bring wheat with moths, but Rex will bring more.

27 1) Import data into Excel 2) Select Tools->Data Analysis-> ANOVA: Two factor with Replication Conclusion, the factor “shipper” has a significant Influence on the moth probability with a p-value of 0.03

28 ANOVA- ChemE examples How does temperature affect yield?

29 ANOVA- ChemE examples Do both temperature and concentration affect yield?

30 ANOVA- ChemE examples How can controlling v4 and v2 differently affect process profitability? Example from 2006 controls wiki: http://controls.engin.umich.edu/wiki/index.php/Design_of_experiments_via_taguchi_methods:_one_and_two_way_layouts

31 How can controlling v4 and v2 differently affect process profitability? DATA Example from 2006 controls wiki: http://controls.engin.umich.edu/wiki/index.php/Design_of_experiments_via_taguchi_methods:_one_and_two_way_layouts

32 How can controlling v4 and v2 differently affect process profitability? ANOVA Example from 2006 controls wiki: http://controls.engin.umich.edu/wiki/index.php/Design_of_experiments_via_taguchi_methods:_one_and_two_way_layouts DATA


Download ppt "Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls."

Similar presentations


Ads by Google