Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls.

Slides:



Advertisements
Similar presentations
Chapter 16 Inferential Statistics
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
i) Two way ANOVA without replication
Analysis of Variance (ANOVA) ANOVA can be used to test for the equality of three or more population means We want to use the sample results to test the.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Part IVA Analysis of Variance (ANOVA) Dr. Stephen H. Russell Weber State University.
Bayesian Networks I: Static Models & Multinomial Distributions By Peter Woolf University of Michigan Michigan Chemical Process Dynamics.
ANOVA: ANalysis Of VAriance. In the general linear model x = μ + σ 2 (Age) + σ 2 (Genotype) + σ 2 (Measurement) + σ 2 (Condition) + σ 2 (ε) Each of the.
Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls.
Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf University of Michigan Michigan Chemical Process Dynamics.
Statistics for Managers Using Microsoft® Excel 5th Edition
Dynamical Systems Analysis I: Fixed Points & Linearization By Peter Woolf University of Michigan Michigan Chemical Process Dynamics.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Dynamical Systems Analysis II: Evaluating Stability, Eigenvalues By Peter Woolf University of Michigan Michigan Chemical Process Dynamics.
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Linear Regression Example Data
Statistics 270– Lecture 25. Cautions about Z-Tests Data must be a random sample Outliers can distort results Shape of the population distribution matters.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Go to Table of ContentTable of Content Analysis of Variance: Randomized Blocks Farrokh Alemi Ph.D. Kashif Haqqi M.D.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
PS 225 Lecture 15 Analysis of Variance ANOVA Tables.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Modern Languages Row A Row B Row C Row D Row E Row F Row G Row H Row J Row K Row L Row M
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
1 G Lect 6b G Lecture 6b Generalizing from tests of quantitative variables to tests of categorical variables Testing a hypothesis about a.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
ANOVA (Analysis of Variance) by Aziza Munir
Research Project Statistical Analysis. What type of statistical analysis will I use to analyze my data? SEM (does not tell you level of significance)
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
1 Analysis of Variance Chapter 14 2 Introduction Analysis of variance helps compare two or more populations of quantitative data. Specifically, we are.
Section 10.1 Confidence Intervals
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
CHAPTER 4 Analysis of Variance One-way ANOVA
1.1 Statistical Analysis. Learning Goals: Basic Statistics Data is best demonstrated visually in a graph form with clearly labeled axes and a concise.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
ETM U 1 Analysis of Variance (ANOVA) Suppose we want to compare more than two means? For example, suppose a manufacturer of paper used for grocery.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Data Analysis.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
PCB 3043L - General Ecology Data Analysis.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
MAKING MEANING OUT OF DATA Statistics for IB-SL Biology.
Chapter 13 Understanding research results: statistical inference.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Two-Way Analysis of Variance Chapter 11.
i) Two way ANOVA without replication
PCB 3043L - General Ecology Data Analysis.
Testing a Claim About a Mean:  Not Known
Introduction to Inferential Statistics
Goodness of Fit.
Presentation transcript:

Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version 1.0 Creative commons

Unit 1 Unit 2 Scenario: You have two parallel processes that carry out the same reaction using very similar equipment. Question: Are these units actually behaving the same or not?

Approach: (1) Gather data on yield from both units Plot of data does not clearly show any difference

Approach: (1) Gather data on yield from both units (2) Perform statistical analysis - Fisher’s exact test - Chi squared test - ANOVA Requires binning Directly on data

Binning Data: Data reduction by reassigning data into windows HIGH LOW

Binning Data: Data reduction by reassigning data into windows HIGH LOW Choosing a binning strategy: Assign to bins that naturally appear such as groupings or important thresholds (e.g. yield>50 is profitable, so this is a natural window) If multiple windows appear, assign multiple bins If no natural bins appear, choose equally sized bins or above/below average Bin in excel with IF.. THEN statements

HIGH LOW For Fisher’s exact and Chi squared tests,create a contingency table. Contingency table High Low Unit 1 Unit As mentioned in last lecture, we can use Fisher’s exact to calculate a p-value of the probability of finding this configuration at random

High Low Unit 1 Unit observed High Low Unit 1 Unit “more extreme” configuration “most extreme” configuration High Low Unit 1 Unit

Observed case Probability of configuration # changes away from observed More extreme = Less extreme = Most likely cases if this were a random sample Total area=1.0 Conclusion: The units are behaving differently IDEA! The distance between observed case and the most likely if random is far, so can we just use that?

# changes away from observed IDEA! The distance between observed case and the most likely if random is far, so can we just use that? Probability of configuration If this distance is “big” then the observed case is unusual What is this point?

High Low Unit 1 Unit What is this point? Observed case Most likely case if random High Low Unit 1 Unit =150*(135/300) =67.5 =150*(135/300) =67.5 =150*(165/300) =82.5 =150*(165/300) =82.5 Distance between these two cases? But this depends on the magnitude, so normalize it.. Chi squared statistic

High Low Unit 1 Unit Observed case Most likely case if random High Low Unit 1 Unit Chi squared statistic For this case: Okay.. So what? What is the p-value? =150*(135/300) =67.5 =150*(135/300) =67.5 =150*(165/300) =82.5 =150*(165/300) =82.5

Chi squared statistic For this case: The chi squared statistic has a known distribution that can be looked up or found in excel using “chidist” with 1 degree of freedom. =chidist(11.33,1)= This can be done in a more automated way in excel using “chitest” For this case chitest & Fisher’s exact agree

Chi squared test vs. Fisher’s exact For a random null, Fisher’s exact will always yield a correct result Chi squared test is often easier to carry out (the math is easier) Chi squared will give incorrect results when fewer than 20 samples are present if there are between 20 and 40 samples and one expected number is 5 or below Chitest says the result is 2x more significant--error due to small sample effect

Chi squared test vs. Fisher’s exact (continued) Chi squared test is easy to do for larger contingency tables and when the expected distribution is not random. Can be done with a Fisher’s like test, but the math gets much harder. Example: 3 by 3 contingency table with a model for expectations Observed is close to the expected, but far from random

Approach: (1) Gather data on yield from both units (2) Perform statistical analysis - Fisher’s exact test - Chi squared test - ANOVA Requires binning Directly on data

ANOVA: Analysis of Variance Method to compare continuous measurements determine if they are sampled from the same or different distributions. For a single factor ANOVA, we assume that each observation in each class can be modeled as: Observation = overall mean + class effect + random error In the study we are following in this class, the class effect would be the effect unit 1 or unit 2. ANOVA analysis can be easily done in Excel using Tools->Data Analysis-> ANOVA

1 way ANOVA Key value: p-value here tells the probability that both units (each group) are the same.

2 way ANOVA with replicates Scenario: Testing three units in triplicate, each with three different control architectures: Feedback (FB), Model predictive control (MPC), and a cascade architecture. In each case we measure the yield. Questions: 1)Do the units significantly differ? 2)Do the control architectures significantly differ? Tools->Data Analysis ->ANOVA:Two factor with replication

2 way ANOVA with replicates Controllers (samples) have a significant effect ?? Looks like an error, and may be why we get a negative F value and no p-value Columns (units) don’t have a significant effect

ANOVA ANOVA tells you if factors are significantly related to an outcome according to a linear model –Nonlinear relationships can be strong, but may appear insignificant in an ANOVA analysis. ANOVA does not tell you the model parameters. ANOVA, t-test, and z-test all provide similar kinds of information for different kinds of data.

Unit 1 Unit 2 Physical process Experimental Data Statistical Analysis Results: Unit 1 is different from unit 2 This difference is clearer in the binned data (chi squared and fisher’s<ANOVA)

Take Home Messages Chi squared tests are analogous to Fisher’s exact tests, but are generally easier to calculate Chi squared tests fail when sample sizes are small ANOVA determines if lists of continuous measurements likely the same or different ANOVA can determine the significance of a set of factors on the measurements

The following pages have additional examples of ChemE applications of ANOVA analyses

Solution approach: two factor ANOVA. Factor 1: Farm Factor 2: Shipper See if a factor has a significant p-value

Looking at averages and ranges, it looks like shipper Rex has a somewhat worse record than Ned. The farms have some variation, but it is small. This said, both shippers will bring wheat with moths, but Rex will bring more.

1) Import data into Excel 2) Select Tools->Data Analysis-> ANOVA: Two factor with Replication Conclusion, the factor “shipper” has a significant Influence on the moth probability with a p-value of 0.03

ANOVA- ChemE examples How does temperature affect yield?

ANOVA- ChemE examples Do both temperature and concentration affect yield?

ANOVA- ChemE examples How can controlling v4 and v2 differently affect process profitability? Example from 2006 controls wiki:

How can controlling v4 and v2 differently affect process profitability? DATA Example from 2006 controls wiki:

How can controlling v4 and v2 differently affect process profitability? ANOVA Example from 2006 controls wiki: DATA