1 CSI5388: Functional Elements of Statistics for Machine Learning Part II.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Chapter 16 Introduction to Nonparametric Statistics
Chapter 15 Comparing Two Populations: Dependent samples.
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
C82MST Statistical Methods 2 - Lecture 4 1 Overview of Lecture Last Week Per comparison and familywise error Post hoc comparisons Testing the assumptions.
Chapter Seventeen HYPOTHESIS TESTING
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Today Concepts underlying inferential statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
Inferential Statistics
Chapter 15 Nonparametric Statistics
Nonparametric or Distribution-free Tests
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
Inferential Statistics
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
Hypothesis Testing:.
Chapter 14: Nonparametric Statistics
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Fundamentals of Hypothesis Testing: One-Sample Tests
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Chapter 15 Correlation and Regression
NONPARAMETRIC STATISTICS
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chapter 15 Data Analysis: Testing for Significant Differences.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Hypothesis Testing Using the Two-Sample t-Test
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
Ordinally Scale Variables
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
I. Statistical Tests: A Repetive Review A.Why do we use them? Namely: we need to make inferences from incomplete information or uncertainty þBut we want.
1 Nonparametric Statistical Techniques Chapter 17.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Chapter 14 – 1 Chapter 14: Analysis of Variance Understanding Analysis of Variance The Structure of Hypothesis Testing with ANOVA Decomposition of SST.
1 Statistical Significance Testing. 2 The purpose of Statistical Significance Testing The purpose of Statistical Significance Testing is to answer the.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
NON-PARAMETRIC STATISTICS
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
The Analysis of Variance ANOVA
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric methods.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part II.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Non – Parametric Test Dr. Anshul Singh Thapa.
Presentation transcript:

1 CSI5388: Functional Elements of Statistics for Machine Learning Part II

2 Part I (This set of lecture notes): Part I (This set of lecture notes): Definition and PreliminariesDefinition and Preliminaries Hypothesis Testing: Parametric ApproachesHypothesis Testing: Parametric Approaches Part II (The next set of lecture notes) Part II (The next set of lecture notes) Hypothesis Testing: Non-Parametric ApproachesHypothesis Testing: Non-Parametric Approaches Power of a TestPower of a Test Statistical Tests for Comparing Multiple ClassifiersStatistical Tests for Comparing Multiple Classifiers Contents of the Lecture

3 Non-parametric approaches to Hypothesis testing The hypothesis testing procedures discussed in the previous lecture are called parametric. The hypothesis testing procedures discussed in the previous lecture are called parametric. This means that they are based on assumptions regarding the distribution of the population for which the test was ran, and rely on the estimation of parameters from these distributions. This means that they are based on assumptions regarding the distribution of the population for which the test was ran, and rely on the estimation of parameters from these distributions. In our cases, we assumed that the distributions were either normal or followed a Student t distribution. The parameters we estimated were the mean and the variance. In our cases, we assumed that the distributions were either normal or followed a Student t distribution. The parameters we estimated were the mean and the variance. The problem we now turn to is the issue of hypothesis testing that is not based on assumptions regarding the distribution and do not rely on the estimation of parameters. The problem we now turn to is the issue of hypothesis testing that is not based on assumptions regarding the distribution and do not rely on the estimation of parameters.

4 The Different Types of non-parametric hypothesis testing approaches I There are two important families of tests that do not involve distributional assumptions and parameter estimations: There are two important families of tests that do not involve distributional assumptions and parameter estimations: Nonparametric tests, which rely on ranking the data and performing a statistical test on the ranks.Nonparametric tests, which rely on ranking the data and performing a statistical test on the ranks. Resampling statistics which consist of drawing samples repeatedly from a population and evaluating the distribution of the result. Resampling Statistics will be discussed in the next lecture.Resampling statistics which consist of drawing samples repeatedly from a population and evaluating the distribution of the result. Resampling Statistics will be discussed in the next lecture.

5 The nonparametric tests are quite useful in populations for which outliers skew the distribution too much. The nonparametric tests are quite useful in populations for which outliers skew the distribution too much. Ranking eliminates the problem. Ranking eliminates the problem. However, they typically are less powerful (see further) than parametric tests. However, they typically are less powerful (see further) than parametric tests. Resampling statistics are useful when the statistics of interest cannot be derived analytically (e.g., statistics about the median of a population), unless we assume a normal distribution. Resampling statistics are useful when the statistics of interest cannot be derived analytically (e.g., statistics about the median of a population), unless we assume a normal distribution. The Different Types of non-parametric hypothesis testing approaches II

6 Non-Parametric Tests Wilcoxon’s Rank-Sum Test The case of independent samples The case of independent samples The case of matched pairs The case of matched pairs

7 Wilcoxon’s Rank-Sum Tests Wilcoxon’s Rank-Sum Tests are equivalent to the t-test, but apply when the normality assumption of the distribution is not met. Wilcoxon’s Rank-Sum Tests are equivalent to the t-test, but apply when the normality assumption of the distribution is not met. As a result of their non-parametric nature, however, power is lost (see further for a formal discussion of power). In particular, the tests are not as specific as their parametric equivalent. As a result of their non-parametric nature, however, power is lost (see further for a formal discussion of power). In particular, the tests are not as specific as their parametric equivalent. This means that, although we interpret the result of these non-parametric tests to mean one thing of a central nature to the distributions under study, they could mean something else. This means that, although we interpret the result of these non-parametric tests to mean one thing of a central nature to the distributions under study, they could mean something else.

8 Wilcoxon’s Rank-Sum Test (for two independent samples) Informal Description I Given two populations with n1 observations in group 1 and n2 observations in group 2. The null hypothesis we are trying to reject is: “H0: The two samples come from identical populations (not just populations with the same mean)”. Given two populations with n1 observations in group 1 and n2 observations in group 2. The null hypothesis we are trying to reject is: “H0: The two samples come from identical populations (not just populations with the same mean)”. We consider two cases: We consider two cases: Case 1: The null hypothesis is false (to a substantial degree) and the scores from population 1 are generally lower than those of population 2.Case 1: The null hypothesis is false (to a substantial degree) and the scores from population 1 are generally lower than those of population 2. Case 2: The null hypothesis is true. This means that the two samples came from the same population.Case 2: The null hypothesis is true. This means that the two samples came from the same population.

9 Wilcoxon’s Rank-Sum Test (for two independent samples) Informal Description II In both cases, the procedure consists of ranking the scores of the two populations taken together. In both cases, the procedure consists of ranking the scores of the two populations taken together. Case 1: In the first case, we assume that the ranks from population 1 should be generally lower than those of population 2. Actually, we could also expect that the sum of the ranks in group 1 is smaller than the sum of the ranks in group 2.Case 1: In the first case, we assume that the ranks from population 1 should be generally lower than those of population 2. Actually, we could also expect that the sum of the ranks in group 1 is smaller than the sum of the ranks in group 2. Case 2: In the second case, we assume that the sum of ranks of the first group is about equal to the sum of ranks of the second group.Case 2: In the second case, we assume that the sum of ranks of the first group is about equal to the sum of ranks of the second group.

10 Wilcoxon’s Rank-Sum Test (for two independent samples) n1 and n2 ≤ 25: Consider the two groups of data sets of size n1 and n2, respectively, where n1 is the smallest sample size. Consider the two groups of data sets of size n1 and n2, respectively, where n1 is the smallest sample size. Rank their scores together from lowest to highest. Rank their scores together from lowest to highest. In case of an x-way tie just after rank y, then assign (y+1 + y +2 + … + y+ x)/x to all the tied elements. In case of an x-way tie just after rank y, then assign (y+1 + y +2 + … + y+ x)/x to all the tied elements. Add the scores of the group containing the smallest number of samples (n1) (if both groups contain as many samples, choose the smallest value). Call this sum Ws. Add the scores of the group containing the smallest number of samples (n1) (if both groups contain as many samples, choose the smallest value). Call this sum Ws. Find the value V in the Wilcoxon table, for n1 and n2 and the significance level s required, where n1 in the table corresponds to the smallest value, as well. Find the value V in the Wilcoxon table, for n1 and n2 and the significance level s required, where n1 in the table corresponds to the smallest value, as well. Compare Ws to V and conclude that the difference between the two groups at the chosen level, L1 for a one-tailed test or 2*L1 for the two-tailed test is significant only if Ws < V. If Ws ≥ V, the null hypothesis cannot be rejected. Compare Ws to V and conclude that the difference between the two groups at the chosen level, L1 for a one-tailed test or 2*L1 for the two-tailed test is significant only if Ws < V. If Ws ≥ V, the null hypothesis cannot be rejected.

11 Wilcoxon’s Rank-Sum Test (for two independent samples) n1 and n2 > 25: Compute Ws as before Compute Ws as before Use the fact that Ws approaches a normal distribution as size increases with: Use the fact that Ws approaches a normal distribution as size increases with: A mean ofA mean of m= n1(n1+n2+1)/2, and A standard error ofA standard error of std= sqrt(n1n2(n1+n2+1)/12) Compute the z statistic Compute the z statistic z = (Ws – m)/std Use the tables of the normal distribution. Use the tables of the normal distribution.

12 Wilcoxon’s Matched Pairs Signed Ranks Test (for paired scores) Informal Description Logic of the Test: Given the same population tested under different circumstances C1 and C2. Given the same population tested under different circumstances C1 and C2. If there is improvement in C2, then most of the results recorded in C2 will be greater than those recorded in C1 and those that are not greater will be smaller by only a small amount. If there is improvement in C2, then most of the results recorded in C2 will be greater than those recorded in C1 and those that are not greater will be smaller by only a small amount.

13 Wilcoxon’s Matched Pairs Signed Ranks Test (for paired scores) n ≤ 50 We calculate the difference score for each pair of measurement We calculate the difference score for each pair of measurement We rank all the difference scores without paying attention to their signs (i.e., we rank their absolute values) We rank all the difference scores without paying attention to their signs (i.e., we rank their absolute values) We assign the algebraic sign of the differences to the ranks We assign the algebraic sign of the differences to the ranks We sum the positive and negative ranks separately We sum the positive and negative ranks separately We choose as test statistic T, the smaller of the absolute values of the two sums. We choose as test statistic T, the smaller of the absolute values of the two sums. We compare T to a Wilcoxon T table We compare T to a Wilcoxon T table

14 Wilcoxon’s Matched Pairs Signed Ranks Test (for paired scores) n > 50 Compute T as before Compute T as before Use the fact that T approaches a normal distribution as size increases with: Use the fact that T approaches a normal distribution as size increases with: A mean ofA mean of m= n(n+1)/4 and A standard errorA standard error std= sqrt(n(n+1)(2n+1)/24) And compute the z statistic And compute the z statistic z = (T – m)/std z = (T – m)/std Use the tables of the normal distribution. Use the tables of the normal distribution.

15 Power Analysis

16 Type I and Type II Errors Definition: A Type I error (α) corresponds to the error of rejecting H0, the null hypothesis, when it is, in fact, true. A Type II error (β) corresponds to the error of failing to reject H0 when it is false. Definition: A Type I error (α) corresponds to the error of rejecting H0, the null hypothesis, when it is, in fact, true. A Type II error (β) corresponds to the error of failing to reject H0 when it is false. Definition: The power of a test is the probability of rejecting H0 given that it is false. Power = 1- β Definition: The power of a test is the probability of rejecting H0 given that it is false. Power = 1- β

17 Why does Power Matter? I All the hypothesis tests described in the previous three sections are only concerned about reducing the Type I error. All the hypothesis tests described in the previous three sections are only concerned about reducing the Type I error. i.e., they try to ascertain the conditions under which we are rejecting a hypothesis rightly. i.e., they try to ascertain the conditions under which we are rejecting a hypothesis rightly. They are not at all concerned about the case where the null hypothesis is really false, but we do not reject it. They are not at all concerned about the case where the null hypothesis is really false, but we do not reject it.

18 Why does Power Matter? II In the case of Machine Learning, reducing the type I error means reducing the probability of us saying that there is a difference in the performance of the 2 classifiers, when in fact, there isn’t. In the case of Machine Learning, reducing the type I error means reducing the probability of us saying that there is a difference in the performance of the 2 classifiers, when in fact, there isn’t. Reducing the type II error means reducing the probability of us saying that there is no difference in the performance of the two classifiers, when, in fact, there is. Reducing the type II error means reducing the probability of us saying that there is no difference in the performance of the two classifiers, when, in fact, there is. Power matters because we do not want to discard a classifier that shouldn’t have been discarded. If a test does not have enough power, then this kind of situation can arise Power matters because we do not want to discard a classifier that shouldn’t have been discarded. If a test does not have enough power, then this kind of situation can arise

19 What is the Effect Size? The effect size measures how strong the relationship between two entities is. The effect size measures how strong the relationship between two entities is. In particular, if we consider a particular procedure, in addition to knowing how statistically significant the effect of that procedure is, we may want to know what the size of this effect is. In particular, if we consider a particular procedure, in addition to knowing how statistically significant the effect of that procedure is, we may want to know what the size of this effect is. There are different measures of effect sizes, including: There are different measures of effect sizes, including: Pearsons’ correlation coefficientPearsons’ correlation coefficient Odd’s ratioOdd’s ratio Cohen’s d statisticsCohen’s d statistics Cohen's d statistic is appropriate in the context of a t-test on means. It is thus the effect size measure we concentrate on here. Cohen's d statistic is appropriate in the context of a t-test on means. It is thus the effect size measure we concentrate on here. t-test [Wikipedia:

20 Cohen’s d-statistics Cohen’s d-statistic is expressed as: Cohen’s d-statistic is expressed as: d = (X1 – X2)/ sp d = (X1 – X2)/ sp Where sp 2, the pooled variance estimate is: Where sp 2, the pooled variance estimate is: sp 2 = ((n1-1)*s (n2-1)*s 2 2 ) sp 2 = ((n1-1)*s (n2-1)*s 2 2 ) (n1+n2-2) (n1+n2-2) and sp, its square root. and sp, its square root. [Note this is not exactly Cohen’s d measure which was expressed in terms of parameters. What we show above is an estimate of d]. [Note this is not exactly Cohen’s d measure which was expressed in terms of parameters. What we show above is an estimate of d].

21 Usefulness of the d statistic d is useful in that it standardizes the difference between the two means. We can talk about deviations in terms of proportions of standard deviation points that are more useful than actual differences that are domain dependent. d is useful in that it standardizes the difference between the two means. We can talk about deviations in terms of proportions of standard deviation points that are more useful than actual differences that are domain dependent. Cohen came up with a set of guidelines concerning d: Cohen came up with a set of guidelines concerning d: d=.2 has a small effect, but is probably meaningful;d=.2 has a small effect, but is probably meaningful; d=.5 is a medium effect that is noticeable.d=.5 is a medium effect that is noticeable. d=.8 shows a large effect size.d=.8 shows a large effect size.

22 Statistical Tests for Comparing Multiple Classifiers

23 What is the Analysis of Variance (ANOVA)? The analysis of variance is similar to the t-test in that it deals with differences between sample means. The analysis of variance is similar to the t-test in that it deals with differences between sample means. However, unlike the t-test that is restricted to the difference between two means, ANOVA allows us to assess whether the differences observed between any number of means are statistically significant. However, unlike the t-test that is restricted to the difference between two means, ANOVA allows us to assess whether the differences observed between any number of means are statistically significant. In addition, ANOVA allows us to deal with more than one independent variable. For example, we could choose, as two independent variables, 1) the learning algorithm and 2) the domain to which the learning algorithm is applied. In addition, ANOVA allows us to deal with more than one independent variable. For example, we could choose, as two independent variables, 1) the learning algorithm and 2) the domain to which the learning algorithm is applied.

24 Why is ANOVA useful? One may wonder why ANOVA is useful in the context of classifier evaluation. One may wonder why ANOVA is useful in the context of classifier evaluation. Very simply, if we want to answer the following common question : "How do various classifiers fare on different data sets?", then we have 2 independent variables: the learning algorithm and the domain, and a lot of results. Very simply, if we want to answer the following common question : "How do various classifiers fare on different data sets?", then we have 2 independent variables: the learning algorithm and the domain, and a lot of results. ANOVA makes it easy to tell whether the difference observed are indeed significant. ANOVA makes it easy to tell whether the difference observed are indeed significant.

25 Variations on the ANOVA Theme There are different implementations of ANOVA: There are different implementations of ANOVA: One-way ANOVA is a linear model trying to assess if the difference in the performance measures of classifiers over different datasets is statistically significant, but does not distinguish between the performance measures’ variability within-datasets and the performance measure variability between-datasets.One-way ANOVA is a linear model trying to assess if the difference in the performance measures of classifiers over different datasets is statistically significant, but does not distinguish between the performance measures’ variability within-datasets and the performance measure variability between-datasets. Two-way/Multi-way ANOVA can deal with more than one independent variable. For instance, two performance measures over different classifiers over various datasets.Two-way/Multi-way ANOVA can deal with more than one independent variable. For instance, two performance measures over different classifiers over various datasets. Then there are other related tests as well: Then there are other related tests as well: Friedman’s test, Post-hoc tests, Tukey Test, etc…Friedman’s test, Post-hoc tests, Tukey Test, etc…

26 How does One-Way ANOVA work? I It considers various groups of observations and sets as a hypothesis that all the means are equal. It considers various groups of observations and sets as a hypothesis that all the means are equal. The opposite hypothesis is that they are not all equal. The opposite hypothesis is that they are not all equal. The ANOVA model is as follows: The ANOVA model is as follows: x ij = μ i + e ij x ij = μ i + e ij where x ij is the j th observation from group i, μ i is the mean of group i and e ij is the noise that is normally distributed with mean 0 and common standard deviation σwhere x ij is the j th observation from group i, μ i is the mean of group i and e ij is the noise that is normally distributed with mean 0 and common standard deviation σ

27 How does One-Way ANOVA work? II ANOVA monitors three different kinds of variation in the data: ANOVA monitors three different kinds of variation in the data: Within-group variationWithin-group variation Between-group variationBetween-group variation Total variation = within-group variation + between-group variationTotal variation = within-group variation + between-group variation Each of the above variations are represented by sums of squares (SS) of the variations. Each of the above variations are represented by sums of squares (SS) of the variations. The statistics of interest in ANOVA is F, where The statistics of interest in ANOVA is F, where F = Between-group variation F = Between-group variation Within-group variation Within-group variation Larger F’s demonstrate greater statistical significance than smaller ones. Like for z and t, there are tables of significance levels associated with the F-ratio. Larger F’s demonstrate greater statistical significance than smaller ones. Like for z and t, there are tables of significance levels associated with the F-ratio.

28 The goal of ANOVA is to find out whether or not the differences in means between different groups are statistically significant. The goal of ANOVA is to find out whether or not the differences in means between different groups are statistically significant. To do so, ANOVA partitions the total variance into variance caused by random error (the within group SS) and variance caused by actual differences between means (the between-group SS). To do so, ANOVA partitions the total variance into variance caused by random error (the within group SS) and variance caused by actual differences between means (the between-group SS). If the null hypothesis holds, then the within- group SS should be about the same as the between-groups SS. If the null hypothesis holds, then the within- group SS should be about the same as the between-groups SS. We can compare these two SS using the F test, which checks whether the ratio of the two SSs is significantly greater than 1. We can compare these two SS using the F test, which checks whether the ratio of the two SSs is significantly greater than 1. How does One-Way ANOVA work? III

29 What is Multi-Way ANOVA? In One-Way ANOVA, we simply considered several groups. In One-Way ANOVA, we simply considered several groups. For example this could correspond the comparing the performance of 10 different classifiers on one domain. For example this could correspond the comparing the performance of 10 different classifiers on one domain. How about the case where we compare the performance of these same 10 different classifiers on 5 domains? How about the case where we compare the performance of these same 10 different classifiers on 5 domains? Two-Way ANOVA can help with that Two-Way ANOVA can help with that If we were to use an additional dimension such as the consideration of 6 different (but matched) threshold levels (as in AUC) for each classifier on the same 5 domains, then Three-way ANOVA could be used, and so on… If we were to use an additional dimension such as the consideration of 6 different (but matched) threshold levels (as in AUC) for each classifier on the same 5 domains, then Three-way ANOVA could be used, and so on…

30 How Does Multi-Way ANOVA work? In our example, the difference between One-Way ANOVA and Two-Way ANOVA can be illustrated as follows: In our example, the difference between One-Way ANOVA and Two-Way ANOVA can be illustrated as follows: in One-Way ANOVA, we would calculate the within-group SS by collapsing the results obtained on all the data sets together within each classifier results. in One-Way ANOVA, we would calculate the within-group SS by collapsing the results obtained on all the data sets together within each classifier results. In Two-Way ANOVA, with would calculate all the within- classifier, within-domain variances separately and group the results together.In Two-Way ANOVA, with would calculate all the within- classifier, within-domain variances separately and group the results together. As a result, the spooled within-group SS of two-way ANOVA would be smaller than the spooled within-group SS of one-way ANOVA.As a result, the spooled within-group SS of two-way ANOVA would be smaller than the spooled within-group SS of one-way ANOVA. Multi-way ANOVA is thus a more statistically powerful test than One-way ANOVA since we need fewer observations to find significant effects. Multi-way ANOVA is thus a more statistically powerful test than One-way ANOVA since we need fewer observations to find significant effects.