ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

COMP3740 CR32: Knowledge Management and Adaptive Systems
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Sample size estimation
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Evaluating Classifiers
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Classification Algorithms
Decision Tree Approach in Data Mining
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Lecture Notes for Chapter 4 (2) Introduction to Data Mining
18 LEARNING FROM OBSERVATIONS
Induction of Decision Trees
Evaluating Hypotheses
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
LEARNING DECISION TREES
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Chapter Sampling Distributions and Hypothesis Testing.
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Inference about a Mean Part II
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
Nonparametric or Distribution-free Tests
Sampling Theory Determining the distribution of Sample statistics.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
LEARNING DECISION TREES Yılmaz KILIÇASLAN. Definition - I Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Experimental Evaluation of Learning Algorithms Part 1.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Learning from Observations Chapter 18 Through
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
CpSc 810: Machine Learning Evaluation of Classifier.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Confidence intervals and hypothesis testing Petter Mostad
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Bab /57 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 2 Model Overfitting & Classifier Evaluation.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Data Mining and Decision Support
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Artificial Intelligence
Chapter 6 Classification and Prediction
Discrete Event Simulation - 4
Learning Chapter 18 and Parts of Chapter 20
Dr. Sampath Jayarathna Cal Poly Pomona
AP STATISTICS LESSON 10 – 4 (DAY 2)
Dr. Sampath Jayarathna Cal Poly Pomona
Presentation transcript:

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN

Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses that do a good job of predicting the classifications of unseen examples. Later on, we may see how prediction quality can be estimated in advance. For now, we will look at a methodology for assessing prediction quality after the fact.

Assessing the quality of a hypothesis A methodology for assessing the quality of a hypothesis: 1.Collect a large set of examples. 2.Divide it into two disjoint sets: a.the training set and b.the test set. 3.Apply the learning algorithm to the training set, generating a hypothesis h. 4.Measure the percentage of examples in the test set that are correctly classified by h. 5.Repeat steps 2 to 4 for different sizes of training sets and different randomly selected training sets of each size.

The learning curve for the algorithm Training set size A learning curve for the decision tree algorithm on 100 randomly generated examples in the restaurant domain. The graph summarizes 20 trials.

Evaluation Metrics: Accuracy, Recall and Precision Accuracy: A simple metric that computes the fraction of instances for which the correct result is returned. Recall: The fraction of correct instances among all instances that actually belong to the relevant subset. Precision: The fraction of correct instances among those that the algorithm believes to belong to the relevant subset.

Recall versus Precision (in information retrieval) Precision can be seen as a measure of exactness or fidelity, whereas recall is a measure of completeness. Recall and precision depend on the outcome (oval) of a query and its relation to all relevant documents (left) and the non- relevant documents (right). The more correct results (green), the better. Precision: horizontal arrow. Recall: diagonal arrow.

Recall and Precision in Information Retrieval In the field of information retrieval, precision is the fraction of retrieved documents that are relevant to the search: Recall in Information Retrieval is the fraction of the documents that are relevant to the query that are successfully retrieved:

Recall and Precision in Classification - I In the context of classification tasks, the terms true positives, true negatives, false positives and false negatives are used to compare the given classification of an item (the class label assigned to the item by a classifier) with the desired correct classification (the class the item actually belongs to). This is illustrated by the table below: Correct Result / Classification E1E2 Obtained Result / Classification E1tp (true positive) fp (false positive) E2fn (false negative) tn (true negative)

Recall and Precision in Classification - II Recall and precision are then defined as:

Peeking at the test data Obviously, the learning algorithm must not be allowed to "see" the test data before the learned hypothesis is tested on them. Unfortunately, it is all too easy to fall into the trap of peeking at the test data:  A learning algorithm can have various "knobs" that can be twiddled to tune its behavior-for example, various different criteria for choosing the next attribute in decision tree learning.  We generate hypotheses for various different settings of the knobs, measure their performance on the test set, and report the prediction performance of the best hypothesis.  Alas, peeking has occurred!

Noise and overfitting - I Unfortunately, this is far from the whole story. It is quite possible, and in fact likely, that even when vital information is missing, the decision tree learning algorithm will find a decision tree that is consistent with all the examples. This is because the algorithm can use the irrelevant attributes, if any, to make spurious distinctions among the examples.

Noise and overfitting - II Consider the problem of trying to predict the roll of a die. Suppose that experiments are carried out during an extended period of time with various dice and that the attributes describing each training example are as follows: 1. Day: the day on which the die was rolled (Mon, Tue, Wed, Thu). 2. Month: the month in which the die was rolled (Jan or Feb). 3. Color: the color of the die (Red or Blue). As long as no two examples have identical descriptions, Decision-Tree-Learning will find an exact hypothesis, which is in fact spurious. What we would like is that Decision-Tree-Learning return a single leaf node with probabilities close to 1/6 for each roll.

Decision tree pruning Decision tree pruning is a simple technique for the treatment of overfitting. Pruning works by preventing recursive splitting on attributes that are not clearly relevant, even when the data at that node in the tree are not uniformly classified. The information gain is a good clue to irrelevance. But, how large a gain should we require in order to split on a particular attribute?

Statistical significance test We can answer the preceding question by using a statistical significance test. Such a test begins by assuming that there is no underlying pattern (the so-called null hypothesis). Then the actual data are analyzed to calculate the extent to which they deviate from a perfect absence of pattern. If the degree of deviation is statistically unlikely (usually taken to mean a 5% probability or less), then that is considered to be good evidence for the presence of a significant pattern in the data. The probabilities are calculated from standard distributions of the amount of deviation one would expect to see in random sampling.

Deviation from the null hypothesis

X 2 (chi-squared) distribution A convenient measure of the total deviation is given by Under the null hypothesis, the value of D is distributed according to the X 2 (chi-squared) distribution with v - 1 degrees of freedom. The probability that the attribute is really irrelevant can be calculated with the help of standard X 2 tables or with statistical software.

Reference Russell, S. and P. Norvig (2003). Artificial Intelligence: A Modern Approach. Prentice Hall.