ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
18 LEARNING FROM OBSERVATIONS
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
More on Decision Trees. Numerical attributes Tests in nodes can be of the form x j > constant Divides the space into rectangles.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Ensemble Learning: An Introduction
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Evaluating Hypotheses
Machine Learning Reading: Chapter Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically.
Three kinds of learning
LEARNING DECISION TREES
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Decision trees and empirical methodology Sec 4.3,
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Ensemble Learning (2), Tree and Forest
Ensembles of Classifiers Evgueni Smirnov
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
LEARNING DECISION TREES Yılmaz KILIÇASLAN. Definition - I Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm.
Experimental Evaluation of Learning Algorithms Part 1.
Learning from observations
Learning from Observations Chapter 18 Through
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Decision Tree Learning R&N: Chap. 18, Sect. 18.1–3.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CLASSIFICATION: Ensemble Methods
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Ensemble Methods in Machine Learning
Rotem Golan Department of Computer Science Ben-Gurion University of the Negev, Israel.
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Validation methods.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Issues in Decision-Tree Learning Avoiding overfitting through pruning
CSCI N317 Computation for Scientific Applications Unit Weka
Model Combination.
Presentation transcript:

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN

Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses that do a good job of predicting the classifications of unseen examples. Later on, we may see how prediction quality can be estimated in advance. For now, we will look at a methodology for assessing prediction quality after the fact.

Assessing the quality of a hypothesis A methodology for assessing the quality of a hypothesis: 1.Divide it into two disjoint sets: the training set and the test set. 2.Apply the learning algorithm to the training set, generating a hypothesis h. 3.Measure the percentage of examples in the test set that are correctly classified by h. 4.Collect a large set of examples. 5.Repeat steps 2 to 4 for different sizes of training sets and different randomly selected training sets of each size.

The learning curve for the algorithm Training set size A learning curve for the decision tree algorithm on 100 randomly generated examples in the restaurant domain. The graph summarizes 20 trials.

Peeking at the test data Obviously, the learning algorithm must not be allowed to "see" the test data before the learned hypothesis is tested on them. Unfortunately, it is all too easy to fall into the trap of peeking at the test data:  A learning algorithm can have various "knobs" that can be twiddled to tune its behavior-for example, various different criteria for choosing the next attribute in decision tree learning.  We generate hypotheses for various different settings of the knobs, measure their performance on the test set, and report the prediction performance of the best hypothesis.  Alas, peeking has occurred!

Noise and overfitting - I Unfortunately, this is far from the whole story. It is quite possible, and in fact likely, that even when vital information is missing, the decision tree learning algorithm will find a decision tree that is consistent with all the examples. This is because the algorithm can use the irrelevant attributes, if any, to make spurious distinctions among the examples.

Noise and overfitting - II Consider the problem of trying to predict the roll of a die. Suppose that experiments are carried out during an extended period of time with various dice and that the attributes describing each training example are as follows: 1. Day: the day on which the die was rolled (Mon, Tue, Wed, Thu). 2. Month: the month in which the die was rolled (Jan or Feb). 3. Color: the color of the die (Red or Blue). As long as no two examples have identical descriptions, DECISION-TREE-LEARNING will find an exact hypothesis, which is in fact spurious. What we would like is that DECISION-TREE-LEARNING return a single leaf node with probabilities close to 1/6 for each roll.

Decision tree pruning Decision tree pruning is a simple technique for the treatment of overfitting. Pruning works by preventing recursive splitting on attributes that are not clearly relevant, even when the data at that node in the tree are not uniformly classified. The information gain is a good clue to irrelevance.

Cross-validation - I Cross-validation is another technique that reduces overfitting. It can be applied to any learning algorithm, not just decision tree learning. The basic idea is to estimate how well each hypothesis will predict unseen data:  set aside some fraction of the known data and  use it to test the prediction performance of a hypothesis induced from the remaining data.

Cross-validation - II K-fold cross-validation means that you run k experiments, each time setting aside a different 1/k of the data to test on, and average the results. Popular values for k are 5 and 10. The extreme is k = n, also known as leave-one-out cross-validation. Cross-validation can be used in conjunction with any tree-construction method (including pruning) in order to select a tree with good prediction performance. To avoid peeking, we must then measure this performance with a new test set.

Ensemble learning - I The idea of ensemble learning methods is to select a whole collection, or ensemble, of hypotheses from the hypothesis space and combine their predictions. The motivation for ensemble learning is to minimize the risk of misclassification. Another way to think about the ensemble idea is as a generic way of enlarging the hypothesis space. (See next page)

Ensemble learning - II Illustration of the increased expressive power obtained by ensemble learning.

Ensemble learning - III The most widely used ensemble method is called boosting. To understand how it works, we need first to explain the idea of a weighted training set. In such a training set, each example has an associated weight w j > 0. The higher the weight of an example, the higher is the importance attached to it during the learning of a hypothesis.

Ensemble learning - IV Boosting starts with w j = 1 for all the examples (i.e., a normal training set). From this set, it generates the first hypothesis, h 1. This hypothesis wall classify some of the training examples correctly and some incorrectly. We would like the next hypothesis to do better on the misclassified examples, so we increase their weights while decreasing the weights of the correctly classified examples. From this new weighted training set, we generate hypothesis h 2. The process continues in this way until we have generated M hypotheses, where M is an input to the boosting algorithm. The final ensemble hypothesis is a weighted-majority combination of all the M hypotheses, each weighted according to how well it performed on the training set.

Ensemble learning - V fi How the boosting algorithm works. (See Figure 18.9)