Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Classification Algorithms
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Decision Tree.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Tree Rong Jin. Determine Milage Per Gallon.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Induction of Decision Trees
Classification Continued
Three kinds of learning
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification II.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Sparse vs. Ensemble Approaches to Supervised Learning
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Ensemble Learning (2), Tree and Forest
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Machine Learning CS 165B Spring 2012
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Data Mining: Classification
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
CLassification TESTING Testing classifier accuracy
Mohammad Ali Keyvanrad
Weka Project assignment 3
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Learning from Observations Chapter 18 Through
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Classification Algorithms Decision trees Rule-based induction Neural networks Memory(Case) based reasoning Genetic algorithms Bayesian networks Basic Principle.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Machine Learning Inductive Learning and Decision Trees
Computational Intelligence: Methods and Applications
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Classification and Prediction
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Statistical Learning Dong Liu Dept. EEIS, USTC.
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Categorical data

Decision Tree Classification

Which feature to split on? Try to classify as many as possible with each split (This is a good split)

Which feature to split on? This is a bad split – no classifications obtained

Improving a good split

Decision Tree Algorithm Framework If you have positive and negative examples, use a splitting criterion to decide on best attribute to split Each child is a new decision tree – call the algorithm again with the parent feature removed If all data points in child node are same class, classify node as that class If no attributes left, classify by majority rule If no data points left, no such example seen: classify as majority class from entire dataset

Splitting Criterion ID3 Algorithm Some information theory Blackboard

Issues on training and test sets Do you know the correct classification for the test set? If you do, why not include it in the training set to get a better classifier? If you don’t, how can you measure the performance of your classifier?

Cross Validation Tenfold cross-validation Ten iterations Pull a different tenth of the dataset out each time to act as a test set Train on the remaining training set Measure performance on the test set Leave one out cross-validation Similar, but leave only one point out each time, then count correct vs. incorrect

Noise and Overfitting Can we always obtain a decision tree that is consistent with the data? Do we always want a decision tree that is consistent with the data? Example: Predict Carleton students who become CEOs Features: state/country of origin, GPA letter, major, age, high school GPA, junior high GPA,... What happens with only a few features? What happens with many features?

Overfitting Fitting a classifier “too closely” to the data finding patterns that aren’t really there Prevented in decision trees by pruning When building trees, stop recursion on irrelevant attributes Do statistical tests at node to determine if should continue or not

Examples of decision trees using Weka

Preventing overfitting by cross validation Another technique to prevent overfitting (is this valid)? Keep on recursing on decision tree as long as you continue to get improved accuracy on the test set

Review of how to decide on which attribute to split Dataset has two classes, P and N Relationship between information and randomness The more random a dataset is (points in P and N), the more information is provided by the message “Your point is in class P (or N).” The less random a dataset is, the less information is provided by the message “Your point is in class P (or N).” Information of message = Randomness of dataset =

How much randomness in split?

Which split is better? Patrons split Randomness = Type split Randomness = 1 Patrons has less randomness, so it is a better split Randomness is often referred to as entropy (similarities with thermodynamics)

Learning Logical Descriptions Hypothesis

Learning Logical Descriptions Goal is to learn a logical hypothesis consistent with the data Example of hypothesis consistent with X1: Is this consistent with X2? X2 is a false negative for hypothesis if hypothesis says negative, but should be positive X2 is a false positive for hypothesis if hypothesis says positive, but should be negative

Current-best-hypothesis search Start with an initial hypothesis and adjust it as you see examples Example: based on X1, arbitrarily start with X2 should be -, but H1 says +. H1 is not restrictive enough, specialize it: X3 should be +, but H2 says -. H2 is too restrictive, generalize:

Current-best-hypothesis search X4 should be +, H3 says -. Must generalize: What if you end up with an inconsistent hypothesis that you cannot modify to make work? Backup search and try a different route Tree on blackboard

Neural Networks Moving on to Chapter 19, neural networks