Decision Trees with Numeric Tests

Slides:



Advertisements
Similar presentations
Conceptual Clustering
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Machine Learning in Real World: C4.5
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Using a Mixture of Probabilistic Decision Trees for Direct Prediction of Protein Functions Paper by Umar Syed and Golan Yona department of CS, Cornell.
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Ensemble Learning: An Introduction
Classification: Decision Trees
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Algorithms for Classification: Notes by Gregory Piatetsky.
Classification with Decision Trees II
Classification.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Decision Tree Learning
Chapter 7 Decision Tree.
Machine Learning Chapter 3. Decision Tree Learning
Slides for “Data Mining” by I. H. Witten and E. Frank.
Weka Project assignment 3
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Data Mining Schemes in Practice. 2 Implementation: Real machine learning schemes  Decision trees: from ID3 to C4.5  missing values, numeric attributes,
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
Today’s Topics Read Chapter 3 & Section 4.1 (Skim Section 3.6 and rest of Chapter 4), Sections 5.1, 5.2, 5.3, 5,7, 5.8, & 5.9 (skim rest of Chapter 5)
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Algorithms: Decision Trees 2 Outline  Introduction: Data Mining and Classification  Classification  Decision trees  Splitting attribute  Information.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Decision Tree Learning
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.1: Decision Trees Rodney Nielsen Many /
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Decision Trees with Numeric Tests. Industrial-strength algorithms  For an algorithm to be useful in a wide range of real- world applications it must:
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Decision Trees by Muhammad Owais Zahid
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Mining Practical Machine Learning Tools and Techniques.
Decision Trees an introduction.
Decision Trees with Numeric Tests
Algorithms for Classification:
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Roberto Battiti, Mauro Brunato
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning in Practice Lecture 23
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Decision Trees with Numeric Tests

Industrial-strength algorithms For an algorithm to be useful in a wide range of real-world applications it must: Permit numeric attributes Allow missing values Be robust in the presence of noise Basic schemes need to be extended to fulfill these requirements

C4.5 History ID3, CHAID – 1960s C4.5 innovations (Quinlan): permit numeric attributes deal sensibly with missing values pruning to deal with for noisy data C4.5 - one of best-known and most widely-used learning algorithms Last research version: C4.8, implemented in Weka as J4.8 (Java) Commercial successor: C5.0 (available from Rulequest)

Numeric attributes Standard method: binary splits E.g. temp < 45 Unlike nominal attributes, every attribute has many possible split points Solution is straightforward extension: Evaluate info gain (or other measure) for every possible split point of attribute Choose “best” split point Info gain for best split point is info gain for attribute Computationally more demanding

Example Split on temperature attribute: E.g. temperature  71.5: yes/4, no/2 temperature  71.5: yes/5, no/3 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits Place split points halfway between values Can evaluate all split points in one pass! 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No

Example Split on temperature attribute: infoGain 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No infoGain

Speeding up Entropy only needs to be evaluated between points of different classes (Fayyad & Irani, 1992) value class 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No X Potential optimal breakpoints Breakpoints between values of the same class cannot be optimal

Missing as a separate value Missing value denoted “?” in C4.5 Simple idea: treat missing as a separate value Q: When is this not appropriate? A: When values are missing due to different reasons Example 1: gene expression could be missing when it is very high or very low Example 2: field IsPregnant=missing for a male patient should be treated differently (no) than for a female patient of age 25 (unknown)

Missing values - advanced Split instances with missing values into pieces A piece going down a branch receives a weight proportional to the popularity of the branch weights sum to 1 Info gain works with fractional instances use sums of weights instead of counts During classification, split the instance into pieces in the same way Merge probability distribution using weights

Application: Computer Vision 1

Application: Computer Vision 2 feature extraction color (RGB, hue, saturation) edge, orientation texture XY coordinates 3D information

Application: Computer Vision 3 how grey? below horizon?

Application: Computer Vision 4 prediction

Application: Computer Vision 4 inverse perspective

Application: Computer Vision 5 inverse perspective path planning

Quiz 1 Q: If an attribute A has high info gain, does it always appear in a decision tree? A: No. If it is highly correlated with another attribute B, and infoGain(B) > infoGain(A), then B will appear in the tree, and further splitting on A will not be useful. + - A B

Quiz 2 Q: Can an attribute appear more than once in a decision tree? A: Yes. If a test is not at the root of the tree, it can appear in different branches. Q: And on a single path in the tree (from root to leaf)? Numeric attributes can appear more than once, but only with very different numeric conditions. + - A

Quiz 3 Q: If an attribute A has infoGain(A)=0, can it ever appear in a decision tree? A: Yes. All attributes may have zero info gain. info gain often changes when splitting on another attribute. + - A B the XOR problem: