Decision Trees Chapter 18 From Data to Knowledge.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Machine Learning in Real World: C4.5
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
ICS320-Foundations of Adaptive and Learning Systems
Instance Based Learning IB1 and IBK Find in text Early approach.
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Trees.
Decision Trees Jeff Storey. Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Decision.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Decision Tree Algorithm
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
ICS 273A Intro Machine Learning
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Chapter 9 – Classification and Regression Trees
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Machine Learning Lecture 10 Decision Tree Learning 1.
Inferring Decision Trees Using the Minimum Description Length Principle J. R. Quinlan and R. L. Rivest Information and Computation 80, , 1989.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Longin Jan Latecki Temple University
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Decision Tree Learning
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
10. Decision Trees and Markov Chains for Gene Finding.
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Chapter 18 From Data to Knowledge
Classification Algorithms
Prepared by: Mahmoud Rafeek Al-Farra
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Techniques for Data Mining
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Decision trees One possible representation for hypotheses
A task of induction to find patterns
Decision Trees Jeff Storey.
Data Mining CSCI 307, Spring 2019 Lecture 15
A task of induction to find patterns
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

Decision Trees Chapter 18 From Data to Knowledge

Concerns Representational Bias: –Hyperrectangles – does it match domain Generalization Accuracy –Is the learned concept correct? Comprehensibility –Medical diagnosis Efficiency of Learning Efficiency of Learned Procedure

Simple Example: Weather Data: Four Features: windy, play, outlook: nominal Temperature: numeric outlook = sunny | humidity <= 75: yes (2.0) | humidity > 75: no (3.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0)

Dumb DT Algorithm Build tree: ( discrete features only) If all entries below node are homogenous, stop Else pick a feature at random, create a node for feature and form subtrees for each of the values of the feature. Recurse on each subtree. Will this work?

Properties of Dumb Algorithm Complexity –Homogeneity cost is O(DataSize) –Splitting is O(DataSize) –Times number of node in tree = bd on work Accuracy on training set –perfect Accuracy on test set –Not great. almost random

Many DT models Random selection worked – –If n-binary features then: –N * 2*(N-1)*2*(N-2).. = O(2^N*N!) UGH! Which trees are best? Occam’s razor: small ones (testable?) Exhaustive search impossible, so maybe Heuristic Search. But what heuristic? Goal: replace random with heuristic selection

Heuristic DT algorithm Entropy Set with mixed classes c1, c2,..ck Entropy(S) = - sum pi* lg(pi) where pi is probability of class ci. Sum weighted entropies of each subtrees, where weight is proportion of examples in the subtree. This defines a quality measure on features.

Heuristic score of a feature Say split on feature f yields: (4+, 4-) and ( 1+, 3-) quality of f = 8/12*E({4+,4-}+ 4/12*E({1+,3-}) = 8/13* 2 + 4/12* (- 1/4*log(1/4) -3/4*log(3/4)) Do this for every feature! J48 is roughly dumb + entropy heuristic

Shannon Entropy Entropy is the only function that: Is 0 when only 1 class present Is k if 2^k classes, equally present Is “additive” ie. – E(X,Y) = E(X)+E(Y) if X and Y are independent. Entropy sometimes called uncertainty and sometimes information. Uncertainty defined on RV where “draws” are from the set of classes.

Majority Function Suppose 2n boolean features. Class defined by n or more features are on. How big is the tree? At least 2n choose n leaves. Prototype Function: At least k of n are true is a common medical concept. Concepts that are prototypical do not match the representational bias of DTS.

Dts with real valued attributes Idea: convert to solved problem For each real valued attribute f with values v1, v2,… vn (sorted) and binary features: f1< (v1+v2)/2 f2 < (v2+v3/2) etc Other approaches possible. E.g. fi<any vj so no sorting needed

DTs ->Rules (Part) For each leaf, we make a rule by collecting the tests to the leaf. Number of rules = number of leaves Simplification: test each condition on a rule and see if dropping it harms accuracy. Can we go from Rules to DTs –Not easily. Hint: no root.

Summary Comprehensible if tree is not large. Effective if small number of features sufficient. Bias. Does multi-class problems naturally. Easily generates rules (expert system) –And measures of confidence (count) Can be extended for regression. Easy to implement and low complexity