Decision Trees Decision tree representation ID3 learning algorithm

Slides:



Advertisements
Similar presentations
Decision Tree Learning - ID3
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Machine Learning Group University College Dublin Decision Trees What is a Decision Tree? How to build a good one…
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Induction of Decision Trees
Decision Trees and Decision Tree Learning Philipp Kärger
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Prof. Giancarlo Mauri Lezione 3 - Learning Decision.
Decision Trees Decision tree representation Top Down Construction
1 Interacting with Data Materials from a Course in Princeton University -- Hu Yan.
Ch 3. Decision Tree Learning
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Decision tree learning
By Wang Rui State Key Lab of CAD&CG
Machine Learning Chapter 3. Decision Tree Learning
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Artificial Intelligence 7. Decision trees
Mohammad Ali Keyvanrad
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Decision-Tree Induction & Decision-Rule Induction
Decision Tree Learning
Data Mining-Knowledge Presentation—ID3 algorithm Prof. Sin-Min Lee Department of Computer Science.
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Tree Learning
Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Seminar on Machine Learning Rada Mihalcea Decision Trees Very short intro to Weka January 27, 2003.
Introduction to Machine Learning Dmitriy Dligach.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Friday’s Deliverable As a GROUP, you need to bring 2N+1 copies of your “initial submission” –This paper should be a complete version of your paper – something.
1 By: Ashmi Banerjee (125186) Suman Datta ( ) CSE- 3rd year.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Tree Learning
ICS320-Foundations of Adaptive and Learning Systems
Machine Learning Inductive Learning and Decision Trees
Decision Trees an introduction.
Università di Milano-Bicocca Laurea Magistrale in Informatica
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Decision trees (concept learnig)
Classification Algorithms
Decision Tree Learning
Decision Trees: Another Example
Artificial Intelligence
Lecture 3: Decision Tree Learning
Data Science Algorithms: The Basic Methods
Introduction to Machine Learning Algorithms in Bioinformatics: Part II
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees.
Data Classification for Data Mining
INTRODUCTION TO Machine Learning
Decision Tree.
Presentation transcript:

Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting

Introduction Goal: Categorization Given an event, predict is category. Examples: Who won a given ball game? How should we file a given email? What word sense was intended for a given occurrence of a word? Event = list of features. Examples: Ball game: Which players were on offense? Email: Who sent the email? Disambiguation: What was the preceding word?

Introduction Use a decision tree to predict categories for new events. Use training data to build the decision tree. New Events Decision Tree Category Training Events and Categories

Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity Each internal node tests an attribute High Normal Each branch corresponds to an attribute value node No Yes Each leaf node assigns a classification

Word Sense Disambiguation Given an occurrence of a word, decide which sense, or meaning, was intended. Example: "run" run1: move swiftly (I ran to the store.) run2: operate (I run a store.) run3: flow (Water runs from the spring.) run4: length of torn stitches (Her stockings had a run.) etc.

Word Sense Disambiguation Categories Use word sense labels (run1, run2, etc.) to name the possible categories. Features Features describe the context of the word we want to disambiguate. Possible features include: near(w): is the given word near an occurrence of word w? pos: the word’s part of speech left(w): is the word immediately preceded by the word w? etc.

Word Sense Disambiguation Example decision tree: (Note: Decision trees for WSD tend to be quite large) pos near(race) near(river) near(stocking) run4 run1 yes no noun verb run3

WSD: Sample Training Data Features Word Sense pos near(race) near(river) near(stockings) noun no run4 verb run1 yes run3 run2

Decision Tree for Conjunction Outlook=Sunny  Wind=Weak Outlook Sunny Overcast Rain Wind No No Strong Weak No Yes

Decision Tree for Disjunction Outlook=Sunny  Wind=Weak Outlook Sunny Overcast Rain Yes Wind Wind Strong Weak Strong Weak No Yes No Yes

Decision Tree for XOR Outlook=Sunny XOR Wind=Weak Outlook Sunny Overcast Rain Wind Wind Wind Strong Weak Strong Weak Strong Weak Yes No No Yes No Yes

Decision Tree decision trees represent disjunctions of conjunctions Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes (Outlook=Sunny  Humidity=Normal)  (Outlook=Overcast)  (Outlook=Rain  Wind=Weak)

When to consider Decision Trees Instances describable by attribute-value pairs Target function is discrete valued Disjunctive hypothesis may be required Possibly noisy training data Missing attribute values Examples: Medical diagnosis Credit risk analysis Object classification for robot manipulator (Tan 1993)

Top-Down Induction of Decision Trees ID3 A  the “best” decision attribute for next node Assign A as decision attribute for node For each value of A create new descendant Sort training examples to leaf node according to the attribute value of the branch If all training examples are perfectly classified (same value of target attribute) stop, else iterate over new leaf nodes.

Which attribute is best? G H [21+, 5-] [8+, 30-] [29+,35-] A2=? L M [18+, 33-] [11+, 2-] [29+,35-]

Entropy Entropy(S) = -p+ log2 p+ - p- log2 p- S is a sample of training examples p+ is the proportion of positive examples p- is the proportion of negative examples Entropy measures the impurity of S Entropy(S) = -p+ log2 p+ - p- log2 p-

Entropy -p+ log2 p+ - p- log2 p- Entropy(S)= expected number of bits needed to encode class (+ or -) of randomly drawn members of S (under the optimal, shortest length-code) Why? Information theory optimal length code assign –log2 p bits to messages having probability p. So the expected number of bits to encode (+ or -) of random member of S: -p+ log2 p+ - p- log2 p-

Information Gain (S=E) Gain(S,A): expected reduction in entropy due to sorting S on attribute A Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64 = 0.99 A1=? G H [21+, 5-] [8+, 30-] [29+,35-] A2=? True False [18+, 33-] [11+, 2-] [29+,35-]

Information Gain Entropy([18+,33-]) = 0.94 Entropy([11+,2-]) = 0.62 Gain(S,A2)=Entropy(S) -51/64*Entropy([18+,33-]) -13/64*Entropy([11+,2-]) =0.12 Entropy([21+,5-]) = 0.71 Entropy([8+,30-]) = 0.74 Gain(S,A1)=Entropy(S) -26/64*Entropy([21+,5-]) -38/64*Entropy([8+,30-]) =0.27 A1=? True False [21+, 5-] [8+, 30-] [29+,35-] A2=? True False [18+, 33-] [11+, 2-] [29+,35-]

Training Examples No Strong High Mild Rain D14 Yes Weak Normal Hot Overcast D13 D12 Sunny D11 D10 Cold D9 D8 Cool D7 D6 D5 D4 D3 D2 D1 Play Tennis Wind Humidity Temp. Outlook Day

Selecting the Next Attribute Humidity Wind High Normal Weak Strong [3+, 4-] [6+, 1-] [6+, 2-] [3+, 3-] E=0.592 E=0.811 E=1.0 E=0.985 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 Humidity provides greater info. gain than Wind, w.r.t target classification.

Selecting the Next Attribute Outlook Over cast Rain Sunny [3+, 2-] [2+, 3-] [4+, 0] E=0.971 E=0.971 E=0.0 Gain(S,Outlook) =0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971 =0.247

Selecting the Next Attribute The information gain values for the 4 attributes are: Gain(S,Outlook) =0.247 Gain(S,Humidity) =0.151 Gain(S,Wind) =0.048 Gain(S,Temperature) =0.029 where S denotes the collection of training examples

ID3 Algorithm [D1,D2,…,D14] [9+,5-] Outlook Sunny Overcast Rain Ssunny =[D1,D2,D8,D9,D11] [2+,3-] [D3,D7,D12,D13] [4+,0-] [D4,D5,D6,D10,D14] [3+,2-] Yes ? ? Gain(Ssunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970 Gain(Ssunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570 Gain(Ssunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019

ID3 Algorithm Outlook Sunny Overcast Rain Humidity Yes Wind [D3,D7,D12,D13] High Normal Strong Weak No Yes No Yes [D8,D9,D11] [mistake] [D6,D14] [D4,D5,D10] [D1,D2]