COMP61011 : Machine Learning Decision Trees

Slides:



Advertisements
Similar presentations
Decision Trees Decision tree representation ID3 learning algorithm
Advertisements

Machine Learning III Decision Tree Induction
Classification Algorithms
Decision Tree Algorithm (C4.5)
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Tree Rong Jin. Determine Milage Per Gallon.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Induction of Decision Trees
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Machine Learning Lecture 10 Decision Tree Learning 1.
Machine Learning Queens College Lecture 2: Decision Trees.
Machine Learning II Decision Tree Induction CSE 573.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Decision Tree Learning
A C B. Will I play tennis today? Features – Outlook: {Sun, Overcast, Rain} – Temperature:{Hot, Mild, Cool} – Humidity:{High, Normal, Low} – Wind:{Strong,
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.
COM24111: Machine Learning Decision Trees Gavin Brown
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Tree Learning
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Naïve Bayes Classifier
Classification Algorithms
Decision Tree Learning
Teori Keputusan (Decision Theory)
Prepared by: Mahmoud Rafeek Al-Farra
COMP61011 : Machine Learning Probabilistic Models + Bayes’ Theorem
Data Science Algorithms: The Basic Methods
Decision Trees: Another Example
Artificial Intelligence
Mining Time-Changing Data Streams
COMP61011 Foundations of Machine Learning
Data Science Algorithms: The Basic Methods
Oliver Schulte Machine Learning 726
Decision Tree Saed Sayad 9/21/2018.
ID3 Algorithm.
Machine Learning Techniques for Data Mining
Roberto Battiti, Mauro Brunato
COMP61011 Foundations of Machine Learning
Privacy Preserving Data Mining
Machine Learning: Lecture 3
Naïve Bayes Classifier
Decision Trees Decision tree representation ID3 learning algorithm
Generative Models and Naïve Bayes
Lecture 05: Decision Trees
Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis
Decision Trees.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Decision Trees Decision tree representation ID3 learning algorithm
Artificial Intelligence 9. Perceptron
Generative Models and Naïve Bayes
A task of induction to find patterns
Data Mining CSCI 307, Spring 2019 Lecture 15
A task of induction to find patterns
Presentation transcript:

COMP61011 : Machine Learning Decision Trees

Recap: Decision Stumps 1 10 20 30 40 50 60 Q. Where is a good threshold?

From Decision Stumps, to Decision Trees New type of non-linear model Copes naturally with continuous and categorical data Fast to both train and test (highly parallelizable) Generates a set of interpretable rules

Recap: Decision Stumps 1 10 20 30 40 50 60 10.5 14.1 17.0 21.0 23.2 27.1 30.1 42.0 47.0 57.3 59.9 yes no The stump “splits” the dataset. Here we have 4 classification errors.

A modified stump 1 10 20 30 40 50 60 no yes 57.3 59.9 10.5 14.1 17.0 21.0 23.2 27.1 30.1 42.0 47.0 Here we have 3 classification errors.

Recursion… Just another dataset! Build a stump! 10.5 14.1 17.0 21.0 23.2 27.1 30.1 42.0 47.0 57.3 59.9 yes no Just another dataset! Build a stump! no yes no yes

Decision Trees = nested rules 10 20 30 40 50 60 yes no if x>25 then if x>50 then y=0 else y=1; endif else if x>16 then y=0 else y=1; endif endif

Here’s one I made earlier…..

‘pop’ is the number of training points that arrived at that node. ‘err’ is the fraction of those examples incorrectly classified.

Increasing the maximum depth (10) Decreasing the minimum number of examples required to make a split (5)

Overfitting…. testing error tree depth starting to overfit optimal depth about here 1 2 3 4 5 6 7 8 9 starting to overfit

Talk to your neighbours. A little break. Talk to your neighbours.

We’ve been assuming continuous variables! 10 5 0 5 10 10 20 30 40 50 60

The Tennis Problem

Outlook Humidity HIGH RAIN SUNNY OVERCAST NO Wind YES WEAK STRONG NORMAL

The Tennis Problem Note: 9 examples say “YES”, while 5 say “NO”.

Partitioning the data…

Thinking in Probabilities…

The “Information” in a feature More uncertainty = less information H(X) = 1

The “Information” in a feature Less uncertainty = more information H(X) = 0.72193

Entropy

Calculating Entropy

Information Gain, also known as “Mutual Information”

Why don’t we just measure the number of errors?! Errors = 0.25 … for both! Mutual information I(X1,Y) = 0.1887 I(X2;Y) = 0.3113

maximum information gain

Outlook Temp Humidity HIGH RAIN SUNNY OVERCAST MILD Wind YES WEAK STRONG NO NORMAL HOT COOL

Decision Trees, done. Now, grab some lunch. Then, get to the labs. Start to think about your projects…

Reminder : this is the halfway point! Project suggestions in handbook. Feedback opportunity next week. Final project deadline… 4pm Friday of week six (that’s the end of reading week – submit to SSO)

Team Projects deep study of any topic in ML that you want! 6 page report (maximum), written as a research paper. Minimum font size 11, margins 2cm. NO EXCEPTIONS. Does not have to be “academic”. You could build something – but you must thoroughly analyze and evaluate it according to the same principles. 50 marks, based on your 6 page report. Grades divided equally unless a case is made. Graded according to: Technical understanding/achievements Experimental Methodologies Depth of critical analysis Writing Style and exposition