Machine Learning: Decision Tree Learning

Slides:



Advertisements
Similar presentations
Learning from Observations
Advertisements

Induction of Decision Trees (IDT)
Learning from Observations Chapter 18 Section 1 – 3.
ICS 178 Intro Machine Learning
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Tree Algorithm (C4.5)
Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.
Decision making in episodic environments
Machine Learning Group University College Dublin Decision Trees What is a Decision Tree? How to build a good one…
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
Cooperating Intelligent Systems
Learning From Observations
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Induction of Decision Trees
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18.
ICS 273A Intro Machine Learning
CSCE 580 Artificial Intelligence Ch.18: Learning from Observations
LEARNING DECISION TREES
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Learning decision trees
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
ICS 273A Intro Machine Learning
Learning: Introduction and Overview
Induction of Decision Trees (IDT) CSE 335/435 Resources: – –
Machine learning Image source:
Machine learning Image source:
Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
LEARNING DECISION TREES Yılmaz KILIÇASLAN. Definition - I Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm.
Learning from observations
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Lecture 7 : Intro to Machine Learning Rachel Greenstadt & Mike Brennan November 10, 2008.
Learning from Observations Chapter 18 Section 1 – 3, 5-8 (presentation TBC)
Learning from Observations Chapter 18 Section 1 – 3.
L6. Learning Systems in Java. Necessity of Learning No Prior Knowledge about all of the situations. Being able to adapt to changes in the environment.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
Machine learning Image source:
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Chapter 18 Section 1 – 3 Learning from Observations.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
Learning from Observations
Learning from Observations
Learning from Observations
Machine learning Image source:
Machine Learning Inductive Learning and Decision Trees
Università di Milano-Bicocca Laurea Magistrale in Informatica
Announcements (1) Background reading for next week is posted.
Introduce to machine learning
Learning from Data. Learning from Data Learning sensors actuators environment agent ? As an agent interacts with the world, it should learn about its.
Classification Algorithms
CSE543: Machine Learning Lecture 2: August 6, 2014
Decision Trees: Another Example
Artificial Intelligence
Presented By S.Yamuna AP/CSE
Bayes Net Learning: Bayesian Approaches
Data Science Algorithms: The Basic Methods
Oliver Schulte Machine Learning 726
Decision making in episodic environments
Machine Learning: Lecture 3
Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis
Learning from Observations
Learning from Observations
Decision trees One possible representation for hypotheses
Decision Trees - Intermediate
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Machine Learning: Decision Tree Learning CMPT 420 / CMPG 720

Learning from Examples An agent is learning if it improves its performance on future tasks after making observations about the world. One class of learning problem: from a collection of input-output pairs, learn a function that predicts the output for new inputs. e.g., weather forecast, games

Why learning? The designer cannot anticipate all changes A program designed to predict tomorrow’s stock market prices must learn to adapt when conditions change. Programmers sometimes have no idea how to program a solution recognizing faces

Types of Learning Supervised learning Unsupervised learning example input-output pairs and learns a function e.g., spam detector Unsupervised learning correct answers not given e.g., clustering

Supervised Learning Learning a function/rule from specific input-output pairs is also called inductive learning. Given a training set of N example pairs: (x1,y1), (x2,y2), ..., (xN, yN) target unknown function y = f(x) Problem: find a hypothesis h such that h ≈ f h is generalized well if it correctly predicts the value of y for novel examples (test set).

Supervised Learning When the output y is one of the finite set of values (sunny, cloudy, rainy), the learning problem is called classification. Boolean or binary classification e.g., spam detector, male/female face When y is a number (tomorrow’s temperature), the problem is called regression.

Inductive learning method The points are in the (x,y) plane, where y = f(x). We approximate f with h. Construct/adjust h to agree with f on training set

Inductive learning method Construct/adjust h to agree with f on training set E.g., linear fitting:

Inductive learning method Construct/adjust h to agree with f on training set E.g., curve fitting:

Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:

Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: How to choose from among multiple consistent hypotheses?

Inductive learning method Ockham’s razor: prefer the simplest hypothesis consistent with data (14th-century English philosopher William of Ockham)

Learning decision trees One of the simplest and yet most successful forms of machine learning. A decision tree represents a function that takes as input a vector of attribute values and returns a “decision” – a single output.

Learning decision trees Problem: decide whether to wait for a table at a restaurant, based on the following attributes: Alternate: is there an alternative restaurant nearby? Bar: is there a comfortable bar area to wait in? Fri/Sat: is today Friday or Saturday? Hungry: are we hungry? Patrons: number of people in the restaurant (None, Some, Full) Price: price range ($, $$, $$$) Raining: is it raining outside? Reservation: have we made a reservation? Type: kind of restaurant (French, Italian, Thai, Burger) WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

Attribute-based representations Examples described by attribute values A training set of 12 examples E.g., situations where I will/won't wait for a table: Classification of examples is positive (T) or negative (F)

Decision tree

Decision tree no Price and Type

Goal: to find the most compact decision tree

Constructing the Decision Tree Recursion: divides the problem up into smaller subproblems that can be solved recursively.

Constructing the Decision Tree Recursion: Test the most important attribute first, divides the problem up into smaller subproblems that can be solved recursively.

Choosing a good attribute Which is a better choice?

Attribute-based representations

Attribute-based representations

Attribute-based representations

Choosing the Best Attribute: Information theory (Shannon and Weaver 49) Entropy: a measure of uncertainty of a random variable A coin that always comes up heads --> 0 A flip of a fair coin (Heads or tails) --> 1(bit) The roll of a fair four-sided die --> 2(bit)

Formula for Entropy Suppose we have a collection of 10 examples, 5 positive, 5 negative: H(1/2,1/2) = -1/2log21/2 -1/2log21/2 = 1 bit Suppose we have a collection of 100 examples, 1 positive and 99 negative: H(1/100,99/100) = -.01log2.01 -.99log2.99 = .08 bits

Choosing a good attribute Which is a better choice?

Information gain Information gain (from attribute test) = difference between the original information requirement and new requirement Choose the attribute with the largest IG

Example contd. Decision tree learned from the 12 examples:

Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 D10 D11 D12 D13 D14