Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.

Slides:

Advertisements

Similar presentations

Learning from Observations

Advertisements

Induction of Decision Trees (IDT)

Learning from Observations Chapter 18 Section 1 – 3.

ICS 178 Intro Machine Learning

DECISION TREES. Decision trees  One possible representation for hypotheses.

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

Decision Tree Example MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.

Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.

Decision making in episodic environments

Machine Learning Group University College Dublin Decision Trees What is a Decision Tree? How to build a good one…

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.

Cooperating Intelligent Systems

Decision Tree Learning

18 LEARNING FROM OBSERVATIONS

Learning From Observations

CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.

Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.

Induction of Decision Trees

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18.

ICS 273A Intro Machine Learning

CSCE 580 Artificial Intelligence Ch.18: Learning from Observations

LEARNING DECISION TREES

Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.

Learning decision trees

Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.

ICS 273A Intro Machine Learning

Learning: Introduction and Overview

Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.

CS 4700: Foundations of Artificial Intelligence

Induction of Decision Trees (IDT) CSE 335/435 Resources: – –

Machine learning Image source:

Machine learning Image source:

Introduction to Machine Learning Reading for today: R&N Next lecture: R&N ,

Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.

CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.

Inductive learning Simplest form: learn a function from examples

Artificial Intelligence 7. Decision trees

Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.

LEARNING DECISION TREES Yılmaz KILIÇASLAN. Definition - I Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm.

Learning from observations

Learning from Observations Chapter 18 Through

CHAPTER 18 SECTION 1 – 3 Learning from Observations.

Lecture 7 : Intro to Machine Learning Rachel Greenstadt & Mike Brennan November 10, 2008.

Learning from Observations Chapter 18 Section 1 – 3, 5-8 (presentation TBC)

Learning from Observations Chapter 18 Section 1 – 3.

Homework Submission deadline: next class(May 28st) Handwritten report Please answer the following question on probability. –Suppose one is interested in.

L6. Learning Systems in Java. Necessity of Learning No Prior Knowledge about all of the situations. Being able to adapt to changes in the environment.

Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.

CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,

Chapter 18 Section 1 – 3 Learning from Observations.

Learning From Observations Inductive Learning Decision Trees Ensembles.

Anifuddin Azis LEARNING. Why is learning important? So far we have assumed we know how the world works Rules of queens puzzle Rules of chess Knowledge.

Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.

Learning from Observations

Learning from Observations

Learning from Observations

Machine Learning Inductive Learning and Decision Trees

Università di Milano-Bicocca Laurea Magistrale in Informatica

Announcements (1) Background reading for next week is posted.

Introduction to Machine Learning

Introduce to machine learning

Presented By S.Yamuna AP/CSE

Decision making in episodic environments

Learning from Observations

Learning from Observations

Decision trees One possible representation for hypotheses

Machine Learning: Decision Tree Learning

Decision Trees - Intermediate

Presentation transcript:

Decision Tree Learning CMPT 463

Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and project.zip Final Exam Review o Monday, May 9

Learning from Examples An agent is learning if it improves its performance on future tasks after making observations about the world. One class of learning problem: o from a collection of input-output pairs, learn a function that predicts the output for new inputs. o e.g., weather forecast, Google image

Why learning? The designer cannot anticipate all changes o A program designed to predict tomorrow’s stock market prices must learn to adapt when conditions change. Programmers sometimes have no idea how to program a solution o recognizing faces

Types of Learning Supervised learning o example input-output pairs and learns a function o e.g., spam detector Unsupervised learning o correct answers not given o e.g., clustering Reinforcement learning o rewards or punishments o taxi agent: lack of a tip

Supervised Learning Learning a function/rule from specific input- output pairs is also called inductive learning. Given a training set of N example pairs: o (x1,y1), (x2,y2),..., (xN, yN) o target unknown function y = f(x) Problem: find a hypothesis h such that h ≈ f h is generalized well if it correctly predicts the value of y for novel examples (test set).

Supervised Learning When the output y is one of the finite set of values (sunny, cloudy, rainy), the learning problem is called classification. o Boolean or binary classification o e.g., spam detector, male/female face When y is a number (tomorrow’s temperature), the problem is called regression.

Inductive learning method The points are in the (x,y) plane, where y = f(x). We approximate f with h selected from a hypothesis space H. Construct/adjust h to agree with f on training set

Inductive learning method Construct/adjust h to agree with f on training set E.g., linear fitting:

Inductive learning method Construct/adjust h to agree with f on training set E.g., curve fitting:

Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:

Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: How to choose from among multiple consistent hypotheses?

Inductive learning method Ockham’s razor: prefer the simplest hypothesis consistent with data (14 th -century English philosopher William of Ockham)

Learning decision trees One of the simplest and yet most successful forms of machine learning. A decision tree represents a function that takes as input a vector of attribute values and returns a “decision” – a single output. o discrete input, Boolean classification

Learning decision trees Problem: decide whether to wait for a table at a restaurant, based on the following attributes: 1.Alternate : is there an alternative restaurant nearby? 2.Bar : is there a comfortable bar area to wait in? 3.Fri/Sat : is today Friday or Saturday? 4.Hungry : are we hungry? 5.Patrons : number of people in the restaurant (None, Some, Full) 6.Price : price range ($, $$, $$$) 7.Raining : is it raining outside? 8.Reservation : have we made a reservation? 9.Type : kind of restaurant (French, Italian, Thai, Burger) 10. WaitEstimate : estimated waiting time (0-10, 10-30, 30-60, >60)

Attribute-based representations Examples described by attribute values A training set of 12 examples E.g., situations where I will/won't wait for a table: Classification of examples is positive (T) or negative (F)

Decision trees One possible representation for hypotheses (no Price and Type) “true” tree for deciding whether to wait:

Expressiveness Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row → path to leaf: Trivially, there is a consistent decision tree for any training set with one path to leaf for each example.

Goal: to find the most compact decision trees

21 Constructing the Decision Tree Goal: Find the smallest decision tree consistent with the examples Divide-and-conquer: o Test the most important attribute first, divides the problem up into smaller subproblems that can be solved recursively. o “Most important”: attribute that best splits examples

Attribute-based representations

Constructing the Decision Tree Form tree with root = best attribute For each value v i (or range) of best attribute Selects those examples with best=v i Construct subtree i by recursively calling decision tree with subset of examples, all attributes except best Add a branch to tree with label=v i and subtree=subtree i

Decision tree learning Aim: find a small tree consistent with the training examples Idea: (recursively) choose "most significant" attribute as root of (sub)tree

Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" Which is a better choice?

28 Choosing the Best Attribute: Binary Classification Want a formal measure that returns a maximum value when attribute makes a perfect split and minimum when it makes no distinction Information theory (Shannon and Weaver 49) o Entropy: a measure of uncertainty of a random variable A coin that always comes up heads --> 0 A flip of a fair coin (Heads or tails) --> 1(bit) The roll of a fair four-sided die --> 2(bit) o Information gain: the expected reduction in entropy caused by partitioning the examples according to this attribute

29 Formula for Entropy Examples: Suppose we have a collection of 10 examples, 5 positive, 5 negative: H(1/2,1/2) = -1/2log 2 1/2 -1/2log 2 1/2 = 1 bit Suppose we have a collection of 100 examples, 1 positive and 99 negative: H(1/100,99/100) = -.01log log 2.99 =.08 bits

Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" Which is a better choice?

Information gain Information gain (from attribute test) = difference between the original information requirement and new requirement

Information gain Information gain (from attribute test) = difference between the original information requirement and new requirement Information Gain (IG) or reduction in entropy from the attribute test: Choose the attribute with the largest IG

Information gain For the training set, p = n = 6, H(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root

Example contd. Decision tree learned from the 12 examples: Substantially simpler than “true”

DayOutlookTemperatureHumidityWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolNormalWeakYes D6RainCoolNormalStrongNo D7OvercastCoolNormalStrongYes D8SunnyMildHighWeakNo D9SunnyCoolNormalWeakYes D10RainMildNormalWeakYes D11SunnyMildNormalStrongYes D12OvercastMildHighStrongYes D13OvercastHotNormalWeakYes D14RainMildHighStrongNo