Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Machine Learning II Decision Tree Induction CSE 473.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Induction of Decision Trees
Chapter 2 - Concept learning
1 Interacting with Data Materials from a Course in Princeton University -- Hu Yan.
By Wang Rui State Key Lab of CAD&CG
Machine Learning Chapter 3. Decision Tree Learning
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
Artificial Intelligence 7. Decision trees
Mohammad Ali Keyvanrad
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Decision-Tree Induction & Decision-Rule Induction
1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
Machine Learning II Decision Tree Induction CSE 573.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Machine Learning Concept Learning General-to Specific Ordering
1 Decision Trees ExampleSkyAirTempHumidityWindWaterForecastEnjoySport 1SunnyWarmNormalStrongWarmSameYes 2SunnyWarmHighStrongWarmSameYes 3RainyColdHighStrongWarmChangeNo.
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Data Mining and Decision Support
COM24111: Machine Learning Decision Trees Gavin Brown
Computational Learning Theory Part 1: Preliminaries 1.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Machine Learning Inductive Learning and Decision Trees
Decision Tree Learning
Decision Tree Learning
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CSE543: Machine Learning Lecture 2: August 6, 2014
CS 9633 Machine Learning Concept Learning
Introduction to Machine Learning Algorithms in Bioinformatics: Part II
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Concept Learning.
Machine Learning Chapter 2
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

Decision Trees

DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast Target function y, e.g.: –EnjoySport X  Y = {0,1} (example of concept learning) –WhichSport X  Y = {Tennis, Soccer, Volleyball} –InchesOfRain X  Y = [0, 10] GIVEN: Training examples D –positive and negative examples of the target function: FIND: A hypothesis h such that h(x) approximates y(x). General Learning Task

Hypothesis Spaces Hypothesis space H is a subset of all y: X  Y e.g.: –MC2, conjunction of literals: –Decision trees, any function –2-level decision trees (any function of two attributes, some of three) Candidate-Elimination Algorithm: –Search H for a hypothesis that matches the training data Exploits general-to-specific ordering of hypotheses Decision Trees –Incrementally grow tree by splitting training examples on attribute values –Can be thought of as looping for i = 1,...,n: Search H i = i-level trees for hypothesis h that matches data

Decision Trees represent disjunctions of conjunctions = (Sunny ^ Normal) v Overcast v (Rain ^ Weak)

Decision Trees vs. MC2 Yes Yes No MC2 can’t represent (Sunny v Cloudy) MC2 hypotheses must constrain to a single attribute value if at all Vs. Decision Trees:

Learning Parity with D-Trees  How to solve 2-bit parity:  Two step look-ahead  Split on pairs of attributes at once  For k attributes, why not just do k-step look ahead? Or split on k attribute values? =>Parity functions are the “victims” of the decision tree’s inductive bias.

= I(Y; xi)

Overfitting is due to “noise” Sources of noise: –Erroneous training data concept variable incorrect (annotator error) Attributes mis-measured –Much more significant: Irrelevant attributes Target function not deterministic in attributes

Irrelevant attributes If many attributes are noisy, information gains can be spurious, e.g.: 20 noisy attributes 10 training examples =>Expected # of depth-3 trees that split the training data perfectly using only noisy attributes: 13.4 Potential solution: statistical significance tests (e.g., chi-square)

Non-determinism In general: –We can’t measure all the variables we need to do perfect prediction. –=> Target function is not uniquely determined by attribute values

Non-determinism: Example HumidityEnjoySport Decent hypothesis: Humidity > 0.70  No Otherwise  Yes Overfit hypothesis: Humidity > 0.89  No Humidity > 0.80 ^ Humidity <= 0.89  Yes Humidity > 0.70 ^ Humidity <= 0.80  No Humidity <= 0.70  Yes

Rule #2 of Machine Learning The best hypothesis almost never achieves 100% accuracy on the training data. (Rule #1 was: you can’t learn anything without inductive bias)

Hypothesis Space comparisons Hypothesis Space H# of Semantically distinct h Rote Learning MC k 1-level decision tree k n-level decision tree Task: concept learning with k binary attributes

Decision Trees – Strengths Very Popular Technique Fast Useful when –Instances are attribute-value pairs –Target Function is discrete –Concepts are likely to be disjunctions –Attributes may be noisy

Decision Trees – Weaknesses Less useful for continuous outputs Can have difficulty with continuous input features as well… –E.g., what if your target concept is a circle in the x1, x2 plane? Hard to represent with decision trees… Very simple with instance-based methods we’ll discuss later…

decision tree learning algorithm; along the lines of ID3