Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.

Slides:



Advertisements
Similar presentations
Learning from Observations Chapter 18 Section 1 – 3.
Advertisements

ICS 178 Intro Machine Learning
DECISION TREES. Decision trees  One possible representation for hypotheses.
Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Decision Tree Approach in Data Mining
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Cooperating Intelligent Systems
18 LEARNING FROM OBSERVATIONS
Learning From Observations
ICS 273A Intro Machine Learning
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Extensions of Decision Trees (Reading:
LEARNING DECISION TREES
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Learning decision trees
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
ICS 273A Intro Machine Learning
Learning: Introduction and Overview
Classification.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Induction of Decision Trees (IDT) CSE 335/435 Resources: – –
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
Mohammad Ali Keyvanrad
CpSc 810: Machine Learning Decision Tree Learning.
LEARNING DECISION TREES Yılmaz KILIÇASLAN. Definition - I Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm.
Learning from observations
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Learning from Observations Chapter 18 Section 1 – 3, 5-8 (presentation TBC)
Learning from Observations Chapter 18 Section 1 – 3.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
L6. Learning Systems in Java. Necessity of Learning No Prior Knowledge about all of the situations. Being able to adapt to changes in the environment.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Chapter 18 Section 1 – 3 Learning from Observations.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Learning From Observations Inductive Learning Decision Trees Ensembles.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.
Learning from Observations
Learning from Observations
Machine Learning Inductive Learning and Decision Trees
Introduce to machine learning
Artificial Intelligence
Presented By S.Yamuna AP/CSE
Data Science Algorithms: The Basic Methods
Machine Learning Chapter 3. Decision Tree Learning
CS 4700: Foundations of Artificial Intelligence
Machine Learning Chapter 3. Decision Tree Learning
Learning from Observations
Learning from Observations
Machine Learning: Decision Tree Learning
Presentation transcript:

Decision Trees

What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated value –Discrete - classification –Continuous - regression Structure: –Internal node - tests one attribute –Leaf node - output value (or linear function of attributes)

Example Attributes Alternate - Boolean Bar - Boolean Fri/Sat - Boolean Hungry - Boolean Patrons - {None, Some, Full} Price - {$, $$, $$$} Raining - Boolean Reservation - Boolean Type - {French, Italian, Thai, burger} WaitEstimate - {0-10min, 10-30, 30-60, >60}

Decision Tree for WillWait

Decision Tree Properties Propositional: attribute-value pairs Not useful for relationships between objects Universal: Space of possible decision trees includes every Boolean function Not efficient for all functions (e.g., parity, majority) Good for disjunctive functions

Decision Tree Learning From a training set, construct a tree

Methods for Constructing Decision Trees One path for each example in training set –Not robust, little predictive value Rather look for “small” tree –Applying Ockham’s Razor ID3: simple non-backtracking search through space of decision trees

Algorithm

Choose-Attribute Greedy algorithm - use attribute that gives the greatest immediate information gain

Information/Entropy Information provided by knowing an answer: –Possible answers v i with probability P(v i ) –I({P(v i )}) =  i -P(v i ) log 2 P(v i ) –I({0.5, 0.5}) = -0.5*(-1)-0.5*(-1) = 1 –I({0.01,0.99}) = -0.01*(-6.6) -0.99*(-0.014) = 0.08 Estimate probability from set of examples –Example: 6 yes, 6 no, estimate P(yes)=P(no)=0.5 1 bit required –After testing an attribute, take weighted sum of information required for subsets of the examples

After Testing “Type” 2/12*I(1/2,1/2) + 2/12*I(1/2,1/2) + 4/12*I(1/2,1/2) + 4/12*I(1/2,1/2) = 1

After Testing “Patrons” 2/12*I(0,1) + 4/12*I(1,0) + 6/12*I(2/6,4/6) = 0.46

Repeat adding tests Note: induced tree not the same as “true” function Best we can do given the examples

Overfitting Algorithm can find meaningless regularity E.g., use date & time to predict roll of die Approaches to fixing: –Stop the tree from growing too far –Allow the tree to overfit, then prune

 2 - Pruning Is a split irrelevant? –Information gain close to zero - how close? –Assume no underlying pattern (null hypothesis) –If statistical analysis shows <5% probability that null hypothesis is correct, then assume attribute is relevant Pruning provides noise tolerance

Rule Post-Pruning Convert tree into a set of rules (one per path) Prune preconditions (generalize) each rule if it improves accuracy Rules may now overlap Consider rules in order of accuracy when doing classification

Continuous-valued attributes Split based on some threshhold –X =97 Many possible split points

Multi-valued attributes Information gain of attribute with many values may be huge (e.g., Date) Rather than absolute info gain use ratio of gain to SplitInformation -

Continuous-valued outputs Leaf is a linear function of attributes, not value Regression Tree

Missing Attribute Values Attribute value in example may not be known –Assign most common value amongst comparable examples –Split into fractional examples based of observed distribution of values

Credits Diagrams from “Artificial Intelligence - A Modern Approach’’ by Russell and Norvig