Induction of Decision Trees. An Example Data Set and Decision Tree yes no yesno sunnyrainy no med yes smallbig outlook company sailboat.

Slides:



Advertisements
Similar presentations
Lecture 3: CBR Case-Base Indexing
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Decision Tree Learning - ID3
Classification Algorithms
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Decision Tree Algorithm (C4.5)
Decision Tree.
Classification Techniques: Decision Tree Learning
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Weather By Shavar and Gerlof. Significance Predicting the weather creates a huge benefit for a large amount of people. Harsh weather can cause a disaster.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Induction of Decision Trees Blaž Zupan, Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp.
Classification: Decision Trees
Induction of Decision Trees
Decision Trees and Decision Tree Learning Philipp Kärger
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Biological Data Mining (Predicting Post-synaptic Activity in Proteins)
Lecture 5 (Classification with Decision Trees)
Decision Trees an Introduction.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Induction of Decision Trees Blaž Zupan and Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Machine Learning Lecture 10 Decision Tree Learning 1.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Decision-Tree Induction & Decision-Rule Induction
Longin Jan Latecki Temple University
D ECISION T REES Jianping Fan Dept of Computer Science UNC-Charlotte.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining By Tan, Steinbach,
Decision Trees.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
SEEM Tutorial 1 Classification: Decision tree Siyuan Zhang,
Decision Trees by Muhammad Owais Zahid
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Induction of Decision Trees Blaž Zupan and Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp.
ICS320-Foundations of Adaptive and Learning Systems
Induction of Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Induction of Decision Trees
Classification Algorithms
Teori Keputusan (Decision Theory)
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Advanced Artificial Intelligence
Jianping Fan Dept of Computer Science UNC-Charlotte
Machine Learning: Lecture 3
Classification with Decision Trees
Lecture 05: Decision Trees
Induction of Decision Trees and ID3
Induction of Decision Trees
Decision Trees.
Dept. of Computer Science University of Liverpool
Data Mining(中国人民大学) Yang qiang(香港科技大学) Han jia wei(UIUC)
Junheng, Shengming, Yunsheng 10/19/2018
Data Mining CSCI 307, Spring 2019 Lecture 15
COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr
Presentation transcript:

Induction of Decision Trees

An Example Data Set and Decision Tree yes no yesno sunnyrainy no med yes smallbig outlook company sailboat

Classification yes no yesno sunnyrainy no med yes smallbig outlook company sailboat

Tree induced by Assistant Professional Interesting: Accuracy of this tree compared to medical specialists Breast Cancer Recurrence no rec 125 recurr 39 recurr 27 no_rec 10 Tumor Size no rec 30 recurr 18 Degree of Malig < 3 Involved Nodes Age no rec 4 recurr 1 no rec 32 recurr 0 >= 3 < 15>= 15< 3>= 3

Another Example

Simple Tree Outlook HumidityWindyP sunny overcast rainy PN highnormal PN yesno

Complicated Tree Temperature OutlookWindy cold moderate hot P sunnyrainy N yesno P overcast Outlook sunnyrainy P overcast Windy PN yesno Windy NP yesno Humidity P highnormal Windy PN yesno Humidity P highnormal Outlook N sunnyrainy P overcast null

Attribute Selection Criteria Main principle –Select attribute which partitions the learning set into subsets as “pure” as possible Various measures of purity –Information-theoretic –Gini index –X 2 –ReliefF –...

Information-Theoretic Approach To classify an object, a certain information is needed –I, information After we have learned the value of attribute A, we only need some remaining amount of information to classify the object –Ires, residual information Gain –Gain(A) = I – Ires(A) The most ‘informative’ attribute is the one that minimizes Ires, i.e., maximizes Gain

Entropy The average amount of information I needed to classify an object is given by the entropy measure For a two-class problem: entropy p(c1)

Triangles and Squares

Data Set: A set of classified objects

Entropy 5 triangles 9 squares class probabilities entropy......

Entropy reduction by data set partitioning Color? red yellow green

Entropija vrednosti atributa Color? red yellow green

Information Gain Color? red yellow green

Information Gain of The Attribute Attributes –Gain(Color) = –Gain(Outline) = –Gain(Dot) = Heuristics: attribute with the highest gain is chosen This heuristics is local (local minimization of impurity)

Color? red yellow green Gain(Outline) = – 0 = bits Gain(Dot) = – = bits

Color? red yellow green.. Outline? dashed solid Gain(Outline) = – = bits Gain(Dot) = – 0 = bits

Color? red yellow green.. dashed solid Dot? no yes.. Outline?

Decision Tree Color DotOutlinesquare red yellow green squaretriangle yesno squaretriangle dashedsolid......

Gini Index Another sensible measure of impurity (i and j are classes) After applying attribute A, the resulting Gini index is Gini can be interpreted as expected error rate

Gini Index......

Gini Index for Color Color? red yellow green

Gain of Gini Index

Three Impurity Measures These impurity measures assess the effect of a single attribute Criterion “most informative” that they define is local (and “myopic”) It does not reliably predict the effect of several attributes applied jointly