Induction of Decision Trees and ID3

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Decision Tree Learning - ID3
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Induction of Decision Trees Blaž Zupan, Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp.
Classification: Decision Trees
Induction of Decision Trees
Decision Trees and Decision Tree Learning Philipp Kärger
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees an Introduction.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Learning Chapter 18 and Parts of Chapter 20
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Induction of Decision Trees Blaž Zupan and Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Induction of Decision Trees. An Example Data Set and Decision Tree yes no yesno sunnyrainy no med yes smallbig outlook company sailboat.
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Machine Learning Lecture 10 Decision Tree Learning 1.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
D ECISION T REES Jianping Fan Dept of Computer Science UNC-Charlotte.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Tree Learning
Presentation on Decision trees Presented to: Sir Marooof Pasha.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Decision Trees by Muhammad Owais Zahid
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
Induction of Decision Trees Blaž Zupan and Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp.
Induction of Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
k-Nearest neighbors and decision tree
Induction of Decision Trees
Classification Algorithms
Decision Trees.
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Advanced Artificial Intelligence
ID3 Algorithm.
Jianping Fan Dept of Computer Science UNC-Charlotte
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Classification with Decision Trees
Machine Learning Chapter 3. Decision Tree Learning
Induction of Decision Trees
Dept. of Computer Science University of Liverpool
Learning Chapter 18 and Parts of Chapter 20
Decision Trees Jeff Storey.
Data Mining CSCI 307, Spring 2019 Lecture 15
Presentation transcript:

Induction of Decision Trees and ID3

Agenda Example of Decision Trees What is ID3? Entropy Calculating Entropy with Code Information Gain Advantages and Disadvantages Examples

مثالی از یک درخت تصمیم محل درد شکم گلو سینه هیچکدام آپاندیس تب سکته سرفه بله خیر بله خیر باکتری ویروسی تب هیچکدام بله خیر هر برگ این درخت یک کلاس یا دسته را مشخص میکند. یک مثال آموزشی در درخت تصمیم به این صورت دسته بندی میشود: از ریشه درخت شروع میشود. ویژگی معین شده توسط این گره تست می گردد. و سپس منطبق با ارزش ویژگی در مثال داده شده در طول شاخه ها حرکت رو به پائین انجام می دهد. این فرآیند برای گره های زیردرختان گره جدید تکرار می شود. آنفولانزا سرماخوردگی

An Example Data Set and Decision Tree yes no sunny rainy med small big outlook company sailboat

Classification outlook sunny rainy yes company no big med no sailboat small big yes no

Induction of Decision Trees Data Set (Learning Set) Each example = Attributes + Class TDIDT Top Down Induction of Decision Trees

Decision Trees Rules for classifying data using attributes. The tree consists of decision nodes and leaf nodes. A decision node has two or more branches, each representing values for the attribute tested. A leaf node produces a homogeneous result (all in one class), which does not require additional classification testing.

Some TDIDT Systems ID3 (Quinlan 79) CART (Brieman et al. 84) Assistant (Cestnik et al. 87) C4.5 (Quinlan 93) ...

What is ID3? A mathematical algorithm for building the decision tree. Invented by J. Ross Quinlan in 1979. Uses Information Theory invented by Shannon in 1948. Builds the tree from the top down, with no backtracking. Information Gain is used to select the most useful attribute for classification.

TDIDT Algorithm Also known as ID3 (Quinlan) To construct decision tree T from learning set S: If all examples in S belong to some class C Then make leaf labeled C Otherwise select the “most informative” attribute A partition S according to A’s values recursively construct subtrees T1, T2, ..., for the subsets of S

TDIDT Algorithm tree T is: Attribute A A v1 v2 vn A’s values T1 T2 Tn Subtrees

Another Example

Simple Tree Outlook Humidity P Windy N P N P sunny rainy overcast high normal yes no N P N P

Complicated Tree Temperature Outlook Outlook Windy P P Windy Windy P hot cold moderate Outlook Outlook Windy sunny rainy sunny rainy yes no overcast overcast P P Windy Windy P Humidity N Humidity yes no yes no high normal high normal N P P N Windy P Outlook P yes no sunny rainy overcast N P N P null

Attribute Selection Criteria Main principle Select attribute which partitions the learning set into subsets as “pure” as possible Various measures of purity Information-theoretic X2 ...

Information-Theoretic Approach To classify an object, a certain information is needed I, information After we have learned the value of attribute A, we only need some remaining amount of information to classify the object Ires, residual information Gain Gain(A) = I – Ires(A) The most ‘informative’ attribute is the one that minimizes Ires, i.e., maximizes Gain

Entropy The average amount of information I needed to classify an object is given by the entropy measure For a two-class problem: entropy p(c1)

Residual Information After applying attribute A, S is partitioned into subsets according to values v of A Ires is equal to weighted sum of the amounts of information for the subsets

Information Gain (IG) The information gain is based on the decrease in entropy after a dataset is split on an attribute. Which attribute creates the most homogeneous branches? First the entropy of the total dataset is calculated. The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain, or decrease in entropy. The attribute that yields the largest IG is chosen for the decision node.

Information Gain (cont’d) A branch set with entropy of 0 is a leaf node. Otherwise, the branch needs further splitting to classify its dataset. The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.

Triangles and Squares

Example:Triangles and Squares Data Set: A set of classified objects . . . . . .

Entropy 5 triangles 9 squares class probabilities entropy . . . . . .

Entropy reduction by data set partitioning . . red yellow green Color? Entropy reduction by data set partitioning . .

. . . . . . . red Color? green . yellow .

. . . . . . . red Information Gain Color? green . yellow .

Information Gain of The Attribute Attributes Gain(Color) = 0.246 Gain(Outline) = 0.151 Gain(Dot) = 0.048 Heuristics: attribute with the highest gain is chosen This heuristics is local (local minimization of impurity)

Gain(Outline) = 0.971 – 0 = 0.971 bits red Color? green . yellow . Gain(Outline) = 0.971 – 0 = 0.971 bits Gain(Dot) = 0.971 – 0.951 = 0.020 bits Color red green yellow square

Gain(Outline) = 0.971 – 0.951 = 0.020 bits red Gain(Outline) = 0.971 – 0.951 = 0.020 bits Gain(Dot) = 0.971 – 0 = 0.971 bits Color? green . yellow . solid . Color red green Outline? yellow square Outline dashed . dashed solid triangle square

. . . . Dot? Color? . . . Outline? . red yes no green yellow solid dashed .

Decision Tree . . . . . . Color Dot square Outline triangle square red green yellow Dot square Outline yes no dashed solid triangle square triangle square

Advantages of using ID3 Understandable prediction rules are created from the training data. Builds the fastest tree. Builds a short tree. Only need to test enough attributes until all data is classified. Finding leaf nodes enables test data to be pruned, reducing number of tests. Whole dataset is searched to create tree.

Disadvantages of using ID3 Data may be over-fitted or over-classified, if a small sample is tested. Only one attribute at a time is tested for making a decision. Classifying continuous data may be computationally expensive, as many trees must be generated to see where to break the continuum.

تمرین مهلت تحویل: آخر آبان ماه