ID3 Algorithm Allan Neymark CS157B – Spring 2007.

Slides:

Advertisements

Similar presentations

COMP3740 CR32: Knowledge Management and Adaptive Systems

Advertisements

Data Mining Lecture 9.

Decision Trees Decision tree representation ID3 learning algorithm

Classification Algorithms

Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Decision Tree Approach in Data Mining

Decision Tree Algorithm (C4.5)

ICS320-Foundations of Adaptive and Learning Systems

Classification Techniques: Decision Tree Learning

Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.

Personal Driving Diary: Constructing a Video Archive of Everyday Driving Events IEEE workshop on Motion and Video Computing ( WMVC) 2011 IEEE Workshop.

ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.

Classification: Decision Trees

Induction of Decision Trees

1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.

Decision Trees Chapter 18 From Data to Knowledge.

Three kinds of learning

Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.

Decision Tree Classifier

Classification.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.

Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.

Theses slides are based on the slides by

Data Mining: Classification

Business Intelligence Technologies – Data Mining

Machine Learning Chapter 3. Decision Tree Learning

Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Mohammad Ali Keyvanrad

Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:

Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.

Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)

D ECISION T REES Jianping Fan Dept of Computer Science UNC-Charlotte.

Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.

CS690L Data Mining: Classification

ID3 Algorithm Michael Crawford.

Decision Trees, Part 1 Reading: Textbook, Chapter 6.

Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.

Decision Tree Learning

An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.

Presentation on Decision trees Presented to: Sir Marooof Pasha.

Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.

COM24111: Machine Learning Decision Trees Gavin Brown

Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.

Iterative Dichotomiser 3 By Christopher Archibald.

1 Machine Learning Decision Trees Some of these slides are courtesy of R.Mooney, UT Austin and E. Keogh, UC Riverside.

Decision Trees by Muhammad Owais Zahid

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.

CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.

Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.

DECISION TREES An internal node represents a test on an attribute.

Decision Trees an introduction.

Data Mining Decision Tree Induction

Classification Algorithms

Decision Trees.

Artificial Intelligence

Data Science Algorithms: The Basic Methods

Issues in Decision-Tree Learning Avoiding overfitting through pruning

Decision Tree Saed Sayad 9/21/2018.

Jianping Fan Dept of Computer Science UNC-Charlotte

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning Chapter 3. Decision Tree Learning

Induction of Decision Trees and ID3

Presentation transcript:

ID3 Algorithm Allan Neymark CS157B – Spring 2007

Agenda Decision Trees What is ID3? Entropy Calculating Entropy with Code Information Gain Advantages and Disadvantages Example

Decision Trees Rules for classifying data using attributes. The tree consists of decision nodes and leaf nodes. A decision node has two or more branches, each representing values for the attribute tested. A leaf node attribute produces a homogeneous result (all in one class), which does not require additional classification testing.

Decision Tree Example overcast highnormal false true sunny rain No Yes Outlook Humidity Windy

What is ID3? A mathematical algorithm for building the decision tree. Invented by J. Ross Quinlan in Uses Information Theory invented by Shannon in Builds the tree from the top down, with no backtracking. Information Gain is used to select the most useful attribute for classification.

Entropy A formula to calculate the homogeneity of a sample. A completely homogeneous sample has entropy of 0. An equally divided sample has entropy of 1. Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of negative and positive elements. The formula for entropy is:

Entropy Example Entropy(S) = - (9/14) Log2 (9/14) - (5/14) Log2 (5/14) = 0.940

Calculating Entropy with Code Most programming languages and calculators do not have a log2 function. Use a conversion factor Take log function of 2, and divide by it. Example: log10(2) =.301 Then divide to get log2(n): log10(3/5) /.301 = log2(3/5)

Calculating Entropy with Code (cont’d) Taking log10(0) produces an error. Substitute 0 for (0/3)log10(0/3) Do not try to calculate log10(0/3)

Information Gain (IG) The information gain is based on the decrease in entropy after a dataset is split on an attribute. Which attribute creates the most homogeneous branches? First the entropy of the total dataset is calculated. The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain, or decrease in entropy. The attribute that yields the largest IG is chosen for the decision node.

Information Gain (cont’d) A branch set with entropy of 0 is a leaf node. Otherwise, the branch needs further splitting to classify its dataset. The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.

Advantages of using ID3 Understandable prediction rules are created from the training data. Builds the fastest tree. Builds a short tree. Only need to test enough attributes until all data is classified. Finding leaf nodes enables test data to be pruned, reducing number of tests. Whole dataset is searched to create tree.

Disadvantages of using ID3 Data may be over-fitted or over-classified, if a small sample is tested. Only one attribute at a time is tested for making a decision. Classifying continuous data may be computationally expensive, as many trees must be generated to see where to break the continuum.

Example: The Simpsons

Person Hair Length WeightAgeClass Homer0”25036M Marge10”15034F Bart2”9010M Lisa6”788F Maggie4”201F Abe1”17070M Selma8”16041F Otto10”18038M Krusty6”20045M Comic8”29038?

Hair Length <= 5? yes no Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = Entropy(1F,3M) = -(1/4)log 2 (1/4) - (3/4)log 2 (3/4) = Entropy(3F,2M) = -(3/5)log 2 (3/5) - (2/5)log 2 (2/5) = Gain(Hair Length <= 5) = – (4/9 * /9 * ) = Let us try splitting on Hair length

Weight <= 160? yes no Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = Entropy(4F,1M) = -(4/5)log 2 (4/5) - (1/5)log 2 (1/5) = Entropy(0F,4M) = -(0/4)log 2 (0/4) - (4/4)log 2 (4/4) = 0 Gain(Weight <= 160) = – (5/9 * /9 * 0 ) = Let us try splitting on Weight

age <= 40? yes no Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = Entropy(3F,3M) = -(3/6)log 2 (3/6) - (3/6)log 2 (3/6) = 1 Entropy(1F,2M) = -(1/3)log 2 (1/3) - (2/3)log 2 (2/3) = Gain(Age <= 40) = – (6/9 * 1 + 3/9 * ) = Let us try splitting on Age

Weight <= 160? yes no Hair Length <= 2? yes no Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified… So we simply recurse! This time we find that we can split on Hair length, and we are done!

Weight <= 160? yesno Hair Length <= 2? yes no We need don’t need to keep the data around, just the test conditions. Male Female How would these people be classified?

It is trivial to convert Decision Trees to rules… Weight <= 160? yesno Hair Length <= 2? yes no Male Female Rules to Classify Males/Females If Weight greater than 160, classify as Male Elseif Hair Length less than or equal to 2, classify as Male Else classify as Female Rules to Classify Males/Females If Weight greater than 160, classify as Male Elseif Hair Length less than or equal to 2, classify as Male Else classify as Female

References Quinlan, J.R. 1986, Machine Learning, 1, _dtrees2.html _dtrees2.html Professor Sin-Min Lee, SJSU.