Iterative Dichotomiser 3 (ID3) Algorithm Medha Pradhan CS 157B, Spring 2007.

Slides:



Advertisements
Similar presentations
Introduction to Artificial Intelligence CS440/ECE448 Lecture 21
Advertisements

Iterative Dichotomiser 3 (ID3) Algorithm
Data Mining Lecture 9.
Decision Trees Decision tree representation ID3 learning algorithm
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Machine Learning Decision Trees. Exercise Solutions.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Machine learning learning... a fundamental aspect of intelligent systems – not just a short-cut to kn acquisition / complex behaviour.
Information Extraction Lecture 6 – Decision Trees (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
By: Phuong H. Nguyen Professor: Lee, Sin-Min Course: CS 157B Section: 2 Date: 05/08/07 Spring 2007.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Algorithm
Learning: Identification Trees Larry M. Manevitz All rights reserved.
Decision Tree Learning
More on Decision Trees. Numerical attributes Tests in nodes can be of the form x j > constant Divides the space into rectangles.
Ensemble Learning: An Introduction
Induction of Decision Trees
Lecture 5 (Classification with Decision Trees)
Classification.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Information Theory, Classification & Decision Trees Ling 572 Advanced Statistical Methods in NLP January 5, 2012.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Data Mining: Classification
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.
EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Feature Selection: Why?
Ch10 Machine Learning: Symbol-Based
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
By: Phuong H. Nguyen Professor: Lee, Sin-Min Course: CS 157B Section: 2 Date: 05/08/07 Spring 2007.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Learning with Decision Trees Artificial Intelligence CMSC February 20, 2003.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
ID3 Algorithm Michael Crawford.
Searching by Authority Artificial Intelligence CMSC February 12, 2008.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Lecture Notes for Chapter 4 Introduction to Data Mining
ID3 Algorithm Amrit Gurung. Classification Library System Organise according to special characteristics Faster retrieval New items sorted easily Related.
Presentation on Decision trees Presented to: Sir Marooof Pasha.
Learning with Decision Trees Artificial Intelligence CMSC February 18, 2003.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Iterative Dichotomiser 3 By Christopher Archibald.
Medical Decision Making Learning: Decision Trees Artificial Intelligence CMSC February 10, 2005.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS
DECISION TREES An internal node represents a test on an attribute.
Iterative Dichotomiser 3 (ID3) Algorithm
ID3 Algorithm.
Chapter 8 Tutorial.
Learning with Identification Trees
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Presentation transcript:

Iterative Dichotomiser 3 (ID3) Algorithm Medha Pradhan CS 157B, Spring 2007

Agenda Basics of Decision Tree Introduction to ID3 Entropy and Information Gain Two Examples

Basics What is a decision tree? A tree where each branching (decision) node represents a choice between 2 or more alternatives, with every branching node being part of a path to a leaf node Decision node: Specifies a test of some attribute Leaf node: Indicates classification of an example

ID3 Invented by J. Ross Quinlan Employs a top-down greedy search through the space of possible decision trees. Greedy because there is no backtracking. It picks highest values first. Select attribute that is most useful for classifying examples (attribute that has the highest Information Gain).

Entropy Entropy measures the impurity of an arbitrary collection of examples. For a collection S, entropy is given as: For a collection S having positive and negative examples Entropy(S) = -p + log 2 p + - p - log 2 p - where p + is the proportion of positive examples and p - is the proportion of negative examples In general, Entropy(S) = 0 if all members of S belong to the same class. Entropy(S) = 1 (maximum) when all members are split equally.

Information Gain Measures the expected reduction in entropy. The higher the IG, more is the expected reduction in entropy. where Values(A) is the set of all possible values for attribute A, S v is the subset of S for which attribute A has value v.

Example 1 Sample training data to determine whether an animal lays eggs. Independent/Condition attributes Dependent/ Decision attributes AnimalWarm- blooded FeathersFurSwimsLays Eggs OstrichYes No Yes CrocodileNo Yes RavenYes No Yes AlbatrossYes No Yes DolphinYesNo YesNo KoalaYesNoYesNo

Entropy(4Y,2N): -(4/6)log 2 (4/6) – (2/6)log 2 (2/6) = Now, we have to find the IG for all four attributes Warm-blooded, Feathers, Fur, Swims

For attribute ‘Warm-blooded’: Values(Warm-blooded) : [Yes,No] S = [4Y,2N] S Yes = [3Y,2N] E(S Yes ) = S No = [1Y,0N] E(S No ) = 0 (all members belong to same class) Gain(S,Warm-blooded) = – [(5/6)* (1/6)*0] = For attribute ‘Feathers’: Values(Feathers) : [Yes,No] S = [4Y,2N] S Yes = [3Y,0N] E(S Yes ) = 0 S No = [1Y,2N] E(S No ) = Gain(S,Feathers) = – [(3/6)*0 + (3/6)* ] =

For attribute ‘Fur’: Values(Fur) : [Yes,No] S = [4Y,2N] S Yes = [0Y,1N] E(S Yes ) = 0 S No = [4Y,1N] E(S No ) = Gain(S,Fur) = – [(1/6)*0 + (5/6)*0.7219] = For attribute ‘Swims’: Values(Swims) : [Yes,No] S = [4Y,2N] S Yes = [1Y,1N] E(S Yes ) = 1 (equal members in both classes) S No = [3Y,1N] E(S No ) = Gain(S,Swims) = – [(2/6)*1 + (4/6)* ] =

Gain(S,Warm-blooded) = Gain(S,Feathers) = Gain(S,Fur) = Gain(S,Swims) = Gain(S,Feathers) is maximum, so it is considered as the root node Feathers YN [Ostrich, Raven, Albatross] [Crocodile, Dolphin, Koala] Lays Eggs ? Anim al War m- blood ed Feath ers FurSwim s Lays Eggs Ostric h Yes No Yes Croco dile No Yes RavenYes No Yes Albatr oss Yes No Yes Dolph in YesNo YesNo KoalaYesNoYesNo The ‘Y’ descendant has only positive examples and becomes the leaf node with classification ‘Lays Eggs’

We now repeat the procedure, S: [Crocodile, Dolphin, Koala] S: [1+,2-] Entropy(S) = -(1/3)log 2 (1/3) – (2/3)log 2 (2/3) = AnimalWarm- blooded FeathersFurSwimsLays Eggs CrocodileNo Yes DolphinYesNo YesNo KoalaYesNoYesNo

For attribute ‘Warm-blooded’: Values(Warm-blooded) : [Yes,No] S = [1Y,2N] S Yes = [0Y,2N] E(S Yes ) = 0 S No = [1Y,0N] E(S No ) = 0 Gain(S,Warm-blooded) = – [(2/3)*0 + (1/3)*0] = For attribute ‘Fur’: Values(Fur) : [Yes,No] S = [1Y,2N] S Yes = [0Y,1N] E(S Yes ) = 0 S No = [1Y,1N] E(S No ) = 1 Gain(S,Fur) = – [(1/3)*0 + (2/3)*1] = For attribute ‘Swims’: Values(Swims) : [Yes,No] S = [1Y,2N] S Yes = [1Y,1N] E(S Yes ) = 1 S No = [0Y,1N] E(S No ) = 0 Gain(S,Swims) = – [(2/3)*1 + (1/3)*0] = Gain(S,Warm-blooded) is maximum

The final decision tree will be: Feathers YN Lays eggs Warm-blooded Y N Lays Eggs Does not lay eggs

Example 2 Factors affecting sunburn NameHairHeightWeightLotionSunburned SarahBlondeAverageLightNoYes DanaBlondeTallAverageYesNo AlexBrownShortAverageYesNo AnnieBlondeShortAverageNoYes EmilyRedAverageHeavyNoYes PeteBrownTallHeavyNo JohnBrownAverageHeavyNo KatieBlondeShortLightYesNo

S = [3+, 5-] Entropy(S) = -(3/8)log 2 (3/8) – (5/8)log 2 (5/8) = Find IG for all 4 attributes: Hair, Height, Weight, Lotion For attribute ‘Hair’: Values(Hair) : [Blonde, Brown, Red] S = [3+,5-] S Blonde = [2+,2-] E(S Blonde ) = 1 S Brown = [0+,3-]E(S Brown ) = 0 S Red = [1+,0-]E(S Red ) = 0 Gain(S,Hair) = – [(4/8)*1 + (3/8)*0 + (1/8)*0] =

For attribute ‘Height’: Values(Height) : [Average, Tall, Short] S Average = [2+,1-] E(S Average ) = S Tall = [0+,2-]E(S Tall ) = 0 S Short = [1+,2-]E(S Short ) = Gain(S,Height) = – [(3/8)* (2/8)*0 + (3/8)* ] = For attribute ‘Weight’: Values(Weight) : [Light, Average, Heavy] S Light = [1+,1-] E(S Light ) = 1 S Average = [1+,2-]E(S Average ) = S Heavy = [1+,2-]E(S Heavy ) = Gain(S,Weight) = – [(2/8)*1 + (3/8)* (3/8)* ] = For attribute ‘Lotion’: Values(Lotion) : [Yes, No] SYes = [0+,3-] E(S Yes ) = 0 S No = [3+,2-]E(S No ) = Gain(S,Lotion) = – [(3/8)*0 + (5/8)* ] =

Gain(S,Hair) = Gain(S,Height) = Gain(S,Weight) = Gain(S,Lotion) = Gain(S,Hair) is maximum, so it is considered as the root node NameHairHeightWeigh t LotionSunbur ned SarahBlondeAverag e LightNoYes DanaBlondeTallAverag e YesNo AlexBrownShortAverag e YesNo AnnieBlondeShortAverag e NoYes EmilyRedAverag e HeavyNoYes PeteBrownTallHeavyNo JohnBrownAverag e HeavyNo KatieBlondeShortLightYesNo Hair Blonde Red Brown [Sarah, Dana, Annie, Katie] [Emily] [Alex, Pete, John] Sunburned Not Sunburned ?

Repeating again: S = [Sarah, Dana, Annie, Katie] S: [2+,2-] Entropy(S) = 1 Find IG for remaining 3 attributes Height, Weight, Lotion For attribute ‘Height’: Values(Height) : [Average, Tall, Short] S = [2+,2-] S Average = [1+,0-] E(S Average ) = 0 S Tall = [0+,1-]E(S Tall ) = 0 S Short = [1+,1-]E(S Short ) = 1 Gain(S,Height) = 1 – [(1/4)*0 + (1/4)*0 + (2/4)*1] = 0.5 NameHairHeightWeightLotionSunburned SarahBlondeAverageLightNoYes DanaBlondeTallAverageYesNo AnnieBlondeShortAverageNoYes KatieBlondeShortLightYesNo

For attribute ‘Weight’: Values(Weight) : [Average, Light] S = [2+,2-] S Average = [1+,1-] E(S Average ) = 1 S Light = [1+,1-]E(S Light ) = 1 Gain(S,Weight) = 1 – [(2/4)*1 + (2/4)*1] = 0 For attribute ‘Lotion’: Values(Lotion) : [Yes, No] S = [2+,2-] S Yes = [0+,2-] E(S Yes ) = 0 S No = [2+,0-]E(S No ) = 0 Gain(S,Lotion) = 1 – [(2/4)*0 + (2/4)*0] = 1 Therefore, Gain(S,Lotion) is maximum

In this case, the final decision tree will be Hair Blonde Red Brown Sunburned Not Sunburned Lotion Y N Sunburned Not Sunburned

References "Machine Learning", by Tom Mitchell, McGraw-Hill, 1997 "Building Decision Trees with the ID3 Algorithm", by: Andrew Colin, Dr. Dobbs Journal, June html 1.html Professor Sin-Min Lee, SJSU.