Induction of Decision Trees

Slides:



Advertisements
Similar presentations
Decision Tree Learning - ID3
Advertisements

IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Decision Tree.
Classification Techniques: Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Induction of Decision Trees Blaž Zupan, Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp.
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification: Decision Trees
Induction of Decision Trees
Decision Trees and Decision Tree Learning Philipp Kärger
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees an Introduction.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Classification.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Learning Chapter 18 and Parts of Chapter 20
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Induction of Decision Trees Blaž Zupan and Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Induction of Decision Trees. An Example Data Set and Decision Tree yes no yesno sunnyrainy no med yes smallbig outlook company sailboat.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Machine Learning Lecture 10 Decision Tree Learning 1.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Longin Jan Latecki Temple University
D ECISION T REES Jianping Fan Dept of Computer Science UNC-Charlotte.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Decision Trees.
Decision Trees by Muhammad Owais Zahid
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Induction of Decision Trees Blaž Zupan and Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
ICS320-Foundations of Adaptive and Learning Systems
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Chapter 18 From Data to Knowledge
Induction of Decision Trees
Classification Algorithms
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Advanced Artificial Intelligence
Jianping Fan Dept of Computer Science UNC-Charlotte
Classification by Decision Tree Induction
Data Mining – Chapter 3 Classification
Machine Learning: Lecture 3
Classification with Decision Trees
Induction of Decision Trees and ID3
Induction of Decision Trees
Dept. of Computer Science University of Liverpool
Learning Chapter 18 and Parts of Chapter 20
Data Mining(中国人民大学) Yang qiang(香港科技大学) Han jia wei(UIUC)
©Jiawei Han and Micheline Kamber
Data Mining CSCI 307, Spring 2019 Lecture 15
COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr
Presentation transcript:

Induction of Decision Trees Blaž Zupan and Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp

An Example Data Set and Decision Tree yes no sunny rainy med small big outlook company sailboat

Classification outlook sunny rainy yes company no big med no sailboat small big yes no

Induction of Decision Trees Data Set (Learning Set) Each example = Attributes + Class Induced description = Decision tree TDIDT Top Down Induction of Decision Trees Recursive Partitioning

Some TDIDT Systems ID3 (Quinlan 79) CART (Brieman et al. 84) Assistant (Cestnik et al. 87) C4.5 (Quinlan 93) See5 (Quinlan 97) ... Orange (Demšar, Zupan 98-03)

Analysis of Severe Trauma Patients Data The worst pH value at ICU The worst active partial thromboplastin time PH_ICU <7.2 >7.33 7.2-7.33 Death 0.0 (0/15) Well 0.88 (14/16) APPT_WORST <78.7 >=78.7 Well 0.82 (9/11) Death 0.0 (0/7) PH_ICU and APPT_WORST are exactly the two factors (theoretically) advocated to be the most important ones in the study by Rotondo et al., 1997.

Breast Cancer Recurrence Degree of Malig < 3 >= 3 Tumor Size Involved Nodes < 15 >= 15 < 3 >= 3 no rec 125 recurr 39 no rec 30 recurr 18 recurr 27 no_rec 10 Age no rec 4 recurr 1 no rec 32 recurr 0 Tree induced by Assistant Professional Interesting: Accuracy of this tree compared to medical specialists

Prostate cancer recurrence Secondary Gleason Grade No Yes PSA Level Stage Primary Gleason Grade 1,2 3 4 5 ≤14.9 >14.9 T1c,T2a, T2b,T2c T1ab,T3 2,3

TDIDT Algorithm Also known as ID3 (Quinlan) To construct decision tree T from learning set S: If all examples in S belong to some class C Then make leaf labeled C Otherwise select the “most informative” attribute A partition S according to A’s values recursively construct subtrees T1, T2, ..., for the subsets of S

TDIDT Algorithm Resulting tree T is: Attribute A A v1 v2 vn A’s values Tn Subtrees

Another Example

Simple Tree Outlook Humidity P Windy N P N P sunny rainy overcast high normal yes no N P N P

Complicated Tree Temperature Outlook Outlook Windy P P Windy Windy P hot cold moderate Outlook Outlook Windy sunny rainy sunny rainy yes no overcast overcast P P Windy Windy P Humidity N Humidity yes no yes no high normal high normal N P P N Windy P Outlook P yes no sunny rainy overcast N P N P null

Attribute Selection Criteria Main principle Select attribute which partitions the learning set into subsets as “pure” as possible Various measures of purity Information-theoretic Gini index X2 ReliefF ... Various improvements probability estimates normalization binarization, subsetting

Information-Theoretic Approach To classify an object, a certain information is needed I, information After we have learned the value of attribute A, we only need some remaining amount of information to classify the object Ires, residual information Gain Gain(A) = I – Ires(A) The most ‘informative’ attribute is the one that minimizes Ires, i.e., maximizes Gain

Entropy vs GINI ClassificationError

Entropy vs GINI ClassificationError

Entropy The average amount of information I needed to classify an object is given by the entropy measure For a two-class problem: entropy p(c1)

Residual Information After applying attribute A, S is partitioned into subsets according to values v of A Ires is equal to weighted sum of the amounts of information for the subsets

Triangles and Squares

A set of classified objects Triangles and Squares Data Set: A set of classified objects . . . . . .

Entropy 5 triangles 9 squares class probabilities entropy . . . . . .

Entropy reduction by data set partitioning . . red yellow green Color? Entropy reduction by data set partitioning . .

Entropija vrednosti atributa . . . . . . . red Color? green Entropija vrednosti atributa . yellow .

. . . . . . . red Information Gain Color? green . yellow .

Information Gain of The Attribute Attributes Gain(Color) = 0.246 Gain(Outline) = 0.151 Gain(Dot) = 0.048 Heuristics: attribute with the highest gain is chosen This heuristics is local (local minimization of impurity)

Gain(Outline) = 0.971 – 0 = 0.971 bits red Color? green . yellow . Gain(Outline) = 0.971 – 0 = 0.971 bits Gain(Dot) = 0.971 – 0.951 = 0.020 bits

Gain(Outline) = 0.971 – 0.951 = 0.020 bits red Gain(Outline) = 0.971 – 0.951 = 0.020 bits Gain(Dot) = 0.971 – 0 = 0.971 bits Color? green . yellow . solid . Outline? dashed .

. . . Dot? . Color? . . . Outline? . red yes no green yellow solid dashed .

Decision Tree . . . . . . Color Dot square Outline triangle square red green yellow Dot square Outline yes no dashed solid triangle square triangle square