1 By: Ashmi Banerjee (125186) Suman Datta ( ) CSE- 3rd year.
INTRODUCTION TO DECISION TREES 2 Decision tree learning is one of the most widely used and practical methods for inductive inference. It is a method for approximating discrete-valued functions that is robust to noisy data and capable of learning disjunctive expressions. These learning methods are among the most popular of inductive inference algorithms and have been successfully applied to a broad range of tasks from learning to diagnose medical cases to learning to assess credit risk of loan applicants.
DECISION TREE REPRESENTATION 3 A decision tree is a classification model whose structure consists of a number of nodes and arcs. In general, a node is labelled by an attribute name, and an arc by a valid value of the attribute associated with the node from which the arc originates. The top-most node is called the root of the tree, and the bottom nodes are called the leaves. Each leaf is labelled by a class (value of the class attribute). When used for classification, a decision tree is traversed in a top-down manner, following the arcs with attribute values satisfying the instance that is to be classified. The traversal of the tree leads to a leaf node and the instance is assigned the class label of the leaf.
4
5
TYPES OF ATTRIBUTES 6 1.Binary Attributes 2.Nominal Attributes 3.Ordinal 4.Continuous
7 The test condition for a binary attribute generates two potential outcomes.
8 It can have many values. It can be split into multiple subgroups depending on the number of distinct values corresponding to the attribute.
9 Ordinal attributes can also produce binary or multi way splits. They can be grouped as long as the grouping does not violate the order property of the attribute value.
To illustrate a decision tree, consider the learning task represented by the training examples of the following table. Here the target attribute PlayTennis, which can have values yes or no for different Saturday mornings, is to be predicted based on other attributes of the morning in question. 10
An Illustrative EXAMPLE contd.. 11
12
BUT IS OUR MODEL A GOOD ONE???? Will it predict correctly for all data?? 13
MEASURES FOR SELECTING THE BEST FIT 14
15 The smaller the degree of impurities in the leaf nodes the skewed is the classification. The impurities can be measured as:
16
17
18
19 Humidity provides greater information gain than Wind, relative to the target classification. Here, E stands for entropy and S for the original collection of examples. Given an initial collection S of 9 positive and 5 negative examples, [9+, 5-], sorting these by their Humidity produces collections of [3+, 4-1 (Humidity = High) and [6+, 1-] (Humidity = Normal). The information gained by this partitioning is.151, compared to a gain of only.048 for the attribute Wind.
20
21
22
23
24
25