Presentation is loading. Please wait.

Presentation is loading. Please wait.

MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR K. TAMBALKAR.

Similar presentations


Presentation on theme: "MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR K. TAMBALKAR."— Presentation transcript:

1 MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR K. TAMBALKAR

2 What is Discretization?  Discretization concerns the process of transferring continuous functions, models and equations into discrete values.  This process is usually carried out as first step towards making them suitable for numerical evaluation and implementation on digital computers.

3 Why Discretization?  The main aim is reduce the number of values of continuous attribute to discrete attribute.  Typically data is discretized into a partitions of K equal lengths/widths (equal intervals).

4 Discretization  Discretization of continuous-valued attributes.  First present the result about the information entropy minimization.  Heuristic for binary discretization (Two-interval splits) A better understanding of the heuristic and it’s behavior. Formal evidence that supports the usage of the heuristic in this context.

5 Binary Discretization  A continuous-valued attribute is typically discretized during decision tree generation by partitioning its range into two intervals. Threshold value ‘T’ Continuous attribute ‘A’ is determined and the test A T is assigned to the right branch. We call such threshold value T, a cut point.

6 What is Entropy  Entropy. It’s also called Expected Information Entropy. That’s what we call this value which essentially describes how consistently a potential split will match up with a classifier. Ex: Let’s say we are looking below age of 25. Out of that group how many people can we expect to have an income above 50K or below 50K? Lower entropy is better, and a 0 value for entropy is the best.

7 Data set example Features (f1)Features (f2)Class Labels Attributes (a1)Attributes (b1)Class Labels (a2)(b2)Class Labels (a3)(b3)Class Labels (a4)(b4)Class Labels (a5)(b5)Class Labels (a6)(b6)Class Labels (a7)(b7)Class Labels (a8)(b8)Class Labels (a9)(b9)Class Labels

8 Algorithm  Binary Discretization We select an attribute for branching at a node having a set S of N examples. For each continuous-valued attribute A we select the “best” cut point T A from its range of values by evaluation. First we sort the given set or data into the increasing order of attribute ‘A’. And the midpoint between the each successive pair of example in the sorted sequence is evaluated as a potential cut point. Thus for each continuous-valued attribute, N-1 evaluations will take place for each evaluation of a candidate cut point T, then the data is partitioned into two sets. Then class entropy of the resulting partition is computed.

9 Example 8 9 7 3 2 5 1 6 4 10 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

10 Example

11 Algorithm  Binary Discretization Let ‘T’ partition the set ‘S’ of examples into the subsets ‘S 1 ’ and ‘S 2 ’. Let there be ‘K’ classes ‘C 1 …..,C k ’ and let P(C i,S) be the proportion of examples in ‘S’ that have class C i the class entropy of a subset S is defined as:

12 Algorithm Class Entropy

13 Algorithm  Binary Discretization When the logarithm base is 2, Ent(S) measures the amount of information needed in bits. To specify the classes in S. To evaluate the resulting class entropy after a set S is partitioned into two sets S 1 and S 2

14 Algorithm  Example  For an example set S an attribute A and a cut point value T. Let S 1 subset S be the subset of examples in S with A values <=T and S 2 =S-S 1. The class information entropy of the partition induced by T. E(A, T, S) is defined as.

15 Algorithm  Example

16 Algorithm  Binary Discretization  A binary discretization for A is determined by selecting the cut point T A for which E(A, T A, S) is minimum amongst all the candidate cut points.

17 Gain of the entropy  Once we find out the minimum amongst all the candidate cut points, then compute the gain in the entropy.  How to compute the gain of entropy?

18 Gain of the entropy

19

20 MDLPC Criterion The Minimum Description Length Principle:  Once we find the gain of the entropy now we are ready to state our decision criterion for accepting or rejecting a given partition based on the MDLP.

21 MDCLPC Criteria  The partition induced by a cut point T for a set S of N examples is accepted then discretization process will through and we provide the discrete value to the each and every class from that dataset.  The partition induced by a cut point T for a set S of N examples is rejected then cut point which we selected is wrong find the cut points again from the given example dataset.

22 Empirical Evaluation We compare four different decision strategies for deciding whether or not to accept a partition. Following criteria we follow for variations of algorithm Never Cut: The original binary interval algorithm Always Cut: Always accept a cut unless all examples have the same class or the same value for the attribute. Random cut: Accepts or rejects by flipping the fair coin. MDLP cut: The derived MDLPC criterion.

23 Results

24 Thank you


Download ppt "MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR K. TAMBALKAR."

Similar presentations


Ads by Google