MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR K. TAMBALKAR.

Slides:



Advertisements
Similar presentations
CHAPTER 9: Decision Trees
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Decision Tree Algorithm (C4.5)
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Lecture Notes for Chapter 4 Introduction to Data Mining
Decision Tree Algorithm
Classification Continued
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
LEARNING DECISION TREES
ICS 273A Intro Machine Learning
Classification.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
Chapter 7 Decision Tree.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
Genetic Algorithm.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Decision Tree Problems CSE-391: Artificial Intelligence University of Pennsylvania Matt Huenerfauth April 2005.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Classifying Attributes with Game- theoretic Rough Sets Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA S4S 0A2
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Exercises Decision Trees In decision tree learning, the information gain criterion helps us select the best attribute to split the data at every node.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Classification and Prediction
Discretization. 1.Introduction 2.Perspectives and Background 3.Properties and Taxonomy 4.Experimental Comparative Analysis.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Bootstrapped Optimistic Algorithm for Tree Construction
Big Data Analysis and Mining Qinpei Zhao 赵钦佩 2015 Fall Decision Tree.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Decision Trees.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 4-Inducción de árboles de decisión (1/2) Eduardo Poggi.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Classification Algorithms
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Decision Trees (suggested time: 30 min)
Ch9: Decision Trees 9.1 Introduction A decision tree:
SAD: 6º Projecto.
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Classification by Decision Tree Induction
Statistical Learning Dong Liu Dept. EEIS, USTC.
Presentation transcript:

MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR K. TAMBALKAR

What is Discretization?  Discretization concerns the process of transferring continuous functions, models and equations into discrete values.  This process is usually carried out as first step towards making them suitable for numerical evaluation and implementation on digital computers.

Why Discretization?  The main aim is reduce the number of values of continuous attribute to discrete attribute.  Typically data is discretized into a partitions of K equal lengths/widths (equal intervals).

Discretization  Discretization of continuous-valued attributes.  First present the result about the information entropy minimization.  Heuristic for binary discretization (Two-interval splits) A better understanding of the heuristic and it’s behavior. Formal evidence that supports the usage of the heuristic in this context.

Binary Discretization  A continuous-valued attribute is typically discretized during decision tree generation by partitioning its range into two intervals. Threshold value ‘T’ Continuous attribute ‘A’ is determined and the test A T is assigned to the right branch. We call such threshold value T, a cut point.

What is Entropy  Entropy. It’s also called Expected Information Entropy. That’s what we call this value which essentially describes how consistently a potential split will match up with a classifier. Ex: Let’s say we are looking below age of 25. Out of that group how many people can we expect to have an income above 50K or below 50K? Lower entropy is better, and a 0 value for entropy is the best.

Data set example Features (f1)Features (f2)Class Labels Attributes (a1)Attributes (b1)Class Labels (a2)(b2)Class Labels (a3)(b3)Class Labels (a4)(b4)Class Labels (a5)(b5)Class Labels (a6)(b6)Class Labels (a7)(b7)Class Labels (a8)(b8)Class Labels (a9)(b9)Class Labels

Algorithm  Binary Discretization We select an attribute for branching at a node having a set S of N examples. For each continuous-valued attribute A we select the “best” cut point T A from its range of values by evaluation. First we sort the given set or data into the increasing order of attribute ‘A’. And the midpoint between the each successive pair of example in the sorted sequence is evaluated as a potential cut point. Thus for each continuous-valued attribute, N-1 evaluations will take place for each evaluation of a candidate cut point T, then the data is partitioned into two sets. Then class entropy of the resulting partition is computed.

Example

Example

Algorithm  Binary Discretization Let ‘T’ partition the set ‘S’ of examples into the subsets ‘S 1 ’ and ‘S 2 ’. Let there be ‘K’ classes ‘C 1 …..,C k ’ and let P(C i,S) be the proportion of examples in ‘S’ that have class C i the class entropy of a subset S is defined as:

Algorithm Class Entropy

Algorithm  Binary Discretization When the logarithm base is 2, Ent(S) measures the amount of information needed in bits. To specify the classes in S. To evaluate the resulting class entropy after a set S is partitioned into two sets S 1 and S 2

Algorithm  Example  For an example set S an attribute A and a cut point value T. Let S 1 subset S be the subset of examples in S with A values <=T and S 2 =S-S 1. The class information entropy of the partition induced by T. E(A, T, S) is defined as.

Algorithm  Example

Algorithm  Binary Discretization  A binary discretization for A is determined by selecting the cut point T A for which E(A, T A, S) is minimum amongst all the candidate cut points.

Gain of the entropy  Once we find out the minimum amongst all the candidate cut points, then compute the gain in the entropy.  How to compute the gain of entropy?

Gain of the entropy

MDLPC Criterion The Minimum Description Length Principle:  Once we find the gain of the entropy now we are ready to state our decision criterion for accepting or rejecting a given partition based on the MDLP.

MDCLPC Criteria  The partition induced by a cut point T for a set S of N examples is accepted then discretization process will through and we provide the discrete value to the each and every class from that dataset.  The partition induced by a cut point T for a set S of N examples is rejected then cut point which we selected is wrong find the cut points again from the given example dataset.

Empirical Evaluation We compare four different decision strategies for deciding whether or not to accept a partition. Following criteria we follow for variations of algorithm Never Cut: The original binary interval algorithm Always Cut: Always accept a cut unless all examples have the same class or the same value for the attribute. Random cut: Accepts or rejects by flipping the fair coin. MDLP cut: The derived MDLPC criterion.

Results

Thank you