Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology 10. 20. 2010.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Decision Tree Approach in Data Mining
Classification with Multiple Decision Trees
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Decision Tree.
Feature/Model Selection by Linear Programming SVM, Combined with State-of-Art Classifiers: What Can We Learn About the Data Erinija Pranckeviciene, Ray.
Lecture outline Classification Decision-tree classification.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Tree Rong Jin. Determine Milage Per Gallon.
SIKS-Advanced Course on Computational Intelligence, October Ordinal Classification Rob Potharst Erasmus University Rotterdam.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
CS Instance Based Learning1 Instance Based Learning.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Chapter 5 Data mining : A Closer Look.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Decision Trees.
Mohammad Ali Keyvanrad
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Basic Data Mining Technique
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
CS690L Data Mining: Classification
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Presentation on Decision trees Presented to: Sir Marooof Pasha.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Classification and Regression Trees
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
1 By: Ashmi Banerjee (125186) Suman Datta ( ) CSE- 3rd year.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
10. Decision Trees and Markov Chains for Gene Finding.
k-Nearest neighbors and decision tree
Classification with Gene Expression Data
Trees, bagging, boosting, and stacking
Chapter 6 Classification and Prediction
Decision Trees Greg Grudic
Classification and Prediction
ID3 Algorithm.
Learning with Identification Trees
Classification and Prediction
An Infant Facial Expression Recognition System Based on Moment Feature Extraction C. Y. Fang, H. W. Lin, S. W. Chen Department of Computer Science and.
INTRODUCTION TO Machine Learning
©Jiawei Han and Micheline Kamber
Presentation transcript:

Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology

Outline Problem of ordinal classification Rule learning for classification Evaluate attribute quality with rank entropy in ordinal classification Construct ordinal decision trees Experimental analysis Conclusions and future work

1. Ordinal classification There are two classes of classification tasks Nominal classification assign nominal class labels to objects according to their features Ordinal classification assign ordinal class labels to objects according to their criteria

1. Ordinal classification Nominal classes vs. ordinal classes Take disease diagnosis as an example

1. Ordinal classification Nominal classes vs. ordinal classes Decision slight severe Severe moderate There is an ordinal structure between the decision severity of Flu: severe>moderate>slight

1. Ordinal classification Nominal classification Inconsistent samples As to nominal, the same feature values, the same decision Different assumptions are used in nominal and ordinal classification

1. Ordinal classification Decision slight severe Severe moderate Ordinal classification: The better features, the better decision the worse feature values, but get the better decision

1. Ordinal classification Ordinal classification occurs in a wide range of applications, such as Production quality measure Bank credit analysis Disease or fault severity evaluation Submission or project review Social investigation analysis ……

1. Ordinal classification Different consistency assumptions are used nominal classification The objects taking the same or similar feature values should be classified into the same class; otherwise, the task is not consistent If x=y, then d(x)=d(y)

1. Ordinal classification Different consistency assumptions are used ordinal classification The objects taking the better feature values should be classified into the better classes; otherwise, the task is not consistent If x>=y, then d(x)>=d(y)

2. Rule learning for ordinal classification

2. Rule learning for classification

2. Rule learning for ordinal classification Decision tree algorithms for nominal classification CART —— Classification and Regression Tree (Breiman et al. 1984) ID3, C4.5, See5 —— R. Quinlan 1986, 1993, 2004 Disadvantage in ordinal classification These algorithms adopt information entropy and mutual information to evaluate the capability of features in classification, which does not consider the ordinal structure in ordinal data. Even given a consistent data set, these algorithms may output inconsistent rules

2. Rule learning for ordinal classification The most important issue in constructing decision trees is to design a measure for computing the quality of features, and select the best to divide samples.

3. Attribute quality in ordinal classification Ordinal information, Q. Hu, D. Yu, et al. 2010

3. Attribute quality in ordinal classification The subset of samples which feature values are better than x i in terms of attributes B. The subset of samples which decisions are better than x i.

3. Attribute quality in ordinal classification Shannon ’ s entropy is defined as Number of elements

3. Attribute quality in ordinal classification

If B is a set of attributes and C is a decision, then RMI can be viewed as a coefficient of ordinal relevance between B and C, so it reflects the capability of B in predicting C.

3. Attribute quality in ordinal classification the ascending rank mutual information between X and Y. If we consider x is a feature, y is a decision, then we can see RMI reflects the ordinal consistency

4. Ordinal tree construction Given a set of training samples, how to induce a decision model from the data? (REOT) 1. Compute the rank mutual information between each feature and decision based on samples in the root node 2. Select the feature with the maximal mutual information and split samples according to the feature values 3. Compute the rank mutual information between each features and decision based on samples in this node and select the best feature until each node is pure

5. Experimental analysis 30 samples 2 attributes 5 classes Inconsistent rules

5. Experimental analysis

6. Conclusions and future work Ordinal classification learning is very sensitive to noise; several noisy samples may completely change the evaluation of feature quality. A robust measure of feature quality is desirable. Rank mutual information combines the advantage of information entropy and dominance rough sets. This new measure is not only able to measure the ordinal consistency, but also robust to noisy information. The proposed ordinal decision tree algorithm can produce monotonously consistent decision trees if the given training sets are monotonously consistent. It also gets a more precise decision model than CART and REOT if the datasets are not consistent.

6. Conclusions and future work In real-world applications, some of features are ordinal, others are nominal. This is the most general case. We should be able to distinguish between ordinal features and nominal features and use the proper information structures hidden in them. We will develop algorithms for learning rules from mixed features in the future.