Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.

Slides:



Advertisements
Similar presentations
Wei Fan Ed Greengrass Joe McCloskey Philip S. Yu Kevin Drummey
Advertisements

Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Kun Zhang,
A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.
On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.
Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.
Is Random Model Better? -On its accuracy and efficiency-
Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM T.J.Watson Janek Mathuria, and Chang-tien Lu Virginia.
Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun.
When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.
DECISION TREES. Decision trees  One possible representation for hypotheses.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Presentor:
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Classification with Multiple Decision Trees
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Classification Techniques: Decision Tree Learning
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
Ensemble Learning: An Introduction
Tree-based methods, neutral networks
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
Region Based Image Annotation Through Multiple-Instance Learning By: Changbo Yang Wayne State University Department of Computer Science.
Classification.
Gini Index (IBM IntelligentMiner)
Classification and Prediction: Regression Analysis
Ensemble Learning (2), Tree and Forest
Ensembles of Classifiers Evgueni Smirnov
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Feature Selection: Why?
MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Decision trees for hierarchical multi-label classification.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
10. Decision Trees and Markov Chains for Gene Finding.
Who am I? Work in Probabilistic Machine Learning Like to teach 
Data Transformation: Normalization
DECISION TREES An internal node represents a test on an attribute.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Introduction to Data Mining, 2nd Edition by
Classification and Prediction
Introduction to Data Mining, 2nd Edition by
Machine Learning Week 1.
Introduction to Data Mining, 2nd Edition by
Random Survival Forests
Statistical Learning Dong Liu Dept. EEIS, USTC.
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
©Jiawei Han and Micheline Kamber
Presentation transcript:

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center Presenter: Xiatian Zhang Authors: Xiatian Zhang, Quan Yuan, Shiwan Zhao, Wei Fan, Wentao Zheng, Zhong Wang

Multi-label Classification  Classical Classification (Single Label Classification) –The classes are exclusive: if an example belongs to one class, it can’t be belongs to others  Multi-label Classification –A picture, video, article may belong to several compatible categories –A pieces of gene can control several biological functions Tree LakeIce Winter Park

Existed Multi-label Classification Methods  Grigorios Tsoumakas et al[2007] summarize the existing methods for ML-Classification  Two Strategies –Problem Transformation –Transfer Multi-label Classification Problem to Single Classification Problem –Algorithm Adaptation –Adapt Single-label Classifiers to Solve the Multi-label Classification Problem –With high complexity

Problem Transformation Approaches  Label Powerset (LP) –Label Powerset considers each unique subset of labels that exists in the multi-label dataset as a single label L1L2L3 L1 L2 L3  Binary Relevance (BR) –Binary Relevance learns one binary classier for each label L1 L2 L3 L4 Classifier L1+L2+L3+ L1-L2-L3- Classifier1Classifier2Classifier3

Large Number of Labels Problem  Hundreds and even more labels –Text categorization –protein function classification –semantic annotation of multimedia  The Impacts to Multi-label Classification Methods –Label Powerset: the number of training examples for each particular label will be much less –Binary Relevance: The computational complexity is with linear complexity with respect to the number of labels –Algorithm Adaptation: Even more worse than Binary Relevance

HOMER for Large Number of Labels Problem  HOMER (Hierarchy Of Multilabel classifERs) is developed by Grigorios Tsoumakas et al,  The HOMER algorithm constructs a Hierarchy Of Mul-tilabel classifERs, each one dealing with a much smaller set of labels.

Our Method – Without Label Cost  Without Label Cost –Training Time is almost irrelevant with number of labels |L|  But with Reliable Quality –The classification Quality can be compared to mainstream methods over different data sets.  How to make it?

Our Method – Without Label Cost cont.  Binary Relevance Method based on Random Decision Tree  Random Decision Tree [Fan et al, 2003] –Training Process is irrelevant with label information –Random Construction with very low cost –Stable quality on many applications

Random Decision Tree – Tree Construction  At each node, an un-used feature is chosen randomly –A discrete feature is un-used if it has never been chosen previously on a given decision path starting from the root to the current node. –A continuous feature can be chosen multiple times on the same decision path, but each time a different threshold value is chosen  It stop when one of the following happens: –A node becomes too small (<= 4 examples). –Or the total height of the tree exceeds some limits: –Such as the total number of features.  The construction process is irrelevant with label information

Random Decision Tree - Node Statistics  Classification and Probability Estimation: –Each node of the tree keeps the number of examples belonging to each class.  The node statistics process cost a little computation resource F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …

Random Decision Tree - Classification  During classification, each tree outputs posterior probability: P(+|x)=30/100 =0.3 F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …

Random Decision Tree - Ensemble  For a instance x, average the estimated probability on each tree and take the average probability as the predicted probability of x. P’(+|x)=30/50 =0.6 P(+|x)=30/100=0.3 (P(+|x)+P’(+|x))/2 = 0.45 F3>0.3 F2<0.6 F1>0.7 +:100 -:120 +:30 -: 20 Y Y N N N … F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …

Multi-label Random Decision Tree F1<0.5 F2>0.7 F3>0.3 Y Y N N N … L1+:30 L1-: 70 L2+:50 L2-: 50 L1+:200 L1-: 10 L2+:40 L2-: 60 F3>0.5 F2<0.7 F1>0.7 Y Y N N N … L1+:30 L1-: 20 L2+:20 L2-: 80 L1+:100 L1-:120 L1+:200 L1-: 10 P(L1+|x)=30/100=0.3P’(L1+|x)=30/50 =0.6 P(L2+|x)=50/100=0.5P’(L2+|x)=20/100=0.2 (P(L1+|x)+P’(L1+|x))/2 = 0.45 (P(L2+|x)+P’(L2+|x))/2 = 0.35

Why RDT Works?  Ensemble Learning View –Our Analysis –Other Explanations  Non-Parametric Estimation

Complexity of Multi-label Random Decision Tree  Training Complexity: –m is the number of trees, and n is the number of instances –t is the average number of labels on each leaf nodes, t<<n, and t<<|L|. –It is irrelevant with number of labels |L|. –Complexity of C4.5: V i is the size of values of i-th attribute. –Complexity of HOMER:  Test Complexity: –q is the average depth of branches of trees –It is also irrelevant with number of labels |L|

Experiment – Metrics and Datasets  Quality Metrics:  Datasets:

Experiment - Quality

Experiment – Computational Cost

Experiment – Computational Cost cont.

Experiment – Computational Cost cont

Future Works  Leverage the relationship of labels.  Apply ML-RDT for Recommendation  Parallelization and Streaming Implementation