Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy information
Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab Motivation A number of different approaches to build accurate classifiers have been proposed but the integration of taxonomy information in data used for classifier training has never been investigated so far.
Intelligent Database Systems Lab Objectives This paper presents a general-purpose strategy to improve structured data classifier accuracy provided by a taxonomy built over data items.
Intelligent Database Systems Lab Definition. Aggregation tree Definition. Multiple-taxonomy Let T ¼ t 1 ; …; t be a set of attributes. A multiple-taxonomy Θ={AT 1,…,AT m } is a forest of aggregation trees defined on the domains of attributes in T.
Intelligent Database Systems Lab Methodology
Intelligent Database Systems Lab Methodology – Multiple-taxonomy over data items in D
Intelligent Database Systems Lab two-step process: (i)Generalized classification rule mining. ex: {(Location,Italy)} ⇒ {(User category, Entrepreneur)} (s=50%, c=100%) (1)An extended training dataset version is generated first (2) FP-tree-like representation of the extended dataset is generated. Only frequent items are included in the FP-tree. (ii)Rule selection by means of lazy pruning.
Intelligent Database Systems Lab Methodology – lazy pruning (1)Pruning rules that only misclassify training data. (2) Rules that correctly classify at least one training data are grouped in the Level I rule set, while rules that remain unused during the training phase are kept in the Level II.
Intelligent Database Systems Lab Methodology – The G−L3 algorithm
Intelligent Database Systems Lab Methodology
Intelligent Database Systems Lab Methodology – G−L3 class prediction When a new test case rt has to be classified, G−L3 considers the sorted rule sets in Level I and Level II. If none of the Level I rules match rt, then the top-ranked rule in Level II matching r is considered. If none of the rules belonging to the two model sets match rt, the default class label is assigned to rt.
Intelligent Database Systems Lab Experiments – Dataset characteristics
Intelligent Database Systems Lab Experiments – Accuracy comparison(baseline V.S. extended)
Intelligent Database Systems Lab Experiments – Accuracy comparison
Intelligent Database Systems Lab Experiments –
Intelligent Database Systems Lab Experiments –
Intelligent Database Systems Lab Experiments –
Intelligent Database Systems Lab Experiments – execution time comparison
Intelligent Database Systems Lab Conclusions –Taxonomy integration is shown to yield significant accuracy improvements.
Intelligent Database Systems Lab Comments Advantages –More accurate. Applications –Classification 、 Data mining.