Download presentation
Presentation is loading. Please wait.
Published byAlbert Fields Modified over 9 years ago
1
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy information
2
Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments
3
Intelligent Database Systems Lab Motivation A number of different approaches to build accurate classifiers have been proposed but the integration of taxonomy information in data used for classifier training has never been investigated so far.
4
Intelligent Database Systems Lab Objectives This paper presents a general-purpose strategy to improve structured data classifier accuracy provided by a taxonomy built over data items.
5
Intelligent Database Systems Lab Definition. Aggregation tree Definition. Multiple-taxonomy Let T ¼ t 1 ; …; t be a set of attributes. A multiple-taxonomy Θ={AT 1,…,AT m } is a forest of aggregation trees defined on the domains of attributes in T.
6
Intelligent Database Systems Lab Methodology
7
Intelligent Database Systems Lab Methodology – Multiple-taxonomy over data items in D
8
Intelligent Database Systems Lab two-step process: (i)Generalized classification rule mining. ex: {(Location,Italy)} ⇒ {(User category, Entrepreneur)} (s=50%, c=100%) (1)An extended training dataset version is generated first (2) FP-tree-like representation of the extended dataset is generated. Only frequent items are included in the FP-tree. (ii)Rule selection by means of lazy pruning.
9
Intelligent Database Systems Lab Methodology – lazy pruning (1)Pruning rules that only misclassify training data. (2) Rules that correctly classify at least one training data are grouped in the Level I rule set, while rules that remain unused during the training phase are kept in the Level II.
10
Intelligent Database Systems Lab Methodology – The G−L3 algorithm
11
Intelligent Database Systems Lab Methodology
12
Intelligent Database Systems Lab Methodology – G−L3 class prediction When a new test case rt has to be classified, G−L3 considers the sorted rule sets in Level I and Level II. If none of the Level I rules match rt, then the top-ranked rule in Level II matching r is considered. If none of the rules belonging to the two model sets match rt, the default class label is assigned to rt.
13
Intelligent Database Systems Lab Experiments – Dataset characteristics
14
Intelligent Database Systems Lab Experiments – Accuracy comparison(baseline V.S. extended)
15
Intelligent Database Systems Lab Experiments – Accuracy comparison
16
Intelligent Database Systems Lab Experiments –
17
Intelligent Database Systems Lab Experiments –
18
Intelligent Database Systems Lab Experiments –
19
Intelligent Database Systems Lab Experiments – execution time comparison
20
Intelligent Database Systems Lab Conclusions –Taxonomy integration is shown to yield significant accuracy improvements.
21
Intelligent Database Systems Lab Comments Advantages –More accurate. Applications –Classification 、 Data mining.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.