Download presentation
Presentation is loading. Please wait.
1
Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar
2
Mapping Between Taxonomies Formal systems of orderly classification of knowledge, which are designed for a specific purpose Companies, organizing information in various ways (eg. one for marketing, another for product development)
3
Approach German French Textile Automobile By country By industry
4
Approach German French Textile Automobile By country By industry
5
Approach German French Textile Automobile By country By industry
6
Approach German French Textile Automobile By country By industry
7
Approach Textile Automobile By industry
8
Approach Textile Automobile By industry abc
9
Approach Textile Automobile By industry abc
10
Approach German French Textile Automobile By country By industry abc
11
Approach German French Textile Automobile By country By industry abc
12
Approach German French Textile Automobile By country By industry abc
13
Datasets Two classification schemes: Reuter 2001 (807900 docs) Topics (127) Industry categories (871) Regions (376) Hoovers-255 and Hoovers-28 (4286 docs) industry categories (28) industry categories (255)
14
Learning 2 separate methods of learning for the documents: Old doc category -> new doc category Doc contents -> new category Combined method: Weighted average based on confidence Final result determined by a decision tree One combined learner – used both old category and contents as features
15
Simple Learners Simple Decision Tree (C4.5) – learns probabilities of new categories based on 1 kind of feature: Old categories (doesn’t know about documents/words) Word-based classification (doesn’t know about old categories) Naïve Bayes (rainbow) Old categories (doesn’t know about documents/words) Word-based classification (doesn’t know about old categories) Support Vector Machine (SVM-Light) word-based classification (doesn’t know about old categories), linear kernel [results will be reported in the final paper]
16
Learning Using the document content abc Using the document labels DT, NB, SVM
17
Combined Learners Weighted Average Voting scheme Combination Decision Tree takes the outputs and confidences of two of the simple learners, predicts new category
18
Learning Using both the content and the label Combining the two outputs abc DT abc DT, NB, SVM voting 3 rd classifier
19
Results Words Only 5-fold cross validation
20
Results Categories Only 5-fold cross validation
21
Results Combination 5-fold cross validation
22
Results
23
Remarks Hierarchy (old classes) usually ignored Shown that helps Learners are not the issue Better way of understanding Old label (or hierarchy path) is meta data
24
Remaining Work SVM results (running even as we speak) Repeat experiments on Reuters-2001 Internal hierarchies Missing labels Less correlated types of classes Results in standard evaluation format
25
Future Work Try with a web dataset (Google and Yahoo! Hierarchies) Hierarchies of more levels Meta data (for non-text sources)
26
Related Literature A study of Approaches to Hypertext, Y. Yang, S. Slattery, R. Ghani, Journal of Intelligent Information Systems, Volume 18, Number 2, March 2002 (to appear). Learning Mappings between Data Schemas, A. Doan, P. Domingos, and A. Levy. Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, 2000, Austin, TX.
27
Questions and Suggestions The end.
29
Taxonomies Formal systems of orderly classification of knowledge, which are designed for a specific purpose Change of purpose, change of taxonomies Businesses often need and keep the information in several structures Important to be able to automatically map between taxonomies
30
Useful Mappings Companies, organizing information in various ways (eg. one for marketing, another for product development) Personal online bookmark classification Search engines (eg. Google Yahoo) EU Committee for Standardization “detailed overview of the existing taxonomies officially used in the EU, in order to derive general concepts such as: information organisation, properties, multilinguality, keywords, etc. and, last but not least, the mapping between.”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.