Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar.

Similar presentations


Presentation on theme: "Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar."— Presentation transcript:

1 Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar

2 Mapping Between Taxonomies  Formal systems of orderly classification of knowledge, which are designed for a specific purpose  Companies, organizing information in various ways (eg. one for marketing, another for product development)

3 Approach German French Textile Automobile By country By industry

4 Approach German French Textile Automobile By country By industry

5 Approach German French Textile Automobile By country By industry

6 Approach German French Textile Automobile By country By industry

7 Approach Textile Automobile By industry

8 Approach Textile Automobile By industry abc

9 Approach Textile Automobile By industry abc

10 Approach German French Textile Automobile By country By industry abc

11 Approach German French Textile Automobile By country By industry abc

12 Approach German French Textile Automobile By country By industry abc

13 Datasets Two classification schemes:  Reuter 2001 (807900 docs)  Topics (127)  Industry categories (871)  Regions (376)  Hoovers-255 and Hoovers-28 (4286 docs)  industry categories (28)  industry categories (255)

14 Learning  2 separate methods of learning for the documents:  Old doc category -> new doc category  Doc contents -> new category  Combined method:  Weighted average based on confidence  Final result determined by a decision tree  One combined learner – used both old category and contents as features

15 Simple Learners  Simple Decision Tree (C4.5) – learns probabilities of new categories based on 1 kind of feature:  Old categories (doesn’t know about documents/words)  Word-based classification (doesn’t know about old categories)  Naïve Bayes (rainbow)  Old categories (doesn’t know about documents/words)  Word-based classification (doesn’t know about old categories)  Support Vector Machine (SVM-Light)  word-based classification (doesn’t know about old categories), linear kernel [results will be reported in the final paper]

16 Learning  Using the document content abc  Using the document labels DT, NB, SVM

17 Combined Learners  Weighted Average  Voting scheme  Combination Decision Tree  takes the outputs and confidences of two of the simple learners, predicts new category

18 Learning  Using both the content and the label  Combining the two outputs abc DT abc DT, NB, SVM voting 3 rd classifier

19 Results Words Only  5-fold cross validation

20 Results Categories Only  5-fold cross validation

21 Results Combination  5-fold cross validation

22 Results

23 Remarks  Hierarchy (old classes) usually ignored  Shown that helps  Learners are not the issue  Better way of understanding  Old label (or hierarchy path) is meta data

24 Remaining Work  SVM results (running even as we speak)  Repeat experiments on Reuters-2001  Internal hierarchies  Missing labels  Less correlated types of classes  Results in standard evaluation format

25 Future Work  Try with a web dataset (Google and Yahoo! Hierarchies)  Hierarchies of more levels  Meta data (for non-text sources)

26 Related Literature  A study of Approaches to Hypertext, Y. Yang, S. Slattery, R. Ghani, Journal of Intelligent Information Systems, Volume 18, Number 2, March 2002 (to appear).  Learning Mappings between Data Schemas, A. Doan, P. Domingos, and A. Levy. Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, 2000, Austin, TX.

27 Questions and Suggestions The end.

28

29 Taxonomies  Formal systems of orderly classification of knowledge, which are designed for a specific purpose  Change of purpose, change of taxonomies  Businesses often need and keep the information in several structures  Important to be able to automatically map between taxonomies

30 Useful Mappings  Companies, organizing information in various ways (eg. one for marketing, another for product development)  Personal online bookmark classification  Search engines (eg. Google Yahoo)  EU Committee for Standardization “detailed overview of the existing taxonomies officially used in the EU, in order to derive general concepts such as: information organisation, properties, multilinguality, keywords, etc. and, last but not least, the mapping between.”


Download ppt "Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar."

Similar presentations


Ads by Google