Download presentation
Presentation is loading. Please wait.
1
Mapping Between Taxonomies Elena Eneva 30 Oct 2001 Advanced IR Seminar
2
Idea Review German French Textile Automobile By country By industry
3
Learning Algorithms 2 separate learners for the documents Old doc category -> new doc category Doc contents -> new category Put together Weighted average based on confidence Final result determined by a decision tree One combined learner – used both old category and contents as features
4
Data Sets Hoovers – 4285 documents –28 categories –255 categories Reuter 2001 – 810597 documents –Topics –Industry categories
5
Current System Simple Decision Tree (C4.5) – learns probabilities of new categories based on old categories (doesn’t know about documents/words) Naïve Bayes (rainbow) – word-based classification into the new categories (doesn’t know about old categories) Combination (Decision Tree) – takes the outputs and confidences of the two, predicts new category
6
Current Results NB tr NB te DT tr DT te Comb tr Comb te 28p255?21.1430.0226.1967.7230.26 255p28??100 Accuracy (%) Five fold cross validation
7
Work in Progress Naïve Bayes for 255 predict 28 (expect higher accuracies) Use one classifier only (taking both kinds of features - words & old categories) – NB An additional single simple classifier – KNN (and VNC-Light, if there is time in the end) Run everything on Reuters 2001 (in addition to Hoovers)
8
Comments? http://www.cs.cmu.edu/~eneva/tax.htm The end.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.