Download presentation
Presentation is loading. Please wait.
1
Merging Taxonomies
2
Assertion Creation and maintenance of large ontologies will require the capability to merge taxonomies This problem is similar to the problem of merging e-commerce catalogs R. Agrawai, R. Srikant: On Catalog Integration. WWW-10
3
Catalog Integration Problem Integrate products from new catalog into master catalog.
4
The Problem (cont.) After integration:
5
Desired Solution Automatically integrate products: little or no effort on part of user. domain independent. Problem size: million products thousands of nodes in the hierarchy
6
How do we do it Build classification model (rules) using product descriptions in master catalog. Example: If the product description contains "DRAM", the product is likely to be in the "Memory" category. Use classification model to predict categories for products in the new catalog.
7
National Semiconductor Files
8
National Semiconductor Files with Categories
9
Accuracy on Pangea Data B2B Portal for electronic components: 1200 categories, 40K training documents. 500 categories with < 5 documents. Accuracy: 72% for top choice. 99.7% for top 5 choices.
10
Enhanced Algorithm Use affinity information in catalog to be integrated: Products in same category are similar. Bias the classifier to incorporate this information. Accuracy boost depends on quality of current catalog: Use tuning set to determine amount of bias.
11
Algorithm Extension of the Naive-Bayes classification to incorporate affinity information
12
Empirical Results
13
Improvement in Accuracy (Pangea)
14
Improvement in Accuracy (Reuters)
15
Improvement in Accuracy (Google.Outdoors)
16
Tune Set Size (Pangea)
17
Summary The catalog integration technolgy can be directly used for creating and evolving large taxonomies See WWW-2000 paper for experimental results on merging Yahoo and Google categorizations
18
Naive Bayes Classifier
19
Naive Bayes Classifier (cont.)
20
Enhanced Classifier
21
Algorithm Outline For each node S in the member hierarchy: For each product p in S: i. Tentatively classify p using the standard model. ii. Use the results of Step 1 to compute Pr(class | S). iii. Re-classify each product in S using the enhanced model.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.