Learning to Match Ontologies on the Semantic Web AnHai Doan Jayant Madhavan Robin Dhamankar Pedro Domingos Alon Halevy
Glue Identifies Mappings between websites Uses Machine Learning Uses Common Sense Knowledge Domain Constraints
Motivation Data comes from Different Ontologies Answers come from multiple web pages Manual: very tedious, error prone, not very scalable
Outline Overview of GLUE GLUE Architecture Case Studies CGLUE Case Studies Conclusion Assessment
Overview Assumes 2 Ontologies 1-1 Matching Similarity between two Concepts Computing Joint Distribution P(A,B), P(A, ~B), P(~A,B), P(~A,~B) Machine Learning Multistrategy Learning Exploiting Domain Constraints Data Instances
Overview Relaxation Labeler Similarity Estimator Meta Learner M L1L1 LkLk Taxonomy 0 1 Taxonomy 0 2 Joint Distributions Similarity function Similarity Matrix Common knowledge Domain constraints Mappings for Taxonomies …………
Distribution Estimator Meta Learner M Base Learner L 1 ………… Base Learner L k Taxonomy 0 1 Taxonomy 0 2 Joint Distributions
Distribution Estimator R DCA F E t1,t2 t3,t4 t5 t6,t7 t1,t2,t3,t4 t5,t6,t7 Trained Learner L
Distribution Estimator G H B JI s2,s3 s4 s5,s6 s1,s2,s3,s4 s5,s6 L s1
Distribution Estimator s1,s3 s5s6 s2,s4
Multistrategy Learning Base Learners Content Learner Frequency Naïve Bayes Name Learner Full Name Specific and Descriptive Element MetaLearner
Combines the base learners Gives learner weight User Input
Joint Distributions Similarity function Similarity Estimator Similarity Matrix Similarity Estimator
Applies Function From User Jaccard-sim Outputs a matrix between concepts
Where are we? Find Similarities Compute Similarities Satisfy Constraints
Relaxation Labeler Similarity Matrix Common knowledge Domain constraints Mappings for Taxonomies
Constraints Domain-Independent General Knowledge Domain-Dependent Interaction between two nodes Model each as a feature f()
Domain Independent
Relaxation Labeler Searches for best mapping given constraints Labels are influenced by it “neighborhood” Performs local optimization
Local Optimization 1. Assigns initial labels 2. Performs Optimization 3. Uses a formula to change a label 4. Repeat 2-3
Local Optimization Node in taxonomy O 1 Label in taxonomy O 2 Everything we know Other label assignments to all Nodes besides X
Local Optimization
Where are we? Relaxation Labeler Similarity Estimator Meta Learner M L1L1 LkLk Taxonomy 0 1 Taxonomy 0 2 Joint Distributions Similarity function Similarity Matrix Common knowledge Domain constraints Mappings for Taxonomies …………
Case Study University Catalogs Business Profiles For Each one Entire set of data instances Cleaned it up
Results
Improvements Insufficient Training Data Local Optimization Additional Base Learners Ambiguous Best Match
CGLUE
Beam Search Uses structure and data No relaxation labeling (no constraints)
CGLUE Case Study
Improvements Incorporate Domain Constraints Object Identification
Conclusion Semantic Similarity Multistategy Learning Relaxation Labeling CGLUE
Assessment Data Instances Additional Sites? CGLUE Future Work