Download presentation
Presentation is loading. Please wait.
Published byKeila Doidge Modified over 9 years ago
1
Learning to Map between Ontologies on the Semantic Web AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon Halevy Databases and Data Mining group University of Washington
2
Semantic Web Mark-up data on the web using ontologies Enable intelligent information processing over the web Personal software agents Queries over multiple web pages …
3
An Example Semantic Mappings allow information processing across ontologies www.cs.washington.edu www.cs.usyd.edu.au James Cook PhD, U Sydney Data Instance Find Prof. Cook, a professor in a Seattle college, earlier an assoc. professor at his alma mater in Australia Semantic Mapping People StaffFaculty ProfessorAssoc. Professor Asst. Professor AcademicTechnical ProfessorSenior Lecturer Staff Name Education ……
4
Semantic Web: State of the Art Languages for ontologies RDF, DAML+OIL,… Ontology learning and Ontology design tools [Maedche’02], Protégé, Ontolingua,… Semantic Mappings crucial to the SW vision [Uscold’01, Berners-Lee, et al.’01] Without semantic mappings…Tower of Babel !!!
5
Semantic Mapping Challenges Ontologies can be very different Different vocabularies, different design principles Overlap, but not coincide Semantic Mapping information Data instances marked up with ontologies Concept names and taxonomic structure Constraints on the mapping
6
Overview People StaffFaculty ProfessorAssoc. Professor Asst. Professor AcademicTechnical ProfessorSenior Lecturer Staff Faculty Define Similarity Sim(Fac,Staff) Sim(Fac,Prof) Sim(Fac,Acad) Compute Similarity Satisfy Constraints Academic ?
7
Our Contributions An automatic solution to taxonomy matching Handles different similarity notions Exploits information in data instances and taxonomic structure, using multi-strategy learning Extend solution to handle wide variety of constraints, using Relaxation Labeling GLUE An implementation, our GLUE system, and experiments on real-world taxonomies High accuracy (68-98%) on large taxonomies (100-330 concepts)
8
Defining Similarity Multiple Similarity measures in terms of the JPD Assoc. Prof Snr. Lecturer A, S A, S A, S A,S P(A, S) + P(A,S) + P( A,S) P(A,S) = Joint Probability Distribution: P(A,S),P( A,S),P(A, S),P( A, S) Hypothetical Common Marked up domain P(A S) P(A S) Sim(Assoc. Prof., Snr. Lect.) = [Jaccard, 1908]
9
No common data instances In practice, not easy to find data tagged with both ontologies ! United StatesAustralia Solution: Use Machine Learning A AA S SS
10
Machine Learning for computing similarities JPD estimated by counting the sizes of the partitions CL S S SSUnited States Australia A AA S SS CL A A AA A, SA,S A, S A,S A,S A,S A, S A, S
11
Improve Predictive Accuracy – Use Multi-Strategy Learning Single Classifier cannot exploit all available information Combine the prediction of multiple classifiers CL A 1 A AA A AA CL A N A AA … Content Learner Frequencies on different words in the text in the data instances Name Learner Words used in the names of concepts in the taxonomy Others … Meta-Learner
12
So far… Define Similarity Compute Similarity Satisfy Constraints Joint Probability Distribution Multi-strategy Learning
13
Constraints due to the taxonomy structure Domain specific constraints Department-Chair can only map to a unique concept Numerous constraints of different types Staff People Next Step: Exploit Constraints StaffFac Prof Assoc. ProfAsst. Prof AcadTech ProfSnr. Lect. Lect. People StaffParents Children Extended Relaxation Labeling to ontology matching
14
Solution: Relaxation Labeling Find the best label assignment given a set of constraints Start with an initial label assignment Iteratively improves labels, given constraints Standard Relaxation Labeling not applicable Extended in many ways Acad Staff People Staff Fac Prof Assoc. ProfAsst. Prof Tech ProfSnr. Lect. Lect.PeopleProf Assoc. Prof Asst. Prof Fac Prof ?Staff Snr. Lect. Lect.StaffAcad Prof Lect. ?
15
Distribution Estimator Joint Distributions:P(A,B),P(A, B),… Taxonomy O 2 (structure + data instances) Taxonomy O 1 (structure + data instances) Putting it all together GLUE System Relaxation Labeler Generic & Domain constraints Mappings for O 1, Mappings for O 2 Similarity Estimator Similarity Matrix Similarity function Distribution Estimator Meta Learner Learner CL 1 Learner CL N
16
Real World Experiments Taxonomies on the web University classes (UW and Cornell) Companies (Yahoo and The Standard) For each taxonomy Extracted data instances – course descriptions, and company profiles Trivial data cleaning 100 – 300 concepts per taxonomy 3-4 depth of taxonomies 10-90 average data instances per concept Evaluation against manual mappings as the gold standard
17
Results University IUniversity IICompanies
18
Related Work Our LSD schema matching system [Doan, Domingos, Halevy ’01] GLUE handles taxonomies, richer models, and a much richer set of constraints Other Ontology and Schema Matching work [Noy, Musen’01], [Melnik, et al.’02], [Ichise, et al.’01] Mostly heuristics, or single machine learning techniques Relaxation Labeling for constraint satisfaction [Hummel, Zucker’83], [Chakrabarti, et al.’00] Significantly extend this approach
19
Conclusions & Future Work An automated solution to taxonomy matching Handles multiple notions of similarity Exploits data instances and taxonomy structure Incorporates generic and domain-specific constraints Produces high accuracy results Future Work More expressive models Complex Mappings Automated reasoning about mappings between models
20
An Example Semantic Mappings allow information processing across ontologies www.cs.washington.edu www.cs.usyd.edu.au James Cook PhD, U Sydney Data Instance Find Prof. Cook, a professor in a Seattle college, earlier an assoc. professor at his alma mater in Australia Semantic Mapping People StaffFaculty ProfessorAssoc. Professor Asst. Professor AcademicTechnical ProfessorSenior Lecturer Staff Name Education ……
21
Acad Solution: Relaxation Labeling Iterative estimation of most likely label assignment Staff People Staff Fac Prof Assoc. ProfAsst. Prof Tech ProfSnr. Lect. Lect.StaffProf Snr. Lect. Lect. Tech Prof Lect. Prof Lect.Staff Challenges Making the computation tractable – large number of labels Combining effects of various constraintsPeopleProf Assoc. Prof Asst. Prof Acad ? Fac
22
People StaffFaculty ProfessorAssoc. Professor Asst. Professor AcademicTechnical ProfessorSenior Lecturer Staff Name Education …… Languages for Ontologies E.g. DAML+OIL Ontology Design Tools E.g. Protégé, Ontolingua,… Semantic Mapping
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.