oMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
2 Outline Motivations oMAP A formal framework The different classifiers used Evaluation Conclusion
3 Motivations Heterogeneity of information systems Ontologies as a solution to data heterogeneity on the Web Ontologies are themselves heterogeneous: knowledge representation language degree of formalization Semantic Web More and more OWL/RDF ontologies on the Web Need for comparing/reusing/merging ontologies partially covering the same domain different version of the same ontology
4 Motivations (cont.) Distributed Information Retrieval Resource selection: The agent has to select a subset of some relevant resources Query reformulation: For every selected resource, the agent has to re-formulate its information need accordingly Data fusion & rank aggregation: The results from the selected resources have finally to be merged together.
5 Aligning Ontologies A matching operator: Input: a set of discrete entities (tables, XML elements, classes, properties…) Output: relationship holding between the entities (subsumption, equivalence, disjointness…) a confidence measure Automatic vs manual techniques Numerous work from various communities schema matching, machine learning, data integration
6 Example Equivalence Subsumption Disjointness
7 oMAP: A Formal Framework Inspirations: Formal work in data exchange [Fagin et al., 2003] GLUE: combining several specialized components for finding the best set of mappings [Doan et al., 2003] Notations: A mapping is a tuple: M = (T, S, ∑ ) S et T are the source and target ontologies S i is an OWL entity (class, datatype property, object property) of the ontology ∑ is a set of mapping rules: α ij T j ← S i
8 oMAP: Overall Strategy A three step process: Form possible ∑ sets and estimate its quality based on the quality measures for its mapping rules For each mapping rule T j ← S i, estimate its confidence α ij which also depends on the ∑ it belongs to Use heuristics to build iteratively the final set of mappings
9 oMAP: Combining Classifiers Weight of a mapping rule: α ij = w (S i,T j, ∑ ) Using different classifiers: w (S i,T j, CL k ) is the classifier's approximation of the rule T j ← S i Combining the approximations: Use of a priority list: CL 1 CL 2 … CL n
10 Terminological Classifiers Same entity names (or URI) Same entity name stems
11 Terminological Classifiers String distance name WordNet distance name lcs is the longest common substring between S i and T j sim =
12 Machine Learning-Based Classifiers Collecting individuals: label for the named individuals data value for the datatype properties type for the anonymous individuals and the range of object properties Recursion on the OWL definition: depth parameter
13 Machine Learning-Based Classifiers Example Individual (x 1 type (Conference) value (label "Int. Conf. on WISE") value (location x 2 ) ) Individual (x 2 type (Address) value (city "New York city") value (country "USA") ) u1 = ("Int. Conf. on WISE", "Address") u2 = ("Address", "New York City", "USA") Naïve Bayes text classifier
14 Structural and Semantics- Based Classifier If S i and T j are property names: If S i and T j are concept names 1 : 1 Where D = D(S i ) * D(T j ) ; D(S i ) represents the set of concepts directly parent of S i
15 Structural and Semantics- Based Classifier Let C S =(QR.C) and D T =(Q’R’.D), then 1 : Let C S =(op C 1 …C m ) and D T =(op’ D 1 …D m ), then 2 : 1 Where Q,Q’ are quantifiers, R,R’ are property names and C,D concept expressions 2 Where op, op’ are concept constructors and n,m ≥ 1
16 Structural and Semantics- Based Classifier Possible values for w op and w Q weights w op w Q ⊓⊔ ¬ ⊓ 11/40 ⊔ 10 ¬1 1 1 n n m m 11/3 m 1
17 Evaluation More and more techniques / tools for aligning ontologies difficult to compare all the approaches theoretically pragmatism: evaluation campaign and contest I3CON : based on the NIST Text Retrieval Conference model EON : systematic benchmark tests on all OWL constructs OAEI : Alignment API [Euzenat, ISWC 2004] common format for representing / exchanging the alignments found tools and metrics for evaluating these alignments
18 3 series of tests on bibliographic ontologies: simple tests: identity, specialization/generalization of the language systematic tests: some features of the initial ontology are progressively discarded complex tests: aligning 4 real ontologies available on the Web The directory real world case consists of aligning web sites directory using the large dataset
19
20 Conclusion oMAP: a formal framework for aligning automatically OWL ontologies Combining several specific classifiers terminological classifiers machine learning-based classifiers structural and semantics-based classifier Empirical evaluation on benchmark tests using traditional information retrieval metrics machine resources, memory, computation time… not yet considered
21 Future Work Alignment: Using additional classifiers: kNN, KL-distance, WordNet or other terminological resources straightforward theoretically but practically difficult Finding complex alignment name = firstName + lastName Distributed Information Retrieval Automated relevant resource selection
22 Useful Links oMAP: Tutorial: Schema and Ontology ESWC Alignment API: OAEI: State of the Art: P. Shvaiko and J. Euzenat: A Survey of Shema-based Matching Approaches. Journal on Data Semantics (JoDS), 2005 KW Consortium: State of the Art on Ontology Alignment. Knowledge Web D2.2.3, 2004