1 Berendt: Advanced databases, first semester 2008, 1 Advanced databases – Defining and combining heterogeneous databases: Ontology matching Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science Last update: 28 October 2008
2 Berendt: Advanced databases, first semester 2008, 2 Agenda DBs & ontologies: What‘s new in ontology matching? A classification of schema-based ontology matching Example OLA Example Semantic Integration Through Invariants Evaluating matching Involving the user: Explanations
3 Berendt: Advanced databases, first semester 2008, 3 Recap: The match problem for relational databases Given two schemas S1 and S2, find a mapping between elements of S1 and S2 that correspond semantically to each other
4 Berendt: Advanced databases, first semester 2008, 4 Ocean Lake BodyOfWater River Stream Sea NaturallyOccurringWaterSource Recap: The water ontology (and how it would give rise to a match problem) Tributary Brook Rivulet Properties: feedsFrom: River Properties: emptiesInto: BodyOfWater (Functional) (Inverse Functional) (Inverse) Properties: containedIn: BodyOfWater (Transitive) Properties: connectsTo: NaturallyOccurringWaterSource (Symmetric) Ex.: How would this map to the taxonomy: WaterEntity River LakeOrPond OceanOrSea (and their properties)? Or to non-English ontologies?
5 Berendt: Advanced databases, first semester 2008, 5 A real example: Mice and humans The anatomy real world case is about matching the Adult Mouse Anatomy (2744 classes) and the NCI Thesaurus (3304 classes) describing the human anatomy. ( )
6 Berendt: Advanced databases, first semester 2008, 6 Another real example: Web-scale, cross-lingual thesauri and directories This real world test case requires matching of very large resources (vlcr) available on the web, viz. DBPedia, WordNet and the Dutch audiovisual archive (GTAA), DBPedia is multilingual and GTAA is in Dutch. n GTAA: terms: ~3800 Subject keywords, ~ "Person"s, ~ "Names", ~ Locations, 113 Genres, ~ Makers n DBPedia: 2.18 million resources or "things", each tied to an article in the English language Wikipedia n Wordnet: unique synsets, word-sense pairs (
7 Berendt: Advanced databases, first semester 2008, 7 Motivation: Further applications n All of the applications of schema matching n Specific (Semantic-) Web-related applications: l Agent communication l Web Services integration
8 Berendt: Advanced databases, first semester 2008, 8 Similarities between (database schemas and ontologies) or (databases and knowledge bases) n Same problems: Match conceptual schemata, map instances n Much overlap in expressivity, including l objects, l properties, l aggregation, l generalization, l set-valued properties, l constraints n Structural similarities: l Matching structure of relational tables matching class hierarchies Can re-use schema matching methods from DB literature
9 Berendt: Advanced databases, first semester 2008, 9 Database (schema)s and ontologies: differences n Databases are often created to structure a given set of data, whereas ontologies are created to describe the common structure of a domain (independent of data and applications) Schema-based (as opposed to instance-based) matching more common. n Ontologies often have richer expressivity This can be exploited by matching algorithms. n Ontologies are designed to be shared and extended An interesting type of re-use matching: re-use upper ontology Characteristic #1 Characteristic #2 Charac- teristic #3
10 Berendt: Advanced databases, first semester 2008, 10 Note: Further differences and their effects (with different effects on the matching problem; not treated in this lecture) n Databases are often created to structure a given set of data, whereas ontologies are created to describe the common structure of a domain (independent of data and applications) Different primary roles of constraints l In DB: integrity constraints ensure integrity of the data (= instances) l In ontologies: express meaning, ensure consistency (either of the ontology or of the instances) n Different foci of the processing engines: l SQL engines: answer queries, reason with views, ensure data integrity; l Inference engines: derive new information via automated inference; taxonomic reasoning is key n Database schemas often do not provide explicit semantics for their data (lost after design, not part of the DB spec.) Ontologies are logical systems that obey formal semantics, e.g., we can interpret ontology definitions as a set of logical axioms
11 Berendt: Advanced databases, first semester 2008, 11 Agenda DBs & ontologies: What‘s new in ontology matching? A classification of schema-based ontology matching Example OLA Example Semantic Integration Through Invariants Evaluating matching Involving the user: Explanations
12 Berendt: Advanced databases, first semester 2008, 12 The match operator (1): Basics n Match operator: f(o,o‘) = alignment between o and o‘ l for schemas/ontologies o, o‘ n Alignment l a set of mapping elements n Mapping elements l elements of o, elements of o‘, relation & some further info:
13 Berendt: Advanced databases, first semester 2008, 13 The match operator (2): Potential extra inputs n r: external resources (thesauri,...) n p: parameters (weights, thresholds,...) n An input alignment A to be completed by the process
14 Berendt: Advanced databases, first semester 2008, 14 Recap: Rahm & Bernstein‘s classification of schema matching approaches
15 Berendt: Advanced databases, first semester 2008, 15 The methods that are important when the schema is in the foreground (OM characteristic #1)
16 Berendt: Advanced databases, first semester 2008, 16 The extension by Shvaiko & Euzenat (2005) [Partial view]
17 Berendt: Advanced databases, first semester 2008, 17 A classification of approaches Schema Matching lecture today (characteristic #2) today: explanations
18 Berendt: Advanced databases, first semester 2008, 18 Agenda DBs & ontologies: What‘s new in ontology matching? A classification of schema-based ontology matching Example OLA Example Semantic Integration Through Invariants Evaluating matching Involving the user: Explanations
19 Berendt: Advanced databases, first semester 2008, 19 Basic ideas n In principle, a fully automatic approach; user can – at most – adapt parameters n input: two ontologies in OWL-Lite n Use a dedicated typed graph representation of the language that concentrates the necessary information for computing the similarity between OWL entities n a similarity measure that encompasses all OWL-Lite features n Basic idea: similarity between two elements depends on their pairwise similarity and that of all adjacent elements n Use the computed measure for generating an alignment n Special provisions for key problems (collection comparison, circularities)
20 Berendt: Advanced databases, first semester 2008, 20 OL graphs This ontology (expressed in UML) becomes this OL graph:
21 Berendt: Advanced databases, first semester 2008, 21 OL graphs n categories of nodes: l class (C) l object (O) l relation (R) l property (P) l property instance (A) l datatype (D) l datavalue (V) l property restriction labels (L) n edges express relationships: l rdfs:subClassOf between two classes or two properties (S) l rdf:type (I) between objects and classes, property instances and properties, values and datatypes l A between classes and properties, objects and property instances l owl:Restriction (R) expressing the restriction on a property in a class l valuation (U) of a property in an individual
22 Berendt: Advanced databases, first semester 2008, 22 The similarity measure for a pair of nodes n Anchor pair and contributors define similarity : n x, x‘: nodes n N(x): the set of all relationships in which x participates n Sim is the similarity [0,1] n Weights are normalized to sum to 1 n F(x) = {x; y; F} E
23 Berendt: Advanced databases, first semester 2008, 23 The set of similarities assigns a URI reference to each node from C O R P D A
24 Berendt: Advanced databases, first semester 2008, 24 Example (partial view) When thresholding by 0.5 minimal similarity, only Human – Person is returned
25 Berendt: Advanced databases, first semester 2008, 25 Agenda DBs & ontologies: What‘s new in ontology matching? A classification of schema-based ontology matching Example OLA Example Semantic Integration Through Invariants Evaluating matching Involving the user: Explanations
26 Berendt: Advanced databases, first semester 2008, 26 Basic idea n A methodology for responding to Characteristic #3: l Different application ontologies are not created independently of each other, but as extensions of a core ontology n In principle, a highly interactive (semi-automated) method: users are asked to describe the axiomatic properties of their ontologies
27 Berendt: Advanced databases, first semester 2008, 27 Example: PSL core – axiomatizing a set of intuitive semantic primitives for describing the fundamental concepts of manufacturing processes Primitive Lexicon: n Relations: l (object ?x) l (activity ?a) l (activity_occurrence ?occ) l (timepoint ?t) l (before ?t1 ?t2) l (occurrence_of ?occ ?a) l (participates_in ?x ?occ ?t) n Functions: l (beginof ?occ) l (endof ?occ) n... Axiom 1 The before relation only holds between timepoints. (forall (?t1 ?t2) (if(before ?t1 ?t2) (and (timepoint ?t1) (timepoint ?t2)))) Axiom 2 The before relation is a total ordering. (forall (?t1 ?t2) (if(and (timepoint ?t1) (timepoint ?t2)) (or (= ?t1 ?t2) (before ?t1 ?t2) (before ?t2 ?t1))))... Process Specification Language PSL: Core
28 Berendt: Advanced databases, first semester 2008, 28 Basic idea: ontologies extend PSL
29 Berendt: Advanced databases, first semester 2008, 29 Definitional extensions n Preserving semantics is equivalent to preserving models of the axioms. l preserving models = isomorphism n classify models by using invariants (properties of models that are preserved by isomorphism). l automorphism groups, endomorphism semigroups n Classes of activities and objects are specified using these invariants.
30 Berendt: Advanced databases, first semester 2008, 30 Example: Markovian activities = activities whose preconditions depend only on the state prior to the occurrences Defined by class (3) : NB: class (4): there are additional nonmarkovian constraints on the legal occurrences of the activity
31 Berendt: Advanced databases, first semester 2008, 31 Twenty questions Application ontology designers are asked questions about their classes (e.g., myclass) in order to map these classes to PSL, e.g.: Answer 1 translation definition Answers 1 and 2 translation definition
32 Berendt: Advanced databases, first semester 2008, 32 Ontology mappings n Ontology designers of different extensions of PSL are asked these questions. n Classes from different extensions of PSL can be mapped to each other if they preserve the same invariants.
33 Berendt: Advanced databases, first semester 2008, 33 Agenda DBs & ontologies: What‘s new in ontology matching? A classification of schema-based ontology matching Example OLA Example Semantic Integration Through Invariants Evaluating matching Involving the user: Explanations
34 Berendt: Advanced databases, first semester 2008, 34 The Ontology Alignment Evaluation Initiative The 2008 competitors were also tested on the 2007 tests; results were not significantly different (see
35 Berendt: Advanced databases, first semester 2008, 35 (How did OLA score?) Note: OLA did not participate in 2008
36 Berendt: Advanced databases, first semester 2008, 36 Agenda DBs & ontologies: What‘s new in ontology matching? A classification of schema-based ontology matching Example OLA Example Semantic Integration Through Invariants Evaluating matching Involving the user: Explanations
37 Berendt: Advanced databases, first semester 2008, 37 Example ontologies To be matched by S-Match with the help of (among others) WordNet
38 Berendt: Advanced databases, first semester 2008, 38 Explanations – level 1: explanations in English
39 Berendt: Advanced databases, first semester 2008, 39 Explanations – level 2: source metadata information
40 Berendt: Advanced databases, first semester 2008, 40 Miscellaneous
41 Berendt: Advanced databases, first semester 2008, 41 Large ontologies in the life sciences (examples)
42 Berendt: Advanced databases, first semester 2008, 42 Older brother Younger brother Older sister Younger sister Different languages have different (lexicalized) concept boundaries (some more examples after the rivers...)
43 Berendt: Advanced databases, first semester 2008, 43 Next lecture DBs & ontologies: What‘s new in ontology matching? A classification of schema-based ontology matching Example OLA Example Semantic Integration Through Invariants Evaluating matching Involving the user: Explanations Finding implicit knowledge (I): From Deductive Databases to Knowledge Discovery in Databases
44 Berendt: Advanced databases, first semester 2008, 44 References / background reading; acknowledgements n Overviews: P. Shvaiko, J. Euzenat: A Survey of Schema-based Matching Approaches. Journal on Data Semantics, M. Uschold and M. Grüninger. Ontologies and semantics for seamless connectivity. SIGMOD Record, 33(3), N. Noy: Semantic Integration: A Survey of Ontology-based Approaches. SIGMOD Record, 33(3), n OLA: J. Euzenat, P. Valtchev Similarity-based ontology alignment in OWL-lite. In Proceedings of ECAI, ftp://ftp.inrialpes.fr/pub/exmo/publications/euzenat2004c.pdf ftp://ftp.inrialpes.fr/pub/exmo/publications/euzenat2004c.pdf n Invariants: M. Grüninger and J. Kopena. Semantic integration through invariants. In A. Doan, A. Halevy, and N. Noy, editors, Workshop on Semantic Integration at ISWC-2003, Sanibel Island, FL, n Explanations in S-Match: Pavel Shvaiko, Fausto Giunchiglia, Paulo Pinheiro da Silva & Deborah L. McGuinness (2005). Web Explanations for Semantic Heterogeneity Discovery. In The Semantic Web: Research and Applications. Springer: LNCS
45 Berendt: Advanced databases, first semester 2008, 45 Acknowledgements n pp taken from: Michael Gruninger (undated). Using Model-Theoretic Invariants for Semantic Integration. ance_Metrics/PerMIS_2004/Proceedings/Gruninger.pdf n p. 41 taken from: Toralf Kirsten, Andreas Thor, & Erhard Rahm (2007). Instance-based matching of large life science ontologies. In Data Integration in the Life Sciences (pp ). Springer: LNCS n p. 42 taken from: P. Koch (2005). Vorlesung: Probleme der romanischen Wortbildung. tuebingen.de/peter.koch/Semester/Sose05/Handout%201%20Lexikologi e%20Semantik.pdf