Page 1 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Composing Mappings between Schemas using a Reference Ontology Eduard Dragut, Ramon Lawrence Iowa Database and Emerging Applications (IDEA) Laboratory University of Iowa {eduard-dragut, Eduard Dragut, Ramon Lawrence Iowa Database and Emerging Applications (IDEA) Laboratory University of Iowa {eduard-dragut,
Page 2 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Outline è Motivation è Integration Approach è Background è Architecture Overview è Ontological Matching è Composing Mappings è Global View Construction è Experimental Results è Future Work and Conclusions
Page 3 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Motivation è Many organizations have pre-existing ontologies that are not suitable as global views but are suitable as reference ontologies to aid integration. u Example: National Cancer Institute (NCI) and National Insitutes of Health (NIH) have caBIG grid prototype which standardizes terminology (EVS, caDSR) and data elements in cancer domain. è Schema-to-ontology matching requires integrators understand only their schema instead of all schemas that they may want to integrate.
Integration Approach NCBI Database Schema-to- ontology mapping Reference Ontology Schema matching Expression Database Schema matching Schema-to- ontology mapping Compose & Merge Global View User Queries Page 4 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Page 5 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Background: Ontologies and Integration è Ontologies as the integrated, global view u Carnot project (Collet91) with Cyc ontology (Lenat90) u ONTOBROKER (Decker98), OBSERVER (Mena00) è Tools for semi-automatically merging ontologies u PROMPT (Noy00), Ontobuilder (Gal04) è Use ontologies as matching/integration aids u MOMIS (Beneventano03) using WordNet u Indirect (Xu03), CUPID (Madhavan01), COMA (Do02) è Matching ontologies (Doan02) è “Discovering” ontologies (Madhavan03) u Corpus-based matching
Page 6 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Background: Model Management è Model management as proposed by (Bernstein03) is intended to allow high-level schema operations. u Operators include: Invert, Compose, Match, Merge. u Warning: Semantics of all operators are not yet fully defined and some of them are not completely automatic. è Definitions: u A match is a semantic correspondence between schema elements. u A mapping between schema elements is an expression that relates the elements. u Note that most schema matching systems such as COMA produce matches not mappings.
Page 7 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Architecture Overview è We assume the existence of a pre-existing reference ontology that has been “accepted” in a domain. u The ontology is NOT a global view and may not cover the information in all schemas. It cannot be edited. è Global view construction is a 3-step process: u 1) Independently match each schema to the ontology. u 2) Compose schema-to-ontology matches to produce schema-to-schema mappings. u 3) Merge the schema mappings to produce the global view. è The challenge is to automate this as much as possible.
Page 8 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Benefits of Approach è Even with manual integration there are several benefits to using a reference ontology: u 1) An integrator must only understand their schema and the ontology and not other schemas to be integrated. u 2) Most validation is performed once during schema-to- ontology matching and not for every schema integrated. u 3) Schema-to-ontology matchings can be re-used every time a new schema is integrated into the federation. è Automation can: u 1) Help construct schema-to-ontology matchings. u 2) Perform composition of mappings. u 3) Build a global view from the composed mappings.
Page 9 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Automation Challenges è There are several challenges in automating this process: u 1) Schema matching systems such as COMA are designed for simpler relational schemas. Ontologies must be mapped into a suitable format for use with COMA. u 2) Schema-to-ontology matching is less accurate due to more complicated ontological structure and because the ontology may not model the entire domain or may model it differently. u 3) Composing matchings often results in many false matches which must be handled. u 4) A method for merging schemas using model management primitive operators is required. ï**Even with these operators, Merge is not fully automatic.
Page 10 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Background: COMA è COMA (Do02) is a schema matching system that can flexibly combine different match algorithms and re-use match results. u Match algorithms use names, paths, and schema properties in various ways. è The mapping format between two schemas R and S is a triple (r,s,v) where r in R, s in S, and v is the similarity value in [0..1] between elements r and s. è A schema in COMA is represented as a rooted directed acyclic graph. Schema elements are nodes which may be connected by links of different types.
Page 11 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Ontological Matching è The first step is to convert ontologies in OWL/DAML format into COMA’s graph representation format. u Wrote a program that used the JENA parser. è During the conversion: u 1) Explicitly converted a named relationship in the ontology into a node and several edges in graph. u 2) Explicitly encoded attributes inherited over IS-A links since COMA does not support IS-A. è After conversion, COMA would automatically produce a schema-to-ontology match as it would appear to be matching two relational schemas.
Page 12 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Converting Ontology to a Graph Converting Named RelationshipsMaking IS-A Explicit * Also create a single root POOntology as required by COMA.
Page 13 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Ontological Matching: Max versus noMax è One challenge is what should this match look like? è Two choices: 1) Max - For each schema element, keep the best match with the ontology (if any). 2) NoMax - For each schema element, keep all the matches that are above the cutoff threshold. Since Max only generates one match, it is probably the best in semi-automated settings. NoMax will generate many matches which must be filtered out by the user or during composition.
Page 14 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Composing Mappings è Schema-to-ontology mappings must be composed to produce direct schema-to-schema mappings. è Since mappings carry no semantics, two objects are assumed to be identical if they map to the same ontological concept. Composition is performed transitively and is implemented using a natural join. u That is, if element r is similar to o and o is similar to s, then we assume that r is similar to s. è For example: u and can be composed to yield. u The similarity values may be combined using various functions, although average is the most common.
Page 15 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Composition Example S1 Contact CompanyName Name Position S2 Contact FirstName LastName Position O contact Person FirstName LastName Organization name Compose S1 Contact CompanyName Name Position S2 Contact FirstName LastName Position
Page 16 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Global View Construction è One of the possible applications of constructing schema-to-schema mappings in this way is using them to build a global view. è We have given a script in the paper that uses model management operators to compose any number of schema-to-ontology mappings into a single global view for all sources. è Note that this algorithm is not perfect nor fully automatic as the mappings are not perfect and the Merge operator may require human intervention.
Page 17 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Global View Construction Example
Page 18 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experimental Setup è Matched the 5 sample order schemas: CIDR, Excel, Noris, Paragon, and Apertum used to evaluate COMA. Numbered these schemas 1, 2, 3, 4, and 5. è Created a reference ontology that models some of the domain (but not all of it) and is quite different than the schemas (uses IS-A for example). è Used the matchings specified with COMA as ground- truth. è Evaluation metrics: u Precision - # of correct matches/# of suggested matches u Recall - # of correct matches returned/# total matches u Overall = Recall * (2 - 1 / Precision)
Page 19 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Reference Order Ontology
Page 20 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experiment #1: Schema-to-Ontology Matching è Goal: Evaluate the accuracy of schema-to-ontology matching. è Method: u Automatically convert ontology into COMA format and match each schema with ontology. è Evaluation: u Measured the percent overlap of the schema and ontology. For many schemas, only 60% of their concepts were in the ontology. u Evaluated the precision, recall, and overall measures relative to the number of matches that could be found. ïE.g. If overlap was 60% and recall was 50%, then only 30% of all schema elements were matched BUT of all the possible matches, 50% were found.
Page 21 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experiment #1: Results * noMax is poor for schema 5 as Buyer incorrectly matched to ontology.
Page 22 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experiment #2: Schema-to-Schema Mappings è Goal: Determine the accuracy of producing schema- to-schema mappings by composing schema-to- ontology matchings. è Method: u Used automatically generated schema-to-ontology matchings and composed them. Evaluated composition result against COMA answers for direct matching. Evaluated noMax and Max techniques and manual mappings.
Page 23 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experiment #2: Results (Overall) * 1 2 is poor because of Street mapping. * 4 5 is poor because of Buyer mapping.
Page 24 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experiment #3: Improving Direct Matches è Goal: Determine if the accuracy of producing direct schema-to-schema mappings can be improved by re-using schema-to-ontology matches. è Method: u Generate schema-to-schema mappings by composing schema-to-ontology matchings and then use this as past matching information for COMA. u Allow COMA to perform direct match given this information. Evaluated noMax and Max techniques and manual mappings.
Page 25 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experiment #3: Results (Overall) * 1 2 is poor because of Street mapping.
Page 26 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Discussion and Conclusions è Major findings: u 1) Schema-to-ontology mappings can be constructed with good accuracy (70-80% precision, 60% recall). u 2) The composition of schema-to-ontology matchings produces similar results to direct matching with COMA. 3) Max has higher precision than noMax but with lower recall. Max is probably best when the user must filter incorrect matches and always saves work. u 4) It is valuable to re-use schema-to-ontology matchings (either automatic or manually constructed) to improve the accuracy of direct matchings. è Major conclusion: There is a benefit to building semi- automatic schema-to-ontology matchings for use in integration and global view construction.
Page 27 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Future Work and Challenges è The major challenge is that the mappings carry no semantics which often results in incorrect matches suggested after composition. u We are currently working on extending the mappings to capture semantics to avoid many of these cases. è The approach is not fully automatic (nor will it ever be). However, most manual work is in the schema- to-ontology matching stage. We need better algorithms and tools to support this matching. è Want to perform experimental evaluation on larger ontologies such as those from NCI. u Issue: Many ontologies are not in suitable form for intermediate mapping with schemas. (just taxonomies)
Page 28 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Composing Mappings between Schemas using a Reference Ontology Eduard Dragut, Ramon Lawrence Iowa Database and Emerging Applicatons (IDEA) Laboratory University of Iowa {eduard-dragut, Eduard Dragut, Ramon Lawrence Iowa Database and Emerging Applicatons (IDEA) Laboratory University of Iowa {eduard-dragut,
Page 29 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Extra Slides Extra Slides...
Page 30 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Ontology Conversion Algorithm è 1) Each ontology concept (class) becomes a node in the graph. è 2) For each property (attribute) of a class, add a node to the graph and connect it to its class. è 3) Non-basetype properties (those with domain and range in ontology) are converted by: u 3a) Creating a node in the graph for the relationship. u 3b) Adding an edge from the class domain to this node. u 3c) Adding an edge from the new node to the range class. u Note: Do not currently support properties that have a domain or range that is union/intersection of concepts. è 4) IS-A expanded by graph traversal.
Page 31 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Mapping Composition Challenges Composing N:1 match with 1:N match results in a cross-product Cannot handle these cases as mappings have no semantics.
Page 32 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Global View Construction Script Operator GlobalView(ArraySchemas, ArrayMappings, O, n) // ArraySchemas stores the n schemas // ArrayMappings stores the n schema-to-ontology mappings 1. If n <= 0 Then Return empty schema; 2. If n == 1 Then Return ArraySchemas[0]; 3. S1 = ArraySchemas[0]; 4. S2 = ArraySchemas[1]; 5. map1 = ArrayMappings[0]; 6. map2 = ArrayMappings[1]; 7. = GlobalView2(S1, S2, map1, map2, O); 8. For (i=2; i <= n-1; i++) 9.S1 = S; 10.map1 = map; 11.S2 = ArraySchemas[i]; 12.map2 = ArrayMappings[i]; 13. = GlobalView2(S1, S2, map1, map2, O); 14. end for; 15. Return ; Computes Global View of N Source Schemas (with ontology mappings)
Page 33 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Global View Construction Script (2) Operator GlobalView2(S1, S2, O, S1_O, S2_O) 1. S1_S2 = S1_O * Invert(S2_O) 2. = Merge(S1, S2, S1_S2); 3. M_O = Invert(S1_M) * S1_O + Invert(S2_M) * S2_O; 4. Return ; Computes Global View of Two Source Schemas (with ontology mappings)
Page 34 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Sample Order Schema Excel XML Schema <Schema name="PurchaseOrder.biz" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes">
Page 35 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Sample Order Schema Excel XML Schema (2)
Page 36 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Sample Order Schema Excel XML Schema (3)
Page 37 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Sample Order Schema Excel XML Schema (4)
Page 38 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experiment #2: Precision
Page 39 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experiment #2: Recall
Page 40 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experiment #3: Results (Precision)
Page 41 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Experiment #3: Results (Recall)