Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.

Similar presentations


Presentation on theme: "Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence."— Presentation transcript:

1 Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence University of Illinois at Chicago University of British Columbia Okanagan University of Illinois at Chicago University of British Columbia Okanagan ODBASE 2006, Montpellier, France

2 Page 2 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Talk Overview  Introduction  Background  Model and Mapping representation systems  Proposed Mapping Representation System  Invert and Compose operator definitions and properties  Mappings Composition  Experiment  Estimate the quality of the proposed system

3 Page 3 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships  Models  denote a representation of a domain in a formal language (e.g., EER, Relational, Description Logic)  has two components [Russell et al 2003]  terminological (or metadata)  This is the focus of this work and talk.  extensional (i.e. facts or instances)  Mappings  describe how two models are related to each other Introduction - Terminology

4 Page 4 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships  Ways to define mappings between models  binary relationships morphismsinter-schema correspondences  called morphisms [Melnik et al 2003] or inter-schema correspondences [Popa et al. 2002]  mapping using a helper model  [Bernstein et al. 2003]  mapping as queries  [Madhavan et al. 2003, Berstein et al. 2006]  Our work falls in the class of the first two types of mappings. metadata level mappings.  We call them metadata level mappings.  They are not concerned with the instances of a model. Introduction - Mappings

5 Page 5 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships  Examples of models  diagrams, interface definitions, database schemas, web site layouts, control flow, XML schemas  Applications of mappings  mapping between XML schemas to drive message translation;  schema and database integration;  mapping between ontologies to help in the process of merging and alignment Introduction - Examples

6 Page 6 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships  The creation will be rarely completely automated.  General strategy is to semi-automatically build mappings matchings  use heuristics to generate matchings (e.g. name similarity)  [Rahm and Bernstein 2001, Shvaiko and Euzenat 2005] (surveys)  translate matches into formulas  E.g., Clio project [Popa et al. 2002]  generate new mappings from existing mappings  Composition  E.g, [Madhavan et al. 2003, Berstein et al. 2006]  Invert  E.g, [Fagin 2006]  Semi-automatic tools can significantly speed up the process. Background - Mapping creation

7 Page 7 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Background - Morphisms  Mapping:  is just a set of binary relations between the elements of two models  is a set of pairs  Advantages/Disadvantages  their expressiveness is enough for certain classes of problems and they exhibit certain mathematical properties [Melnik et al. 2003]  main drawback  assumes similarity to be transitive

8 Page 8 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Background – Morphisms problems  Composition  ○ =  due to transitivity assumption  Problems with this technique  Whenever m:1 correspondence is composed with a 1:n correspondence, the composition result is a cross-product; many being false positives.  It may miss or suggest false relationships.  Legend:  Blue  correct  Red  false positive or missed

9 Page 9 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Background – Mapping with helper models  Algorithm (for right compose) [Bernstein et al 2002]  copy the right hand side mapping  for each mapping element, m, on the right, i.e. in map2  compute its Input(m)  for each mapping element, m, on the right, i.e. in map2  set its domain to the union of the domains of Input(m)  Example:

10 Page 10 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Background – Mapping with helper models  Composition result  Problems with this technique  It may miss or suggest false relationships.  Legend:  Red  missed relationships

11 Page 11 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships The Objectives driving motivation  The driving motivation  The need for a mapping definition subsuming the relationship kinds that the state of the art matching algorithms discover with high precision.  Investigate to what extent a set of operations over this mapping definition can be defined. a mapping representation  Provide a mapping representation at the metadata level combining the advantages of morphisms and mappings with helper models.  The former has good mathematical properties.  The latter is more expressive. a compose algorithm  Provide a compose algorithm that exploits the semantic relationships within the mapping expression to produce correct semantic relationships whenever these can be determined automatically and to isolate those instances that require human intervention.

12 Page 12 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Proposed Mappings Representation  Model  A model has similar expressiveness as an EER model and is consistent with the definition of model used in previous work on model management. [Bernstein et al 2002, Pottinger and Bernstein 2003]  Mapping Representation directed, kinded binary relationship  A mapping consists of a set of mapping elements, each mapping element is a directed, kinded binary relationship between a pair of elements not in the same model:  Triplets of form, type = {IsA, AKindOf, HasA, PartOf, =, Contains, ContainedBy, Unknown, Complex}  Comments  Some of these types were introduced in other works.  E.g, [Euzenat 2004, Giunchiglia et al. 2004, Pottinger and Bernstein 2003, Xu and Embley 2003, Wu et al. 2004]

13 Page 13 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Proposed Mappings Representation  An example Unknown Complex  Most of the relationship kinds in the mapping representation are well- known except for Unknown and Complex , means that the relationship between concept a and b is not precisely known. , the relationship between concept a and b may require a functional specification: a = f(b)  e.g., Price = PriceVat(VAT + 1)

14 Page 14 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Operators - Invert  Each of the relationship types introduced have well defined inversion properties:  IsA inverted is AKindOf, HasA inverted is PartOf, Contains inverted is ContainedBy invert for mapping elements  Definition [ invert for mapping elements ]:  Consider m =. Then its corresponding inverted mapping element, denoted m -1, is given by the following expression:  Mathematical form: -1 =  E.g. -1 = invert for mappings  Definition [ invert for mappings ]:  Given two models A and B and a mapping, map, between them, the invert of map denoted by map -1, is defined from B to A and its expression is given by: map -1 = { | map}

15 Page 15 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Operators - Compose  Composing two mappings involves defining a composition operation between the elements of the mappings (i.e. between triplets of form )  Example  ○ =

16 Page 16 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Compose Properties  Remarks:  The result of composing two mappings where mapping elements are expressed as triplets is closed.  Mapping composition is symmetric in this framework: ( ○ ) -1 = ○ does not produce false correspondences  The result of composing two mappings does not produce false correspondences between the elements of the two models, i.e. it does not suggest false directed, kinded relationships.  The Compose operator uses the Unknown relationship to indicate when it is not possible (in general) to suggest a relationship type given only the information expressed in the two mappings.

17 Page 17 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Experiment - Setup  Experiment goal:  Show that the composition framework is robust when applied to real world application and that we are able to correctly identify problematic cases.  We compare it against mappings as morphisms.  Five real-world XML schemas in the purchase order domain: CIDR, Excel, Noris, Paragon, and Apertum from www.biztalk.orgwww.biztalk.org  They were used in other projects:  [Dragut and Lawrence 2004, Madhavan et al. 2001]  And a reference ontology  to which each XML schema is manually mapped both using morphisms [Dragut and Lawrence 2004] and using the new mapping definition.

18 Page 18 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Experiment - Setup  Example of XML schemas:  XML Excel and CIDR schemas

19 Page 19 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Experiment - Intermediary model  Comments:  The intermediary model does not have all concepts in the schemas (e.g. unitOfMeasure, count, and VAT).  The intermediary model is structurally different from the five schemas considered and it is defined using OWL.

20 Page 20 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Experiment - Methodology  Step 1: map the five schemas to the intermediary model:  First, using morphisms  Second, using the proposed mapping  Step 2: apply the compose operators to compute direct mappings between the schemas  First, employing composition over morphisms  Second, using the new compose operator  Step 3: measure the quality of the two compositions in terms of Precision, Recall, and Overall User Effort.  A new metric is introduced User Effort.

21 Page 21 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Experiment - Stats  Overall after composition was computed  CIDR, Excel, Noris, Paragon, and Apertum are assigned numbers 1, 2, 3, 4, and 5 respectively.

22 Page 22 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships  User effort  User effort is the % of mappings that must be validated by a user.  For morphisms, user effort is 100% as there is no way to distinguish true over false relationships. average  In our framework, it is the ratio of the number of Unknown relationships to the number of all produced relationships. On average it is only 19%. Experiment - Stats

23 Page 23 E. Dragut and R. Lawrence - Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships End Thank you for your time and patience!


Download ppt "Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence."

Similar presentations


Ads by Google