Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching.

Similar presentations


Presentation on theme: "Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching."— Presentation transcript:

1 Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching Philip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster COMA - A System for Flexible Combination of Schema Matching Approaches Hongai-Hai Do, Erhard Rahm

2 Mar 27, 2008 Christiano Santiago2 Goals Introductory concepts on Schema Matching Context-Sensitive versus Context-Insensitive Complexity on XSD schemas

3 Mar 27, 2008 Christiano Santiago3 Agenda Terminology Different Approaches XML Schema Definition Context-Insensitive Context-Sensitive Q&A

4 Mar 27, 2008 Christiano Santiago4 Terminology Schema matching: it is the process of identifying that two objects are semantically related.semantically Mapping: it refers to the transformations between the objects.transformations Meaning Conversion

5 Mar 27, 2008 Christiano Santiago5 Terminology Student.Name ≈ GradStudent.Name Student.SSN ≈ GradStudent.ID Student.Marks ≈ GradStudents.Grades Student Name, SSN, Level, Major, Marks GradStudent Name, ID, Major, Grades Match Transformation

6 Mar 27, 2008 Christiano Santiago6 Schema Matching

7 Mar 27, 2008 Christiano Santiago7 Context Context-insensitiveContext-sensitive

8 Mar 27, 2008 Christiano Santiago8 Different Approaches Schema-level matchers Instance-level matchers Hybrid matchers Reusing matching information

9 Mar 27, 2008 Christiano Santiago9 Schema-Level Matchers Only consider schema information Name Description Data type Relationship Constraints Number of nesting levels

10 Mar 27, 2008 Christiano Santiago10 Instance-Level Matchers Use instance-level to gather insight into the content and meaning of schema elements Linguistic Dept DeptName EmpName Constraints 416-7362100 M3J1P3

11 Mar 27, 2008 Christiano Santiago11 Hybrid-Level Matchers Combines more than one approach

12 Mar 27, 2008 Christiano Santiago12 Reusing Matching Information Use previous matching information for future matching tasks Structures or substructures often repeat Caution Salary & Income  Payroll  Tax Reporting

13 Mar 27, 2008 Christiano Santiago13 XML Schema Definition (XSD) Data types 19 built-in primitive data types 25 built-in derived data types User defined complex types

14 Mar 27, 2008 Christiano Santiago14 Complex type definition: Don Smith Dallas, TX XML Schema Definition (XSD) Attribute Child Elements

15 Mar 27, 2008 Christiano Santiago15 XML Schema Definition (XSD) Shared schema components

16 Mar 27, 2008 Christiano Santiago16 XML Schema Definition (XSD) Match Systems approaches COMA: path-based Cupid: materialized Scalability issue: XCBL Order schema contains 1451 components, including 91 shared types. After resolving the shared components, 26000+ nodes/paths were identified.

17 Mar 27, 2008 Christiano Santiago17 XML Schema Definition (XSD) Distributed schemas XSD allows a schema to be distributed over several schema documents (.xsd files) and namespaces

18 Mar 27, 2008 Christiano Santiago18 XML Schema Definition (XSD) Determining similarity between and matching complex types can be as difficult as matching two complete schemas.

19 Mar 27, 2008 Christiano Santiago19 Standard Schema Matching Context-Insensitive Matchers Matching algorithms to compute similarity scores between a pair of attributes Weights Scores are weighted Confidence scores are identified based on standard statistical techniques Selection of best matches

20 Mar 27, 2008 Christiano Santiago20 Fragmented-Based Schema Matching Context-Insensitive Fragment identification Identifying fragment-pair candidates Fragment matching Result combination

21 Mar 27, 2008 Christiano Santiago21 Prototype Based on COMA: COmbining MAtch algorithm Support to multiple file schema Multiple matching strategies Fragment-based approach Result combination

22 Mar 27, 2008 Christiano Santiago22 COMA Schema representation Schemas are represented by rooted DAGs (Directed Acyclic Graphs).

23 Mar 27, 2008 Christiano Santiago23 COMA Directed Acyclic Graphs Direct graph With no cycles Part tree & part graph Used in Critical Path Analysis,Expression Tree Evaluation and Game Evaluation

24 Mar 27, 2008 Christiano Santiago24 COMA Match processing reusability

25 Mar 27, 2008 Christiano Santiago25 Continuity of this work 2004: COMA prototype 2005: COMA++, extended previous COMA prototype High quality and fast execution times Default combination of 4 matchers 2007: MOMA: Mapping-based Object Matching

26 Mar 27, 2008 Christiano Santiago26 Context Schema Matching Context-Sensitive False Negatives R s.price.price → R T.music.price R s.price.price → R T.music.sale R S.price.prcode = “reg” R S.price.prcode = “sale”

27 Mar 27, 2008 Christiano Santiago27 Context Schema Matching Context-Sensitive Two techniques for selecting contextual matches: MultiTable: find the single match with the highest confidence for every target attribute QualTable: find the best matches on a per- table basis

28 Mar 27, 2008 Christiano Santiago28 Context Schema Matching Context-Sensitive Experimental Results “Because of its poor performance, MultiTable is not considered further”

29 Mar 27, 2008 Christiano Santiago29 Conclusion Current schema matching approaches still have to improve for large and complex schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for schema matching are posed by the high expressive power and versatility of modern schema languages like XSD.

30 Mar 27, 2008 Christiano Santiago30 Questions


Download ppt "Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching."

Similar presentations


Ads by Google