Download presentation
Presentation is loading. Please wait.
Published byCarol George Modified over 9 years ago
1
Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching Philip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster COMA - A System for Flexible Combination of Schema Matching Approaches Hongai-Hai Do, Erhard Rahm
2
Mar 27, 2008 Christiano Santiago2 Goals Introductory concepts on Schema Matching Context-Sensitive versus Context-Insensitive Complexity on XSD schemas
3
Mar 27, 2008 Christiano Santiago3 Agenda Terminology Different Approaches XML Schema Definition Context-Insensitive Context-Sensitive Q&A
4
Mar 27, 2008 Christiano Santiago4 Terminology Schema matching: it is the process of identifying that two objects are semantically related.semantically Mapping: it refers to the transformations between the objects.transformations Meaning Conversion
5
Mar 27, 2008 Christiano Santiago5 Terminology Student.Name ≈ GradStudent.Name Student.SSN ≈ GradStudent.ID Student.Marks ≈ GradStudents.Grades Student Name, SSN, Level, Major, Marks GradStudent Name, ID, Major, Grades Match Transformation
6
Mar 27, 2008 Christiano Santiago6 Schema Matching
7
Mar 27, 2008 Christiano Santiago7 Context Context-insensitiveContext-sensitive
8
Mar 27, 2008 Christiano Santiago8 Different Approaches Schema-level matchers Instance-level matchers Hybrid matchers Reusing matching information
9
Mar 27, 2008 Christiano Santiago9 Schema-Level Matchers Only consider schema information Name Description Data type Relationship Constraints Number of nesting levels
10
Mar 27, 2008 Christiano Santiago10 Instance-Level Matchers Use instance-level to gather insight into the content and meaning of schema elements Linguistic Dept DeptName EmpName Constraints 416-7362100 M3J1P3
11
Mar 27, 2008 Christiano Santiago11 Hybrid-Level Matchers Combines more than one approach
12
Mar 27, 2008 Christiano Santiago12 Reusing Matching Information Use previous matching information for future matching tasks Structures or substructures often repeat Caution Salary & Income Payroll Tax Reporting
13
Mar 27, 2008 Christiano Santiago13 XML Schema Definition (XSD) Data types 19 built-in primitive data types 25 built-in derived data types User defined complex types
14
Mar 27, 2008 Christiano Santiago14 Complex type definition: Don Smith Dallas, TX XML Schema Definition (XSD) Attribute Child Elements
15
Mar 27, 2008 Christiano Santiago15 XML Schema Definition (XSD) Shared schema components
16
Mar 27, 2008 Christiano Santiago16 XML Schema Definition (XSD) Match Systems approaches COMA: path-based Cupid: materialized Scalability issue: XCBL Order schema contains 1451 components, including 91 shared types. After resolving the shared components, 26000+ nodes/paths were identified.
17
Mar 27, 2008 Christiano Santiago17 XML Schema Definition (XSD) Distributed schemas XSD allows a schema to be distributed over several schema documents (.xsd files) and namespaces
18
Mar 27, 2008 Christiano Santiago18 XML Schema Definition (XSD) Determining similarity between and matching complex types can be as difficult as matching two complete schemas.
19
Mar 27, 2008 Christiano Santiago19 Standard Schema Matching Context-Insensitive Matchers Matching algorithms to compute similarity scores between a pair of attributes Weights Scores are weighted Confidence scores are identified based on standard statistical techniques Selection of best matches
20
Mar 27, 2008 Christiano Santiago20 Fragmented-Based Schema Matching Context-Insensitive Fragment identification Identifying fragment-pair candidates Fragment matching Result combination
21
Mar 27, 2008 Christiano Santiago21 Prototype Based on COMA: COmbining MAtch algorithm Support to multiple file schema Multiple matching strategies Fragment-based approach Result combination
22
Mar 27, 2008 Christiano Santiago22 COMA Schema representation Schemas are represented by rooted DAGs (Directed Acyclic Graphs).
23
Mar 27, 2008 Christiano Santiago23 COMA Directed Acyclic Graphs Direct graph With no cycles Part tree & part graph Used in Critical Path Analysis,Expression Tree Evaluation and Game Evaluation
24
Mar 27, 2008 Christiano Santiago24 COMA Match processing reusability
25
Mar 27, 2008 Christiano Santiago25 Continuity of this work 2004: COMA prototype 2005: COMA++, extended previous COMA prototype High quality and fast execution times Default combination of 4 matchers 2007: MOMA: Mapping-based Object Matching
26
Mar 27, 2008 Christiano Santiago26 Context Schema Matching Context-Sensitive False Negatives R s.price.price → R T.music.price R s.price.price → R T.music.sale R S.price.prcode = “reg” R S.price.prcode = “sale”
27
Mar 27, 2008 Christiano Santiago27 Context Schema Matching Context-Sensitive Two techniques for selecting contextual matches: MultiTable: find the single match with the highest confidence for every target attribute QualTable: find the best matches on a per- table basis
28
Mar 27, 2008 Christiano Santiago28 Context Schema Matching Context-Sensitive Experimental Results “Because of its poor performance, MultiTable is not considered further”
29
Mar 27, 2008 Christiano Santiago29 Conclusion Current schema matching approaches still have to improve for large and complex schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for schema matching are posed by the high expressive power and versatility of modern schema languages like XSD.
30
Mar 27, 2008 Christiano Santiago30 Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.