Download presentation
Presentation is loading. Please wait.
1
Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference
2
Schema Matching
3
Schema Matching (Cont.) Definition: Finding a mapping between those elements of two schemas that semantically correspond to each other Applications Schema integration Data translation XML message mapping Data warehouse loading Goal
4
Taxonomy Schema vs. Instance based Element vs. Structure granularity Linguistic based Constraint based Matching cardinality Auxiliary information Individual vs. Combinational
5
Cupid Schema-based Automated linguistic-based matching Both element-based and structure-based Biased toward similarity of atomic elements Exploits internal structure Exploits keys, referential constraints and views Makes context-dependent matches of a shard type 1:n mapping
6
Similarity Coefficient Computation First Phase: Linguistic matching Names Data types Domains Linguistic similarity coefficient: lsim Second Phase: Structural matching Contexts Linguistic similarity coefficients Structural similarity coefficient: ssim Hybrid (wsim = w_ struct * ssim + (1-w_ struct ) * lsim)
7
Linguistic Matching Normalization Tokenization Expansion elimination Categorization Data types Schema hierarchy Linguistic contents Comparison—Linguistic Similarity Coefficient (lsim) Thesaurus Sub-string matching
8
Structural Matching Bottom-up Mutually Recursive
9
Example
10
Example (Cont.)
12
Schema Graphs Elements Relationships(containment, aggregation, and IsDerivedFrom) Matching Shard Types (context dependent mappings) Matching Referential Constraints General Schemas
13
Matching Shard Types
14
Matching Referential Constraints
15
Other Features Optionality Views Initial Mappings Lazy Expansion Pruning Leaves
16
Comparative Study Algorithms MOMIS DIKE Cupid Canonical Examples Real World Example
17
Canonical Examples Identical schemas Atomic elements with same names, but different data types Atomic elements with same data types, but different names (a prefix or suffix is added) Different class names, but atomic elements same names and data types Different nesting of the data – similar schemas with nested and flat structures Type substitution or context dependent mapping
18
Real World Example
19
Experimental Conclusions Linguistic matching Thesaurus Linguistic similarity with no structure similarity Granularity of similarity computation Leaves Structure information beyond the immediate vicinity Context-dependent mappings Performance parameters
20
Future Work A Truly Robust Solution Machine learning applied to instances Natural language technology Pattern matching to reuse known matches Immediate Challenges Off-the-shelf thesaurus Schema annotations Automatic tuning of the control parameters Scalability analysis and testing More comparative analysis of algorithms
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.