Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference

Schema Matching

Schema Matching (Cont.) Definition: Finding a mapping between those elements of two schemas that semantically correspond to each other Applications Schema integration Data translation XML message mapping Data warehouse loading Goal

Taxonomy Schema vs. Instance based Element vs. Structure granularity Linguistic based Constraint based Matching cardinality Auxiliary information Individual vs. Combinational

Cupid Schema-based Automated linguistic-based matching Both element-based and structure-based Biased toward similarity of atomic elements Exploits internal structure Exploits keys, referential constraints and views Makes context-dependent matches of a shard type 1:n mapping

Similarity Coefficient Computation First Phase: Linguistic matching Names Data types Domains  Linguistic similarity coefficient: lsim Second Phase: Structural matching Contexts Linguistic similarity coefficients  Structural similarity coefficient: ssim Hybrid (wsim = w_ struct * ssim + (1-w_ struct ) * lsim)

Linguistic Matching Normalization Tokenization Expansion elimination Categorization Data types Schema hierarchy Linguistic contents Comparison—Linguistic Similarity Coefficient (lsim) Thesaurus Sub-string matching

Structural Matching Bottom-up Mutually Recursive

Example

Example (Cont.)

Schema Graphs Elements Relationships(containment, aggregation, and IsDerivedFrom) Matching Shard Types (context dependent mappings) Matching Referential Constraints General Schemas

Matching Shard Types

Matching Referential Constraints

Other Features Optionality Views Initial Mappings Lazy Expansion Pruning Leaves

Comparative Study Algorithms MOMIS DIKE Cupid Canonical Examples Real World Example

Canonical Examples Identical schemas Atomic elements with same names, but different data types Atomic elements with same data types, but different names (a prefix or suffix is added) Different class names, but atomic elements same names and data types Different nesting of the data – similar schemas with nested and flat structures Type substitution or context dependent mapping

Real World Example

Experimental Conclusions Linguistic matching Thesaurus Linguistic similarity with no structure similarity Granularity of similarity computation Leaves Structure information beyond the immediate vicinity Context-dependent mappings Performance parameters

Future Work A Truly Robust Solution Machine learning applied to instances Natural language technology Pattern matching to reuse known matches Immediate Challenges Off-the-shelf thesaurus Schema annotations Automatic tuning of the control parameters Scalability analysis and testing More comparative analysis of algorithms

Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Similar presentations

Presentation on theme: "Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Similar presentations

Presentation on theme: "Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference."— Presentation transcript:

Similar presentations

About project

Feedback