Download presentation
Presentation is loading. Please wait.
1
1 A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein The VLDB Journal 10:334-350 (2001)
2
2 The Problem zSchema matching yInput schemas yOutput mappings zMotivations yManual schema matching yGeneric and customizable schema matching
3
3 Application Domains zSchema Integration: Structures and Terminological relationships zData warehouses: Source-to-warehouse Transformation zE-commerce: Message Translation zSemantic query processing: A Run-time Scenario
4
4 The Match Operator zRepresentations of Input Schemas and Output Mapping ySchema representation xSchema elements xStructure yMapping representation xMapping elements xMapping expressions zMatching Function yMathematically unsatisfying yHeuristics
5
5 Architecture for Generic Match Tool 1 (Portal schemas) Tool 2 (E-business schemas) Tool 3 (Data warehousing schemas) Global libraries (dictionaries, schemas, …) Schema import/export Generic Match Implementation Internal schema representation
6
6 Classification of Approaches zIndividual matchers yInstance vs Schema yElement vs Structure Matching yLanguage vs Constraint yMatching Cardinality (1:1, 1:n, n:1, and n:m) yAuxiliary Information zCombinations of multiple matchers
7
7 Schema-level Approaches zGranularity of match (element-level vs. structure-level) zMatch cardinality zLinguistic approaches zConstraint-based approaches zReusing schema and mapping information
8
8 Granularity of match S1 elementsS2 elements Address Street City State Zip CustomerAddress Street City USState PostalCode Full structure match of Address and CustomerAddress AccountOwner Name Address Birthdate TaxExempt Customer Cname CAddress Cphone Partial structural match of AccountOwner and Customer
9
9 Match Cardinality Local match cardinalities S1 element(s) S2 element(s) Matching expression 1. 1:1, element level PriceAmountAmount = Price 2. n:1, element-level Price, TaxCostCost = Price * (1 + Tax/100) 3. 1:n, element-level NameFirstName, LastName FirstName, LastName = Extract(Name, …) 4. n:1, structure-level (n:m element- level) B.Title, B.PuNo, P.PuNo, P.Name A.Book, A.Publisher A.Book, A.Publisher = select B.Title, P.Name from B, P where B.PuNo = P.PuNo
10
10 Linguistic Approaches zName Matching yEquality of names yEquality of canonical name representations yEquality of synonyms yEquality of hypernyms ySimilarity of names based on common substrings, edit distance, pronunciation, and soundex yUser provided name matches zDescription Matching yEx. S1: empn //employee name yEx. S2: name //name of employee
11
11 Constraint-based Approaches
12
12 Reusing Schema and Mapping Information
13
13 Instance-level Approaches zLinguistic characterization yInformation retrieval techniques yEx. Extracting keywords and themes zConstraint-based characterization yNumeric value ranges yNumeric value averages yCharacter patterns (PhoneNr, ISBNs,, SSNs…)
14
14 Combining Different Matchers zHybrid matchers yHard-wired combination of multiple matching criteria yBetter performance zComposite matchers yIndependent basic matchers yFlexible execution order
15
15 Sample Approaches zSEMINT zLSD zSKAT zTranScm zDIKE zARTEMIS zCUPID
16
16 Sample Approaches zSEMINT zLSD zSKAT zTranScm zDIKE zARTEMIS zCUPID
17
17 SEMINTLSDTranScmCupidBYU Approach Schema TypeRelational, files XMLSGML, OOXML, relational OSM Metadata representationAttribute- based XMLLabeled graphExtended ER OSM Match granularity1:1 1:1 and 1:n1:1 and n:m Schema-level match Name-based **** Constraint-based* *** Structure matching **** Instance- level match Text-oriented * * Constraint-oriented** * Reuse/auxiliary information used** ** Combination of matchesHybridCompositeHybrid Composite Manual work/ user input***** Application areaData integration Data Integration Data Translation Generic RemarksNeural network
18
18 Conclusion zPropose a taxonomy that covers many of the existing approaches zSuggest quantitative work on the relative performance and accuracy of different approaches
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.