Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman.

Similar presentations


Presentation on theme: "Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman."— Presentation transcript:

1 Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman

2 Background Problem: Attribute matching Techniques Data values Data-dictionary information Structural properties Ontologies Terminological relationships

3 Approach Target Schema T Source Schema S Framework Individual Facet Matching; Combining Multiple Facets; Iteration.

4 Example Source Schema S Car Year has 0:1 Make has 0:1 Model has 0:1 Cost Style has 0:1 0:* Year has 0:1 Feature has 0:* Cost has 0:1 Car Mileage has Phone has 0:1 Model has 0:1 Target Schema T Make has 0:1 Miles has 0:1 Year Model Make Year Make Model Car MileageMiles

5 Individual Facet Matching Terminological relationships Data value characteristics Target-specific, regular-expression matches

6 Terminological Relationships Names of Attributes T : A S : B WordNet C4.5 Decision Tree Feature selection f0: Same word f1: Synonym f2: Sum of the distances of A and B to a common hypernym root f3: Number of different common hypernym roots of A and B f4: Sum of the number of senses of A and B

7 WordNet Rule The number of different common hypernym roots of A and B Sum of distances of A and B to a common hypernym The sum of the number of senses of A and B

8 WordNet Confidences

9 Data-Value Characteristics C4.5 Decision Tree Features [LC94] Numeric data Mean, variation, coefficient variation, standard deviation; Alphanumeric data String length, numeric ratio, space ratio.

10 Value-Characteristics Confidences

11 Expected Data Values Target Schema T Data frame Source Schema S Data instances Hit Ratio = N’/N (A, B) N’ : number of B data instances consistent with specifications of A data frame; N: number of B data instances.

12 Expected-Values Confidences

13 Combined Confidences Threshold: 0.5 1 0 0 0 0 0 0 0 000000 1 0 0 0 0 0 0000 10 0 0000 0 0 0 0 0 1 0 0 0 00 10 0 00

14 Final Confidences

15 Experimental Results Matched Attributes 100% (32 of 32); Unmatched Attributes 99.5% (374 of 376); “Feature” ---”Color”; “Feature” ---”Body Type”. F1 93.75% F2 84% F3 92% F1 98.9% F2 97.9% F3 98.4%

16 Future Work Additional facets of metadata More sophisticated combinations Additional application domains Automating feature selection

17 Questions?


Download ppt "Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman."

Similar presentations


Ads by Google