Download presentation
Presentation is loading. Please wait.
1
Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu
2
Background Problem: Attribute Matching Matching Possibilities (Facets) Attribute Names Data-Value Characteristics Expected Data Values Data-Dictionary Information Structural Properties
3
Approach Target Schema T Source Schema S Framework Individual Facet Matching Combining Facets Best-First Match Iteration
4
Example Source Schema S Car Year has 0:1 Make has 0:1 Model has 0:1 Cost Style has 0:1 0:* Year has 0:1 Feature has 0:* Cost has 0:1 Car Mileage has Phone has 0:1 Model has 0:1 Target Schema T Make has 0:1 Miles has 0:1 Year Model Make Year Make Model Car MileageMiles
5
Individual Facet Matching Attribute Names Data-Value Characteristics Expected Data Values
6
Attribute Names Target and Source Attributes T : A S : B WordNet C4.5 Decision Tree: feature selection f0: same word f1: synonym f2: sum of distances to a common hypernym root f3: number of different common hypernym roots f4: sum of the number of senses of A and B
7
WordNet Rule The number of different common hypernym roots of A and B The sum of distances of A and B to a common hypernym The sum of the number of senses of A and B
8
Confidence Measures
9
Data-Value Characteristics C4.5 Decision Tree Features Numeric data (Mean, variation, standard deviation, …) Alphanumeric data (String length, numeric ratio, space ratio)
10
Confidence Measures
11
Expected Data Values Target Schema T and Source Schema S Regular expression recognizer for attribute A in T Data instances for attribute B in S Hit Ratio = N’/N for (A, B) match N’ : number of B data instances recognized by the regular expressions of A N: number of B data instances
12
Confidence Measures
13
Combined Measures Threshold: 0.5 1 0 0 0 0 0 0 0 000000 1 0 0 0 0 0 0000 10 0 0000 0 0 0 0 0 1 0 0 0 00 10 0 00
14
Final Confidence Measures
15
Experimental Results Matched Attributes 100% (32 of 32); Unmatched Attributes 99.5% (374 of 376); “Feature” ---”Color”; “Feature” ---”Body Type”. F1 93.75% F2 84% F3 92% F1 98.9% F2 97.9% F3 98.4%
16
Conclusions Direct Attribute Matching – feasible Individual-Facet Matching – good Multifaceted Matching – better
17
Future Work Additional Facets More Sophisticated Combinations Additional Application Domains Automating Feature Selection Indirect Attribute Matching www.deg.byu.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.