Download presentation
Presentation is loading. Please wait.
1
Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman
2
Background Problem: Attribute matching Techniques Data values Data-dictionary information Structural properties Ontologies Terminological relationships
3
Approach Target Schema T Source Schema S Framework Individual Facet Matching; Combining Multiple Facets; Iteration.
4
Example Source Schema S Car Year has 0:1 Make has 0:1 Model has 0:1 Cost Style has 0:1 0:* Year has 0:1 Feature has 0:* Cost has 0:1 Car Mileage has Phone has 0:1 Model has 0:1 Target Schema T Make has 0:1 Miles has 0:1 Year Model Make Year Make Model Car MileageMiles
5
Individual Facet Matching Terminological relationships Data value characteristics Target-specific, regular-expression matches
6
Terminological Relationships Names of Attributes T : A S : B WordNet C4.5 Decision Tree Feature selection f0: Same word f1: Synonym f2: Sum of the distances of A and B to a common hypernym root f3: Number of different common hypernym roots of A and B f4: Sum of the number of senses of A and B
7
WordNet Rule The number of different common hypernym roots of A and B Sum of distances of A and B to a common hypernym The sum of the number of senses of A and B
8
WordNet Confidences
9
Data-Value Characteristics C4.5 Decision Tree Features [LC94] Numeric data Mean, variation, coefficient variation, standard deviation; Alphanumeric data String length, numeric ratio, space ratio.
10
Value-Characteristics Confidences
11
Expected Data Values Target Schema T Data frame Source Schema S Data instances Hit Ratio = N’/N (A, B) N’ : number of B data instances consistent with specifications of A data frame; N: number of B data instances.
12
Expected-Values Confidences
13
Combined Confidences Threshold: 0.5 1 0 0 0 0 0 0 0 000000 1 0 0 0 0 0 0000 10 0 0000 0 0 0 0 0 1 0 0 0 00 10 0 00
14
Final Confidences
15
Experimental Results Matched Attributes 100% (32 of 32); Unmatched Attributes 99.5% (374 of 376); “Feature” ---”Color”; “Feature” ---”Body Type”. F1 93.75% F2 84% F3 92% F1 98.9% F2 97.9% F3 98.4%
16
Future Work Additional facets of metadata More sophisticated combinations Additional application domains Automating feature selection
17
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.