Download presentation
Presentation is loading. Please wait.
1
Discovering Direct and Indirect Matches for Schema Elements Li Xu Data Extraction Group Brigham Young University Sponsored by NSF
2
Car Problem Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type
3
Applications Data Integration Schema Integration Message Mapping Data Translation
4
Approach Direct Matches Indirect Matches Union Selection Composition Decomposition
5
Union and Selection Car Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type
6
Composition and Decomposition Car Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type
7
Matching Techniques Terminological Relationships Value Characteristics Expected Data Values Structure
8
Terminological Relationships WordNet Machine-Learned Rules Example: (Make, Brand) The number of different common hypernym roots of A and B Sum of distances of A and B to a common hypernym The sum of the number of senses of A and B
9
Value Characteristics Machine Learning Features [LC94] String length, numeric ratio, space ratio. Mean, variation, coefficient variation, standard deviation;
10
Make & ModelBrand Model Expected Values Application Concepts Data Recognizers CarMake “ford” “honda” … CarModel “accord” “mustang” “taurus” … Ford Mustang Ford Taurus Ford F150 … CarMake. CarModel Legend Mustang A4 … CarModel CarMake TargetSource Acura Audi BMW …
11
Structure PO POShipToPOBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder DeliverToInvoiceTo Items ItemItemCount ItemNumber QuantityUnitOfMeasure CityStreet Address TargetSource
12
Structure (Cont.) PO POShipToPOBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder DeliverToInvoiceTo Items ItemCount ItemNumber QuantityUnitOfMeasure CityStreet Address DeliverTo TargetSource
13
Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder InvoiceTo Items ItemCount ItemNumber QuantityUnitOfMeasure City Street City Street POShipToDeliverTo TargetSource
14
Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count PurchaseOrder InvoiceTo Items ItemCount City Street City Street LineQtyUoM ItemNumber Quantity LineQtyUoM ItemNumber Quantity LineQty QuantityUnitOfMeasure POShipToDeliverTo TargetSource
15
Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder InvoiceTo Items ItemCount ItemNumber Quantity City Street City Street City Street City Street Count LineQty QuantityUnitOfMeasure POShipToDeliverTo TargetSource
16
Experiments Methodology Measures Precision Recall F Measure
17
Results Applications (Number of Schemes) Precision (%) Recall (%) F (%) CorrectFalse Positive False Negative Course Schedule (5) 98939611929 Faculty Member (5) 100 14000 Real Estate (5) 9296942352010 Data borrowed from Univ. of Washington Indirect Matches: 94% (precision, recall, F-measure)
18
Ground-Truthing Contact Cell phone Office phone Firm location Fax Email Firm name Agent name Agent’s Or Firm’s
19
Limitation (Expected Data Value)
20
Contributions Direct Matches Indirect Matches Expected values Structure High Precision and High Recall
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.