Download presentation
Presentation is loading. Please wait.
1
Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF
2
Car Schema Matching Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type
3
Mapping Direct Matches Indirect Matches Union Selection Composition Decomposition
4
Union and Selection Car Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type
5
Composition and Decomposition Car Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type
6
Matching Techniques Terminological Relationships Value Characteristics Expected Data Values Structure
7
Terminological Relationships WordNet Machine-Learned Rules Example: (Make, Brand) The number of different common hypernym roots of A and B Sum of distances of A and B to a common hypernym The sum of the number of senses of A and B
8
Value Characteristics Machine Learning Features [LC94] String length, numeric ratio, space ratio. Mean, variation, coefficient variation, standard deviation;
9
Make & ModelBrand Model Expected Values Application Concepts Data Frames CarMake “ford” “honda” … CarModel “accord” “mustang” “taurus” … Ford Mustang Ford Taurus Ford F150 … CarMake. CarModel Legend Mustang A4 … CarModel CarMake TargetSource Acura Audi BMW …
10
Structure PO POShipToPOBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder DeliverToInvoiceTo Items ItemItemCount ItemNumber QuantityUnitOfMeasure CityStreet Address TargetSource
11
Structure (Cont.) PO POShipToPOBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder DeliverToInvoiceTo Items ItemCount ItemNumber QuantityUnitOfMeasure CityStreet Address DeliverTo TargetSource
12
Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder InvoiceTo Items ItemCount ItemNumber QuantityUnitOfMeasure City Street City Street POShipToDeliverTo TargetSource
13
Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count PurchaseOrder InvoiceTo Items ItemCount City Street City Street LineQtyUoM ItemNumber Quantity LineQtyUoM ItemNumber Quantity LineQty QuantityUnitOfMeasure POShipToDeliverTo TargetSource
14
Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder InvoiceTo Items ItemCount ItemNumber Quantity City Street City Street City Street City Street Count LineQty QuantityUnitOfMeasure POShipToDeliverTo TargetSource
15
Experiments Methodology Measures Precision Recall F Measure
16
Results Applications (Number of Schemes) Precision (%) Recall (%) F (%) CorrectFalse Positive False Negative Course Schedule (5) 98939611929 Faculty Member (5) 100 14000 Real Estate (5) 9296942352010 Data borrowed from Univ. of Washington Indirect Matches: 94% (precision, recall, F-measure)
17
Contributions Direct Matches Indirect Matches Expected values Structure High Precision and High Recall
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.