Download presentation
Presentation is loading. Please wait.
1
Schema Mapping: Experiences and Lessons Learned Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF
2
2 Schema Mapping Semantic correspondence between two schemas Significance –data integration –data warehouses –ontology merging –message translation in e-commerce –semantic query processing –etc.
3
3 Schema Representation HouseAgent Golf course Water front Phone_evening Name Address StreetCityState Basic_features bedsSQFT MLS agent location_ description name home phone office phone location Location MLS Bedrooms Phone_day cell phone
4
4 1:1 Mapping Cardinality HouseAgent Golf course Water front Phone_evening Name Address StreetCityState Basic_features bedsSQFT MLS agent location_ description name home phone office phone location Location MLS Bedrooms Phone_day cell phone
5
5 n:1 Mapping Cardinality HouseAgent Golf course Water front Phone_evening Name Address StreetCityState Basic_features bedsSQFT MLS agent location_ description name home phone office phone location Location MLS Bedrooms Phone_day cell phone
6
6 n:m Mapping Cardinality HouseAgent Golf course Water front Phone_evening Name Address StreetCityState Basic_features bedsSQFT MLS agent location_ description name home phone office phone location Location MLS Bedrooms Phone_day cell phone
7
7 Object-Set Matcher (schema-level) Name-based matcher –string and substring comparison –linguistic methods: stemming, stop words, removing ignorable characters, etc. –thesaurus: WordNet, etc. 1:1 mapping cardinality Agent Name agent name
8
8 Object-Set Matcher (instance-level) Data Frame –multiple regular expressions in Perl style –as simple as a list of data values Data-frame matcher –use: compare recognized data values –benefit: able to recognize disjunctive data value sets –bias: data frame may not correspond 100% with the semantics –limitation: a needed data frame might not exist 1:1 mapping cardinality Object-set A Ford Honda Object-set B Chevy Toyota Car Model Ford, Honda, Chevy, Toyota …
9
9 Extended Data-Frame Matcher (instance-level) n:1 mapping cardinality Add a STRICT_SUBSTRING operation With the help of structural analysis Address StreetCityState location Schema 2 Schema 1 120 N. University Ave., Provo, UT
10
10 Direct Structure Matcher Comparing structure similarity between two candidate schemas 1:1 mapping cardinality Agent Name Fax Address agent name faxphone Location phone_day
11
11 Schema 2 Schema 1 Reference Structure Matcher If A and B match C, then A matches B. Able to solve n:m mapping cardinality 1:1, n:1, and n:m mapping cardinalities Phone Day Phone Evening Phone Cell Phone Home Phone Office Phone Day Phone Evening Phone Home Phone Office Phone Cell Phone
12
12 Experiments Application (Number of Schemes) Precision (%) Recall (%) F (%) Number Matches Number Correct Number Incorrect Faculty Member (5) 100 540 0 Course Schedule (5) 9993964904546 Real Estate (5) 90949287682092 Data borrowed from Univ. of Washington [DDH, SIGMOD01] Indirect Matches: (precision 87%, recall 94%, F-measure 90%) Rough Comparison with U of W Results * Faculty Member – Accuracy, ~92% * Course Schedule – Accuracy: ~71% * Real Estate (2 tests) – Accuracy: ~75%
13
13 Lessons Learned n:1 and n:m matches occur frequently. –22% = 97/437 [DMD+03] (Course Catalog, Company Profile) –45% = 287/638 (Car Ads, Cell Phones, Real Estate) Reference structures provides a way to solve the long- lasting hard cluster mapping (n:m cardinality) problem. Data frames improve the instance-level matchers. The combination of schema-level and instance-level matchers improve the results.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.