Schema Mapping: Experiences and Lessons Learned Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.

Slides:



Advertisements
Similar presentations
Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
Advertisements

CoopIS2001 Trento, Italy The Use of Machine-Generated Ontologies in Dynamic Information Seeking Giovanni Modica Avigdor Gal Hasan M. Jamil.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
BY ANISH D. SARMA, XIN DONG, ALON HALEVY, PROCEEDINGS OF SIGMOD'08, VANCOUVER, BRITISH COLUMBIA, CANADA, JUNE 2008 Bootstrapping Pay-As-You-Go Data Integration.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David.
Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF.
1 A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein The VLDB Journal 10: (2001)
HyKSS: A Multiple Ontology Approach to Hybrid Search Andrew Zitzelberger Brigham Young University MS Thesis Proposal.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine.
BYU 2003BYU Data Extraction Group Combining the Best of Global-as-View and Local-as-View for Data Integration Li Xu Brigham Young University Funded by.
Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.
Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.
6/17/20151 Table Structure Understanding by Sibling Page Comparison Cui Tao Data Extraction Group Department of Computer Science Brigham Young University.
BYU 2003BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Semiautomatic Generation of Resilient Data-Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Dynamic Matchmaking between Messages and Services in Multi-Agent Systems Muhammed Al-Muhammed Brigham Young University Supported in part by NSF.
DASFAA 2003BYU Data Extraction Group Discovering Direct and Indirect Matches for Schema Elements Li Xu and David W. Embley Brigham Young University Funded.
UFMG, June 2002BYU Data Extraction Group Automating Schema Matching for Data Integration David W. Embley Brigham Young University Funded by NSF.
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
Discovering Direct and Indirect Matches for Schema Elements Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.
Biological Data Extraction and Integration A Research Area Background Study Cui Tao Department of Computer Science Brigham Young University.
Dynamic Matchmaking between Messages and Services in Multi-Agent Systems Muhammed Al-Muhammed David W. Embley Brigham Young University Supported in part.
Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.
1 Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand Masters Thesis Research Supported By NSF.
BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
Table Interpretation by Sibling Page Comparison Cui Tao & David W. Embley Data Extraction Group Department of Computer Science Brigham Young University.
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
Automatic Creation and Simplified Querying of Semantic Web Content An Approach Based on Information-Extraction Ontologies Yihong Ding, David W. Embley,
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Ontology Matching Basics Ontology Matching by Jerome Euzenat and Pavel Shvaiko Parts I and II 11/6/2012Ontology Matching Basics - PL, CS 6521.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001.
Cross-Language Hybrid Keyword and Semantic Search David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Joseph S. Park, Andrew Zitzelberger Brigham Young.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Learning Source Mappings Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems October 27, 2008 LSD Slides courtesy AnHai.
A SURVEY OF APPROACHES TO AUTOMATIC SCHEMA MATCHING Sushant Vemparala Gaurang Telang.
Chapter 10 Selected Single-Row Functions Oracle 10g: SQL.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
Presenter: Shanshan Lu 03/04/2010
Oracle 11g: SQL Chapter 10 Selected Single-Row Functions.
HKU CSIS DB Seminar: HKU CSIS DB Seminar: Finding Set-Mappings in Schema Matching Supervisor: Dr. David Cheung Speaker: Eric Lo.
XML Schema Integration Ray Dos Santos July 19, 2009.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
WebIQ: Learning from the Web to Match Deep-Web Query Interfaces Wensheng Wu Database & Information Systems Group University of Illinois, Urbana Joint work.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
A Survey of Approaches to Automatic Schema Matching (VLDB Journal, 2001) November 7, 2008 IDB SNU Presented by Kangpyo Lee.
Semantic Mappings for Data Mediation
AIFB Ontology Mapping I3CON Workshop PerMIS August 24-26, 2004 Washington D.C., USA Marc Ehrig Institute AIFB, University of Karlsruhe.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
HKU CSIS DB Seminar: HKU CSIS DB Seminar: COMA-A system for flexible combination of schema matching approaches - VLDB Hong-Hai Do and Erhard Rahm.
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Multiplication Timed Tests.
Chapter 10 Selected Single-Row Functions Oracle 10g: SQL
Cross-language Information Retrieval
Extracting Semantic Concept Relations
Automating Schema Matching for Data Integration
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Grant Number: IIS Institution of PI: Brigham Young University PI’s: David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale Title:
Chaitali Gupta, Madhusudhan Govindaraju
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Introduction to information retrieval
Presentation transcript:

Schema Mapping: Experiences and Lessons Learned Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF

2 Schema Mapping Semantic correspondence between two schemas Significance –data integration –data warehouses –ontology merging –message translation in e-commerce –semantic query processing –etc.

3 Schema Representation HouseAgent Golf course Water front Phone_evening Name Address StreetCityState Basic_features bedsSQFT MLS agent location_ description name home phone office phone location Location MLS Bedrooms Phone_day cell phone

4 1:1 Mapping Cardinality HouseAgent Golf course Water front Phone_evening Name Address StreetCityState Basic_features bedsSQFT MLS agent location_ description name home phone office phone location Location MLS Bedrooms Phone_day cell phone

5 n:1 Mapping Cardinality HouseAgent Golf course Water front Phone_evening Name Address StreetCityState Basic_features bedsSQFT MLS agent location_ description name home phone office phone location Location MLS Bedrooms Phone_day cell phone

6 n:m Mapping Cardinality HouseAgent Golf course Water front Phone_evening Name Address StreetCityState Basic_features bedsSQFT MLS agent location_ description name home phone office phone location Location MLS Bedrooms Phone_day cell phone

7 Object-Set Matcher (schema-level) Name-based matcher –string and substring comparison –linguistic methods: stemming, stop words, removing ignorable characters, etc. –thesaurus: WordNet, etc. 1:1 mapping cardinality Agent Name agent name

8 Object-Set Matcher (instance-level) Data Frame –multiple regular expressions in Perl style –as simple as a list of data values Data-frame matcher –use: compare recognized data values –benefit: able to recognize disjunctive data value sets –bias: data frame may not correspond 100% with the semantics –limitation: a needed data frame might not exist 1:1 mapping cardinality Object-set A Ford Honda Object-set B Chevy Toyota   Car Model Ford, Honda, Chevy, Toyota …

9 Extended Data-Frame Matcher (instance-level) n:1 mapping cardinality Add a STRICT_SUBSTRING operation With the help of structural analysis Address StreetCityState location Schema 2 Schema N. University Ave., Provo, UT

10 Direct Structure Matcher Comparing structure similarity between two candidate schemas 1:1 mapping cardinality Agent Name Fax Address agent name faxphone Location phone_day

11 Schema 2 Schema 1 Reference Structure Matcher If A and B match C, then A matches B. Able to solve n:m mapping cardinality 1:1, n:1, and n:m mapping cardinalities Phone Day Phone Evening Phone Cell Phone Home Phone Office Phone Day Phone Evening Phone Home Phone Office Phone Cell Phone

12 Experiments Application (Number of Schemes) Precision (%) Recall (%) F (%) Number Matches Number Correct Number Incorrect Faculty Member (5) Course Schedule (5) Real Estate (5) Data borrowed from Univ. of Washington [DDH, SIGMOD01] Indirect Matches: (precision 87%, recall 94%, F-measure 90%) Rough Comparison with U of W Results * Faculty Member – Accuracy, ~92% * Course Schedule – Accuracy: ~71% * Real Estate (2 tests) – Accuracy: ~75%

13 Lessons Learned n:1 and n:m matches occur frequently. –22% = 97/437 [DMD+03] (Course Catalog, Company Profile) –45% = 287/638 (Car Ads, Cell Phones, Real Estate) Reference structures provides a way to solve the long- lasting hard cluster mapping (n:m cardinality) problem. Data frames improve the instance-level matchers. The combination of schema-level and instance-level matchers improve the results.