Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.

Slides:



Advertisements
Similar presentations
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
Advertisements

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
USC Graduate Student DayColumbia, SCMarch 2006 Presented by: Jingshan Huang Computer Science & Engineering Department University of South Carolina PhD.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David.
Young Deok Chun, Nam Chul Kim, Member, IEEE, and Ick Hoon Jang, Member, IEEE IEEE TRANSACTIONS ON MULTIMEDIA,OCTOBER 2008.
Generic Schema Matching using Cupid
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu.
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National.
BYU 2003BYU Data Extraction Group Combining the Best of Global-as-View and Local-as-View for Data Integration Li Xu Brigham Young University Funded by.
Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.
BYU 2003BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
Schema Mapping: Experiences and Lessons Learned Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
DASFAA 2003BYU Data Extraction Group Discovering Direct and Indirect Matches for Schema Elements Li Xu and David W. Embley Brigham Young University Funded.
UFMG, June 2002BYU Data Extraction Group Automating Schema Matching for Data Integration David W. Embley Brigham Young University Funded by NSF.
Filtering Multiple-Record Web Documents Based on Application Ontologies Presenter: L. Xu Advisor: D.W.Embley.
D2I Modena, 27 Aprile 2001 Methodologies and techniques for translating information from source to target data models Unità Responsabile: CS-RC Unità Coinvolte:
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
Scheme Matching and Data Extraction over HTML Tables from Heterogeneous Sources Cui Tao March, 2002 Founded by NSF.
Discovering Direct and Indirect Matches for Schema Elements Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.
Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman.
1 Project 3 Li Xu Brigham Young University. 2 Overview zProblem: schema matching yElement-level yStructure-level zExperiment Applications yCourse Schedule.
BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
1 A Tool to Support Ontology Creation Based on Incremental Mini-ontology Merging Zonghui Lian.
fleckvelter gonsity (ld/gg) hepth (gd) burlam falder multon repeat: 1.understand table 2.generate mini-ontology 3.match with growing.
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
Data Model Examples USER SPECIFICATIONS.
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect3_1.
H euristic Pre-Clustering Relevance Feedback in Attention-Based Image Retrieval Wan-Ting Su, Wen-Sheng Chu and Jenn-Jier James Lien Speaker: Wen-Sheng.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
U N I V E R S I T À D E G L I S T U D I D I M I L A N O C17 SC for Environmental Applications and Remote Sensing I M S C I A Soft Computing for Environmental.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
Variables, Function Patterns, and Graphs
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Basic Geographic Concepts GEOG 370 Instructor: Christine Erlien.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
CSE 636 Data Integration Schema Matching Cupid Fall 2006.
HKU CSIS DB Seminar: HKU CSIS DB Seminar: Finding Set-Mappings in Schema Matching Supervisor: Dr. David Cheung Speaker: Eric Lo.
XML Schema Integration Ray Dos Santos July 19, 2009.
Generic Schema Matching using Cupid Jayant Madhavan University of Washington Philip A. Bernstein Erhard Rahm Microsoft Research University of Leipzig.
© 2002 by Prentice Hall 1 Database Processing with Microsoft Access David M. Kroenke Database Concepts 1e Appendix A.
Database Processing with Microsoft Access Appendix DAVID M. KROENKE’S DATABASE CONCEPTS, 2 nd Edition.
Car Loan Company Loan Amount Yearly Interest Rate Monthly Interest Rate Length of Loan (in months) Monthly Loan Payment Car #1: 1997 BMW 318i.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
8.6 Algebra and Composition of Functions. that limit the domain of a function are: The most common rules of algebra Rule 1: You can’t divide by 0. Rule.
Aligner automatiquement des ontologies avec Tuesday 23 rd of January, 2007 Rapha ë l Troncy.
Basic Concepts of Object Orientation Object-Oriented Analysis CIM2566 Bavy LI.
 Most maps have the following elements, which are necessary to read and understand them.
INTRODUCTION TO STATISTICS CHAPTER 1: IMPORTANT TERMS & CONCEPTS.
Course Summery’2012 ISLM 301: Indexing and Abstracting Full Marks: 100 Course Final 80, Incourse 20 (Tutorial 10, Class Presentation 5, Project 5) CH-4.
Lesson 7-4 Pages Scale Drawings Lesson Check 7-3.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Oracle Advanced Analytics
EMPA Statistical Analysis
Parts of a Map.
Stojanka Brankovic, C.E.MSc Annual World Bank Conference on
Beginning XML 4th Edition.
7.5(A) Generalize the critical attributes of similarity, including
Chapter Functions.
Cross-language Information Retrieval
UNIT TWO: CLASSIFYING REAL NUMBERS
Consider the composition, size, space, and other visual and material attributes of this shot: A) discuss how formal elements impact the meaning.
Automating Schema Matching for Data Integration
Machine Learning Course.
Penny.
Presentation transcript:

Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF

Car Schema Matching Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type

Mapping Direct Matches Indirect Matches Union Selection Composition Decomposition

Union and Selection Car Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type

Composition and Decomposition Car Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type

Matching Techniques Terminological Relationships Value Characteristics Expected Data Values Structure

Terminological Relationships WordNet Machine-Learned Rules Example: (Make, Brand) The number of different common hypernym roots of A and B Sum of distances of A and B to a common hypernym The sum of the number of senses of A and B

Value Characteristics Machine Learning Features [LC94] String length, numeric ratio, space ratio. Mean, variation, coefficient variation, standard deviation;

Make & ModelBrand Model Expected Values Application Concepts Data Frames CarMake  “ford”  “honda”  … CarModel  “accord”  “mustang”  “taurus”  … Ford Mustang Ford Taurus Ford F150 … CarMake. CarModel Legend Mustang A4 … CarModel CarMake TargetSource Acura Audi BMW …

Structure PO POShipToPOBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder DeliverToInvoiceTo Items ItemItemCount ItemNumber QuantityUnitOfMeasure CityStreet Address TargetSource

Structure (Cont.) PO POShipToPOBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder DeliverToInvoiceTo Items ItemCount ItemNumber QuantityUnitOfMeasure CityStreet Address DeliverTo TargetSource

Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder InvoiceTo Items ItemCount ItemNumber QuantityUnitOfMeasure City Street City Street POShipToDeliverTo TargetSource

Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count PurchaseOrder InvoiceTo Items ItemCount City Street City Street LineQtyUoM ItemNumber Quantity LineQtyUoM ItemNumber Quantity LineQty QuantityUnitOfMeasure POShipToDeliverTo TargetSource

Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder InvoiceTo Items ItemCount ItemNumber Quantity City Street City Street City Street City Street Count LineQty QuantityUnitOfMeasure POShipToDeliverTo TargetSource

Experiments Methodology Measures Precision Recall F Measure

Results Applications (Number of Schemes) Precision (%) Recall (%) F (%) CorrectFalse Positive False Negative Course Schedule (5) Faculty Member (5) Real Estate (5) Data borrowed from Univ. of Washington Indirect Matches: 94% (precision, recall, F-measure)

Contributions Direct Matches Indirect Matches Expected values Structure High Precision and High Recall