Discovering Direct and Indirect Matches for Schema Elements Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.

Slides:



Advertisements
Similar presentations
ICDL Software Applications - Database Concepts. Unit 6 Data and Data Representation Database Concepts –File Structure –Relationships Database Design –Data.
Advertisements

Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
USC Graduate Student DayColumbia, SCMarch 2006 Presented by: Jingshan Huang Computer Science & Engineering Department University of South Carolina PhD.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/1 Outline Introduction Background Distributed Database Design Database Integration ➡ Schema Matching ➡
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David.
Generic Schema Matching using Cupid
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu.
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National.
BYU 2003BYU Data Extraction Group Combining the Best of Global-as-View and Local-as-View for Data Integration Li Xu Brigham Young University Funded by.
Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.
Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.
BYU 2003BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
Schema Mapping: Experiences and Lessons Learned Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Designing Algorithms February 2nd. Administrativia Lab assignments will be due every Monday Lab access –Searles 128: daily until 4pm unless class in progress.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
DASFAA 2003BYU Data Extraction Group Discovering Direct and Indirect Matches for Schema Elements Li Xu and David W. Embley Brigham Young University Funded.
UFMG, June 2002BYU Data Extraction Group Automating Schema Matching for Data Integration David W. Embley Brigham Young University Funded by NSF.
Filtering Multiple-Record Web Documents Based on Application Ontologies Presenter: L. Xu Advisor: D.W.Embley.
Scheme Matching and Data Extraction over HTML Tables from Heterogeneous Sources Cui Tao March, 2002 Founded by NSF.
Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman.
1 Project 3 Li Xu Brigham Young University. 2 Overview zProblem: schema matching yElement-level yStructure-level zExperiment Applications yCourse Schedule.
A Formal Methodology for Smart Assembly Design A Presentation by Kris Downey – Graduate Student Alan Parkinson – Faculty Member 15 June 2000 Acknowledgements.
BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
1 A Tool to Support Ontology Creation Based on Incremental Mini-ontology Merging Zonghui Lian.
Special Sets of Numbers
fleckvelter gonsity (ld/gg) hepth (gd) burlam falder multon repeat: 1.understand table 2.generate mini-ontology 3.match with growing.
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
Data Model Examples USER SPECIFICATIONS.
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect3_1.
Yong Choi School of Business CSUB
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
SHAPE. SHAPE: An element of art referring to a two- dimensional area clearly set off by one or more of the other visual elements such as color, value,
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Infusionsoft Training Session #5 – Broadcasts & Reports.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
Desktop Publishing Notes: Letterhead Stationery Understand business publications.2 Letterhead Stationery Examples.
CSE 636 Data Integration Schema Matching Cupid Fall 2006.
HKU CSIS DB Seminar: HKU CSIS DB Seminar: Finding Set-Mappings in Schema Matching Supervisor: Dr. David Cheung Speaker: Eric Lo.
Generic Schema Matching using Cupid Jayant Madhavan University of Washington Philip A. Bernstein Erhard Rahm Microsoft Research University of Leipzig.
© 2002 by Prentice Hall 1 Database Processing with Microsoft Access David M. Kroenke Database Concepts 1e Appendix A.
Database Processing with Microsoft Access Appendix DAVID M. KROENKE’S DATABASE CONCEPTS, 2 nd Edition.
Car Loan Company Loan Amount Yearly Interest Rate Monthly Interest Rate Length of Loan (in months) Monthly Loan Payment Car #1: 1997 BMW 318i.
Building Staff Research Topic Research Project Research Center DepartmentFaculty Relationships: 1.Work-in 2.Do 3.Has 4.Do 5.Locate-in.
RESEARCH METHODS Lecture 15. MEASUREMENT OF CONCEPTS.
Basic Concepts of Object Orientation Object-Oriented Analysis CIM2566 Bavy LI.
Computer Applications I 3.02 Purpose of Publications Business Cards and Letterhead.
 Most maps have the following elements, which are necessary to read and understand them.
Map Essentials Coach Novsek.
© 2005 Prentice-Hall, Inc. 4-1 Chapter 4 Communication.
Course Summery’2012 ISLM 301: Indexing and Abstracting Full Marks: 100 Course Final 80, Incourse 20 (Tutorial 10, Class Presentation 5, Project 5) CH-4.
3.01 Business Documents News/Press Release. Issued by an organization to emphasize specific information that it considers important. Sent to members of.
Slide 6-1 The Marketing Mix CHAPTER 6. Slide 6-2 Why Market?  Motivate customer to take action  Create memorable awareness for the future ?
Letterhead, Business Cards, Logo & Resumes. 2 Letterhead  A good letterhead communicates practical information at a glance, yet it does not overshadow.
5 Best Ideas For Prom Transportation Atlanta
Stojanka Brankovic, C.E.MSc Annual World Bank Conference on
Beginning XML 4th Edition.
7.5(A) Generalize the critical attributes of similarity, including
Real Estate Tips: How to Pick a Great Real Estate Agent?
MAD Mean Absolute Deviation
Chapter Functions.
Cross-language Information Retrieval
florida real estate school top online real estate courses real estate broker online classes florida real estate school top online real estate.
Professional Tax Help. Tax Services Tax and estate planning.
UNIT TWO: CLASSIFYING REAL NUMBERS
Automating Schema Matching for Data Integration
MAD Mean Absolute Deviation
Presentation transcript:

Discovering Direct and Indirect Matches for Schema Elements Li Xu Data Extraction Group Brigham Young University Sponsored by NSF

Car Problem Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type

Applications Data Integration Schema Integration Message Mapping Data Translation

Approach Direct Matches Indirect Matches Union Selection Composition Decomposition

Union and Selection Car Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type

Composition and Decomposition Car Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type

Matching Techniques Terminological Relationships Value Characteristics Expected Data Values Structure

Terminological Relationships WordNet Machine-Learned Rules Example: (Make, Brand) The number of different common hypernym roots of A and B Sum of distances of A and B to a common hypernym The sum of the number of senses of A and B

Value Characteristics Machine Learning Features [LC94] String length, numeric ratio, space ratio. Mean, variation, coefficient variation, standard deviation;

Make & ModelBrand Model Expected Values Application Concepts Data Recognizers CarMake  “ford”  “honda”  … CarModel  “accord”  “mustang”  “taurus”  … Ford Mustang Ford Taurus Ford F150 … CarMake. CarModel Legend Mustang A4 … CarModel CarMake TargetSource Acura Audi BMW …

Structure PO POShipToPOBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder DeliverToInvoiceTo Items ItemItemCount ItemNumber QuantityUnitOfMeasure CityStreet Address TargetSource

Structure (Cont.) PO POShipToPOBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder DeliverToInvoiceTo Items ItemCount ItemNumber QuantityUnitOfMeasure CityStreet Address DeliverTo TargetSource

Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder InvoiceTo Items ItemCount ItemNumber QuantityUnitOfMeasure City Street City Street POShipToDeliverTo TargetSource

Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count PurchaseOrder InvoiceTo Items ItemCount City Street City Street LineQtyUoM ItemNumber Quantity LineQtyUoM ItemNumber Quantity LineQty QuantityUnitOfMeasure POShipToDeliverTo TargetSource

Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder InvoiceTo Items ItemCount ItemNumber Quantity City Street City Street City Street City Street Count LineQty QuantityUnitOfMeasure POShipToDeliverTo TargetSource

Experiments Methodology Measures Precision Recall F Measure

Results Applications (Number of Schemes) Precision (%) Recall (%) F (%) CorrectFalse Positive False Negative Course Schedule (5) Faculty Member (5) Real Estate (5) Data borrowed from Univ. of Washington Indirect Matches: 94% (precision, recall, F-measure)

Ground-Truthing Contact Cell phone Office phone Firm location Fax Firm name Agent name Agent’s Or Firm’s

Limitation (Expected Data Value)

Contributions Direct Matches Indirect Matches Expected values Structure High Precision and High Recall