XML Schema Integration Ray Dos Santos July 19, 2009.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Angelo Augusto Frozza, Ronaldo dos Santos Mello {frozza, A Method for Defining Semantic Similarities between GML Schemas Angelo Augusto.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Chapter 6: Entity-Relationship Model (part I)
Chapter 10: Information Integration. Bing Liu, UIC ACL-07 2 Introduction At the end of last topic, we identified the problem of integrating extracted.
Data Design The futureERD - CardinalityCODINGRelationshipsDefinition.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Agenda for Week 1/31 & 2/2 Learn about database design
Modeling Data The Entity Relationship Model (ER) For Database Design.
©Silberschatz, Korth and Sudarshan2.1Database System Concepts Chapter 2: Entity-Relationship Model Entity Sets Relationship Sets Design Issues Mapping.
Data Modeling 1 Yong Choi School of Business CSUB.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
CONCEPTS OF E-R MODEL. CONTENTS Entity Attributes Data Value Entity Types Types of Entity Types Relationships Relationship Constraints.
A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001.
Content Resource- Elamsari and Navathe, Fundamentals of Database Management systems.
Web Mining ( 網路探勘 ) WM10 TLMXM1A Wed 8,9 (15:10-17:00) U705 Information Integration ( 資訊整合 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Module Title? Data Base Design 30/6/2007 Entity Relationship Diagrams (ERDs)
Database Systems: Design, Implementation, and Management Ninth Edition
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Concepts and Terminology Introduction to Database.
A SURVEY OF APPROACHES TO AUTOMATIC SCHEMA MATCHING Sushant Vemparala Gaurang Telang.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
MIS 3053 Database Design & Applications The University of Tulsa Professor: Akhilesh Bajaj ER Model Lecture 1 © Akhilesh Bajaj, 2000, 2002, 2003, 2004.
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
©Silberschatz, Korth and Sudarshan2.1Database System Concepts Chapter 2: Entity-Relationship Model Entity Sets Relationship Sets Design Issues Mapping.
1 Chapter 1 Introduction. 2 Introduction n Definition A database management system (DBMS) is a general-purpose software system that facilitates the process.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
Chapter 4 Entity Relationship (ER) Modeling.  ER model forms the basis of an ER diagram  ERD represents conceptual database as viewed by end user 
Entity-Relationship (ER) Modelling ER modelling - Identify entities - Identify relationships - Construct ER diagram - Collect attributes for entities &
Entity-Relationship Model Using High-Level Conceptual Data Models for Database Design Entity Types, Sets, Attributes and Keys Relationship Types, Sets,
1 A Demo of Logical Database Design. 2 Aim of the demo To develop an understanding of the logical view of data and the importance of the relational model.
Chapter 13 Designing Databases Systems Analysis and Design Kendall & Kendall Sixth Edition.
A Survey of Approaches to Automatic Schema Matching (VLDB Journal, 2001) November 7, 2008 IDB SNU Presented by Kangpyo Lee.
Databases Illuminated Chapter 3 The Entity Relationship Model.
 XML: What and Why?  Application: Web Services  XQuery: The XML Query Language  Good News: XQuery, as a declarative language, is ideal for automatic.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 3 The Relational Database Model.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
1 DATABASE TECHNOLOGIES (Part 2) BUS Abdou Illia, Fall 2015 (September 9, 2015)
INTRODUCTION TO DATA MODELING CS 260 Database Systems.
UML Basics and XML Basics Navigating the ISO Standards.
1 ER Modeling BUAD/American University Mapping ER modeling to Relationships.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
The relational model1 The relational model Mathematical basis for relational databases.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Database vs File System Integrated Data Reduced Data Duplication Program/Data Independence Easier representation for user Separate/Isolated data Appl.
Databases Databases are collections of information; our study repeats a theme: Tell the computer the structure, and it can help you! © 2004, Lawrence Snyder.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
COP Introduction to Database Structures
Database Systems: Design, Implementation, and Management Tenth Edition
Logical Database Design and the Rational Model
Entity-Relationship Model
Relational Databases.
CIS 207 The Relational Database Model
Lecture 2 The Relational Model
Entity Relationship Diagrams
Chapter 4 Entity Relationship (ER) Modeling
ISC321 Database Systems I Chapter 10: Object and Object-Relational Databases: Concepts, Models, Languages, and Standards Spring 2015 Dr. Abdullah Almutairi.
Data Model.
Database Modeling using Entity Relationship Model (E-R Model)
Developing a Metadata Element Set and/or an Application Profile
Chapter 7: Entity-Relationship Model
INSTRUCTOR: MRS T.G. ZHOU
Presentation transcript:

XML Schema Integration Ray Dos Santos July 19, 2009

2 XML integration Fundamental problem: schema matching, which takes two (or more) schemas to produce a mapping between elements (or attributes) of the two (or more) schemas that correspond semantically to each other. Objective: find corresponding entities.

3 Some application domains for XML HealthCare Level Seven Geography Markup Language (GML) Systems Biology Markup Language (SBML) XBRL, the XML based Business Reporting standard Global Justice XML Data Model (GJXDM) ebXML e.g. Encoded Archival Description Application Digital photography metadata XMP An XML grammar for sensor data (SensorML) Real Simple Syndication (RSS 2.0)

4 Integrating two schemas Consider two schemas, S1 and S2, representing two customer relations, Cust and Customer. S1 S2 Cust Customer CNoCustID CompNameCompany FirstNameContact LastNamePhone

5 Integrating two schemas (contd) Represent the mapping with a similarity relation, , over the power sets of S1 and S2, where each pair in  represents one element of the mapping. E.g., Cust.CNo  Customer.CustID Cust.CompName  Customer.Company {Cust.FirstName, Cust.LastName}  Customer.Contact

6 Different types of matching Schema-level only matching: only schema information is considered. Domain and instance-level only matching: some instance data (data records) and possibly the domain of each attribute are used. This case is quite common on the Web. Integrated matching of schema, domain and instance data: Both schema and instance data (possibly domain information) are available.

7 Pre-processing for integration Tokenization: break an item into atomic words using a dictionary, e.g.,  Break “fromCity” into “from” and “city”  Break “first-name” into “first” and “name” Expansion: expand abbreviations and acronyms to their full words, e.g.,  From “dept” to “departure” Stopword removal and stemming Standardization of words: Irregular words are standardized to a single form, e.g.,  From “colour” to “color”

8 Schema-level matching Schema level matching relies on information such as name, description, data type, relationship type (e.g., part-of, is-a, etc), constraints, etc. Match cardinality:  1:1 match: one element in one schema matches one element of another schema.  1:m match: one element in one schema matches m elements of another schema.  m:n match: m elements in one schema matches n elements of another schema.

9 An example m:1 match is similar to 1:m match. m:n match is complex, and there is little work on it.

10 Linguistic approaches They are used to derive match candidates based on names, comments or descriptions of schema elements: Name match:  Equality of names  Synonyms  Equality of hypernyms: A is a hypernym of B is B is a kind-of A.  Common sub-strings  Cosine similarity  User-provided name match: usually a domain dependent match dictionary

11 Linguistic approaches (contd) Description match: in many files, there are comments to schema elements, e.g., Cosine similarity can be used to compare comments after stemming and stopword removal.

12 Constraint based approaches Constraints such as data types, value ranges, uniqueness, relationship types, etc. An equivalent or compatibility table for data types and keys can be provided. E.g.,  string  varchar, and (primiary key)  unique Note: On the Web, the constraint information is often not available, but some can be inferred based on the domain and instance data.

13 Domain and instance-level matching In many applications, some data instances or attribute domains may be available. Value characteristics are used in matching. Two different types of domains  Simple domain: each value in the domain has only a single component (the value cannot be decomposed).  Composite domain: each value in the domain contains more than one component.

14 Match of simple domains A simple domain can be of any type. If the data type information is not available (this is often the case on the Web), the instance values can often be used to infer types, e.g.,  Words may be considered as strings  Phone numbers can have a regular expression pattern. Data type patterns (in regular expressions) can be learned automatically or defined manually.  E.g., used to identify such types as integer, real, string, month, weekday, date, time, zip code, phone numbers, etc.

15 XML is different from databases Limited use of acronyms and abbreviations on the XML: but natural language words and phrases, for general public to understand.  Databases use acronyms and abbreviations extensively. Limited vocabulary: for easy understanding A large number of similar databases: a large number of sites offer the same services or selling the same products. Additional structures: the information is usually organized in some meaningful way. But the organization needs to be understood first.  Related attributes are together.  Hierarchical organization.

16 Instance-based matching via footprints  Assume a global schema is given and a set of instances are also given.  The method uses each instance value of every attribute to probe the underlying ontology to obtain the footprints.  These footprints are used to help with a best-faith matching estimate. It performs matches based on  Events  Relationships  Spatial characteristics  Attributes

17 Entity Rel Event Object Spatial -- name -- description -- shape -- size -- atts -- begin at -- stop at -- elapse -- start -- end -- touches -- overlaps -- contains -- spans -- within -- parent -- child -- type-of -- is-a -- part-of Washington DC Monuments Attached to the ground. Intelligently places itself at the height of the underlying terrain , ,0 Washington Monument Major monuments of the federal capital ,0

18 Entity Rel Event Objec t Spati al -- name -- description -- shape -- size -- atts -- begin at -- stop at -- elapse -- start -- end -- touches -- overlaps -- contains -- spans -- within -- parent -- child -- type-of -- is-a -- part-of FootPrint: 01ZZ Events: XX ZZ XX Rel: 01ZZ Spatial: XX Temporal, moving objects Relationships among objects Location

19 Next Steps  How to define footprints  Define rules to minimize footprint size and count  Order footprints such that the most appropriates are “looked at” first  Footprints for subtrees ?