XML Schema Integration Resources : Louise Lane & Kalpdrum Passi, Sanjay Madria and Mukesh Mohania - “A Model for XML Schema Integration”, and My Research.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
SDPL 2003Notes 2: Document Instances and Grammars1 2.5 XML Schemas n A quick introduction to XML Schema –W3C Recommendation, May 2, 2001: »XML Schema Part.
Xyleme A Dynamic Warehouse for XML Data of the Web.
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
XML Schemas Lecture 10, 07/10/02. Acknowledgements A great portion of this presentation has been borrowed from Roger Costello’s excellent presentation.
Lecture 14 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
Sunday, June 28, 2015 Abdelali ZAHI : FALL 2003 : XML Schemas XML Schemas Presented By : Abdelali ZAHI Instructor : Dr H.Haddouti.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Unit 4 – XML Schema XML - Level I Basic.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
Processing of structured documents Spring 2003, Part 3 Helena Ahonen-Myka.
17 Apr 2002 XML Schema Andy Clark. What is it? A grammar definition language – Like DTDs but better Uses XML syntax – Defined by W3C Primary features.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Lecture 15 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
A Unified Framework for the Semantic Integration of XML Databases
SDPL 2002Notes 2: Document Instances and Grammars1 2.5 XML Schemas n A quick introduction to XML Schema –W3C Recommendation, May 2, 2001: »XML Schema Part.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Schema 1 Lecturer.
XML Schema Vinod Kumar Kayartaya. What is XML Schema?  XML Schema is an XML based alternative to DTD  An XML schema describes the structure of an XML.
1 XML Schemas. 2 Useful Links Schema tutorial links:
Dr. Azeddine Chikh IS446: Internet Software Development.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Schemas Ellen Pearlman Eileen Mullin Programming the Web Using XML.
XML and friends Part 2 - XML Schema ELAG 2001 workshop 8 Jan Erik Kofoed © BIBSYS Library Automation.
XSLT for Data Manipulation By: April Fleming. What We Will Cover The What, Why, When, and How of XSLT What tools you will need to get started A sample.
ITEC224 Database Programming
Software School of Hunan University Database Systems Design Part III Section 5 Design Methodology.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Schema 2 Lecturer.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
SDPL 2005Notes 2.5: XML Schemas1 2.5 XML Schemas n Short introduction to XML Schema –W3C Recommendation, 1 st Ed. May, 2001; 2 nd Ed. Oct, 2004: »XML Schema.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Identity Constraints.
Dimitrios Skoutas Alkis Simitsis
New Perspectives on XML, 2nd Edition
XML Schema. Why Schema? To define a class of XML documents Serve same purpose as DTD “Instance document" used for XML document conforming to schema.
An OO schema language for XML SOX W3C Note 30 July 1999.
An Introduction to XML Sandeep Bhattaram
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 5 XML Schema (Based on Møller and Schwartzbach,
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XSD Presented by Kushan Athukorala. 2 Agenda XML Namespaces XML Schema XSD Indicators XSD Data Types XSD Schema References.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
XML and Database.
Processing of structured documents Spring 2003, Part 3 Helena Ahonen-Myka.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: (x2073)
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
Experience with XML Schema Ashok Malhotra Schema Usage  Mapping XML Schema and XML documents controlled by the Schema to object classes and instances.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
SDPL : XML Schemas1 2.5 XML Schemas n Short introduction to XML Schema –W3C Recommendation, 1 st Ed. May, 2001; 2 nd Ed. Oct, 2004: »XML Schema.
XML Schema Integration
XML QUESTIONS AND ANSWERS
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Semi-Structured data (XML Data MODEL)
Analysis models and design models
New Perspectives on XML
Presentation transcript:

XML Schema Integration Resources : Louise Lane & Kalpdrum Passi, Sanjay Madria and Mukesh Mohania - “A Model for XML Schema Integration”, and My Research in Fall, 2001 with Dr. Madria

Contents What is XML Data Integration Why business applications use XML What is XML Schema Different ways to integrate XML data XML Schema Integration XML Namespaces Phases in Schema Integration XML Schema Data Model Graphical representation of the model

Contents contd.. Conflicts resolution Integration phase Construction of Global schema Advantages Disadvantages Conclusion

What is XML XML is a markup language for documents containing structured information. A markup language is a mechanism to identify structures in a document. XML documents are self-describing, thus XML provides a platform independent means to describe data and therefore, can transport data from one platform to another. XML documents can be created and used by applications.

Data Integration E-Commerce applications use data from different sources and need to be integrated. A mediated schema is created to represent a particular application domain and data sources are mapped as views over the mediated schema.

Why Business applications use XML Business applications needs to exchange data between different applications. The data should be transparent from representation and should be platform independent. XML is also used when one or more organizations merge. When organizations merge, interoperability among documents is necessary which can be achieved using XML integration.

XML Schema XML Schema is the recommended as the standard schema language by W3C to validate documents. XML Schema has a stronger expressive power than DTD schema for the purpose of data exchange and integration from various sources of data.

Different ways to integrate XML data Integrating XML documents Mapping of local schemas to global/integrated schema if the global schema is known, or Querying the data to obtain the required global schema. Integrating XML Schemas

Extracting Schema from XML Documents Minimal Spanning graphs from different documents can be extracted and the Schema can be constructed using these graphs. Heuristic rules are applied on the obtained spanning graphs to construct the schema. The paper “Re-engineering Structures from Web Documents” – Chuang-Hue, Ee-Peng, and Wee-Keong deals with constructing Schema in DTD for given XML documents.

Complexities in integrating XML Documents 1.Need to extract the schema from the document. 2.Integrate the schemas obtained or perform mapping from the individual schema documents to the global schema if the global schema is already present. 3.Parse the XML documents and integrate the data according to the global schema. Querying on XML documents can be done to obtain the integrated document.

Tukwila Data Integration System Tukwila Data Integration system uses a mediated schema to integrate data from different sources. The user asks a query over the mediated schema and the data Integration system reformulates the query over the data sources and executes it. Tukwila uses an Query Re-formulator and Optimizer to query large amounts of data efficiently. MiniCon algorithm is used to map the query from the mediated schema to data sources. It uses an x-scan operator that can query streaming XML data.

Tukwila x-scan operator To query an XML document, Querying techniques like XML-QL and XQL needs the complete XML document to be downloaded and is then queried.

Tukwila x-scan operator contd.. Tukwila X-scan matches regular path expression patterns from the query, returning results in pipelined fashion as the data streams across the network.

XML Schema Integration The automated integration of XML schemas is beneficial to both the traditional forms of view integration and database integration. An integrated schema forms the basis for a valid query language over a particular set of XML documents. The schemas to be integrated currently validate a set of existing XML documents, data integrity and continued document delivery are chief concerns of the integration process.

XML schema requires the use of namespaces to uniquely identify schema structure ( elements, attributes, datatypes, etc. ). The name of each structure is prefaced by a namespace prefix which identifies the namespace that the structure is defined within. A practical example of schema integration is when two companies merge. XML Namespace

Documents and schemas of the companies that merge <gs_equipment xmlns=" xmlns:xsi=" Schema-instance" xsi:schemaLocation=" le.org GSE1.xsd"> Air to Ground FRD Vancouver 6A <gs_equipment xmlns=" xmlns:xsi=" instance" xsi:schemaLocation=" GSE2.xsd"> Winnipeg main Quick as a Jet GSE QJ-TT September

<schema xmlns:xsd=" targetNamespace=" elementFormDefault="qualified" xmlns:GSE1=" <element name="serial_number" type="xsd:string" minOccurs="1" maxOccurs="1" /> <element name="service_hours" type="xsd:integer" minOccurs="0" maxOccurs="1" > <schema xmlns:xsd=" targetNamespace=" elementFormDefault="qualified" xmlns:GSE2="

An object-oriented data model that is called as XSDM ( XML Schema Data Model ) is defined. A three-layered architecture consisting of pre-integration, comparison and integration is used for the integration. A global schema must meet the following criteria: completeness, minimality and understandability. Optionality of elements is expanded to meet boundary restrictions.

Three Phases of integration Pre-Integration: In this phase element, attribute and datatype definitions are extracted through parsing the actual schema document. Comparison: In this phase, the correspondences between elements and attributes are determined either by using semantic learning or using human interaction. Integration: In this phase, conflicts that exist between the corresponding elements and/or attributes such as naming conflicts, datatype conflicts and structural conflicts are resolved.

XML Schema Data Model (XSDM) Basically four structures are defined – Node Object, Child Object, Datatype Object and Attribute Object. Node Object : Represents an element, which may be either non-terminal or terminal. Each node represents another set of structures that define the node – Name, Namespace, Attribute, Datatype, Substitution Group Name, Child list and Node Type which has six types – terminal, sequence, choice, all, any or empty. Child Object : Represents an element, which is a part of childList. Each child has structures that define itself – Name, namespace, Max Occurances, and Min Occurances.

XML Schema Data Model (XSDM) contd.. Datatype Object : Represents datatype of elements and attributes. The structures that define this are Name, Variety(atomic, union, list), Kind(43 simple and derived datatype), and Constraining Facets. Attribute Object : Represents attributes associated with a non- terminal or terminal element. The structures that define an attribute – Name, Namespace, Use, DataType, and value(default value).

Graphical Representation of XML Schemas

Graphical representation of sample schema for GSE1

Graphical representation of sample schema for GSE2

Conflict Resolution Naming Conflicts: Synonym Naming Conflict: Different names but same defination. Solved using substitution group names. Homonym Naming conflict: Same name but different structure. Homonym conflicts at Non-terminals are called structural conflicts and at terminals are called datatype conflicts.

Conflict Resolution contd.. Datatype & scale differences: Disjoint or incompatible datatypes – union E.g. String, integer Compatible datatypes – scale adjustment E.g. Integer, float Enumerated datatype – taking set of all the enumerations E.g. {a,b}, {b,c} => {a,b,c} Scale differences – constraint facet redefinition

Conflict Resolution contd.. Structural Conflicts: Type Conflicts: Terminal in one schema and non-terminal in another schema – Add both to the global schema. Key conflicts: If both schemas have their individual keys, then the global schema’s key should be a composite of both the keys. If an element is declared as key in one schema and as a non-key in other schema, a complete knowledge of the data present in the documents is required. If the same element is declared as key in both the schemas, a prefix can be added to the keys to make the key elements unique globally.

Integration phase 1.Constructing correspondences table 2.Constructing dependencies table Correspondences table contain the information about the corresponding elements/attributes. An entry in the Dependencies table denotes the dependency of an element on other elements/attributes. The elements/attributes are integrated only after their dependencies are integrated.

Graphical representation of Global schema obtained

Construction of the Global schema Document Once the integration process is completed, the global schema in XSDM notation is used to construct the global XML schema document. The construction of the XML schema document is a straight- forward process because all the data about the schema is present in the XSDM notation.

Global schema document <schema xmlns:xsd=" targetNamespace=" elementFormDefault="qualified" xmlns:GSEM=" xmlns:GSE2=" >

Global schema document Contd..

Advantages This method is useful when a required global schema is not present. The global XML schema obtained is complete, minimal and understandable. Human interaction is required only for a limited level. Even though local schemas are large and complex, the global schema can be obtained efficiently.

Disadvantages User interaction is required, cannot do the task by only using semantic learning. Not successful in resolving all key conflicts. Complete knowledge on data is required to resolve these. The method doesn’t have an cross check on the users input. The process may result in a un minimal schema if the user doesn’t recognize all the correspondences.

Conclusion This method is successful in integrating schema documents. The method explained is implementable.