Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
XML: Extensible Markup Language
XML DOCUMENTS AND DATABASES
An Introduction to XML Based on the W3C XML Recommendations.
Tamino – a DBMS Designed for XML Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
The Architecture Design Process
XML and The Relational Data Model
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Module 17 Storing XML Data in SQL Server® 2008 R2.
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
CHAPTER 9 DATABASE MANAGEMENT © Prepared By: Razif Razali.
NHS CFH Approach to HL7 CDA Rik Smithies Chair HL7 UK NProgram Ltd.
XML in SQL Server Overview XML is a key part of any modern data environment It can be used to transmit data in a platform, application neutral form.
1Computer Sciences Department Princess Nourah bint Abdulrahman University.
Creating Extensible Content Models XML Schemas: Best Practices A set of guidelines for designing XML Schemas Created by discussions on xml-dev.
Dr. Azeddine Chikh IS446: Internet Software Development.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
XML BIS4430 – unit 10. XML Origins Extensible Markup Language (XML) 1998 Inspired by Standard Generalized Markup Language (SGML) and HTML. SGML defines.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Querying Structured Text in an XML Database By Xuemei Luo.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Web Architecture: Extensible Language Tim Berners-Lee, Dan Connolly World Wide Web Consortium 元智資工所 系統實驗室 楊錫謦 1999/9/15.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
XML – Part III. The Element … This type of element either has the element content or the mixed content (child element and data) The attributes of the.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
What it is and how it works
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
XML and Database.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Representing data with XML SE-2030 Dr. Mark L. Hornick 1.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
The Relational Model Lecture #2 Monday 21 st October 2001.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
1 XML and XML in DLESE Katy Ginger November 2003.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
XML: Extensible Markup Language
XML QUESTIONS AND ANSWERS
XML and Databases.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
CSE591: Data Mining by H. Liu
Presentation transcript:

Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen

Agenda Introduction: XML and databases Objectives of the study Findings Conclusions

Introduction: XML and databases

Basic definitions XML/relational mapping means data transformation between XML and relational data models Mapping method is the way the mapping is done

Native vs. Relational Why to store XML documents in relational database and not in native XML database? –Immaturity of current native XML database technology –Emerging technology - no ”de facto” standard –Well-working relational databases currently in use Efficient and usable May have been in use for years

Mapping dilemma XML data model supports much more flexible data structures than relational model Two fundamental differences: –XML tags –Nested structure of XML elements vs. flat structure of relational tables If an XML document is not originated from another relational data source, it is possible that the data does not fit to relational schema very well

Dichotomy of mapping methods There are two fundamentally different techniques of storing XML documents in a relational database –LOB presentation –Composed presentation

LOB presentation LOB stands for Large Object One XML document is put into a single column of a relational table At least one column for indexing is also needed Does not take full advantage of classical relational database (no XML extensions) –Not possible to use SQL to query XML elements Not a very interesting choice!

Composed presentation Data structure of an XML document is ”shredded” over one or more tables Example: Different elements to different columns Multiple ways to do this –Table-based and object-relational mapping will be introduced later

Objectives of the study

Find and explain the main issues to be considered when converting XML schema to relational schema –In other words: The main challenges that have to be taken into account by Designers of XML/relational mapping methods Users who need to map the data explicitly Find and describe briefly two general mapping methods based on composed presentation

Findings

Issues to consider in mapping Some of the most essential data characteristics –Existence of schema definition document –Stability of the schema –Degree of structure Usage model for data –Queries against the database –Requirement of preserving ”hidden” information DBMS implementation –not covered by the study, because scope was limited to the classical relational model

Data characteristics: Existence of XML schema definition Schema definition says how the structure of XML documents conforming the schema is restricted –XSD (XML Schema Definition) and DTD (Document Type Definition) are currently the dominating standards for defining XML schema. If we have the definition for the schema, conversion to relational schema will be based on it. If we don’t have the schema definition, we have to make guesses how the structure of the given XML vocabulary is restricted. –Guesses are based on the data of instances of the vocabulary (XML documents). In other words we extract the schema from available data. –This is not unproblematic as we see from next example

Data characteristics: Existence of XML schema definition 2 - Example Illustration of the problem of extracting the schema from data: eddy example mannerheimintie 10, helsinki We might deduce from the document, that we wish to restrict the schema to

Data characteristics: Existence of XML schema definition 2 – Example continued But if following document is received from the data source, we either have to extend our relational schema or dismiss the data that relational schema doesn’t support (summer cottage’s address) or combine the two fields: person2 jämeräntaival 10, espoo hiekkatie 7, oulu We can alter the database schema by adding an extra column to table mapped from addressbook element to support the the new information –This solution can’t be however applied if we don’t know the relation between person and summercottage is 1:1. We might get documents containing persons that have many addresses for summer cottages, and again, we would run to the situation that we would have to alter the database schema. We would have to create a property table for the addresses.

Evolving schema If the schema of XML vocabulary is defined, but it experiences changes, respective changes must be made to relational schema Changes are not always such easy to make to relational schema as in previous example (if composed approach is used) It should be evaluated what are the chances for schema to change.

Degree of structure of the XML schema Categorization used in the study: 1.Structured data Data is totally independent from the presentation used to describe it. Document can be navigated without examining it first 2.Semi-structured data Some blocks of the document may contain optionalities 3.Marked-up text Documents require the preservation of ”hidden” information E.g. HTML documents These terms have different meaning in the literature. Information on the following slide is based on the definitions of this slide.

Degree of structure of the XML schema Structured documents can be easily mapped to database using composed presentation. Also semi-structured documents can be decomposed if schema definition is provided. If mixed content is included, it depends on the usage of data whether LOB presentation is better for the mixed content block than further fragmentation. Marked-up text's requirement for “hidden information's” preservation is discussed later.

Storing mixed content to relations Mixed content: Document elements embedded to character data. E.g. example here you have a short example Designing a relational schema to store mixed content –If there are blocks in the content that make sense only as a whole, decomposition of those blocks makes no sense. –If we have strong arguments for decomposing a block containing mixed content, one possible decomposition method is to create one table for the root element and one property table for character data, and a property table for every element that appears in the content.

Mixed content mapping example DTD Example instance: Here we have a nice example ! Relational schema –A(a_pk) –B(a_fk,b, bOrder) –C(a_fk, c, cOrder) –PCDATA(a_fk, pcdata, pcdataOrder)

Usage models for data: Type of queries executed against the database The spectrum of queries –Queries that retrieve XML documents –Queries that retrieve fragments of XML documents –Queries that make transformations on XML data –And even more complex queries...

Query examples 1 Sample documents person1 jämeräntaival 10 espoo hiekkatie 7, oulu person2 smt 10 espoo hiekkatie 7, oulu Query emitting XML fragment: Select the names of persons who live in Espoo person1 person2

Query examples 2 Query making transformation: “select the number of persons living in Espoo” 2

Preservation of “hidden” information The XML document contains “hidden” information that is related to the presentation of the data, not the data itself. –Order of elements –Comments –Whitespaces It might be required that original XML documents can be retrieved –Trivial when LOB presentation is used –If composition presentation is used, all “hidden” information need to be stored to relations

Table-based mapping Listing 1. Required structure of XML document in table-based mapping (Bourret, 2001).

Object-Relational mapping Mapping method for mapping any XML document that has a schema definition. The idea is to convert the schema of document to an object schema, and then convert the object schema to relational schema Step of object/relational conversion is predefined, but XML/object conversion leaves some freedom to define the object view that is mapped from XML data.

Conclusions

The selection between the choice of possible relational representations for XML data include many issues that must be considered. Some of the issues limit the choice to LOB presentation (no schema, rapidly evolving schema, queries include only retrieval of original documents) LOB presentation can be also used for storing blocks of the document where are no references from elsewhere. Usual reason why decomposition method is generally preferred if possible, is the performance gain. Also the data comes more accessible to applications that use the database, but don’t publish any views of data in XML.