Management of XML Documents in Object-Relational Databases Thomas Kudrass Matthias Conrad HTWK Leipzig EDBT-Workshop XML-Based Data Management Prague,

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Chapter 10: Designing Databases
An Introduction to XML Based on the W3C XML Recommendations.
© Krumbein / Kudrass ADBIS | 2003 September 3-6, 2003, Dresden, Germany {kudrass | Thomas Kudrass, Tobias Krumbein Rule-Based.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 28 Database Systems I The Relational Data Model.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
Data Quality Class 5. Goals Project Data Quality Rules (Continued) Example Use of Data Quality Rules.
1 SCHEMALESS APPROACH OF MAPPING XML DOCUMENTS INTO RELATIONAL DATABASE Ibrahim Dweib, Ayman Awadi, Seif Elduola Fath Elrhman, Joan Lu CIT 2008 Sydney,
Data Modelling. EAR model This modelling language allows a very small vocabulary: Just as English has nouns, verbs, adjectives, pronouns.., EAR models.
A Guide to SQL, Seventh Edition. Objectives Understand the concepts and terminology associated with relational databases Create and run SQL commands in.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
IST Databases and DBMSs Todd S. Bacastow January 2005.
10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
DATA MODELING AND DATABASE DESIGN
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
©Silberschatz, Korth and Sudarshan10.1Database System ConceptsIntroduction XML: Extensible Markup Language Defined by the WWW Consortium (W3C) Originally.
XML By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Chapter 10: XML.
Neminath Simmachandran
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Database Solutions for Storing and Retrieving XML Documents.
SQL data definition using Oracle1 SQL Data Definition using Oracle.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Database Management COP4540, SCS, FIU Relational Model Chapter 7.
Electronic Commerce COMP3210 Session 4: Designing, Building and Evaluating e-Commerce Initiatives – Part II Dr. Paul Walcott Department of Computer Science,
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
1 Design Issues in XML Databases Ref: Designing XML Databases by Mark Graves.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Web Technologies COMP6115 Session 4: Adding a Database to a Web Site Dr. Paul Walcott Department of Computer Science, Mathematics and Physics University.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
An Introduction to XML Sandeep Bhattaram
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
Visual Programing SQL Overview Section 1.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
Database Basics BCIS 3680 Enterprise Programming.
Well Formed XML The basics. A Simple XML Document Smith Alice.
Experience with XML Schema Ashok Malhotra Schema Usage  Mapping XML Schema and XML documents controlled by the Schema to object classes and instances.
Copyright 2002, Ronald Bourret, XML-DBMS Middleware for XML and databases Ronald Bourret O'Reilly Open.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
10/14/2001 Management of XML Documents without Schema in Relational Database Systems Workshop Objects, and Databases OOPSLA 2001, Tampa Thomas Kudrass.
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
1 Introduction to XML Babak Esfandiari. 2 What is XML? introduced by W3C in 98 Stands for eXtensible Markup Language it is more general than HTML, but.
CS 480: Database Systems Lecture 26 March 18, 2013.
Session III Chapter 6 – Creating DTDs
Contents Preface I Introduction Lesson Objectives I-2
Session II Chapter 6 – Creating DTDs
Presentation transcript:

Management of XML Documents in Object-Relational Databases Thomas Kudrass Matthias Conrad HTWK Leipzig EDBT-Workshop XML-Based Data Management Prague, 24 March 2002

© T. Kudrass, HTWK Leipzig Overview Motivation Object-Relational Database Concepts Parsing XML Documents XML-to-ORDB Mapping Meta-Data Special Issues Conclusions

© T. Kudrass, HTWK Leipzig Motivation Storing of XML documents in DBMS Use existing database technology Dealing with complex objects: – XML documents = complex objects – avoid any decomposition – object-relational database technology good choice to represent complex objects

© T. Kudrass, HTWK Leipzig User-Defined Types in ORDB Complex Data Types – Object Type Object Type – Collection Type Collection Type Object References Object Views

© T. Kudrass, HTWK Leipzig Example: Object Types CREATE TYPE Type_Professor AS OBJECT ( PNameVARCHAR(80), SubjectVARCHAR(120) ); object-valued object table attribute CREATE TYPE Type_Course AS OBJECT ( CREATE TABLE TabProfessor OF Name VARCHAR(100), Type_Professor; ProfessorType_Professor );

© T. Kudrass, HTWK Leipzig Example: Collection Types CREATE TYPE Type_Professor AS OBJECT ( PNameVARCHAR(80), SubjectVARCHAR(120) ); Array Nested Table CREATE TYPE TypeVa_ Professor AS CREATE TYPE Type_TabProfessor AS VARRAY(5) OF Type_Professor; TABLE OF Type_Professor; CREATE TABLE TabDept ( DName VARCHAR(80), Professor Type_TabProfessor ) NESTED TABLE Professor STORE AS TabProfessor_List;

© T. Kudrass, HTWK Leipzig Example: Object References CREATE TYPE Type_Professor AS OBJECT ( PNameVARCHAR(80), DeptVARCHAR(120) ); CREATE TABLE TabProfessor OF Type_Professor; CREATE TYPE Type_Course AS OBJECT ( NameVARCHAR(200), Prof_RefREF Type_Professor ); CREATE TABLE TabCourse OF Type_Course; Reference to objects of object table TabProfessor

© T. Kudrass, HTWK Leipzig Parsing DTD and XML XML V2 ParserDTD Parser XML DocumentDTD Schema Definition Well-Formedness Validity Check XML2 Oracle XML DOM TreeDTD DOM Tree DBMS Oracle JDBC / ODBC Syntax Check

© T. Kudrass, HTWK Leipzig

© T. Kudrass, HTWK Leipzig Object–Based–Mapping DTD Classes Tables CLASS A { CREATE TABLE A ( STRING b; a_pk INTEGER NOT NULL, C c; b VARCHAR(30) NOT NULL); CLASS C { CREATE TABLE C ( STRING d;} c_pk INTEGER NOT NULL, a_fk INTEGER NOT NULL, d VARCHAR(10) NOT NULL); Modification of the Mapping Algorithm [Bourret]  No class definitions  Use objects of the DTD tree

© T. Kudrass, HTWK Leipzig Each Complex Element  Table Each Set-Valued Element  Table Primary Key in each Table CREATE TABLE TabUniversity ( IDUniversity 2 CREATE TABLE TabStudent ( IDStudent 4 CREATE TABLE TabCourse ( IDCourse 5 CREATE TABLE TabProfessor ( IDProfessor CREATE TABLE TabSubject ( IDSubject Step 1

© T. Kudrass, HTWK Leipzig Other Elements & Attributes  Table Columns CREATE TABLE TabUniversity ( IDUniversity, attrStudyCourse, CREATE TABLE TabStudent ( IDStudent, attrStudNr, attrLName, attrFName, CREATE TABLE TblMatrikelNr ( IDMatrikelNr, attrMNummer, CREATE TABLE TabCourse ( IDCourse, attrName, attrCreditPts, CREATE TABLE TabProfessor ( IDProfessor, attrPName, attrDept, CREATE TABLE TabSubject ( IDSubject, attrSubject, Step

© T. Kudrass, HTWK Leipzig Relationships between Elements  Foreign Keys CREATE TABLE TabUniversity ( IDUniversity INTEGER NOT NULL, attrStudyCourse VARCHAR(4000) NOT NULL, PRIMARY KEY (IDUniversity)); CREATE TABLE TabStudent ( IDStudent INTEGER NOT NULL, IDUniversity INTEGER NOT NULL, attrStudNr VARCHAR(4000) NOT NULL, attrLName VARCHAR(4000) NOT NULL, attrFName VARCHAR(4000) NOT NULL, PRIMARY KEY (IDStudent), CONSTRAINT conMatrikel FOREIGN KEY (IDUniversity) REFERENCES TabUniversity (IDUniversity));... Step 3

© T. Kudrass, HTWK Leipzig ORDBS Oracle and XML Basic Idea: – Generate an object-relational schema from the DTD – Natural representation of an XML document by combining user-defined types Different Mapping Rules: – Simple elements – Complex elements – Set-valued elements – Complex set-valued elements

© T. Kudrass, HTWK Leipzig XML Attributes & Simple Elements Elements of #PCDATA type and XML attributes  Attributes of the object type Domain of Simple Elements: – No type information in the DTD: numeric vs. alphanumeric? length? – Restrictions of the DBMS (e.g. VARCHAR [Oracle] 4000 characters) Mapping of an XML attribute of a simple element  Definition of an object type for both attribute and element

© T. Kudrass, HTWK Leipzig CREATE TABLE TabProfessor OF Type_Professor; CREATE TYPE Type_Professor AS OBJECT ( attr PAddressVARCHAR(4000), attrPNameVARCHAR(4000), attrSubject VARCHAR(4000), attrDeptType_Dept); CREATE TYPE Type_Dept AS OBJECT ( attrDept VARCHAR(4000), attrDAddressVARCHAR(4000)); XML Attributes & Simple Elements

© T. Kudrass, HTWK Leipzig Complex Elements Nesting of elements by composite DB object types CREATE TABLE TabUniversity ( attrStudyCourse VARCHAR(4000), attrStudent Type_Matrikel ); CREATE TYPE Type_Student AS OBJECT ( attrStudNrVARCHAR(4000), attrLNameVARCHAR(4000), attrFNameVARCHAR(4000), attrCourse Type_Vorlesung ); CREATE TYPE Type_Course AS OBJECT ( attrNameVARCHAR(4000), attrProfessorType_Professor, attrCreditPts VARCHAR(4000)); CREATE TYPE Type_Professor AS OBJECT ( attrPNameVARCHAR(4000), attrSubject VARCHAR(4000), attrDept VARCHAR(4000)); INSERT INTO TabUniversity VALUES ( ‘Computer Science', Type_Student('23374','Conrad','Matthias', Type_Course(‘Databases II‘, Type_Professor(‘Kudrass‘, ‘Database Systems‘', ‘Computer Science‘), '4'))); SELECT u.attrStudent.attrLname FROM TabUniversity u WHERE u.attrStudent.attrCourse.attrProfessor.attrPName = ‘Kudrass';

© T. Kudrass, HTWK Leipzig Set-Valued Elements Multiple Occurrence (in DTD): marked by + or * DBMS Restrictions – collection type applicable to set-valued elements with text- valued subelements, e.g. ARRAY OF VARCHAR – collection type not applicable to set-valued elements with complex subelements subelements may be set-valued again Solutions – use newer DBMS releases (e.g. Oracle 9i) – model relationships with object references

© T. Kudrass, HTWK Leipzig Set-Valued Elements CREATE TYPE Type_Student AS OBJECT ( attrJahrgang VARCHAR(4000), attrUniversity REF Type_University ); CREATE TABLE TabStudent OF Type_Student; CREATE TYPE Type_University AS OBJECT( attrStudyCourse VARCHAR(4000)); CREATE TABLE TabUniversity OF Type_University; Set-valued element Student Modeling in object type Type_Student with a reference to objects of the table TabUniversity Reference to University Objects

© T. Kudrass, HTWK Leipzig Set-Valued Elements CREATE TYPE TypeVA_Course AS VARRAY(100) OF Type_Course; CREATE TYPE TypeVA_Professor AS VARRAY(100) OF Type_Professor; CREATE TYPE TypeVA_Subject AS VARRAY(100) OF VARCHAR(4000); CREATE TABLE TabUniversity ( attrStudyCourse VARCHAR(4000), attrStudent Type_Matrikel ); CREATE TYPE Type_Student AS OBJECT ( attrStudNrVARCHAR(4000), attrLNameVARCHAR(4000), attrFNameVARCHAR(4000), attrCourse Type_Vorlesung ); CREATE TYPE Type_Course AS OBJECT ( attrNameVARCHAR(4000), attrProfessorType_Professor, attrCreditPts VARCHAR(4000)); CREATE TYPE Type_Professor AS OBJECT ( attrPNameVARCHAR(4000), attrSubject VARCHAR(4000), attrDept VARCHAR(4000));

© T. Kudrass, HTWK Leipzig Set-Valued Elements Example INSERT INTO TabUniversity VALUES ( ‘Computer Science', TypeVA_Student ( Type_Student('23374','Conrad','Matthias', TypeVA_Course ( Type_Course(‘Databases II‘, TypeVA_Professor ( Type_Professor(‘Kudrass‘, TypeVA_Subject ( ‘Database Systems,‘Operating Systems‘), ‘Computer Science‘)),‘4‘), Type_Course(‘CAD Intro‘, TypeVA_Professor ( Type_Professor(‘Jaeger‘, TypeVA_Subject ( ‘CAD‘,‘CAE‘), ‘Computer Science‘)),‘4‘),...)), Type_Student(‘00011',‘Meier',‘Ralf', … ) … )...);

© T. Kudrass, HTWK Leipzig Dealing with Null Values Restrictions with NOT NULL constraints in object-relational DB schema – NOT NULL constraints in table - not in object type! – NOT NULL constraints not applicable to collection types Object-valued attributes: – use CHECK constraints for NOT NULL Loss of DTD semantics DTD in the database

© T. Kudrass, HTWK Leipzig Dealing with CHECK Constraints CREATE TYPE Type_Address AS OBJECT ( attrStreetVARCHAR(4000), attrCityVARCHAR(4000)); CREATE TYPE Type_Course AS OBJECT ( attrNameVARCHAR(4000), attrAddress Type_Address); CREATE TABLE TabCourse OF Type_Course ( attrNameNOT NULL, CHECK (attrAdresse.attrStrasse IS NOT NULL)); // ORA-02290: Desired error message 1. INSERT INTO TabCourse ( VALUES (‘CAD Intro’,Type_Address (NULL,’Leipzig’); // ORA-02290: Undesired error message 2. INSERT INTO TabCourse ( VALUES ('RN', NULL)

© T. Kudrass, HTWK Leipzig Meta-Data about XML Documents Unique DocumentID for each Document Prolog Information Document Location (URL) Name Space Element vs. Attribute

© T. Kudrass, HTWK Leipzig Naming Conventions for DB Objects Rules: – Tab Elementname  Table Name – Type _Elementname  Object Type Name – TypeVa _Elementname  Array Name No Conflicts with Keywords Introduction of a Schema ID Naming Rule: SchemaID + Naming Convention + Name CREATE TYPE DTD01_Type_University CREATE TYPE DTD02_Type_University AS OBJECT ( AS OBJECT ( attrStudyCourse VARCHAR(4000) ); attrRegister VARCHAR(4000) );

© T. Kudrass, HTWK Leipzig Conclusions: Advantages Non-atomic domains possible – Natural representation of XML Documents – Nesting of any complexity possible Simple queries by using dot notation Using object references to represent relationships (OIDs)

© T. Kudrass, HTWK Leipzig Conclusions: Drawbacks Mapping Deficiencies – Possible restrictions of element types in collections – No adequate mapping of NOT NULL constraints Loss of Information – Prolog, Comments, Processing Instructions, Prolog – Entity References – Attribute vs. Element ? Schema Evolution – Modification of DTD  Modification of DB Type Information – Target type: VARCHAR - not sufficient!

© T. Kudrass, HTWK Leipzig Outlook Graph-based creation of a schema Source: XML Schema Use CLOB datatype Enhance Meta-Schema – Comments, Processing Instructions and their position in document – Entity references and their substitution text