Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Native XML Database or RDBMS. Data or Document orientation If you are primarily storing documents, then a Native XML Database may be the best option.
TURKISH STATISTICAL INSTITUTE 1 /34 SQL FUNDEMANTALS (Muscat, Oman)
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001.
Ling Wang, Mukesh Mulchandani Advisor: Elke A. Rundensteiner Rainbow Research group, DSRG, WPI Updating XQuery Views over Relational Data.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data.
2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.
Database Systems and XML David Wu CS 632 April 23, 2001.
Bridging Relational Technology and XML Jayavel Shanmugasundaram University of Wisconsin & IBM Almaden Research Center.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Database Systems More SQL Database Design -- More SQL1.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Structured Query Language Chapter Three DAVID M. KROENKE and DAVID J. AUER DATABASE CONCEPTS, 6 th Edition.
Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001.
Managing XML and Semistructured Data Lecture 18: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001.
CS561 On Relational Support for XML Publishing Beyond Sorting and Tagging Surajit Chaudhuri Raghav Kaushik Jeffrey F. Naughton Presented by:
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.
Using SQL Queries to Generate XML- Formatted Data Joline Morrison Mike Morrison Department of Computer Science University of Wisconsin-Eau Claire.
Structured Query Language
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Structured Query Language Chapter Three DAVID M. KROENKE and DAVID J. AUER DATABASE CONCEPTS, 5 th Edition.
Structured Query Language Chapter Three DAVID M. KROENKE and DAVID J. AUER DATABASE CONCEPTS, 4 th Edition.
Embracing the Value of XML in Institutional Research Jim Few Center for Institutional Effectiveness Kennesaw State University Association.
Maziar Sanaii Ashtiani – SCT – EMU, Fall 2011/12.
Introduction to Databases Chapter 7: Data Access and Manipulation.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 2: Intro to Relational.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
Oracle DML Dr. Bernard Chen Ph.D. University of Central Arkansas.
Publishing Relational Data in XML David McWherter.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com ICOM 5016 – Introduction.
Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland.
PL / SQL By Mohammed Baihan. What is PL/SQL? PL/SQL stands for Procedural Language extension of SQL. PL/SQL is a combination of SQL along with the procedural.
Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April.
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
Chapter 2 Introduction to Relational Model. Example of a Relation attributes (or columns) tuples (or rows) Introduction to Relational Model 2.
Chapter 2: Intro to Relational Model. 2.2 Example of a Relation attributes (or columns) tuples (or rows)
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Structured Query Language
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
+ Midterm Review. + Notes from the Midterm When updating, deleting records Make sure you have a WHERE statement that only will give you the row(s) you.
+ Complex SQL Week 9. + Today’s Objectives TOP GROUP BY JOIN Inner vs. Outer Right vs. Left.
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
1 Lecture 15 Monday, May 20, 2002 Size Estimation, XML Processing.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
BTM 382 Database Management Chapter 8 Advanced SQL Chitu Okoli Associate Professor in Business Technology Management John Molson School of Business, Concordia.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Management of XML and Semistructured Data
Efficiently Publishing Relational Data as XML Documents
Universal Database Systems
Chapter 1: Introduction
Chapter 2: Intro to Relational Model
SilkRoute: A Framework for Publishing Rational Data in XML
Wednesday, May 22, 2002 XML Publishing, Storage
Presentation transcript:

Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh Berthold Reinwald Univ. Wisconsin/IBM Almaden Jayavel Shanmugasundaram Univ. Wisconsin/IBM Almaden Joint work with:

Outline Why? How? Which? Hence

XML Example John Mary Internet Recycling

What is the big deal about XML? Elegantly models complex, hierarchical/ graph-structured data Domain-specific tags (unlike HTML) Standardized!  Fast emerging as dominant standard for data exchange on the WWW

Why Relational Data? Most business data stored in relational databases Unlikely to change in the near future –Scalability, Reliability, Performance, Tools  Need efficient means to publish relational data as XML documents

Usage Scenario Existing Database System (RDBMS) Application/User Query to produce XML Documents XML Result (processed or displayed in browser) The Internet

Outline Why? How? Which? Hence

Example Relational Schema Department DeptIdDeptName 10 Purchasing Project ProjId DeptIdProjName Internet 79510Recycling Employee EmpId DeptIdEmpName John 9110Mary Salary 50K 70K

XML Representation John Mary Internet Recycling

Main Issues Relational data is flat, XML is a tagged graph How do we specify translation from flat model to a graph model? –A query language to map from relations to XML How do we transform flat representations to tagged nested representations? –Efficient implementation strategies

Outline Why? How? –Language? –Mechanism? Which? Hence

SQL: Key Ideas Sub-queries to specify nesting Scalar functions to specify tags/attributes –XML Constructors Aggregate functions to group child elements

Example Relational Schema Department DeptIdDeptName 10 Purchasing Project ProjId DeptIdProjName Internet 79510Recycling Employee EmpId DeptIdEmpName John 9110Mary Salary 50K 70K

SQL: Query to publish XML Select DEPT(d.name,, ) From Department d

SQL: XML Constructor Define XML Constructor DEPT(dname: varchar(20), emplist: xml, projlist: xml) As ( {emplist} {projlist} )

SQL: Query to publish XML Select DEPT(d.name,, ) From Department d

SQL: Query to publish XML Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno) ) From Department d

Query Result John Mary Internet Recycling ( )

Outline Why? How? –Language? –Mechanism? Which? Hence

Relations to XML: Issues Two main differences: –Nesting (structuring) –Tagging Space of alternatives: Late TaggingEarly Tagging Late Structuring Early Structuring Inside Engine Outside Engine

Stored Procedure Approach Issue queries for sub-structures and tag them Could be a Stored Procedure DBMS Engine Department Employee Project Problem: Too many SQL queries! (10, Purchasing) (John) (Mary) (Internet) (Recycling) Early Tagging, Early Structuring, Outside Engine

Correlated CLOB Approach Problem: Correlated execution of sub-queries Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno) ) From Department d Early Tagging, Early Structuring, Inside Engine

De-Correlated CLOB Approach Compute employee lists associated with all departments Compute project lists associated with all departments Join results above on department id Early Tagging, Early Structuring, Inside Engine Problem: CLOBs during query processing

Late Tagging, Late Structuring XML document content produced without structure (in arbitrary order) Tagger enforces order as final step Relational Query Processing Unstructured content Tagging Result XML Document

Redundant Relation Approach How do we represent nested content as relations? (10, Purchasing) (10, Internet) (10, Recycling) (10, John) (10, Mary) (Purchasing, John, Internet) (Purchasing, John, Recycling) (Purchasing, Mary, Internet) (Purchasing, Mary, Recycling) Problem: Large relation due to data redundancy! Late Tagging, Late Structuring

Outer Union Approach How do we represent nested content as relations? Problem: Wide tuples (having many columns) Department EmployeeProject Department EmployeeProject Union (Purchasing, Internet) (Purchasing, Recycling) (Purchasing, John) (Purchasing, Mary) (10, Purchasing) (Purchasing, null, Internet, 0) (Purchasing, null, Recycling, 0) (Purchasing, John, null, 1) (Purchasing, Mary, null, 1) Late Tagging, Late Structuring

Hash-based Tagger Results not structured early –In arbitrary order Tagger has to enforce order during tagging –Hash-based approach Inside/Outside engine tagger Late Tagging, Late Structuring Problem: Requires memory for entire document

Late Tagging, Early Structuring Structured XML document content produced Tagger just adds tags (constant space) Relational Query Processing Structured content Tagging Result XML Document

Sorted Outer Union Approach A B C DEFG A B n n E n n A n C n n F n A n C n n n G Late Tagging, Early Structuring A B n D n n n Sort By: Aid, Bid, Cid Problem: Only partial ordering required

Constant Space Tagger Detects changes in XML document hierarchy Adds appropriate opening/closing tags Inside/outside engine Late Tagging, Late Structuring

Classification of Alternatives Late TaggingEarly Tagging Late Structuring Early Structuring Inside Engine De-Correlated CLOB Outside Engine Stored Procedure Inside Engine Outside Engine Sorted Outer Union (Tagging inside) Sorted Outer Union (Tagging outside) Unsorted Outer Union (Tagging inside) Unsorted Outer Union (Tagging outside) Outside Engine Correlated CLOB

Outline Why? How? –Language? –Mechanism? Which? Hence

Where Does Time Go?

Performance Evaluation Summary Late TaggingEarly Tagging Late Structuring Early Structuring Inside Engine De-Correlated CLOB Outside Engine Stored Procedure Inside Engine Outside Engine Sorted Outer Union (Tagging inside) Sorted Outer Union (Tagging outside) Unsorted Outer Union (Tagging inside) Unsorted Outer Union (Tagging outside) Outside Engine Correlated CLOB

Outline Why? How? –Language? –Mechanism? Which? Hence

Conclusion Publishing XML from relational sources important in Internet SQL-based language specification Implementation Alternatives –Inside engine >> Outside engine –Unsorted Outer Union : sufficient main memory –Sorted Outer Union : otherwise (most stable)

Related Work SilkRoute (WWW 2000) Oracle’s XML extensions (ICDE 2000) Microsoft’s XDR XPERANTO (VLDB demo tomorrow)

Performance Evaluation Query Depth Query Fan Out Database Size

Effect of Query Depth

De-Correlated CLOB Approach Problem: CLOBs during processing With EmpStruct (deptname, empinfo) AS ( Select d.deptname, XMLAGG(EMP(employee, e.empname)) From department d left join employee e on d.deptid = e.deptid Group By d.deptname) With ProjStruct (deptname, projinfo) AS ( Select d.deptname, XMLAGG(PROJ(employee, p.projname)) From department d left join project p on d.deptid = e.deptid Group By d.deptname) Select DEPT(name, d1.empinfo, d2.projinfo)) From EmpStruct d1 full join ProjStruct d2 on d1.deptname = d2.deptname Early Tagging, Early Structuring, Inside Engine