Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
COL 106 Shweta Agrawal and Amit Kumar
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
Fundamentals, Design, and Implementation, 9/e Chapter 8 Database Redesign.
The Volcano/Cascades Query Optimization Framework
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Hierarchies & Trees in SQL by Joe Celko copyright 2008.
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Query Processing (overview)
Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.
Database Systems and XML David Wu CS 632 April 23, 2001.
Bridging Relational Technology and XML Jayavel Shanmugasundaram University of Wisconsin & IBM Almaden Research Center.
Designing for Performance Announcement: The 3-rd class test is coming up soon. Open book. It will cover the chapter on Design Theory of Relational Databases.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Managing XML and Semistructured Data Lecture 18: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.
8/17/20151 Querying XML Database Using Relational Database System Rucha Patel MS CS (Spring 2008) Advanced Database Systems CSc 8712 Instructor : Dr. Yingshu.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Introduction to Databases Chapter 7: Data Access and Manipulation.
Access Path Selection in a Relational Database Management System Selinger et al.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
The Volcano Query Optimization Framework S. Sudarshan (based on description in Prasan Roy’s thesis Chapter 2)
Copyright © Curt Hill Query Evaluation Translating a query into action.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
Publishing Relational Data in XML David McWherter.
Module 18 Querying XML Data in SQL Server® 2008 R2.
Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Module 3: Using XML. Overview Retrieving XML by Using FOR XML Shredding XML by Using OPENXML Introducing XQuery Using the xml Data Type.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Query Processing – Implementing Set Operations and Joins Chap. 19.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
Chapter 13: Query Processing
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
Bridging Relational Technology and XML Jayavel Shanmugasundaram University of Wisconsin & IBM Almaden Research Center.
COMP261 Lecture 23 B Trees.
INLS 623– Database Systems II– File Structures, Indexing, and Hashing
Database Management System
Efficiently Publishing Relational Data as XML Documents
Introduction to complexity
Lecture 2- Query Processing (continued)
Lecture 13: Query Execution
Diving into Query Execution Plans
Yan Huang - CSCI5330 Database Implementation – Query Processing
Wednesday, May 22, 2002 XML Publishing, Storage
Presentation transcript:

Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo.

What drove them? No…it wasn’t the chaffeur… XML rapidly emerging as a global standard Large amount of data stored in RDBMS and needs to be exchanged

Primary Issues Language specification Implementation – what method works best? Adding TAG and STRUCTURE - when do you do these operations?

Roadmap Language specification Implementation Early tagging, structuring Late tagging, structuring Early structure, late tagging Performance Evaluation

Our little sample… Dilys Thomas Gift for Consulate VISA Woman Traveller’s cheques due Feb 12 Note the Elements Names/Tags ID Refs

Underlying tables Customer(id int, name varchar) Account(id varchar, custID int, acctnum int) Item(id int, poID int, desc varchar) PurchOrder(id int, custID int, acctID varchar, date varchar) Payment(id int, poID int, desc varchar)

SQL-based language spec. Sqlfunctions: Define XMLConstruct ITEM(id int, desc varchar) AS { $desc } Sqlaggregates: Select XMLAGG(ITEM(id, desc)) From Item // returns an XML aggregation of items

Sample query Select cust.name, CUST(cust.id, cust.name, (Select XMLAGG(ACCT(acct.id, acct.acctnum)) From Account acct Where acct.custId=cust.id), (Select XMLAGG(PORDER(porder.id, porder.acct, porder.date, (Select XMLAGG(ITEM(item.id, item.desc)) From Item item Where item.poid=porder.id) (Select XMLAGG(PAYMENT(pay.id,pay.desc)) From Payment pay, Where pay.poid=porder.id))) From PurchOrder porder Where porder.custID=cust.id)) From Customer cust Constructs XML from the relational tables.

Roadmap Language specification Implementation Early tagging, structuring Late tagging, structuring Early structure, late tagging Performance Evaluation

Implementation alternatives Late/early tagging Late/early structuring (No late structuring+early tagging) DB Result TAGSTRUCTURE

Early tagging and structuring Stored Procedure Explicitly issue nested queries Get corresponding nested data using other queries Done outside relational engine. Tag/str as soon as results are available. Too many queries per tuple. Fixed order (nested loop join)

Contd… Correlated CLOB Push queries into the engine Plug in XMLAGG, XMLCONSTRUCT support into engine Have to handle huge CLOBS in the engine Fixed join order Decorrelated CLOB Decorrelate and use Outer Joins – no longer fixed order Still carry around CLOBs (due to early tagging!)

Roadmap Language specification Implementation Early tagging, structuring Late tagging, structuring Early structure, late tagging Performance Evaluation

Late tagging and Structuring 2 phases -> Content creation + Tagging/Structuring Redundant Relation: Blindly join all constituent tables ‘Parent’ data repeated Unsorted Outer Union: Decorrelate query, compute common subexpressions and use Outer Joins Take an Outer Union of result tables Columns grow with width/depth of XML doc. - Path Outer Union

Contd… Alternatively, don’t repeat Parent node at every child. Feed parent into Outer Union and only keep parent Ids with children. – Node Outer Union Greatly increases no of tuples generated

Outer Union Note: Outer Joins to retain parents. OU Separate column in result for each column of input Unused cols set to NULL Type column added for each row.

Contd… 2 phases -> Content creation + Tagging/Structuring Inside the Engine XMLAGG, XMLCons support required Final step after content generated CLOBs not carried around Outside the Engine GROUP Siblings Eliminate Duplicates Extract info and TAG

Grouping data HASH! Every row in the final table has a column with name of element with all parents (a.b.c.d.e) Check if hashes true, TAG accordingly and add as another child at that level Else check if hashes true, add and then And so on…till you either find hash or hit root element. Tuples can come in any order. Sufficient mem required!

Roadmap Language specification Implementation Early tagging, structuring Late tagging, structuring Early structure, late tagging Performance Evaluation

Late tagging, Early Structuring As before…only, now SORT the outer union Ensure Parent info comes before child Info about node and desc. completed before any other node info starts Ordering follows user-def condns

Sort and Tag Sort on Pkeys Define an order on Pkeys (CustID, AcId, POId, ItemId, PaymentID) – based on structure of XML Doc. Parent tuples will have filled values for first few cols and null for the later ones Nulls sort low Tag in constant memory Maximum amount of info to be stored is proportional only to the depth of the XML Doc.

Roadmap Language specification Implementation Early tagging, structuring Late tagging, structuring Early structure, late tagging Performance Evaluation

Modeling transformations Query fanout Query nesting depth # Root nodes (tuples in the root table) # Leaf tuples (tuples corresp to all leaf nodes) Structure Result size

Results (graph time!!) “Inside the engine” versions are about 3 times as fast as “outside” counterparts

Inside –vs- Outside the engine Query Execution Bind out Tag/Structure Write XML to file Bind-out time not required for Inside Engine approaches

Query Fan out Increasing QFO -> Greater Joins -> More time CorrCLOB has to use Nested Loop Join Order – bad performance Unsorted OU better than Sorted OU. Sorting cost > Cost of complex tagging DecorrCLOB – optimized by DB2 engine. CLOBs retained in memory (low fanout)

Query Depth DecorrCLOB – huge increase! Complexity of queries increases. Engine makes bad choices (sorting after XMLAGG etc)

Number of Roots Outer Union approaches not affected CorrCLOB at #root=1 equiv to just 2 queries!

Number of tuples, Memory If sufficient memory, no great changes! If not, Unsorted OU which requires large space for tagging, fails. Overflow! Sorted OU – based on scalable sorting. Adapt to large size and less mem better.

Roadmap Language specification Implementation Early tagging, structuring Late tagging, structuring Early structure, late tagging Performance Evaluation Quo vadis?