Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Advertisements

XML: Extensible Markup Language
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
COL 106 Shweta Agrawal and Amit Kumar
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
DBLABNational Taiwan Ocean University1/35 A Document-based Approach to Indexing XML Data Ya-Hui Chang and Tsan-Lung Hsieh Department of Computer Science.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Chapter Information Systems Database Management.
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Fundamentals, Design, and Implementation, 9/e Chapter 6 Introduction to Structured Query Language (SQL)
2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
Database Systems and XML David Wu CS 632 April 23, 2001.
Bridging Relational Technology and XML Jayavel Shanmugasundaram University of Wisconsin & IBM Almaden Research Center.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Managing XML and Semistructured Data Lecture 18: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.
XMLII XSchema XSchema XQuery XQuery. XML Schema XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports XML.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
Concepts of Database Management, Fifth Edition
Introduction to Databases Chapter 7: Data Access and Manipulation.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
CSCE Database Systems Chapter 15: Query Execution 1.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Querying Structured Text in an XML Database By Xuemei Luo.
Computing & Information Sciences Kansas State University Thursday, 15 Mar 2007CIS 560: Database System Concepts Lecture 24 of 42 Thursday, 15 March 2007.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Publishing Relational Data in XML David McWherter.
Chapter 13 A Advanced Implementations of Tables. © 2004 Pearson Addison-Wesley. All rights reserved 13 A-2 Balanced Search Trees The efficiency of the.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
XML and Database.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Module 3: Using XML. Overview Retrieving XML by Using FOR XML Shredding XML by Using OPENXML Introducing XQuery Using the xml Data Type.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
CS 440 Database Management Systems Stored procedures & OR mapping 1.
Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
Overview of XML Data Management Research at Cornell Jayavel Shanmugasundaram Cornell University.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
Bridging Relational Technology and XML Jayavel Shanmugasundaram University of Wisconsin & IBM Almaden Research Center.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
XML: Extensible Markup Language
Efficiently Publishing Relational Data as XML Documents
Indexing and Hashing Basic Concepts Ordered Indices
Wednesday, May 22, 2002 XML Publishing, Storage
Presentation transcript:

Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data in the XML documents

Introduction (cont’d) “Efficiently Publishing Relational Data as XML Documents” Lei Jiang: a language for conversion, implementations Yan Zhang: implementations Yong Zhuge: performance comparison

Language SQL with minor scalar and aggregate function extensions for XML construction Advantage: use existing APIs and processing infrastructure of RDBMS Other language proposals Example

John Doe //first purchase order 1 Jan 2000 Shoes Bungee Ropes due January 15 due January 20 due February 15 //second purchase order …

Customer(id integer, name varchar(20) Account(id varchar(20), cusId integer, acctnum integer) PurchOrder(id integer, cusid integer, acctId varchar(20) date varchar(10) Item(id integer, poId inteter, desc varchar(10) Payment(id integer, poId integer, desc varchar(10)

Select cust.name, CUST(cust.id, cust.name, (Select XMLAGG(ACCT(acct.id, acct.acctnum) From Account acct Where acct.custId = cust.id), (Select XMLAGG(PORDER(porder.id, porder.acct, porder.date, (Select XMLAGG(ITEM(item.id, item.desc)) From Item item Where item.poId=porder.id), (Select XMLAGG(PAYMENT(pay.id,pay.desc)) From Payment pay Where pay.poId = porder.id))) From PurchOrder porder Where porder.custId=cust.id)) From Customer cust

Define XML Constructor CUST (custId: integer, custName: varchar(20) acctList: xml, porderList:xml) AS{ $custName $acctList $porderList }

Implementation Add tags and structure to the relational tables Early Tagging, Early Structuring Late Tagging, Late Structuring Early Tagging, Late Structuring Outside Engine, Inside Engine

Early Tagging, Early Structuring Outside engine: Stored Procedure Approach Simplest technique, commonly used Drawback: overhead of issuing many queries Inside engine: Correlated CLOB, De-Correlated CLOB Approach

Late Tagging, Late Structuring Content creation –Relational data is produced Tagging and structuring –Relational data is structured and tagged to produce XML document

Content Creation Redundant Relation Approach –Join every table together –Simple –Redundancy Unsorted Outer Union Approach –Compute each path using join –One tuple per data item in the leaf level –Sub-expressions are shared to reduce redundancy

Content Creation (cont’) (Unsorted Outer Union Approach) Account Customer Right Outer Join PurchaseOrder Left Outer Join ItemPayment Outer Union Left Outer JoinRight Outer Join (CustId,CustInfo,POId, POInfo,ItemId,ItemInfo) (CustId,CustInfo,POId,POInfo, PaymentId,PaymentInfo) (CustId,CustInfo,AcctId, AcctInfo)(CustId,CustInfo,POId, POInfo)

Structuring & Tagging (Hash-based Tagger) Two things need to do 1.Group all siblings in the desired XML document under the same parent In order to recognize siblings, we need to look for the same parent Using main-memory hash table to do this(given the parent’s type and id information) 2.Extract the information from each tuple and tag it to produce the XML result This will be done after all the input tuples have been hashed The output process is straightforward

Late Tagging, Early Structuring Why? –Late tagging and Late structuring need complex memory management We can use “structured content” and “constant space tagger” to eliminate this problem Structured content creation(Sorted Outer Union) –The key is to order the relational content the same way that it needs to appear in the result XML document –Two important factors need to be satisfied Parent information occurs before, or with, child information Information about a particular node and its descendants is not mixed in with information about non-descendant nodes.

Late Tagging, Early Structuring(cont’) –Performing a single final relational sort of the unstructured relational content is sufficient Null value will be sorted first Parents always are sorted before the children Parent’s id occurs before child’s id, which ensure the children of a parent node are grouped together Tagging Sorted Data –Easy Tuples have been in order Add tags and write out

Performance Comparison of Alternatives for publishing XML The Parameters in our experiment 1) query fan out 2) query depth 3) Number of roots. 4) Number of leaf tuples ( Only balanced queries are considered in our experiment. )

Performance Comparison of Alternatives for publishing XML ParameterRange of valueDefault Query Fan Out 2,3,42 Query Depth 2,3,42 # Roots 1,50,500, 5000, # Leaf Tuples , , Parameter Settings for Experiment

Performance Comparison of Alternatives for publishing XML

Summary and Conclusion This paper introduced, implemented and tested a mechanism for converting relational data to XML Document. Different approaches are tested, include Stored Proc, CLOB-Corr, CLOB- DeCorr, Unsorted OU(In/Out), Sorted OU(In/Out). It points to the following conclusions, 1)Constructing an XML document inside the relational engine is far more efficient than doing so outside the engine, mainly because of the high cost of binding out tuples to host variables. 2)When processing can be done in main memory, a stable approach that is always among the very best (both inside and outside the engine), is the Unsorted Outer Union approach. 3)When processing cannot be done in main memory, the Sorted Outer Union approach is the approach of choice (both inside and outside the engine). This is because the relational sort operator scales well.