Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data.

Similar presentations


Presentation on theme: "Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data."— Presentation transcript:

1

2 Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data in the XML documents

3 Introduction (cont’d) “Efficiently Publishing Relational Data as XML Documents” Lei Jiang: a language for conversion, implementations Yan Zhang: implementations Yong Zhuge: performance comparison

4 Language SQL with minor scalar and aggregate function extensions for XML construction Advantage: use existing APIs and processing infrastructure of RDBMS Other language proposals Example

5 John Doe 1894654 3849342 //first purchase order 1 Jan 2000 Shoes Bungee Ropes due January 15 due January 20 due February 15 //second purchase order …

6 Customer(id integer, name varchar(20) Account(id varchar(20), cusId integer, acctnum integer) PurchOrder(id integer, cusid integer, acctId varchar(20) date varchar(10) Item(id integer, poId inteter, desc varchar(10) Payment(id integer, poId integer, desc varchar(10)

7 Select cust.name, CUST(cust.id, cust.name, (Select XMLAGG(ACCT(acct.id, acct.acctnum) From Account acct Where acct.custId = cust.id), (Select XMLAGG(PORDER(porder.id, porder.acct, porder.date, (Select XMLAGG(ITEM(item.id, item.desc)) From Item item Where item.poId=porder.id), (Select XMLAGG(PAYMENT(pay.id,pay.desc)) From Payment pay Where pay.poId = porder.id))) From PurchOrder porder Where porder.custId=cust.id)) From Customer cust

8 Define XML Constructor CUST (custId: integer, custName: varchar(20) acctList: xml, porderList:xml) AS{ $custName $acctList $porderList }

9 Implementation Add tags and structure to the relational tables Early Tagging, Early Structuring Late Tagging, Late Structuring Early Tagging, Late Structuring Outside Engine, Inside Engine

10 Early Tagging, Early Structuring Outside engine: Stored Procedure Approach Simplest technique, commonly used Drawback: overhead of issuing many queries Inside engine: Correlated CLOB, De-Correlated CLOB Approach

11 Late Tagging, Late Structuring Content creation –Relational data is produced Tagging and structuring –Relational data is structured and tagged to produce XML document

12 Content Creation Redundant Relation Approach –Join every table together –Simple –Redundancy Unsorted Outer Union Approach –Compute each path using join –One tuple per data item in the leaf level –Sub-expressions are shared to reduce redundancy

13 Content Creation (cont’) (Unsorted Outer Union Approach) Account Customer Right Outer Join PurchaseOrder Left Outer Join ItemPayment Outer Union Left Outer JoinRight Outer Join (CustId,CustInfo,POId, POInfo,ItemId,ItemInfo) (CustId,CustInfo,POId,POInfo, PaymentId,PaymentInfo) (CustId,CustInfo,AcctId, AcctInfo)(CustId,CustInfo,POId, POInfo)

14 Structuring & Tagging (Hash-based Tagger) Two things need to do 1.Group all siblings in the desired XML document under the same parent In order to recognize siblings, we need to look for the same parent Using main-memory hash table to do this(given the parent’s type and id information) 2.Extract the information from each tuple and tag it to produce the XML result This will be done after all the input tuples have been hashed The output process is straightforward

15 Late Tagging, Early Structuring Why? –Late tagging and Late structuring need complex memory management We can use “structured content” and “constant space tagger” to eliminate this problem Structured content creation(Sorted Outer Union) –The key is to order the relational content the same way that it needs to appear in the result XML document –Two important factors need to be satisfied Parent information occurs before, or with, child information Information about a particular node and its descendants is not mixed in with information about non-descendant nodes.

16 Late Tagging, Early Structuring(cont’) –Performing a single final relational sort of the unstructured relational content is sufficient Null value will be sorted first Parents always are sorted before the children Parent’s id occurs before child’s id, which ensure the children of a parent node are grouped together Tagging Sorted Data –Easy Tuples have been in order Add tags and write out

17 Performance Comparison of Alternatives for publishing XML The Parameters in our experiment 1) query fan out 2) query depth 3) Number of roots. 4) Number of leaf tuples ( Only balanced queries are considered in our experiment. )

18 Performance Comparison of Alternatives for publishing XML ParameterRange of valueDefault Query Fan Out 2,3,42 Query Depth 2,3,42 # Roots 1,50,500, 5000, 400005000 # Leaf Tuples 160000, 320000, 480000320000 Parameter Settings for Experiment

19 Performance Comparison of Alternatives for publishing XML

20

21 Summary and Conclusion This paper introduced, implemented and tested a mechanism for converting relational data to XML Document. Different approaches are tested, include Stored Proc, CLOB-Corr, CLOB- DeCorr, Unsorted OU(In/Out), Sorted OU(In/Out). It points to the following conclusions, 1)Constructing an XML document inside the relational engine is far more efficient than doing so outside the engine, mainly because of the high cost of binding out tuples to host variables. 2)When processing can be done in main memory, a stable approach that is always among the very best (both inside and outside the engine), is the Unsorted Outer Union approach. 3)When processing cannot be done in main memory, the Sorted Outer Union approach is the approach of choice (both inside and outside the engine). This is because the relational sort operator scales well.


Download ppt "Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data."

Similar presentations


Ads by Google