Presentation is loading. Please wait.

Presentation is loading. Please wait.

2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.

Similar presentations


Presentation on theme: "2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System."— Presentation transcript:

1 2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System

2 2005rel-xml-i2  Background & Issues XML – a de-facto standard for data exchange (B2B) Business data is will be stored in relational db’s : reliability, optimized query processing, established applications  Need efficient generation of XML data from relational db’s Subject was investigated in several projects (early 2000’s) : Xperanto (IBM Almaden) SilkRoute (AT&T) PRATA (Bell Labs/Lucent)

3 2005rel-xml-i3 Issues : Relations are flat, unordered, tuples are pure data XML is nested, tagged, and ordered Need a language/interface to specify needed data and its form Which parts of the work should be performed Inside/outside the relational engine Relational engines are very good at Optimizing SQL queries (efficient execution) Sorting But do not deal with tagging, do not generate XML

4 2005rel-xml-i4 Xperanto : The IBM team had access to DB2 internals Extended SQL by a few primitives to generate XML, Implemented these in the DB2 relational engine Analyzed space of execution strategies, using simulations, concluded that 2 strategies, both doing almost all work as one (extended) SQL query is best But: We do not have access to db internals  The interesting part is their analysis/simulations

5 2005rel-xml-i5 SilkRoute : Relational db is presented as an XML view (standard transformation) Desired XML specified in XML query language –Initial version: home-brewed XML query language –Last version: QXuery Main idea: Query composition with the query that define the standard view allow to generate the data by SQL queries + tagging Found that one big SQL query is not always best

6 2005rel-xml-i6 PRATA : Use DTD’s as description of desired XML, & a generalization of attribute grammars with query actions to specify the needed data Can handle recursion in DTD’s (former approaches cannot) Can still optimize to use a small number of SQL queries for data generation

7 2005rel-xml-i7 Comment : We are now back in the GAV approach View: standard XML view of relational Desired XML: query on this view Main idea: query composition Complications: XML data is tagged, relational is not Nested data Different nesting in view and query target  Need to change structure  May need fusion

8 2005rel-xml-i8  Execution strategies The issue: Data is stored in relational tables, can be retrieved with one/few/many SQL queries Which approach is more efficient? How can the approach be implemented, assuming the transformation is in some XML-ish l query language

9 2005rel-xml-i9 Xperanto execution strategies : Base example :

10 2005rel-xml-i10 The source relational schema:

11 2005rel-xml-i11 Space of evaluation strategies : Early/late tagging Early/late structuring (to form the nested XML structure) All work inside the engine, or (at least part) outside the engine Some combinations are meaningless, e.g. early tagging/late structuring

12 2005rel-xml-i12 1 st strategy : early tagging, early structuring, outside the engine An application issues a sequence of SQL queries, matching the structure of the result e.g: For each customer do 1. retrieve root – customer info – cust. name & id retrieved, tagged & output 2. retrieve, tag, & output customer account info 3. retrieve, tag, & output customer purchase orders 4. for each PO, retrieve, tag, output items, then payment info Early structuring : queries follow structure of generated doc Early tagging : each element is tagged when retrieved Outside the engine : obvious

13 2005rel-xml-i13 Shortcomings : Many small granularity queries – several queries per “object” – serious performance problems Performs a nested loop join – a fixed join order and join strategy – the relational engine might explore others

14 2005rel-xml-i14 2 nd strategy: Early structuring, tagging inside the machine : For this, augment the db engine with New data type : xml Constructors for the kinds of elements in the document, e.g.

15 2005rel-xml-i15 Now, can express query as: XMLAGG aggregates several XML fragments into one For example, two accounts for the customer

16 2005rel-xml-i16 The XML fragments have variable size  represented as Character large objects (CLOBs) Problems: CLOBS are stored separately of their tuples, hence may need separate fetches Each XML constructor copies the inpuyt CLOBs to form its output CLOB – a lot of copying Advantage : One large query, rather than many small ones Still nested loop join, but possibly engine can select another strategy (?)

17 2005rel-xml-i17 3 rd strategy: Late structuring & tagging : If both structuring and tagging are done late (possibly outside engine), we can separate process into two Content creation – retrieve the data from the db, inside the engine Structuring and tagging – the 1 st possibly inside, the 2 nd outside

18 2005rel-xml-i18 Contents creation – outer join approach : Select cust.*, acct.*, porder.*, pay.*, item.* From Customer cust left join Account acct on cust.id = acc.custId left join PurchaseOrder porder on cust.id = porder.custId left join Item item on porder.id = item.poId left join payment pay on porder.id = pay,poId Left join: a customer should occur in result even if no account, or has an account but no purchase orders, etc. Result for a customer w/o some fields is padded with nulls Join is performed for each path in tree (root to leaf) in some order Disadvantages : ??

19 2005rel-xml-i19 Contents creation – (unsorted) path outer union approach : Select cust.*, acct.*, type =1 From Customer cust left join Account acct on cust.id = acc.custId Outer union Select cust.*, porder.*, item.*, type = 2 From Customer cust left join PurchaseOrder porder on cust.id = porder.poId left join Item item on porder.id = item.poId Outer union Select cust.*, porder.*, pay.*, type = 3 From Customer cust left join PurchaseOrder porder on cust.id = porder.poId left join payment pay on porder.id = pay,poId Outer union: pads with nulls, like the left/right join, but does not duplicate data (as much) Each sub-query is a join for one leaf-to-root path Note: a sub-query repeated twice

20 2005rel-xml-i20 Contents creation – (unsorted) node outer union approach : The previous strategy still ahs some redundnacy: A parents info is replicated with the descendents Can avoid by using id’s of parents in the descendents

21 2005rel-xml-i21 Structuring & tagging (for unsorted outer union) : Use a hash table, with hash key type and ancestor id’s for an element to in the XML tree hash-based tagger For each tuple in relational result, that defines a node in tree, find out (hash) if parent is present: Yes – just add this new node (tag) No – add nodes for all the missing ancestors along the path to root (hashing repeatedly for shorter paths) Main disadvantage : For large outputs, main memory shortage Also, does not necessarily satisfy required order – may need sorting

22 2005rel-xml-i22 Last strategy : early structuring & late tagging : The idea: order the relational contents in the same order it needs to appear in the (flattened) XML file Then, tagging (& nesting) can be performed in constant space All info about a node X appears before/with the information of its children The info of X & its descendents appears together (no mixing with descendents of other nodes) The children are ordered as required by the XML def

23 2005rel-xml-i23 Contents creation – sorted path/node outer union approach : Same as the outer union approach, but add a final sort step Relational engines are sorting experts, including external sorting Sort on id fields, with id of higher nodes preceding those of lower nodes In example: CustId, AcctId, PoId, ItemId, PayementId Nulls should be accounted for in sorting, and null values should precede non-nulls

24 2005rel-xml-i24 Performance comparisons (fig. 13) : Outer join performs badly – too much data redundancy (not shown) One large query is better than many small ones (stored proc.) Inside the engine outperforms out the engine (for similar strategies) outside the engine needs to copy data and bind to external variables --- binding out time is a significant component for all approaches Binding out In black


Download ppt "2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System."

Similar presentations


Ads by Google