SilkRoute: A Framework for Publishing Rational Data in XML

SilkRoute: A Framework for Publishing Rational Data in XML
Mary F. Fernandez Yana Kadiyska Dan Suciu Atsuyuki Morishima Wang Chiew Tan ACM Transactions on Database System 2002, volume 27, no. 4

Outline Introduction System Architecture XQuery to SQL Translation
XML Publishing Optimization Conclusion

Introduction

Introduction XML as language for data exchange between applications
Platform neutral Storage independent Semantics rich But most existing data are stored in RDBMS Conversion from RDBMS to XML should be General Selective Efficient

Design Goals General Two different data representations
Inherent complex conversion No public XML schema matches exactly relational counterparts Need for mapping multiple XML to same relational source under different situations XML Relational Nested Flat Unnormalized Normalized Public Proprietary

Design Goals Selective Efficient XML schema as virtual view
XQuery may only extract parts of the view Only materialize the data fragment needed Efficient Reduce computational overhead as much as possible Exploit optimization and evaluation engines provided by DB

Contributions XQuery to SQL Translation
Separate structure of XML output from computation which constructs it Introduce view forest which nodes are mapping from XML to SQL Optimization Problem for XML Publishing XQuery is mapped to one or more SQL Selecting the most efficient set

System Architecture

System Architecture 3-tier architecture Backend database for storage
SilkRoute as middleware Third party application (Client)

System Architecture SilkRoute transforms relational schema to canonical view forest

System Architecture Administrator uses XQuery to extract public XML view over canonical view, this gives public view forest

System Architecture Client application uses XQuery to data in interest, this gives application view forest

System Architecture Planner inside SilkRoute gets application view forest to produce SQLQuery (which then enters RDBMS) XMLTemplate (which then enters XML Generator)

System Architecture Tuples from RDBMS RDBMS is fed into the XMLTemplate inside generator Final XML output is then returned to the client

XQuery to SQL Translation

View Forest Internal representation of XML, a critical role in SilkRoute Each node carries XML label SQL query fragment For internal nodes, XML label is either element or attribute names; SQL fragment contains FROM and WHERE clauses For leaf nodes, XML label is atomic type; SQL fragment consists of SELECT clause

View Forest – SQL Association
A complete SQL query Cn is associated with each node n SELECT – if n is leaf node, use its SELECT clause; otherwise SELECT * FROM – concat all FROM clauses of n and its ancestor WHERE– concat all WHERE clauses of n and its ancestor

An Example To view product code, call CN1.2.6.1.1= SELECT p.code
FROM Clothing c, Problems p WHERE c.category = “outerwear” AND p.pid = c.pid

VFCA View Forest Composition Algorithm
Translate XQuery into view forest Used in view forest composer Operate on a subset of XQuery syntax, XQueryCore, which includes everything but NOT Recursive functions and operators (eg “before” , “after”, “is”, “isnot” Functions depending on XML document order

Q(V1, …, Vm) = VQ(I) for any I of the schema
VFCA Suppose I – instance of relational schema V(I) – view forest corresponding to the instance I Q(V(I)) – XQuery over the view forest V(I), the result return from this query is again a view forest, denoted by VQ(I) Input: An XQueryCore expression Q over view forests V1, …, Vm Output: A view forest VQ = VFCA(V1, …, Vm) such that Q(V1, …, Vm) = VQ(I) for any I of the schema

VFCA Traditional XQuery Evaluation VFCA
Inner XQuery is evaluated first to return results for outer XQuery Producing a number of intermediate XML outputs VFCA Only view forest is produced, eliminating intermediate results Data extraction is done once using the set of SQL realized from forest

XML Publishing Optimization

View Forest Evaluation
Steps in evaluating view forest Planner adds a key to each SQL fragment Planner partitions each view tree into one or more subtrees Each subtree is realized to a SQL The set of all SQL is submitted to relational engine XML generator consumes tuple streams and construct one single integrated relation XML output is produced by nesting tuples integrated relation and tagging each element

The set of SQL emitted from view forest evaluation is called a plan Given a view forest , there can be many different plans (exponential to the number of edges) Two extreme cases Emit a SQL for each node (fully partitioned strategy) Emit a SQL for each tree in the forest (unified strategy)

LEFT OUTER JOIN to connect a parent and child nodes’ SQL UNION to bind together sibling nodes’ SQL ORDER BY is needed to allow XML generator takes only one single pass over the tuple streams The tree can also be split into connected components SQL is then generated for each component

SELECT 1 AS L1, c.pid, L2, (c.price * Q.discount) as sale, Q.code, Q.comments FROM Clothing c LEFT OUTER JOIN ((SELECT 1 AS L2, d.pid AS pid, d.discount AS discount, null AS code, null AS comments FROM Discount d) UNION (SELECT 2 AS L2, p.pid AS pid, null AS discount, p.code AS code, p.comments AS comments FROM Problems p)) AS Q ON c.pid = Q.pid ORDER BY L1, c.pid, L2, Q.code

SELECT 1 as L1, c.pid, (d.discount * c.price) FROM Cloothing c LEFT OUTER JOIN (SELECT d.pid, d.discount FROM Discount d) ON c.pid = d.pid ORDER BY c.pid SELECT 2 as L1, c.pid, p.comments FROM Clothing c, Problems p WHERE c.pid = p.pid ORDER BY c.pid, p.code

Choosing Efficient Plan
As there exists exponential number of plans, SilkRoute resorts to heuristcs UNION and OUTER JOIN are costly operations Only choosing edges whose two associated queries are less expensive to evaluate together than separately Cost Difference= cost(qc) – (cost(q1) + cost(q2)) Where qc is the combined query of parent and child nodes And q1 and q2 are the parent query and child query respectively

Choosing Efficient Plan
Seek for cost estimates from relational source engine The estimate cost of a query q is obtained by cost(q, a, b) = a * evaluation_cost(q) + b * data_size(q) AND data_size(q) = f(|attrs(q)| * cardinality(q)) Relational engine provides evaluation_cost and cardinality a and b are weights

Results

Conclusion Proprietary commercial products have prospered in this field DB2 SQL Server Oracle Native XML database SilkRoute – free!! (work with MySQL/PostgreSQL)

Thank you!!

SilkRoute: A Framework for Publishing Rational Data in XML

Similar presentations

Presentation on theme: "SilkRoute: A Framework for Publishing Rational Data in XML"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SilkRoute: A Framework for Publishing Rational Data in XML

Similar presentations

Presentation on theme: "SilkRoute: A Framework for Publishing Rational Data in XML"— Presentation transcript:

Similar presentations

About project

Feedback