Download presentation
Presentation is loading. Please wait.
Published byΜυρίνα Λούπης Modified over 6 years ago
1
SilkRoute: A Framework for Publishing Rational Data in XML
Mary F. Fernandez Yana Kadiyska Dan Suciu Atsuyuki Morishima Wang Chiew Tan ACM Transactions on Database System 2002, volume 27, no. 4
2
Outline Introduction System Architecture XQuery to SQL Translation
XML Publishing Optimization Conclusion
3
Introduction
4
Introduction XML as language for data exchange between applications
Platform neutral Storage independent Semantics rich But most existing data are stored in RDBMS Conversion from RDBMS to XML should be General Selective Efficient
5
Design Goals General Two different data representations
Inherent complex conversion No public XML schema matches exactly relational counterparts Need for mapping multiple XML to same relational source under different situations XML Relational Nested Flat Unnormalized Normalized Public Proprietary
6
Design Goals Selective Efficient XML schema as virtual view
XQuery may only extract parts of the view Only materialize the data fragment needed Efficient Reduce computational overhead as much as possible Exploit optimization and evaluation engines provided by DB
7
Contributions XQuery to SQL Translation
Separate structure of XML output from computation which constructs it Introduce view forest which nodes are mapping from XML to SQL Optimization Problem for XML Publishing XQuery is mapped to one or more SQL Selecting the most efficient set
8
System Architecture
9
System Architecture 3-tier architecture Backend database for storage
SilkRoute as middleware Third party application (Client)
10
System Architecture SilkRoute transforms relational schema to canonical view forest
11
System Architecture Administrator uses XQuery to extract public XML view over canonical view, this gives public view forest
12
System Architecture Client application uses XQuery to data in interest, this gives application view forest
13
System Architecture Planner inside SilkRoute gets application view forest to produce SQLQuery (which then enters RDBMS) XMLTemplate (which then enters XML Generator)
14
System Architecture Tuples from RDBMS RDBMS is fed into the XMLTemplate inside generator Final XML output is then returned to the client
15
XQuery to SQL Translation
16
View Forest Internal representation of XML, a critical role in SilkRoute Each node carries XML label SQL query fragment For internal nodes, XML label is either element or attribute names; SQL fragment contains FROM and WHERE clauses For leaf nodes, XML label is atomic type; SQL fragment consists of SELECT clause
17
View Forest – SQL Association
A complete SQL query Cn is associated with each node n SELECT – if n is leaf node, use its SELECT clause; otherwise SELECT * FROM – concat all FROM clauses of n and its ancestor WHERE– concat all WHERE clauses of n and its ancestor
18
An Example To view product code, call CN1.2.6.1.1= SELECT p.code
FROM Clothing c, Problems p WHERE c.category = “outerwear” AND p.pid = c.pid
19
VFCA View Forest Composition Algorithm
Translate XQuery into view forest Used in view forest composer Operate on a subset of XQuery syntax, XQueryCore, which includes everything but NOT Recursive functions and operators (eg “before” , “after”, “is”, “isnot” Functions depending on XML document order
20
Q(V1, …, Vm) = VQ(I) for any I of the schema
VFCA Suppose I – instance of relational schema V(I) – view forest corresponding to the instance I Q(V(I)) – XQuery over the view forest V(I), the result return from this query is again a view forest, denoted by VQ(I) Input: An XQueryCore expression Q over view forests V1, …, Vm Output: A view forest VQ = VFCA(V1, …, Vm) such that Q(V1, …, Vm) = VQ(I) for any I of the schema
21
VFCA Traditional XQuery Evaluation VFCA
Inner XQuery is evaluated first to return results for outer XQuery Producing a number of intermediate XML outputs VFCA Only view forest is produced, eliminating intermediate results Data extraction is done once using the set of SQL realized from forest
22
XML Publishing Optimization
23
View Forest Evaluation
Steps in evaluating view forest Planner adds a key to each SQL fragment Planner partitions each view tree into one or more subtrees Each subtree is realized to a SQL The set of all SQL is submitted to relational engine XML generator consumes tuple streams and construct one single integrated relation XML output is produced by nesting tuples integrated relation and tagging each element
24
View Forest Evaluation
25
View Forest Evaluation
The set of SQL emitted from view forest evaluation is called a plan Given a view forest , there can be many different plans (exponential to the number of edges) Two extreme cases Emit a SQL for each node (fully partitioned strategy) Emit a SQL for each tree in the forest (unified strategy)
26
View Forest Evaluation
LEFT OUTER JOIN to connect a parent and child nodes’ SQL UNION to bind together sibling nodes’ SQL ORDER BY is needed to allow XML generator takes only one single pass over the tuple streams The tree can also be split into connected components SQL is then generated for each component
27
View Forest Evaluation
SELECT 1 AS L1, c.pid, L2, (c.price * Q.discount) as sale, Q.code, Q.comments FROM Clothing c LEFT OUTER JOIN ((SELECT 1 AS L2, d.pid AS pid, d.discount AS discount, null AS code, null AS comments FROM Discount d) UNION (SELECT 2 AS L2, p.pid AS pid, null AS discount, p.code AS code, p.comments AS comments FROM Problems p)) AS Q ON c.pid = Q.pid ORDER BY L1, c.pid, L2, Q.code
28
View Forest Evaluation
SELECT 1 as L1, c.pid, (d.discount * c.price) FROM Cloothing c LEFT OUTER JOIN (SELECT d.pid, d.discount FROM Discount d) ON c.pid = d.pid ORDER BY c.pid SELECT 2 as L1, c.pid, p.comments FROM Clothing c, Problems p WHERE c.pid = p.pid ORDER BY c.pid, p.code
29
Choosing Efficient Plan
As there exists exponential number of plans, SilkRoute resorts to heuristcs UNION and OUTER JOIN are costly operations Only choosing edges whose two associated queries are less expensive to evaluate together than separately Cost Difference= cost(qc) – (cost(q1) + cost(q2)) Where qc is the combined query of parent and child nodes And q1 and q2 are the parent query and child query respectively
30
Choosing Efficient Plan
Seek for cost estimates from relational source engine The estimate cost of a query q is obtained by cost(q, a, b) = a * evaluation_cost(q) + b * data_size(q) AND data_size(q) = f(|attrs(q)| * cardinality(q)) Relational engine provides evaluation_cost and cardinality a and b are weights
31
Results
32
Conclusion Proprietary commercial products have prospered in this field DB2 SQL Server Oracle Native XML database SilkRoute – free!! (work with MySQL/PostgreSQL)
33
Thank you!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.