Presentation is loading. Please wait.

Presentation is loading. Please wait.

Management of XML and Semistructured Data

Similar presentations


Presentation on theme: "Management of XML and Semistructured Data"— Presentation transcript:

1 Management of XML and Semistructured Data
Lecture 15: Publishing XML Data From Relations Wednesday, May 16th, 2001

2 Overview XML Publishing Example XML Publishing Languages
Virtual XML Publishing Materialized XML Publishing (next time)

3 XML Publishing Today: Legacy data XML data
fragmented into many flat relations 3rd normal form proprietary XML data nested un-normalized public (450 schemas at

4 XML Publishing: an Example
Legacy data in E/R: name country name url euSid usSid Eu-Stores US-Stores date tax Eu-Sales US-Sales date Products pid name priceUSD

5 XML Publishing: an Example
XML view <allsales> <country> <name> France </name> <store> <name> Nicolas </name> <product> <name> Blanc de Blanc </name> <sold> 10/10/2000 </sold> <sold> 12/10/2000 </sold> </product> <product>…</product>… </store>…. </country> … </allsales> In summary: group by country store product

6 allsales Output “schema”: * country * name store ? * name product url
PCDATA * name product url * PCDATA PCDATA name sold ? PCDATA date tax PCDATA PCDATA

7 XML Publishing Need a language for specifying
the Relational  XML mapping SilkRoute: a SQL/XML-QL blend IBM (formerly Experanto project) extension of SQL SQL Server: “FOR XML” – and extension of SQL XDR’s

8 XML Publishing: SilkRoute
In SilkRoute [Fernandez, Suciu, Tan ’00] { FROM EuStores $S, EuSales $L, Products $P WHERE $S.euSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT <allsales()> <country($S.country)> <name> $S.country </name> <store($S.euSid)> <name> $S.name </name> <product($P.pid)> <name> $P.name </name> <price> $P.priceUSD </price> </product> </store> </country> <allsales> } /* union….. */

9 XML Publishing : SilkRoute
…. /* union */ { FROM USStores $S, EuSales $L, Products $P WHERE $S.usSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT <allsales()> <country(“USA”)> <name> USA </name> <store($S.euSid)> <name> $S.name </name> <url> $S.url </url> <product($P.pid)> <name> $P.name </name> <price> $P.priceUSD </price> <tax> $L.tax </tax> </product> </store> </country> <allsales> }

10 Internal Representation
View Tree: Non-recursive datalog (SELECT DISTINCT … ) allsales() allsales():- * country(c) :-EuStores(x,_,c), EuSales(x,y,_), Products(y,_,_) country(“USA”) :- country(c) * store(c,x) :- EuStores(x,_,c), EuSales(x,y,_), Products(y,_,_) store(c,x) :- USStores(x,_,_), USSales(x,y,_), Products(y,_,_), c=“USA” name(c) store(c,x) c * ? name(n) product(c,x,y) url(c,x,u) url(c,x,u):-USStores(x,_,u), USSales(x,y,_),Products(y,_,_) n u * name(n) sold(c,x,y,d) n date(c,x,y,d) Tax(c,x,y,d,t) d t

11 XML Publishing : IBM XPERANTO: Publishing Object-Relational Data as XML, Carey, Florescu, Ives, Lu, Shanmugasundaram, Shekita, Subramanian, WebDB’2000 Efficiently Publishing Relational Data as XML Documents, Shanmugasundaram, Shekita, Barr, Carey, Lindsay, Pirahesh, Reinwald, VLDB’2000

12 XML Publishing : IBM SQL + User defined functions
(Select S.name, STORE(S.euSid, S.name, (Select XMLAGG(PRODUCT(P.pid, P.name, P.priceUSD)) From EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid)) From EuStores S) Union All Define XML Constructor STORE(storeID: integer, name: varchar(20), prodList: xml) AS { <store id=$storeID> <name> $name </name> $prodList </store> } SQL + User defined functions Define XML Constructor PRODUCT( ...) AS { . . . }

13 XML Publishing : SQL Server
Three modes RAW mode Auto Mode Explicit Mode

14 XML Publishing : SQL Server, RAW Mode
Select S.euSid, P.name, P.price From Stores S, EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid For XML Raw <row euSid = “SLKDJFS”, name = “Saint Emilion”, price=“23.99”> <row euSid = “DRJLKSD”, name = “Loire”, price=“12.99”> flat XML default tag and attribute names

15 XML Publishing : SQL Server Auto Mode
Select S.euSid, P.name, P.price From Stores S, EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid For XML Auto <Stores euSid = “SLKDJFS”> <Product name = “Saint Emilion”, price=“23.99”/> <Product name = “Loire”, price=“12.99”/> </Stores> <Stores euSid = “FGJISOD”> . . . nested XML default tag and attribute names

16 XML Publishing : SQL Server Explicit Mode
Nested XML User defined tags and attributes Idea: write SQL queries with complex column names Ad-hoc, order dependent semantics

17 XML Publishing : SQL Server Explicit Mode
(Select as Tag, null as Parent, S.euSid as [Store!1!id], S.name as [Store!1!name!element], null as [Product!2!name!element], null as [Product!2!price!element] From Stores S) Union All (Select as Tag, as Parent, S.euSid as [Store!1!id], null as [Store!1!name!element], P.name as [Product!2!name!element], P.price as [Product!2!name!element] From Stores S, EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid) Order By [Store!1!id]

18 XML Publishing : SQL Server Explicit Mode
All column names are legal SQL names Special form: [tagname!k!something], Or [tagname!k!something!element] Or other variations... Hence everything is legal SQL But what does it mean ? Construct the universal table first Then process the table sequentially

19 XML Publishing : SQL Server Explicit Mode
Universal table: Tag Parent Store!1!id Store!1!name!element Product!2!name!element Product!2!price!element 1 ABCDE Nicolas 2 Saint Emilion 23.99 Loire 12.99 FDKLS FNAC Databases 49.99 . . .

20 XML Publishing : SQL Server Explicit Mode
Converting universal table to XML: scan sequentially each two let Tag=k look up only columns with that tag all are called [tagname!k!something] with the same tagname What happens if one has a different tagname ? Create an element called tagname Output <tagname> Columns become its children Either subelements or attributes if Parent is specified, last element with that tag is the parent otherwise it is a root element

21 XML Publishing : SQL Server Explicit Mode
<Store id=“ABCDE”> <name> Nicolas </name> <Product> <name> Saint Emilion </name> <price> </price> </Product> <Product> <name> Loire </name> <price> </price> </Price> </Store> <Store id=“FDKLS”> <name> FNAC </name> . . .

22 XML Publishing : SQL Server Explicit Mode
Seems complex, but also powerful Can construct arbitrarily deeply nested hierarchies How ? However, they are very, very limited Why ?

23 XML Publishing: Virtual View
Don’t compute the XML data yet Users ask XML queries System composes with the view, sends to the RDBMS Main issue: compose queries

24 XML Publishing: Virtual View in SilkRoute
find names, urls of all stores who sold on 1/1/2000 (in XML-QL / XQuery melange): WHERE <allsales/country/store> <product/sold/date> 1/1/2000 </> <name> $X </> <url> $Y </> </> RETURN $X , $Y

25 Query Composition View Tree XML-QL Query Pattern
date(c,x,y,d) allsales() country(c) store(c,x) name(n) product(c,x,y) sold(c,x,y,d) url(c,x,u) allsales $n1 country $n2 name(c) $n3 store c product $n4 url name n u $Y name(n) $X sold n $n5 Tax(c,x,y,d,t) date $Z d t 1/1/2000 “Evaluate” the XML pattern(s) on the view tree, combine all datalog rules

26 Query Composition Result (in theory…): ( SELECT S.name, S.url
FROM USStores S, USSales L, Products P WHERE S.usSid=L.usSid AND L.pid=P.pid AND L.date=‘1/1/2000’) UNION ( SELECT S2.name, S2.url FROM EUStores S1, EUSales L1, Products P1 USStores S2, USSales L2, Products P2, WHERE S1.usSid=L1.usSid AND L1.pid=P1.pid AND L1.date=‘1/1/2000’ AND S2.usSid=L2.usSid AND L2.pid=P1.pid AND S1.country=“USA” AND S1.euSid = S2.usSid)

27 Complexity of XML Publishing
But in practice: 5-7 times more joins ! Need query minimization Could this be avoided ? We thought hard and couldn’t find a better way It is NP-hard !

28 XML Publishing Is NP-Hard
View Tree: customer ? ? order():- Q1 order complaint complaint():- Q2 PCDATA PCDATA XML query: WHERE <customer> <order> $x </> <complaint> $y </> </> RETURN ( ) Reduction from the clique problem. Q1 describes a graph with named nodes: E.g. V1(x1), V2(x2), V3(x3), V4(x4), E(x1,x2), E(x4,x2),… Q2 describes a k-clique: e.g. E(y1,y2), E(y1,y3), …, E(yk-1,yk) If the graph has a k-clique, then (Q1 join Q2) = Q1 Conversely, if the minimized query of Q1 join Q2 is a graph isomorphic to the original graph, then the graph has a k-clique. It is easy to check isomorphism, since all nodes are named (i.e. there is at most one homeomorphism, and that can be found in polynomial time). The composed SQL query is : Minimizing it is NP hard ! (can be shown…) Q1 JOIN Q2


Download ppt "Management of XML and Semistructured Data"

Similar presentations


Ads by Google