Managing XML and Semistructured Data Lecture 18: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001.

Slides:



Advertisements
Similar presentations
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Advertisements

Composing XSL Transformations with XML Publishing Views Chengkai LiUniversity of Illinois at Urbana-Champaign Philip Bohannon Lucent Technologies, Bell.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Optimization Goal: Declarative SQL query
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
1 Implementation of Relational Operations Module 5, Lecture 1.
Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001.
Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data.
2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.
Database Systems and XML David Wu CS 632 April 23, 2001.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Managing XML and Semistructured Data Lecture 1: Preliminaries and Overview Prof. Dan Suciu Spring 2001.
Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XQuery.
Database Management 9. course. Execution of queries.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Query Optimization (CB Chapter ) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Publishing Relational Data in XML David McWherter.
Review Jun 5th, HW#5.2 TableTupleTuple/pagePage R S R R.a = S.b S (52buffers)
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Web Data and the Resurrection of Database Theory Dan Suciu University of Washington.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
1 Lecture 15 Monday, May 20, 2002 Size Estimation, XML Processing.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
CS 440 Database Management Systems
Management of XML and Semistructured Data
Efficiently Publishing Relational Data as XML Documents
Lecture 15: Midterm Review
Relational Algebra Chapter 4, Part A
Evaluation of Relational Operations
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Introduction to Database Systems
Joining Interval Data in Relational Databases
SilkRoute: A Framework for Publishing Rational Data in XML
Relational Algebra Chapter 4, Sections 4.1 – 4.2
G-CORE: A Core for Future Graph Query Languages
Advance Database Systems
Relational Algebra Friday, 11/14/2003.
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
Lecture 14: Database Theory in XML Processing
Relational Algebra & Calculus
Wednesday, May 22, 2002 XML Publishing, Storage
Lecture 20: Query Execution
Presentation transcript:

Managing XML and Semistructured Data Lecture 18: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

In this lecture Virtual XML Publishing Materialized XML Publishing Resources Efficiently Publishing Relational Data as XML Ducments by Shanmugasundaram, Shekita, Barr, Carey, Lindsay, Pirahesh, Reinwald in VLDB'2000Efficiently Publishing Relational Data as XML Ducments

XML Publishing XML view defined declaratively –SQL extensions [Exodus] –RXL [SilkRoute] Virtual XML publishing –Accept XML queries (e.g. XML-QL), translate to SQL –Main issue: compose queries Materialized XML publishing –Compute entire XML view – large ! –Main issue: compute a large query efficiently

Virtual XML Publishing Eu-Stores US-Stores Products Eu-SalesUS-Sales namecountrynameurl date tax name priceUSD euSidusSid pid Legacy data in E/R:

Virtual XML Publishing XML view France Nicolas Blanc de Blanc 10/10/ /10/2000 … … … …. … In summary: group by country store product

allsales country namestore nameproduct namesold datetax url PCDATA * * * * ? ? Output “schema”:

{ FROM EuStores $S, EuSales $L, Products $P WHERE $S.euSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT $S.country $S.name $P.name $P.priceUSD } /* union….. */ { FROM EuStores $S, EuSales $L, Products $P WHERE $S.euSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT $S.country $S.name $P.name $P.priceUSD } /* union….. */ Virtual XML Publishing In SilkRoute

Virtual XML Publishing …. /* union */ { FROM USStores $S, EuSales $L, Products $P WHERE $S.usSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT USA $S.name $S.url $P.name $P.priceUSD $L.tax } …. /* union */ { FROM USStores $S, EuSales $L, Products $P WHERE $S.usSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT USA $S.name $S.url $P.name $P.priceUSD $L.tax }

Non-recursive datalog (SELECT DISTINCT … ) allsales() country(c) name(c)store(c,x) name(n)product(c,x,y) name(n)sold(c,x,y,d) date(c,x,y,d) Tax(c,x,y,d,t) url(c,x,u) c n n d t u Internal Representation country(c) :-EuStores(x,_,c), EuSales(x,y,_), Products(y,_,_) country(“USA”) :- store(c,x) :- EuStores(x,_,c), EuSales(x,y,_), Products(y,_,_) store(c,x) :- USStores(x,_,_), USSales(x,y,_), Products(y,_,_), c=“USA” url(c,x,u):-USStores(x,_,u), USSales(x,y,_),Products(y,_,_) allsales():- * * * * ? View Tree:

Virtual XML Publishing Don’t compute the XML data yet Users ask XML queries System composes with the view, sends to the RDBMS Main issue: compose queries

XML Publishing: Virtual View in SilkRoute find names, urls of all stores who sold on 1/1/2000 (in XML-QL / XQuery melange): WHERE 1/1/2000 $X $Y RETURN $X, $Y WHERE 1/1/2000 $X $Y RETURN $X, $Y

name(c) name(n) Tax(c,x,y,d,t) date(c,x,y,d) allsales() country(c) store(c,x) name(n)product(c,x,y) sold(c,x,y,d) url(c,x,u) c n n d t u Query Composition allsales country store product sold date url 1/1/2000 name $X $Y View Tree XML-QL Query Pattern $n1 $n2 $n3 $n4 $n5 $Z “Evaluate” the XML pattern(s) on the view tree, combine all datalog rules

Query Composition Result (in theory…): ( SELECT DISTINCT S.name, S.url FROM USStores S, USSales L, Products P WHERE S.usSid=L.usSid AND L.pid=P.pid AND L.date=‘1/1/2000’) UNION ( SELECT DISTINCT S2.name, S2.url FROM EUStores S1, EUSales L1, Products P1 USStores S2, USSales L2, Products P2, WHERE S1.usSid=L1.usSid AND L1.pid=P1.pid AND L1.date=‘1/1/2000’ AND S2.usSid=L2.usSid AND L2.pid=P1.pid AND S1.country=“USA” AND S1.euSid = S2.usSid) ( SELECT DISTINCT S.name, S.url FROM USStores S, USSales L, Products P WHERE S.usSid=L.usSid AND L.pid=P.pid AND L.date=‘1/1/2000’) UNION ( SELECT DISTINCT S2.name, S2.url FROM EUStores S1, EUSales L1, Products P1 USStores S2, USSales L2, Products P2, WHERE S1.usSid=L1.usSid AND L1.pid=P1.pid AND L1.date=‘1/1/2000’ AND S2.usSid=L2.usSid AND L2.pid=P1.pid AND S1.country=“USA” AND S1.euSid = S2.usSid)

Complexity of XML Publishing But in practice: 5-7 times more joins ! –Need query minimization Could this be avoided ? –No: it is NP-hard

XML Publishing Is NP-Hard customer ordercomplaint PCDATA ?? order():- Q1 complaint():- Q2 XML query: The composed SQL query is : Minimizing it is NP hard ! (can be shown…) View Tree: WHERE $x $y RETURN ( ) Q1 JOIN Q2

Materialized XML Publishing Efficiently Publishing Relational Data as XML Documents, Shanmugasundaram et al., VLDB’2001 Considers several alternatives, both inside and outside the engine

Materialized XML Publishing Create the structure (i.e. nesting): –Early –Late Add tags: –Early –Late Do this: –Inside relational engine –Outside relational engine Note: may add tags only after structuring has completed

Example CONSTRUCT FROM EuStores $S CONSTRUCT $S.name FROM Owners $O WHERE $S.oID = $O.oID CONSTRUCT $O.name FROM EuSales $L, Products $P WHERE $S.euSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT $P.name $P.priceUSD CONSTRUCT FROM EuStores $S CONSTRUCT $S.name FROM Owners $O WHERE $S.oID = $O.oID CONSTRUCT $O.name FROM EuSales $L, Products $P WHERE $S.euSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT $P.name $P.priceUSD

Early Structuring, Early Tagging The Stored Procedure Approach Advantage: very simple Disadvantage: multiple SQL queries submitted XMLObject result = “ ” SQLCursor C1 = “Select S.sid, S.name From EuStore S” FOR x IN C1 DO result = result + “ ” + C1.name + “ ” SQLCursor C2 = “Select O.name From Owners O Where O.oid=%C1.oid FOR y IN C2 DO result = result + “ ” + C2.name + “ ” SQLCursor C3 = “Select P.name, P.priceUSD From... Where...” FOR z IN C3 DO result = result + “ ” + P.name +... result = result + “ ” XMLObject result = “ ” SQLCursor C1 = “Select S.sid, S.name From EuStore S” FOR x IN C1 DO result = result + “ ” + C1.name + “ ” SQLCursor C2 = “Select O.name From Owners O Where O.oid=%C1.oid FOR y IN C2 DO result = result + “ ” + C2.name + “ ” SQLCursor C3 = “Select P.name, P.priceUSD From... Where...” FOR z IN C3 DO result = result + “ ” + P.name +... result = result + “ ”

Early Structuring, Early Tagging The correlated CLOB approach Still nested loops... Create large CLOBs – problem for the engine SELECT XMLAGG(STORE(S.name, XMLAGG(OWNER(SELECT O.oID FROM Owners O WHERE S.oID = O.oID)), XMLAGG(PRODUCT(SELECT P.name, P.priceUSD FROM EuSales L, Products P WHERE S.euSid = L.euSid AND L.pid = P.pid))) FROM EuStores S SELECT XMLAGG(STORE(S.name, XMLAGG(OWNER(SELECT O.oID FROM Owners O WHERE S.oID = O.oID)), XMLAGG(PRODUCT(SELECT P.name, P.priceUSD FROM EuSales L, Products P WHERE S.euSid = L.euSid AND L.pid = P.pid))) FROM EuStores S

Early Structuring, Early Tagging The de-correlated CLOB approach GroupBy euSid and XMLAGG (EuStores S1 LEFT OUTER JOIN Owners O ON S1.oId = O.oId) JOIN GroupBy euSid and XMLAGG(EuStores S2 LEFT OUTER JOIN ( SELECT L.euSid, P.name, P.priceUSD FROM EuSales L, Products P WHERE L.pid = P.pid) ON S2.euSid = L.euSid ON S1.euSid = S2.euSid GroupBy euSid and XMLAGG (EuStores S1 LEFT OUTER JOIN Owners O ON S1.oId = O.oId) JOIN GroupBy euSid and XMLAGG(EuStores S2 LEFT OUTER JOIN ( SELECT L.euSid, P.name, P.priceUSD FROM EuSales L, Products P WHERE L.pid = P.pid) ON S2.euSid = L.euSid ON S1.euSid = S2.euSid

Early Structuring, Early Tagging The de-correlated CLOB approach Modify the engine to do groupBy’s and taggings Better than nested loops (why ?) Still large CLOBs Early structuring, early tagging

Late Tagging Idea: create a flat table first, then nest and tag The flat table consists of outer joins and outer unions: –Unsorted  late structuring –Sorted  early structuring

Review of Outer Joins and Outer Unions Left outer join –e.g. R(A,B) S(B,C) = T(A,B,C) AB a1b1 a2b2 a3b3 BC b1c1 b1c2 b3c3 ABC a1b1c1 a1b1c2 a2b2- a3b3c3 =

Review of Outer Joins and Outer Unions Outer union –E.g. R(A,B) outer union S(A,C) = T(A, B, C) AB a1b1 a2b2 AC a3c3 a4c4 a5c5 TagABC 1a1b1- 1a2b2- 2a3-c3 2a4-c4 2a5-c5 = outer union

Late Tagging, Late Structuring Construct the table: Tagging: –Use main memory hash table to group elements on store ID (EuStores LEFT OUTER JOIN Owners) OUTER UNION (EuStores LEFT OUTER JOIN EuSales JOIN Products) (EuStores LEFT OUTER JOIN Owners) OUTER UNION (EuStores LEFT OUTER JOIN EuSales JOIN Products)

Late Tagging, Early Structuring Same table, but now sort by store ID and tag: Constant space tagger (EuStores LEFT OUTER JOIN Owners) OUTER UNION (EuStores LEFT OUTER JOIN EuSales JOIN Products) ORDER BY euSid, tag (EuStores LEFT OUTER JOIN Owners) OUTER UNION (EuStores LEFT OUTER JOIN EuSales JOIN Products) ORDER BY euSid, tag

Materialized XML Publishing SilkRoute, SIGMOD’2001 The outer union / outer join query is large Hard to optimize by some RDBMs Split it in smaller queries, then merge sort the tuple streams Idea: use the view tree; each partition defines a plan

allsales() country(c) name(c)store(c,x) name(n)product(c,x,y) name(n)sold(c,x,y,d) date(c,x,y,d) Tax(c,x,y,d,t) url(c,x,u) c n n d t u View Tree * * * * ? Q1 =...join Q2 =...left outer join Q3 =...join Q4 =...join Q1 Q2 Q3 Q4

In general: –A “1” edge corresponds to a join –A “*” edge corresponds to a left outer join –There are 2 n possible plans Choose best plan using heuristics