Wednesday, May 22, 2002 XML Publishing, Storage

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Managing XML and Semistructured Data Lecture 8: Query Languages - XML-QL Prof. Dan Suciu Spring 2001.
1 Managing XML and Semistructured Data Part 1: Preliminaries, Motivation and Overview Acknowledgement: Part of the materials in this set of XML slides.
Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data.
Storage of XML Data XML data can be stored in –Non-relational data stores Flat files –Natural for storing XML –But has all problems discussed in Chapter.
2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.
Database Systems and XML David Wu CS 632 April 23, 2001.
Bridging Relational Technology and XML Jayavel Shanmugasundaram University of Wisconsin & IBM Almaden Research Center.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001.
Managing XML and Semistructured Data Lecture 18: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.
8/17/20151 Querying XML Database Using Relational Database System Rucha Patel MS CS (Spring 2008) Advanced Database Systems CSc 8712 Instructor : Dr. Yingshu.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
Introduction to XQuery Resources: Official URL: Short intros:
Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
1 What Is XML? eXtensible Markup Language for data –Standard for publishing and interchange –“Cleaner” SGML for the Internet Applications: –Data exchange.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Publishing Relational Data in XML David McWherter.
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Web Data and the Resurrection of Database Theory Dan Suciu University of Washington.
1 Lecture 15 Monday, May 20, 2002 Size Estimation, XML Processing.
XML Storage We must upgrade to XML. Everyone is talking about it. Well, that is going to cost us XXX on YYY and earn us WWW on ZZZ.
Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
CS 540 Database Management Systems
XML path expressions CSE 350 Fall 2003.
Management of XML and Semistructured Data
Efficiently Publishing Relational Data as XML Documents
Universal Database Systems
Lecture 15: Midterm Review
Relational Algebra Chapter 4, Part A
Evaluation of Relational Operations
Storing and Querying XML Documents Without Using Schema Information
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
(b) Tree representation
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
SilkRoute: A Framework for Publishing Rational Data in XML
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Introduction to Database Systems CSE 444 Lecture 23: Final Review
Semi-Structured data (XML Data MODEL)
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Lecture 30: Final Review Wednesday, December 6, 2000.
Relational Algebra Friday, 11/14/2003.
Lecture 13: Query Execution
Wednesday, May 29, 2002 XML Storage Final Review
Lecture 23: Query Execution
Evaluation of Relational Operations: Other Techniques
Lecture 24: Final Review Friday, March 10, 2006.
Lecture 30: Final Review Wednesday, December 10, 2003.
Lecture 22: Query Execution
Lecture 14: Database Theory in XML Processing
Introduction to Database Systems CSE 444 Lecture 23: Final Review
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Lecture 14: XML Publishing & Storage Midterm Review
Lecture 14: SQL Wednesday, October 31, 2001.
Lecture 29: Final Review Wednesday, December 11, 2002.
Lecture 20: Query Execution
Presentation transcript:

Wednesday, May 22, 2002 XML Publishing, Storage Lecture 16 Wednesday, May 22, 2002 XML Publishing, Storage

Virtual XML Publishing Don’t compute the XML data yet Users ask XML queries System composes with the view, sends to the RDBMS Main issue: compose queries

Materialized XML Publishing Efficiently Publishing Relational Data as XML Documents, Shanmugasundaram et al., VLDB’2001 Considers several alternatives, both inside and outside the engine

Materialized XML Publishing Create the structure (i.e. nesting): Early Late Add tags: Do this: Inside relational engine Outside relational engine Note: may add tags only after structuring has completed

Example <allsales> for $S in db/EuStores return <store> <name> $S/name </name> for $O in db/Owners where $S/oID = $O/oID return <owner> $O/name </owner> for $L in EuSales, $P in Products where $S/euSid = $L/euSid AND $L/pid = $P/pid return <product> <name> $P/name </name> <price> $P/priceUSD </price> </product> </store> </allsales>

Early Structuring, Early Tagging The Stored Procedure Approach Advantage: very simple Disadvantage: multiple SQL queries submitted XMLObject result = “<allsales>” SQLCursor C1 = “select S.sid, S.name from EuStore S” for x in C1 do result = result + “<name>” + C1.name + “</name>” SQLCursor C2 = “select O.name from Owners O where O.oid=%C1.oid” for y in C2 do result = result + “<owner>” + C2.name + “</owner>” SQLCursor C3 = “select P.name, P.priceUSD from ... Where ...” for z in C3 do result = result + “<product> <name>” + P.name + ... result = result + “</allsales>”

Early Structuring, Early Tagging The correlated CLOB approach select XMLAGG(STORE(S.name, (select XMLAGG(OWNER(O.oID)) from Owners O where S.oID = O.oID), (select XMLAGG(PRODUCT(P.name, P.priceUSD)) from EuSales L, Products P where S.euSid = L.euSid AND L.pid = P.pid))) from EuStores S

Early Structuring, Early Tagging The correlated CLOB approach Still nested loops... Create large CLOBs – problem for the engine procedure OWNER(id : varchar(20)) { return “<owner>” + id + “</owner>” } procedure PRODUCT(name : varchar(20), price: integer) { return “<product> <name>” + name + “</name>” + “<price>” + price + “</price> </product>” } XMLAGG = builtin aggregate operator; concatenates all strings in a set of strings

Early Structuring, Early Tagging The de-correlated CLOB approach GroupBy euSid and XMLAGG (EuStores S1 LEFT OUTER JOIN Owners O ON S1.oId = O.oId) JOIN GroupBy euSid and XMLAGG(EuStores S2 LEFT OUTER JOIN ( SELECT L.euSid, P.name, P.priceUSD FROM EuSales L, Products P WHERE L.pid = P.pid) ON S2.euSid = L.euSid ON S1.euSid = S2.euSid

Early Structuring, Early Tagging The de-correlated CLOB approach Modify the engine to do groupBy’s and taggings Better than nested loops (why ?) Still large CLOBs Early structuring, early tagging

Late Tagging Idea: create a flat table first, then nest and tag The flat table consists of outer joins and outer unions: Unsorted  late structuring Sorted  early structuring

Review of Outer Joins and Outer Unions Left outer join e.g. R(A,B) S(B,C) = T(A,B,C) A B C a1 b1 c1 c2 a2 b2 - a3 b3 c3 A B a1 b1 a2 b2 a3 b3 B C b1 c1 c2 b3 c3 =

Review of Outer Joins and Outer Unions E.g. R(A,B) outer union S(A,C) = T(A, B, C) Tag A B C 1 a1 b1 - a2 b2 2 a3 c3 a4 c4 a5 c5 A C a3 c3 a4 c4 a5 c5 A B a1 b1 a2 b2 outer union =

Late Tagging, Late Structuring Construct the table: Tagging: Use main memory hash table to group elements on store ID (EuStores LEFT OUTER JOIN Owners) OUTER UNION (EuStores LEFT OUTER JOIN EuSales JOIN Products)

Late Tagging, Early Structuring Same table, but now sort by store ID and tag: Constant space tagger (EuStores LEFT OUTER JOIN Owners) OUTER UNION (EuStores LEFT OUTER JOIN EuSales JOIN Products) ORDER BY euSid, tag

Materialized XML Publishing SilkRoute, SIGMOD’2001 The outer union / outer join query is large Hard to optimize by some RDBMs Split it in smaller queries, then merge sort the tuple streams Idea: use the view tree; each partition defines a plan

View Tree Q1 Q2 Q3 Q4 Q1 = ...join Q2 = ...left outer join allsales Q1 * country * Q2 name store c * ? name sale url Q3 n u * name sold Q4 Q1 = ...join Q2 = ...left outer join Q3 = ...join Q4 = ...join n date tax d t

Choose best plan using heuristics In general: A “1” edge corresponds to a join A “*” edge corresponds to a left outer join There are 2n possible plans Choose best plan using heuristics

XML Storage in a Relational DB Use generic schema [Florescu, Kossman 1999] Use DTD to derive schema [Shanmugasundaram, et al. 1999] Use data mining to derive schema [Deutsch, Fernandez, Suciu 1999] Use the Path table [T.Amagasa, T.Shimura, S.Uemura 2001]

XML Stoarge: Ternary Relation [Florescu, Kossman 1999] Use generic relational schema (independent on the XML schema): Ref(source,label,dest) Val(node,value)

XML Stoarge: Ternary Relation Ref Val &o1 paper &o2 year title author author &o3 &o4 &o5 &o6 “The Calculus” “…” “…” “1986” [Florescu, Kossman 1999]

XML Stoarge: Ternary Relation Xpath to SQL translation: Xpath: SQL: /paper[year=“1986”]/author Select . . . . . . . . . . . . . . From . . . . . . . . . . . . . . . Where . . . . . . . . . . . . . .

XML Stoarge: Ternary Relation In practice may need more table: RefTag1(source,dest) RefTag2(source,dest) … IntVal(node,intVal) RealVal(node,realVal)

XML Storage: DTD to Schema [Christophides, Abiteboul, Cluet, Scholl 1994] [Shanmugasundaram, Tufte, He, Zhang, DeWitt, Naughton 1999] Idea: use the XML schema to derive the relational schema

XML Storage: DTD to Schema Relational schema: <!ELEMENT paper (title, author*, year?)> <!ELEMENT author (firstName, lastName)> Paper(pid, title, year) Author(aid, pid, firstName, lastName)

XML Storage: DTD to Schema Xpath to SQL translation: Xpath: SQL: /paper[year=“1986”]/author Select . . . . . . . . . . . . . . From . . . . . . . . . . . . . . . Where . . . . . . . . . . . . . .

XML Storage: Data Mining to Schema [Deutsch, Fernandez, Suciu 1999] Given: One large XML data instance No schema/DTD Query workload Problem: find a “good” relational schema for it Notice: even when a DTD is present, it may be imprecise: E.g. when a person may have 1-3 phones: phone*

XML Storage: Data Mining to Schema Paper1 Paper2 paper author title year fn ln [Deutsch, Fernandez, Suciu 1999]

XML Storage: Data Mining to Schema Xpath to SQL translation: Xpath: SQL: /paper[year=“1986”]/author

XML Storage: the Path Relation Method [T.Amagasa, T.Shimura, S.Uemura 2001] Store paths as strings Xpath expressions become the SQL like operator Additional information for parent/child, ancestor/descendant relationship

XML Storage: the Path Relation Method pathID Pathexpr 1 #/bib 2 #/bib#/paper 3 #/bib#/paper#/author 4 #/bib#/paper#/title 5 #/bib#/paper#/year 6 #/bib#/book#/author 7 #/bib#/book#/title 8 #/bib#/book#/publisher Path One entry for every path in the database Relatively small

XML Storage: the Path Relation Method Element NodeID pathID Start End ParentID 1 1000 - 2 5 200 3 8 20 4 21 30 31 100 6 101 150 7 151 180 300 500 . . . One entry for every element in the database Relatively large

XML Storage: the Path Relation Method NodeID Val 3 Smith 4 Vance 5 Tim 6 Wallace 7 The Best Cooking Book Ever 8 2 . . . Val One entry for every leaf in the database Relatively large

XML Storage: the Path Relation Method Xpath to SQL translation: Xpath: SQL: /bib/paper[year=“1986”]//figure Select . . . . . . . . . . . . . . From . . . . . . . . . . . . . . . Where . . . . . . . . . . . . . .