Lecture 13: XQuery XML Publishing, XML Storage

Slides:

Advertisements

Similar presentations

XML: Extensible Markup Language

Advertisements

Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.

XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !

XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.

CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.

Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.

1 Part 3: Query Languages Managing XML and Semistructured Data.

Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.

Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.

Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?

1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.

1 Lecture 12: XQuery in SQL Server Monday, October 23, 2006.

1 Lecture 9: XQuery. 2 XQuery Motivation XPath expressivity insufficient –no join queries (as in SQL) –no changes to the XML structure possible –no quantifiers.

1 Managing XML and Semistructured Data Part 1: Preliminaries, Motivation and Overview Acknowledgement: Part of the materials in this set of XML slides.

1 COS 425: Database and Information Management Systems XML and information exchange.

Query Languages - XQuery Slides partially from Dan Suciu.

XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.

1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.

1 Lecture 11: Xpath/XQuery Friday, October 20, 2006.

1 Lecture 4 XML/Xpath/XQuery Tuesday, January 30, 2007.

XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.

End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =

XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.

Xpath to XQuery February 23rd, Other Stuff HW 3 is out. Instructions for Phase 3 are out. Today: finish Xpath, start and finish Xquery. From Wednesday:

1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.

Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.

1 Lecture 12: XML Publishing, XML Storage Monday, October 24, 2005.

Xquery. Summary of XQuery FLWR expressions FOR and LET expressions Collections and sorting Resource W3C recommendation:

Introduction to XQuery Resources: Official URL: Short intros:

1 XQuery Slides From Dr. Suciu. 2 FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...

XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XQuery.

Company LOGO OODB and XML Database Management Systems – Fall 2012 Matthew Moccaro.

End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...

Database Systems Part VII: XML Querying Software School of Hunan University

Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)

PROCESSING AND QUERYING XML 1. ROADMAP Models for Parsing XML Documents XPath Language XQuery Language XML inside DBMSs 2.

XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.

More XML: semantics, DTDs, XPATH February 18, 2004.

1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.

XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &

1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

1 CSE544 XML, XQuery Wednesday, April 5, Announcements Project Ideas are posted (to discuss in class) Groups due today Proposals due Monday Two.

XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting 2.

Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.

1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.

1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.

1 CSE544 XML, XQuery Monday+Wednesday, April 11+13, 2004.

Lecture 10 XML Monday, Oct. 21, 2001.

Lecture 11: Xpath/XQuery

Querying XML and Semistructured Data

Lecture 15: Midterm Review

Database Processing with XML

Lecture 11 XML Wednesday, Oct. 24, 2001.

XML: Schemas, Queries Wednesday, 4/17/2002

eXtensible Markup Language (XML)

Lecture 12: XML, XPath, XQuery

Introduction to Database Systems CSE 444 Lecture 12 More Xquery and Xquery in SQL Server April 25, 2008.

Lecture 9: XML Monday, October 17, 2005.

Wednesday, May 29, 2002 XML Storage Final Review

Lecture 8: XML Data Wednesday, October

Lecture 24: Final Review Friday, March 10, 2006.

Xquery Slides From Dr. Suciu.

Wednesday, May 22, 2002 XML Publishing, Storage

Introduction to Database Systems CSE 444 Lecture 10 XML

Lecture 12: XQuery in SQL Server

Introduction to Database Systems CSE 444 Lecture 12 Xquery in SQL Server October 22, 2007.

Lecture 11: XML and Semistructured Data

Lecture 14: XML Publishing & Storage Midterm Review

Presentation transcript:

Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002

FLWR (“Flower”) Expressions FOR ... LET... WHERE... RETURN...

XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year/text() > 1995 RETURN { $x/title } Result: <title> abc </title> <title> def </title> <title> ghi </title>

XQuery Find book titles by the coauthors of “Database Theory”: FOR $x IN bib/book[title/text() = “Database Theory”] $y IN bib/book[author/text() = $x/author/text()] RETURN <answer> { $y/title/text() } </answer> Result: <answer> abc </ answer > < answer > def </ answer > < answer > abc </ answer > < answer > ghk </ answer > Question: Why do we get duplicates ?

XQuery Same as before, but eliminate duplicates: FOR $x IN bib/book[title/text() = “Database Theory”]/author/text() $y IN distinct(bib/book[author/text() = $x] /title/text()) RETURN <answer> { $y } </answer> Result: <answer> abc </ answer > < answer > def </ answer > < answer > ghk </ answer > distinct = a function that eliminates duplicates Need to apply to a collection of text values, not of elements – note how query has changed

SQL and XQuery Side-by-side Product(pid, name, maker) Company(cid, name, city) Find all products made in Seattle FOR $x in /db/Product/row $y in /db/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle” RETURN { $x/name } SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city=“Seattle” SQL XQuery FOR $y in /db/Company/row[city/text()=“Seattle”] $x in /db/Product/row[maker/text()=$y/cid/text()] RETURN { $x/name } Cool XQuery

XQuery: Nesting For each author of a book by Morgan Kaufmann, list all books she published: FOR $a IN /bib/book[publisher /text()=“Morgan Kaufmann”]/author) RETURN <result> { $a, FOR $t IN /bib/book[author/text()=$a/text()]/title RETURN $t } </result> In the RETURN clause comma concatenates XML fragments

XQuery Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <author> Smith </author> <title> ghi </title>

XQuery FOR $x in expr -- binds $x to each value in the list expr LET $x := expr -- binds $x to the entire list expr Useful for common subexpressions and for aggregations

XQuery Find books whose price is larger than average: LET $a:=avg(/bib/book/price/text()) FOR $b in /bib/book WHERE $b/price/text() > $a RETURN { $b }

XQuery Find all publishers that published more than 100 books: <big_publishers> { FOR $p IN distinct(//publisher/text()) LET $b := document("bib.xml")/book[publisher/text() = $p] WHERE count($b) > 100 RETURN <publisher> { $p } </publisher> } </big_publishers> $b is a collection of elements, not a single element count = a (aggregate) function that returns the number of elms

XQuery Summary: FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses List of tuples WHERE Clause List of tuples RETURN Clause Instance of Xquery data model

FOR v.s. LET FOR Binds node variables  iteration LET Binds collection variables  one value

FOR v.s. LET FOR $x IN /bib/book Returns: <result> <book>...</book></result> ... FOR $x IN /bib/book RETURN <result> { $x } </result> LET $x := /bib/book RETURN <result> { $x } </result> Returns: <result> <book>...</book> <book>...</book> ... </result>

Collections in XQuery Ordered and unordered collections /bib/book/author/text() = an ordered collection: result is in document order distinct(/bib/book/author/text()) = an unordered collection: the output order is implementation dependent LET $a := /bib/book  $a is a collection $b/author  a collection (several authors...) Returns: <result> <author>...</author> <author>...</author> ... </result> RETURN <result> { $b/author } </result>

The Role of XML Data XML is designed for data exchange, not to replace relational or E/R data Sources of XML data: Created manually with text editors: not really data Generated automatically from relational data (will discuss next) Text files, replacing older data formats: Web server logs, scientific data (biological, astronomical) Stored/processed in native XML engines: very few applications need that today

XML from/to Relational Data XML publishing: relational data  XML XML storage: XML  relational data

XML Publishing XML Tuple streams Web Relational XML publishing Database XML publishing Application Xpath/ XQuery SQL

XML Publishing Exporting the data is easy: we do this already for HTML Translating XQuery  SQL is hard XML publishing systems: Research: Experanto (IBM/DB2), SilkRoute (AT&T Labs and UW) XQuery  SQL Commercial: SQL Server, Oracle only Xpath  SQL and with restrictions

XML Publishing Relational schema: Student(sid, name, address) Will follow SilkRoute, more or less Relational schema: Student(sid, name, address) Course(cid, title, room) Enroll(sid, cid, grade) enroll student course

XML Publishing Group by courses: redundant representation of students <xmlview> <course> <title> Operating Systems </title> <room> MGH084 </room> <student> <name> John </name> <address> Seattle </address > <grade> 3.8 </grade> </student> <student> …</student> … </course> <course> <title> Database </title> <room> EE045 </room> <student> <name> Mary </name> <address> Shoreline </address > <grade> 3.9 </grade> </xmlview> Group by courses: redundant representation of students Other representations possible too

XML Publishing First thing to do: design the DTD: <!ELEMENT xmlview (course*)> <!ELEMENT course (title, room, student*)> <!ELEMENT student (name,address,grade)> <!ELEMENT name (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ELEMENT grade (#PCDATA)>

Now we write an XQuery to export relational data  XML Note: result is is the right DTD <xmlview> { FOR $x IN /db/Course/row RETURN <course> <title> { $x/title/text() } </title> <room> { $x/room/text() } </room> { FOR $y IN /db/Enroll/row[cid/text() = $x/cid/text()]/row $z IN /db/Student/row[sid/text() = $y/sid/text()]/row RETURN <student> <name> { $z/name/text() } </name> <address> { $z/address/text() } </address> <grade> { $y/grade/text() } </grade> </student> } </course> } </xmlview>

SilkRoute does this automatically XML Publishing Query: find Mary’s grade in Operating Systems XQuery FOR $x IN /xmlview/course[title/text()=“Operating Systems”], $y IN $x/student/[name/text()=“Mary”] RETURN <answer> $y/grade/text() </answer> SilkRoute does this automatically SELECT Enroll.grade FROM Student, Enroll, Course WHERE Student.name=“Mary” and Course.title=“OS” and Student.sid = Enroll.sid and Enroll.cid = Course.cid SQL

XML Publishing How do we choose the output structure ? Determined by agreement, with our partners, or dictated by committees XML dialects (called applications) = DTDs XML Data is often nested, irregular, etc No normal forms for XML

XML Storage Often the XML data is small and is parsed directly into the application (DOM API) Sometimes it is big, and we need to store it in a database The XML storage problem: How do we choose the schema of the database ? Much harder than XML publishing (why ?)

XML Storage Two solutions: Schema derived from DTD Storing XML as a graph: “Edge relation”

Designing a Schema from DTD Design a relational schema for: <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office?, phone*)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, ((price,availability)|description))> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]>

Designing a Schema from DTD <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office?, phone*)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, ((price,availability)|description))> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]> First, construct the DTD graph: company * * person product * ssn name office phone pid price avail. descr.

Designing a Schema from DTD Next, design the relational schema, using common sense. company * * person product * ssn name office phone pid price avail. descr. Person(ssn, name, office) Phone(ssn, phone) Product(pid, name, price, avail., descr.) Which attributes may be null ?

Designing a Schema from DTD What happens to queries: FOR $x IN /company/product[description] RETURN <answer> { $x/name, $x/description } </answer> SELECT Product.name, Product.description FROM Product WHERE Product.description IS NOT NULL

Storing XML as a Graph Every XML instance is a tree Hence we can store it as any graph, using an Edge table In addition we need a Value table to store the data values (#PCDATA)

Storing XML as a Graph Edge Value db book publisher title author state “Complete Guide to DB2” “Chamberlin” “Transaction Processing” “Bernstein” “Newcomer” “Morgan Kaufman” “CA” 1 2 3 4 5 6 7 8 9 10 11 Edge Source Tag Dest db 1 book 2 title 3 author 4 5 6 7 . . . Value Source Val 3 Complete guide . . . 4 Chamberlin 6 . . .

Storing XML as a Graph FOR $x IN /db/book[author/text()=“Chamberlin”] What happens to queries: FOR $x IN /db/book[author/text()=“Chamberlin”] RETURN $x/title

Storing XML as a Graph What happens to queries: SELECT vtitle.value FROM Edge xdb, Edge xbook, Edge xauthor, Edge xtitle, Value vauthor, Value vtitle WHERE xdb.source=0 and xdb.tag = ‘db’ and xdb.dest = xbook.source and xbook.tag = ‘book’ and xbook.dest = xauthor.source and xauthor.tag = ‘author’ and xbook.dest = xtitle.source and xtitle.tag = ‘title’ and xauthor.dest = vauthor.source and vauthor.value = ‘Chamberin and xtitle.dest = vtitle.source

Storing XML as a Graph Edge relation summary: Same relational schema for every XML document: Edge(Source, Tag, Dest) Value(Source, Val) Generic: works for every XML instance But inefficient: Repeat tags multiple times Need many joins to reconstruct data