Lecture 11: Xpath/XQuery

Slides:



Advertisements
Similar presentations
Spring Part III: Introduction to XPath XML Path Language.
Advertisements

Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.
XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
1 Web Data Management Path Expressions. 2 In this lecture Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul,
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
1 Lecture 6 XML/Xpath/XQuery Wednesday, November 3, 2010 Dan Suciu -- CSEP544 Fall 2010.
1 Part 3: Query Languages Managing XML and Semistructured Data.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
IS432: Semi-Structured Data Dr. Azeddine Chikh. 7. XQuery.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
QSX (LN 3)1 Query Languages for XML XPath XQuery XSLT (not being covered today!) (Slides courtesy Wenfei Fan, Univ Edinburgh and Bell Labs)
1 Lecture 9: XQuery. 2 XQuery Motivation XPath expressivity insufficient –no join queries (as in SQL) –no changes to the XML structure possible –no quantifiers.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 357 Database Systems I Query Languages for XML.
Query Languages - XQuery Slides partially from Dan Suciu.
1 Efficient Processing of XPath Queries Using Indexes Yan Chen 1, Sanjay Madria 1, Kalpdrum Passi 2, Sourav Bhowmick 3 1 Department of Computer Science,
XML May 2 nd, Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.
1 Lecture 11: Xpath/XQuery Friday, October 20, 2006.
Lecture #6 XML November 2 nd, Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework.
1 Lecture 4 XML/Xpath/XQuery Tuesday, January 30, 2007.
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Xpath to XQuery February 23rd, Other Stuff HW 3 is out. Instructions for Phase 3 are out. Today: finish Xpath, start and finish Xquery. From Wednesday:
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
Xquery. Summary of XQuery FLWR expressions FOR and LET expressions Collections and sorting Resource W3C recommendation:
Introduction to XQuery Resources: Official URL: Short intros:
1 XQuery Slides From Dr. Suciu. 2 FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
More XML: semantics, DTDs, XPATH February 18, 2004.
1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.
1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
IS432 Semi-Structured Data Lecture 4: XPath Dr. Gamal Al-Shorbagy.
1 CSE544 XML, XQuery Wednesday, April 5, Announcements Project Ideas are posted (to discuss in class) Groups due today Proposals due Monday Two.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
1 Lecture 5: Relational Algebra and XML Monday, April 26th, 2004.
XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting 2.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.
1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.
1 CSE544 XML, XQuery Monday+Wednesday, April 11+13, 2004.
Lecture 14: Relational Algebra Projects XML?
XML path expressions CSE 350 Fall 2003.
Querying XML and Semistructured Data
XML: Schemas, Queries Wednesday, 4/17/2002
Lecture 12: XML, XPath, XQuery
Lecture 9: XML Monday, October 17, 2005.
Wednesday, May 29, 2002 XML Storage Final Review
XQuery Leonidas Fegaras.
Xquery Slides From Dr. Suciu.
Introduction to Database Systems CSE 444 Lecture 10 XML
Lecture 15: Querying XML Friday, October 27, 2000.
Lecture 11: XML and Semistructured Data
Lecture 13: XQuery XML Publishing, XML Storage
Presentation transcript:

Lecture 11: Xpath/XQuery Wednesday, April 17, 2007

Outline XPath XQuery See recommend readings in previous lecture

Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal will not discuss in class

Sample Data for Queries <bib> <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> </bib>

Data Model for XPath The root The root element . . . . bib book book publisher author . . . . Addison-Wesley Serge Abiteboul

XPath: Simple Expressions /bib/book/year Result: <year> 1995 </year> <year> 1998 </year> Result: empty (there were no papers) /bib/paper/year /bib What’s the difference ? /

XPath: Restricted Kleene Closure //author Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> Result: <first-name> Rick </first-name> /bib//first-name

Xpath: Attribute Nodes /bib/book/@price Result: “55” @price means that price is has to be an attribute

Xpath: Wildcard //author/* Result: <first-name> Rick </first-name> <last-name> Hull </last-name> * Matches any element @* Matches any attribute //author/*

Xpath: Text Nodes /bib/book/author/text() Result: Serge Abiteboul Victor Vianu Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: text() = matches the text value node() = matches any node (= * or @* or text()) name() = returns the name of the current tag

Xpath: Predicates /bib/book/author[firstname] Result: <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author>

Xpath: More Predicates /bib/book/author[firstname][address[.//zip][city]]/lastname Result: <lastname> … </lastname> <lastname> … </lastname> How do we read this ? First remove all qualifiers (predicates): /bib/book/author /lastname Then add them one by one: /bib/book/author[firstname][address]/lastname etc

Xpath: More Predicates /bib/book[@price < 60] /bib/book[author/@age < 25] /bib/book[author/text()]

Xpath: Position Predicates The 2nd book /bib/book[2] /bib/book[last()] The last book The 2nd of all books in 1998 /bib/book[@year = 1998] [2] /bib/book[2][@year = 1998] 2nd book IF it is in 1998

Xpath: More Axes /bib/book[.//review] /bib/book[./review] . means current node /bib/book[./review] /bib/book[review] Same as /bib/author/firstname /bib/author/. /firstname Same as

Xpath: More Axes /bib/author/.. /author/zip /bib/author/zip .. means parent node /bib/author/.. /author/zip /bib/author/zip Same as /bib/book[.//review/../comments] Same as /bib/book[.//*[comments][review]] Hint: don’t use ..

Xpath: Summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib bib/book[@price<“55”]/author/lastname matches…

XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries

FLWR (“Flower”) Expressions FOR ... LET... WHERE... RETURN...

FOR-WHERE-RETURN Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year/text() > 1995 RETURN $x/title Result: <title> abc </title> <title> def </title> <title> ghi </title>

FOR-WHERE-RETURN Equivalently (perhaps more geekish) FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title RETURN $x And even shorter: document("bib.xml")/bib/book[year/text() > 1995] /title

COERCION The query: FOR $x IN document("bib.xml")/bib/book[year > 1995] /title RETURN $x Is rewritten by the system into: FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title RETURN $x

FOR-WHERE-RETURN Find all book titles and the year when they were published: FOR $x IN document("bib.xml")/ bib/book RETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer> Result: <answer> <title> abc </title> <year> 1995 </ year > </answer> <answer> <title> def </title> < year > 2002 </ year > </answer> <answer> <title> ghk </title> < year > 1980 </ year > </answer>

FOR-WHERE-RETURN Notice the use of “{“ and “}” What is the result without them ? FOR $x IN document("bib.xml")/ bib/book RETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer> <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>

Nesting For each author of a book by Morgan Kaufmann, list all books she published: FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/author RETURN <result> { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } </result> In the RETURN clause comma concatenates XML fragments

Result <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <author> Smith </author> <title> ghi </title>

Aggregates Find all books with more than 3 authors: FOR $x IN document("bib.xml")/bib/book WHERE count($x/author)>3 RETURN $x count = a function that counts avg = computes the average sum = computes the sum distinct-values = eliminates duplicates

Aggregates Same thing: FOR $x IN document("bib.xml")/bib/book[count(author)>3] RETURN $x

Eliminating Duplicates Print all authors: FOR $a IN distinct-values($b/book/author/text()) RETURN <author> { $a } </author> Note: distinct-values applies ONLY to values, NOT elements

The LET Clause Find books whose price is larger than average: FOR $b in document(“bib.xml”)/bib LET $a:=avg($b/book/price/text()) FOR $x in $b/book WHERE $x/price/text() > $a RETURN $x

Flattening Compute a list of (author, title) pairs Input: <book> <title> Databases </title> <author> Widom </author> <author> Ullman </author> </answer> FOR $b IN document("bib.xml")/bib/book, $x IN $b/title/text(), $y IN $b/author/text() RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer> Output: <answer> <title> Databases </title> <author> Widom </author> </answer> <answer> <title> Databases </title> <author> Ullman </author> </answer>

Re-grouping For each author, return all titles of her/his books Result: <answer> <author> efg </author> <title> abc </title> <title> klm </title> . . . . </answer> FOR $b IN document("bib.xml")/bib, $x IN $b/book/author/text() RETURN <answer> <author> { $x } </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer> What about duplicate authors ?

Re-grouping Same, but eliminate duplicate authors: FOR $b IN document("bib.xml")/bib LET $a := distinct-values($b/book/author/text()) FOR $x IN $a RETURN <answer> <author> $x </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

Re-grouping Same thing: FOR $b IN document("bib.xml")/bib, $x IN distinct-values($b/book/author/text()) RETURN <answer> <author> $x </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

SQL and XQuery Side-by-side Find all product names, prices, sort by price Product(pid, name, maker, price) FOR $x in document(“db.xml”)/db/Product/row ORDER BY $x/price/text() RETURN <answer> { $x/name, $x/price } </answer> XQuery SELECT x.name, x.price FROM Product x ORDER BY x.price SQL

Xquery’s Answer <answer> <name> abc </name> <price> 7 </price> </answer> <answer> <name> def </name> <price> 23 </price> </answer> . . . . Notice: this is NOT a well-formed document ! (WHY ???)

Producing a Well-Formed Answer <myQuery> { FOR $x in document(“db.xml”)/db/Product/row ORDER BY $x/price/text() RETURN <answer> { $x/name, $x/price } </answer> } </myQuery>

Xquery’s Answer <myQuery> <answer> <name> abc </name> <price> 7 </price> </answer> <answer> <name> def </name> <price> 23 </price> </answer> . . . . </myQuery> Now it is well-formed !

SQL and XQuery Side-by-side Product(pid, name, maker, price) Company(cid, name, city, revenues) Find all products made in Seattle FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle” RETURN { $x/name } XQuery SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city=“Seattle” SQL FOR $y in /db/Company/row[city/text()=“Seattle”], $x in /db/Product/row[maker/text()=$y/cid/text()] RETURN { $x/name } Cool XQuery

<product> <row> <pid> 123 </pid> <name> abc </name> <maker> efg </maker> </row> <row> …. </row> … </product> <product> . . . </product> . . . .

SQL and XQuery Side-by-side For each company with revenues < 1M count the products over $100 SELECT y.name, count(*) FROM Product x, Company y WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000 GROUP BY y.cid, y.name FOR $r in document(“db.xml”)/db, $y in $r/Company/row[revenue/text()<1000000] RETURN <proudCompany> <companyName> { $y/name/text() } </companyName> <numberOfExpensiveProducts> { count($r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) } </numberOfExpensiveProducts> </proudCompany>

SQL and XQuery Side-by-side Find companies with at least 30 products, and their average price SELECT y.name, avg(x.price) FROM Product x, Company y WHERE x.maker=y.cid GROUP BY y.cid, y.name HAVING count(*) > 30 An element FOR $r in document(“db.xml”)/db, $y in $r/Company/row LET $p := $r/Product/row[maker/text()=$y/cid/text()] WHERE count($p) > 30 RETURN <theCompany> <companyName> { $y/name/text() } </companyName> <avgPrice> avg($p/price/text()) </avgPrice> </theCompany> A collection