Query Languages for XML: XQuery Adrian Pop, Paul Pop Computer and Information Science Dept. Linköpings universitet
2 Outline Motivation XML applications, types of queries Approaches Requirements on a query language Path expressions, the basic building block XML query languages: XML-QL, YATL, Lorel, XQL XQuery Background, history Concepts, examples FLWR expressions FOR and LET expressions Collections and sorting Available software, demo Examples XQuery vs. XSLT Summary
3 Motivation XML applications Representing many types of information, many sources Structured and semi-structured documents Relational databases Object repositories Information has to be Accessed, filtered, grouped, transformed, etc. Query languages are needed! Retrieve and interpret information Diverse sources Querying a database is different from transforming a document
4 Document World vs. Database World Two worlds, two querying approaches XML-as-document Roots in SGML Queried using path expressions XML-as-data Middleware, interface to databases Queried with SQL-like constructs XML query language has to work in both worlds A query language for XML should work across all types of XML data sources and applications Problem Exiting query languages designed for specific types of data Robust for those types, weak for other
5 Types of Queries W3C specification: Important classes of queries Filtering Compute a table of contents for a document Joins Combine data from multiple sources in a single result Grouping Forming data into groups, applying aggregate function like “average” or “count” Queries on sequence Queries where the sequence, hierarchy, (i.e. precedence relationships) are important
6 Requirements on a Query Language Output: a query language should output XML Composition of queries! Views can be defined via a single query Transparent to applications Server-side processing Selection: choosing a document, element, based on content, structure or attributes; Extraction: pulling out particular elements of a document; Reduction: removing selected sub-elements of an element; Restructuring: Constructing a new set of element instances to hold queried data; Combination: Merging two or more elements into one; should all be possible in a single query. No schema required / exploit available schema Queries should work on XML data when there is no schema, DTD Use the exiting schema for detecting errors at compile time
7 Requirements on a Query Language, Cont. Preserve order and association A query should preserve the order of elements, grouping Programmatic manipulation Queries will be constructed via programs, interfaces; programs should in an easy fashion with the representation of queries XML representation Mutually embedding with XML XLink and XPointer cognizant Namespace alias independence A query should not be dependent on namespace aliases local to an XML document Support for new datatypes Suitable for metadata
8 Path Expressions Query language for XML, semi-structured data Semi-structured data modeled as a edge-labeled directed graph Ability to reach to arbitrary depths in the data graph Achieved using “path expressions” Path expressions: basic building block of a query language A sequence of edge labels l 1, l 2, …, l n A query, whose result for a given data graph is a set of nodes Can be specified based on some properties Property of the path: the path must traverse the book edge Property of an individual edge label: the label contains the substring “Victor” Regular expressions are used to describe path properties Limitations Cannot create new nodes in the database Cannot perform “Joins” Cannot test values stored in a database
9 Path Expressions Data, modeled as an edge-labeled directed graph &o1 &o12&o24&o29 &o43 &o70&o71 &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” paper book paper references author title year http author title publisher author title page firstname lastname firstnamelastnamefirst last Bib &o44&o45&o46 &o47&o48 &o49 &o50 &o51 &o52 Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}
10 Regular Path Expressions R ::= label | _ | R.R | (R|R) | R* | R+ | R? Examples: Bib.(paper|book).author Bib.book.author.lastname? Bib.book.(references)*.author Bib.(_)*.zip
11 XML Query Languages Semistructured databases XML-QL A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for XML, YATL S. Cluet, S. Jacqmin and J. Siméon The New YATL: Design and Specifications. Working draft. Lorel S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The Lorel query language for semistructured data, ftp://db.stanford.edu/pub/papers/lorel96.ps Structured text, search techniques XQL J. Robie. The design of XQL, 1999, design.html
12 XML Query Examples TCP/IP Illustrated Stevens W. Addison-Wesley Advanced Programming the Unix environment Stevens W. Addison-Wesley Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers The Economics of Technology and Content for Digital TV Gerbarg Darcy CITI Kluwer Academic Publishers Example data: list of books
13 XML Query Examples, Cont. Query List books published by Addison-Wesley after 1991, including their year and title. Result: TCP/IP Illustrated Advanced Programming in the Unix environment
14 Features of Query Languages A query has three parts pattern clause matches nested elements in the input document and binds variables filter clause tests the bound variables constructor clause specifies the result in terms of the bound variables Join operator Combine data from different portions of documents Path expressions Querying without precise knowledge of the document structure Other useful features: to check for the absence of information, e.g., missing fields. Use of arbitrary external functions, such as aggregation functions, string comparison functions, etc. Use of navigation operators, simplify handling data with references.
15 XML-QL CONSTRUCT { WHERE $t Addison-Wesley IN " $y > 1991 CONSTRUCT $t } patterns and filters appear in the WHERE clause the constructor appears in the CONSTRUCT clause The result of the inner WHERE clause is a relation, that maps variables to tuples of values that satisfy the clause all pairs of year and title values bound to ($y, $t) that satisfy the clause The result contains one element for each book that satisfies the WHERE clause of the inner query, one for each pair ($y, $t)
16 YATL make bib [ *book [ $y ], title [ $t ] ] ] match " with bib [ *book [ $y ], title [ $t ] ], publisher [ name [ $n ] ] ] where $n = "Addison-Wesley" and $y > 1991 the constructor appears in the make clause patterns appear in the match clause a bib element may have many book elements, but that each book element has one year attribute, one publisher element, and one title element filters appear in the where clause
17 Lorel select xml(bib:{ (select title:t}) from bib.book b, b.title t, b.year y where b.publisher = "Addison-Wesley" and y > 1991)}) constructor appears in the select clause patterns appear in the from clause both patterns and filters appear in the where clause. bib is used as the entry point for the data in the XML document The from clause binds variables to the element ids of elements denoted by the given pattern, and the where clause selects those elements that satisfy the given filters The select clause constructs a new XML book element with a year attribute and a title element.
18 XQL document(" { book[publisher/name="Addison-Wesley" | title } XQL: from the “document world” The pattern document(" selects all top-level bib elements evaluates the nested expression for each such element selects the book elements that are children of a bib element and that satisfy the filter clause in brackets XQL does not have a constructor clause; the pattern expressions determine the result of the query the inner-most expression: the book's year attribute and title element
19 XQuery: An XML Query Language W3C standard Derived from Quilt Jonathan Robie, Don Chamberlin, and Daniela Florescu Based on XML-QL Relevant W3C documents XML Query Requirements XML Query Use Cases XQuery 1.0: An XML Query Language XQuery 1.0 and XPath 2.0 Data Model XQuery 1.0 Formal Semantics XML Syntax for XQuery 1.0 (XQueryX)
20 XQuery { for $b in //bib/book where $b/publisher = "Addison-Wesley" and > 1991 return { $b/title } Overview Path expressions: XPath FLWR (“flower”) expressions FOR vs. LET expressions Collections and sorting Other constructs
21 XPath W3C Standard Building block for other W3C standards: XSL Transformations (XSLT) XML Link (XLink) XML Pointer (XPointer) XML Query Was originally part of XSL
22 XPath Overview bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a book a price attribute price attribute in book, in bib
23 FLWR (“Flower”) Expressions “Flower” expressions FOR... LET... FOR... LET... WHERE... RETURN... Example: find all books titles published after 1995 FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN $x/title Result: TCP/IP Illustrated Advanced Programming the Unix environment Data on the Web The Economics of Technology and Content …
24 FLWR (“Flower”) Expressions, Cont. FOR $x in expr binds $x to each element in the list expr LET $x = expr binds $x to the entire list expr Useful for common subexpressions and for aggregations FOR/LET Clauses WHERE Clause RETURN Clause List of tuples Instance of XQuery data model
25 FOR vs. LET FOR Query FOR $x IN document("bib.xml")/bib/book RETURN $x Returns... LET Query LET $x := document("bib.xml")/bib/book RETURN $x Returns...
26 More Complex FLWR Expressions For each author of a book by Morgan Kaufmann, list all his/her books: FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t (distinct: eliminates duplicates) Find books whose price is larger than average: LET FOR $b in document("bib.xml")/bib/book WHERE > $a RETURN $b
27 Collections in XQuery Ordered and unordered collections Ordered collection /bib/book/author Unordered collection distinct(/bib/book/author) LET $a = /bib/book$a is a collection $b/author a collection (several authors...) list of n prices * 0.7 list of n numbers * list of n x m numbers
28 Sorting in XQuery Sorting arguments Refer to the name space of the RETURN clause, not the FOR clause To sort on an element you don’t want to display Return it, then remove it with an additional query. FOR $p IN distinct(document("bib.xml")//publisher) RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] RETURN $b/title, SORTBY(price DESCENDING) SORTBY(name)
29 If-Then-Else FOR $h IN //holding RETURN $h/title, IF = "Journal" THEN $h/editor ELSE $h/author SORTBY (title)
30 Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title
31 Other Constructs BEFORE and AFTER for dealing with order in the input FILTER deletes some edges in the result tree Recursive functions Currently: arbitrary recursion Perhaps more restrictions in the future?
32 XQueryX LET $authors := /book/author RETURN { $authors } book author AUTHORS $authors
33 XQuery Software QuiP developer/downloads/default.htm Software AG Windows and Linux on x86 Features Latest W3C syntax Graphical user interface. Kweelt Open Source Runs on all Java platforms Problems Older syntax, from previous W3C requirements. No graphical user interface.
34 Example Application: Cruise Controller Vehicle cruise controller. Modelled with a process graph of 32 processes. Mapped on 5 nodes: CEM, ABS, ETM, ECM, TCM.
35 P1P1 P1P1 P4P4 P4P4 P2P2 P2P2 P3P3 P3P3 m1m1 m2m2 m3m3 m4m4 S1S1 S0S0 Round 1Round 2Round 3Round 4Round 5 P1P1 P4P4 P2P2 m1m1 m2m2 m3m3 m4m4 P3P3 24 ms Schedule Table
36 XML Model of the Cruise Controller architecture.xml I 128 behaviour.xml 7 2 PR3 PR4 0 mapping.xml PR1 PR2 PR30 schedule.xml 0 0 P P1
37 Requirements on the Cruise Controller Requirements on the model The model should be consistent Every process should be mapped to one and only one node Every sensor/actuator should be connected The schedule should be correct The schedule should respect the precedence constraints No two slots in the schedule should overlap Cruise Controller Timing requirements The CC should execute within 100 ms Resource requirements The sum of processes’ memory on a node should not exceed that node's capacity Should be expressed in XQuery!
38 Resource Requirements: Query The sum of processes’ memory on a node should not exceed that node's capacity for $map in document("data/sweb/mapping.xml")//MAP, $nod in = let $proc := = $map/Process] return <processor { for $process in $proc return <process /> }
39 Resource Requirements: Result query result:check_resource_consistency.xml … …
40 Use Case “XMP”: Experiences and Exemplars TCP/IP Illustrated Stevens W. Addison-Wesley Advanced Prog… the Unix environment Stevens W. Addison-Wesley Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers The Economics of Technology and Content for Digital TV Gerbarg Darcy CITI Kluwer Academic Publishers { for $b in document("data/xmp- data.xml")/bib/book where $b/publisher = "Addison-Wesley" and > 1991 return { $b/title } xmp-data.xml XMPQ1.xquery
41 Use Case “XMP”: Experiences and Exemplars TCP/IP Illustrated Advanced Programming in the Unix environment Result of: List books published by Addison-Wesley after 1991, including their year and title.
42 Use Case “TREE”: Qs that preserve hierarchy Data on the Web Serge Abiteboul Peter Buneman Dan Suciu Introduction Text... Audience Text... Web Data and the Two Cultures Text... Traditional client/server architecture Text... …. …. { for $f in document("data/tree-data.xml")//figure return { ) } tree-data.xml TREEQ2.xquery
43 Use Case “TREE”: Qs that preserve hierarchy Traditional client/server architecture Graph representations of structures Examples of Relations Result of: Prepare a (flat) figure list for first book, listing all the figures and their titles. Preserve the original attributes of each element, if any.
44 Use Case “TREE”: Qs that preserve hierarchy ( { count(document("data/tree-data.xml")//section) }, { count(document("data/tree-data.xml")//figure) } ) - <quip:result xmlns:quip=" softwareag.com/tamino/quip/"> 7 3 TREEQ3.xquery/Result TREEQ4.xquery/Result { count(document("data/tree-data.xml")/book/section) } <quip:result xmlns:quip=" softwareag.com/tamino/quip/"> 2
45 Use Case “SEQ”: Queries based on sequence report> Procedure The patient was taken to the operating room where she was placed in supine position and induced under general anesthesia. A Foley catheter was placed to decompress the bladder and the abdomen was then prepped and draped in sterile fashion. A curvilinear incision was made in the midline immediately infraumbilical and the subcutaneous tissue was divided using electrocautery. The fascia was identified and #2 0 Maxon stay sutures were placed on each side of the midline. The fascia was divided using electrocautery and the peritoneum was entered. … for $s in document("data/report1.xml")//section[section. title = "Procedure"] let $instruments := $s//instrument for $i in 1 to 2 return $instruments[$i] Result of: In the Procedure section of Report1, what are the first two Instruments to be used? <quip:result xmlns:quip=" softwareag.com/tamino/quip/"> using electrocautery. electrocautery report1.xml SEQQ2.xquery
46 Use Case “R”: Access to Relational Data USERS USERID NAME RATING U01 Tom Jones B U02 Mary Doe A U03 Dee Linquent D U04 Roger Smith C U05 Jack Sprat B U06 Rip Van Winkle B ITEMS ITEMNO DESCR O_BY DATE PRICE 1001 Red Bicycle U Motorcycle U Old Bicycle U Tricycle U Tennis Racket U Helicopter U Racing Bicycle U Broken Bicycle U BIDS USERID ITEMNO BID BID_DATE U U U U U U …. { for $u in document("users.xml")//user_tuple for $i in document("items.xml")//item_tuple where $u/rating > "C" and $i/reserve_price > 1000 and $i/offered_by = $u/userid return { $u/name } { $u/rating } { $i/description } { $i/reserve_price } } relational data RQ2.xquery
Dee Linquent D Helicopter Result of: Find cases where a user with a rating worse (alphabetically, greater) than "C" is offering an item with a reserve price of more than Use Case “R”: Access to Relational Data
48 { for $u in document("data/R-users.xml")//user_tuple let $b := document("data/R-bids.xml")//bid_tuple[userid = $u/userid and int(string-value(bid)) >= 100] where count($b) > 1 return { $u/name/text() } Result: Mary Doe Dee Linquent Roger Smith Result of: List names of users who have placed multiple bids of at least $100 each. Use Case “R”: Access to Relational Data
49 Use Case “PARTS”: Recursive Parts Explosion partlist> define function one_level(xs:AnyType $p, xs:AnyType $ps) returns xs:AnyType { { $s in return one_level($s,$ps) ) } let $ps := document("data/parts- data.xml")/partlist/part for $p in return one_level($p,$ps) parts-data.xml PARTSQ1a.xquery
50 - Result of: Convert the sample document from "partlist" format to "parttree" format (see DTD section for definitions). In the result document, part containment is represented by containment of one element inside another. Each part that is not part of any other part should appear as a separate top- level element in the output document. Use Case “PARTS”: Recursive Parts Explosion
51 XSLT & XQuery: is there a difference? <xsl:transform xmlns:xsl=" -XSL/Transform" version="1.0"> <xsl:for-each select="document('xmp- data.xml')/bib/book"> <xsl:if test="publisher='Addison-Wesley‘ { for $b in document("data/xmp- data.xml")/bib/book where $b/publisher = "Addison-Wesley" and > 1991 return { $b/title } xslt.xls XMPQ1.xquery
52 XSLT & XQUERY XQuery Result: TCP/IP Illustrated Advanced Programming in the Unix environment XSLT Xalan engine Result: TCP/IP Illustrated Advanced Programming in the Unix environment
53 Summary Motivation XML applications, types of queries Approaches Requirements on a query language Path expressions, the basic building block XML query languages XQuery Background, history Concepts, examples FLWR expressions FOR and LET expressions Collections and sorting Available software, demo Examples XQuery vs. XSLT Summary