Introduction to XQuery Resources: Official URL: Short intros: Or see Ramakrishnan & Gehrke text Lecture modified from slides by Dan Suciu
XML vs. Relational Data { row: { name: “John”, phone: 3634 }, row: { name: “Sue”, phone: 6343 }, row: { name: “Dick”, phone: 6363 } } row name phone “John”3634 “Sue” “Dick” Relation … in XML
Relational to XML Data A relation instance is basically a tree with: –Unbounded fanout at level 1 (i.e., any # of rows) –Fixed fanout at level 2 (i.e., fixed # fields) XML data is essentially an arbitrary tree –Unbounded fanout at all nodes/levels –Any number of levels –Variable # of children at different nodes, variable path lengths
Query Language for XML Must be high-level; “SQL for XML” Must conform to XSchema –But also work in absence of schema info Support simple and complex/nested datatypes Support universal and existential quantifiers, aggregation Operations on sequences and hierarchies of doc structures Capability to transform and create XML structures
XQuery Influenced by XML-QL, Lorel, Quilt, YATL –Also, XPath and XML Schema Reads a sequence of XML fragments or atomic values and returns a sequence of XML fragments or atomic values –Inputs/outputs are objects defined by XML- Query data model, rather than strings in XML syntax
Overview of XQuery Path expressions Element constructors FLWOR (“flower”) expressions –Several other kinds of expressions as well, including conditional expressions, list expressions, quantified expressions, etc. Expressions evaluated w.r.t. a context: –Context item (current node) –Context position (in sequence being processed) –Context size (of the sequence being processed) –Context also includes namespaces, variables, functions, date, etc.
Path Expressions Examples: Bib/paper Bib/book/publisher Bib/paper/author/lastname Given an XML document, the value of a path expression p is a set of objects
Path Expression Examples Doc = &o1 &o12&o24&o29 &o43 &o70&o71 &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” paper book paper references author title year http author title publisher author title page firstnamelastname firstname lastname firstlast Bib &o44&o45&o46 &o47&o48 &o49 &o50 &o51 &o52 Bib/paper = Bib/book/publisher = Bib/paper/author/lastname = Bib/paper = Bib/book/publisher = Bib/paper/author/lastname = Note that order of elements matters!
Element Construction An XQuery expression can construct new values or structures Example: Consider the path expressions from the previous slide. –Each of them returns a newly constructed sequence of elements –Key point is that we don’t just return existing structures or atomic values; we can re-arrange them as we wish into new structures
FLWOR Expressions FOR-LET-WHERE-ORDERBY-RETURN = FLWOR FOR / LET Clauses WHERE Clause ORDERBY/RETURN Clause List of tuples Instance of XQuery data model
FOR vs. LET FOR $x IN list-expr –Binds $x in turn to each value in the list expr LET $x = list-expr –Binds $x to the entire list expr –Useful for common sub-expressions and for aggregations
FOR vs. LET: Example FOR $x IN document("bib.xml") /bib/book RETURN $x FOR $x IN document("bib.xml") /bib/book RETURN $x Returns:... LET $x IN document("bib.xml") /bib/book RETURN $x LET $x IN document("bib.xml") /bib/book RETURN $x Returns:... Notice that result has several elements Notice that result has exactly one element
XQuery Example 1 Find all book titles published after 1995: FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title Result: abc def ghi
XQuery Example 2 For each author of a book by Morgan Kaufmann, list all books she published: FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t distinct = a function that eliminates duplicates (after converting inputs to atomic values)
Results for Example 2 Jones abc def Smith ghi Observe how nested structure of result elements is determined by the nested structure of the query.
XQuery Example 3 count = (aggregate) function that returns the number of elements FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p For each publisher p - Let the list of books published by p be b Count the # books in b, and return p if b > 100
XQuery Example 4 Find books whose price is larger than average: LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN $b LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN $b
Collections in XQuery Ordered and unordered collections –/bib/book/author = an ordered collection –Distinct(/bib/book/author) = an unordered collection Examples: –LET $a = /bib/book $a is a collection; stmt iterates over all books in collecion –$b/author also a collection (several authors...) RETURN $b/author Returns a single collection!... However:
Collections in XQuery What about collections in expressions ? $b/price list of n prices $b/price * 0.7 list of n numbers?? $b/price * $b/quantity list of n x m numbers ?? –Valid only if the two sequences have at most one element –Atomization $book1/author eq "Kennedy" - Value Comparison $book1/author = "Kennedy" - General Comparison
Sorting in XQuery FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING RETURN $b/title, $b/price FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING RETURN $b/title, $b/price
Conditional Expressions: If-Then-Else FOR $h IN //holding ORDERBY $h/title RETURN $h/title, IF = "Journal" THEN $h/editor ELSE $h/author FOR $h IN //holding ORDERBY $h/title RETURN $h/title, IF = "Journal" THEN $h/editor ELSE $h/author
Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title
Universal Quantifiers FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title
Other Stuff in XQuery Before and After –for dealing with order in the input Filter –deletes some edges in the result tree Recursive functions Namespaces References, links … Lots more stuff …
Appendix XML Schema and XQuery Data Model
XML Schema Includes primitive data types (integers, strings, dates, etc.) Supports value-based constraints (integers > 100) User-definable structured types Inheritance (extension or restriction) Foreign keys Element-type reference constraints
Sample XML Schema …
XML-Query Data Model Describes XML data as a tree Node ::= DocNode | ElemNode | ValueNode | AttrNode | NSNode | PINode | CommentNode | InfoItemNode | RefNode
XML-Query Data Model Element node (simplified definition): elemNode : (QNameValue, {AttrNode }, [ ElemNode | ValueNode]) ElemNode QNameValue = means “a tag name” Reads: “Give me a tag, a set of attributes, a list of elements/values, and I will return an element”
XML Query Data Model Example: <book price = “55” currency = “USD”> Foundations … Abiteboul Hull Vianu 1995 <book price = “55” currency = “USD”> Foundations … Abiteboul Hull Vianu 1995 book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8]) price2 = attrNode(…) /* next */ currency3 = attrNode(…) title4 = elemNode(title, string9) … book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8]) price2 = attrNode(…) /* next */ currency3 = attrNode(…) title4 = elemNode(title, string9) …