XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington
CS561 - Spring XML W3C standard to complement HTML origins: structured text SGML motivation: –HTML describes presentation –XML describes content (version 2, 10/2000)
CS561 - Spring From HTML to XML HTML describes the presentation
CS561 - Spring HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999
CS561 - Spring XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content
CS561 - Spring XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags
CS561 - Spring More XML: Attributes Foundations of Databases Abiteboul … 1995 attributes are alternative ways to represent data
CS561 - Spring More XML: Oids and References Jane Mary John oids and references in XML are just syntax
CS561 - Spring XML Namespaces (1/99) name ::= [prefix:]localpart … 15 …. … 15 ….
CS561 - Spring … … XML Namespaces syntactic:, semantic: provide URL for schema defined here
CS561 - Spring XML Data Model Several competing models: Document Object Model (DOM): – / (2/2001) –class hierarchy (node, element, attribute,…) –objects have behavior –defines API to inspect/modify the document Infoset - PSV (post schema validation) XML Query data model
CS561 - Spring XML Schemas 1/10/2000 generalizes DTDs uses XML syntax two documents: structure and datatypes – – XML-Schema is complex
CS561 - Spring XML Schemas DTD:
CS561 - Spring Elements v.s. Types in XML Schema DTD:
CS561 - Spring Types: –Simple types (integers, strings,...) –Complex types (regular expressions, like in DTDs) Element-type-element alternation: –Root element has a complex type –That type is a regular expression of elements –Those elements have their complex types... –... –On the leaves we have simple types Elements v.s. Types in XML Schema
CS561 - Spring Local and Global Types in XML Schema Local type: [define locally the person’s type] Global type: [define here the type ttt] Global types: can be reused in other elements
CS561 - Spring Local v.s. Global Elements in XML Schema Local element:... Global element:... Global elements: like in DTDs
CS561 - Spring Regular Expressions in XML Schema Recall the element-type-element alternation: [regular expression on elements] Regular expressions: A B C = A B C A B C = A | B | C A B C = (A B C).. = (...)*.. = (...)?
CS561 - Spring Attributes in XML Schema Attributes are associated to the type, not to the element Only to complex types; more trouble if we want to add attributes to simple types.
CS561 - Spring Derived Types by Extensions Corresponds to inheritance
CS561 - Spring Keys in XML Schema Lawnmower Baby Monitor Lapis Necklace Sturdy Shelves Lawnmower Baby Monitor Lapis Necklace Sturdy Shelves XML: XML Schema:
CS561 - Spring Keys in XML Schema In general, two flavors: Note: all Xpath expressions “start” at the element currently being defined The fields must identify a single node
CS561 - Spring Keys in XML Schema Unique = guarantees uniqueness Key = guarantees uniqueness and existence All Xpath expressions are “restricted”: –/a/b | /a/c OK for selector” –//a/b/*/c OK for field Note: better than DTD’s ID mechanism
CS561 - Spring Keys in XML Schema Examples Recall: must have A single forename, Single surname
CS561 - Spring Foreign Keys in XML Schema Example
XPATH
CS561 - Spring XPath Goal = permit to access some nodes from document XPath main construct : axis navigation XPath path consists of one or more navigation steps, separated by / Navigation step : axis + node-test + predicates Examples –/descendant::node()/child::author –/descendant::node()/child::author[parent/attribute::booktitle =“XML”][2] XPath also offers shortcuts –no axis means child –// / descendant-or-self::node()/
CS561 - Spring XPath- Child axis navigation author is shorthand for child::author. Examples: –aaa -- all the child nodes labeled aaa (1,3) –aaa/bbb -- all the bbb grandchildren of aaa children (4) –*/ bbb all the bbb grandchildren of any child (4,6) –. -- the context node –/ -- the root node aaa bbb cccaaa bbb ccc context node
CS561 - Spring XPath- child axis navigation –/ doc -- all the doc children of the root –./ aaa -- all the aaa children of the context node (equivalent to aaa ) –text() -- all the text children of the context node –node() -- all the children of the context node (includes text and attribute nodes) –.. -- parent of the context node –.// -- the context node and all its descendants –// -- the root node and all its descendants –//text() -- all the text nodes in the document
CS561 - Spring Predicates –[2] -- the second child node of the context node –chapter[5] -- the fifth chapter child of the context node –[last()] -- the last child node of the context node –chapter[title=“introduction”] -- the chapter children of the context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes) –person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “ Joe ”
CS561 - Spring Axis navigation So far, nearly all our expressions have moved us down by moving to child nodes. Exceptions were –. -- stay where you are –/ go to the root –// all descendants of the root –.// all descendants of the context node XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self –Some of these ( self, parent ) describe single nodes, others describe sequences of nodes.
CS561 - Spring XPath Navigation Axes ancestor descendant followingpreceding following-siblingpreceding-sibling child attribute namespace self
CS561 - Spring XPath abbreviated syntax ///descendant-or-self::node().self::node().//descendant-or-self::node..parent::node() /(document root)
CS561 - Spring XPath Reasonably widely adopted -- in XML- Schema and query languages. Neither more expressive nor less expressive than regular path expressions
Query Languages - XQuery
CS561 - Spring Summary of XQuery FLWR expressions FOR and LET expressions Collections and sorting Resources XQuery: A Query Language for XML XQuery: A Query Language for XML Chamberlin, Florescu, et al. W3C recommendation:
CS561 - Spring XQuery Based on Quilt (which is based on XML-QL) XML Query data model (ordered)
CS561 - Spring FLWR (“Flower”) Expressions FOR... LET... FOR... LET... WHERE... RETURN...
CS561 - Spring XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title Result: abc def ghi
CS561 - Spring XQuery For each author of a book by Morgan Kaufmann, list all books she published: FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t distinct = a function that eliminates duplicates
CS561 - Spring XQuery Result: Jones abc def Smith ghi
CS561 - Spring XQuery FOR $x in expr -- binds $x to each element in the list expr LET $x = expr -- binds $x to the entire list expr –Useful for common subexpressions and for aggregations
CS561 - Spring XQuery count = a (aggregate) function that returns the number of elms FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p
CS561 - Spring XQuery Find books whose price is larger than average: LET $a=avg( document("bib.xml") FOR $b in document("bib.xml") /bib/book WHERE > $a RETURN $b LET $a=avg( document("bib.xml") FOR $b in document("bib.xml") /bib/book WHERE > $a RETURN $b
CS561 - Spring XQuery Summary: FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses WHERE Clause RETURN Clause List of tuples Instance of Xquery data model
CS561 - Spring FOR v.s. LET FOR Binds node variables iteration LET Binds collection variables one value
CS561 - Spring FOR v.s. LET FOR $x IN document("bib.xml") /bib/book RETURN $x FOR $x IN document("bib.xml") /bib/book RETURN $x Returns:... LET $x := document("bib.xml") /bib/book RETURN $x LET $x := document("bib.xml") /bib/book RETURN $x Returns:...
CS561 - Spring Collections in XQuery Ordered and unordered collections –/bib/book/author = an ordered collection –Distinct(/bib/book/author) = an unordered collection LET $a = /bib/book $a is a collection $b/author a collection (several authors...) RETURN $b/author Returns:...
CS561 - Spring Sorting in XQuery FOR $p IN distinct(document("bib.xml")//publisher) RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] RETURN $b/title, SORTBY(price DESCENDING) SORTBY(name) FOR $p IN distinct(document("bib.xml")//publisher) RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] RETURN $b/title, SORTBY(price DESCENDING) SORTBY(name)
CS561 - Spring Sorting in XQuery Sorting arguments: refer to name space of RETURN clause, not FOR clause To sort on an element you don’t want to display, first return it, then remove it with an additional query.
CS561 - Spring If-Then-Else FOR $h IN //holding RETURN $h/title, IF = "Journal" THEN $h/editor ELSE $h/author SORTBY (title) FOR $h IN //holding RETURN $h/title, IF = "Journal" THEN $h/editor ELSE $h/author SORTBY (title)
CS561 - Spring Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title
CS561 - Spring Universal Quantifiers FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title