Introduction to XML and XQuery Guangjun (Kevin) Xie
Nov 28, 2005York University2 Road Map XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery
Nov 28, 2005York University3 XML Data Model XML Information Set (Infoset) Infoset is an abstract data set containing all information in an XML document provide a consistent set of definitions to refer to the information in a well-formed XML document Usually, Infosets result from parsing XML documents; but it could also be synthetic By use of an API, such as DOM By transforming from existing infoset An infoset consists of a number of information items.
Nov 28, 2005York University4 XML Data Model XML Infoset "information set" and "information item" are similar in meaning to the generic terms "tree" and "node” An information item is an abstract description of some part of an XML document. Each information item has a set of associated named properties, indicated as [property name]
Nov 28, 2005York University5 XML Data Model Information Items 11 types of information items 1.Document Information Item 2.Element Information Items 3.Attribute Information Items 4.Character Information Items 5.Processing Instruction Information Items 6.Unexpanded Entity Reference Information Items 7.Comment Information Items 8.The Document Type Declaration Information Item 9.Unparsed Entity Information Items 10.Notation Information Items 11.Namespace Information Items We will discuss the first 3 today
Nov 28, 2005York University6 XML Data Model Document Information Item Exactly one doc item in an infoset Other information accessible thru its properties: [children] – containing PIs, comments, etc [document element] – element item corresponding to the document element [version] – XML version of the document … etc
Nov 28, 2005York University7 XML Data Model Element Information Items One element item for each element in XML document The “root” element item is the [document element] prop. of document info item Properties: [namespace name] – the ns part of tag name [local name] – the local part of tag name [children] – all other info items inside [attributes] – attributes elems of this item [parent] – info. Item containing this item … etc.
Nov 28, 2005York University8 XML Data Model Attribute information items One attribute item for each attribute in an XML element Properties: [namespace name] – the ns part of tag name [local name] – the local part of tag name [attribute type] – the data type of this attribute [owner element] – the element info item containing this attr … etc
Nov 28, 2005York University9 XML Data Model Infoset example <msg:message doc:date=" " xmlns:doc=“ xmlns:msg=" >Phone home! The information set contains: A document information item. An element information item with namespace name " local part "message", and prefix "msg". An attribute information item with the namespace name " local part "date", prefix "doc", and normalized value " ". Three namespace information items for the and namespaces. Two attribute information items for the namespace attributes. Eleven character information items for the character data.
Nov 28, 2005York University10 XML Data Model Infoset Example Version=1.0 msg:message xmlns:msgxmlns:doc Phoenhoem! doc:date Legend: Document info. Item Element info. Item Attribute info. Item Character info. Item
Nov 28, 2005York University11 Road Map XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery
Nov 28, 2005York University12 XML Data vs Relational Data Relational DB stems from commercial data processing Information usually has regular structure XML has roots in text documents processing Often have irregular structure. Both are general model and capable of representing all forms of information. Different heritages cause them to be optimized for different types of applications.
Nov 28, 2005York University13 XML Data vs Relational Data Nesting XML Model Deeply nested structure Flexible (un-predefined) Query easily handled by “descendants” axis in XPath 2.0 Relational Model Flat table structure Primary-foreign keys represent nesting relationship Complex and flexible nesting may result in awkward queries
Nov 28, 2005York University14 XML Data vs Relational Data Metadata XML Model Metadata mixed with ordinary data Hight ratio of metadata to ordinary data Relational Model Metadata easily factored out Difficult when query involve metadata Ex: find the names of columns containing the value “red”
Nov 28, 2005York University15 XML Data vs Relational Data Ordering XML Model Intrinsic ordering can’t derived from value Ex: sentences in a book is essential Impose challenge for the query language Relational Model Ordering is dependent on values Rows not considered to have ordering
Nov 28, 2005York University16 XML Data vs Relational Data Null Values XML Model Representing missing value by absence of element Retrieving missing value results empty list Need rule on how handle empty list Relational Model “null” value to represent missing value Rules for operators in the presence of null
Nov 28, 2005York University17 XML Data vs Relational Data Structural Transformations XML Model Queries on XML documents and generate new XML documents XPath 2.0 – navigating inside a document XQuery – joining elements, constructing new elements/structures Relational Model Queries on tables and generate new tables
Nov 28, 2005York University18 XML Data vs Relational Data Data Definition XML Model Mixture of primitive data and nested elements Elements may be optional Constraints on cardinality and order Impose challenges on type inference Ex: proving output satisfies a given schema? Relational Model Specifying the properties of columns All rows have same columns Relatively simple
Nov 28, 2005York University19 Road Map XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery
Nov 28, 2005York University20 XPath 2.0 What’s XPath? XPath is a specification for defining parts of an XML document. XPath 2.0 provides a method to locate individual node or set of nodes in a XML data model. XPath 2.0 is close related to XQuery Same data model based on XML data model (infoset) XQuery uses XPath to refer to information in the data model XPath 2.0 uses path expressions to navigate in XML documents XPath 2.0 uses path expressions to select nodes in an XML document. An XPath expression evaluates to a sequence of nodes These path expressions look very much like the expressions you see when you work with a traditional computer file system. XPath 2.0 is a W3C recommendation
Nov 28, 2005York University21 XPath 2.0 Data model Represent various values including the input and the output of a query all values of expressions used during the intermediate calculations. Based on XML infoset data model Shared with XQuery Model XML data as trees Sequence based data model Using sequence to represent set of trees or tree fragments Everything is sequence Sequences never contain other sequences
Nov 28, 2005York University22 XPath 2.0 Data model A tree whose root node is a Document Node is referred to as a document. A tree whose root node is not a Document Node is referred to as a fragment.
Nov 28, 2005York University23 XPath 2.0 Data model Every instance of the data model is a sequence A sequence may contain nodes, atomic values, or any mixture of nodes and atomic values A sequence is an ordered collection of zero or more items An item is either a node or an atomic value A single item appearing on its own is modeled as a sequence containing one item.
Nov 28, 2005York University24 XPath 2.0 Data model There are seven kinds of Nodes in the data model: Document node Element node Attribute node Text node Namespace node processing instruction node Comment node
Nov 28, 2005York University25 XPath 2.0 Sample XML Document Everyday Italian Giada De Laurentiis Harry Potter J K. Rowling XQuery Kick Start James McGovern Per Bothner Kurt Cagle James Linn Vaidyanathan Nagarajan Learning XML Erik T. Ray Books.xml
Nov 28, 2005York University26 XPath 2.0 Example Everyday Italian Giada De Laurentiis Harry Potter J K. Rowling XQuery Kick Start James McGovern Per Bothner Kurt Cagle James Linn Vaidyanathan Nagarajan Learning XML Erik T. Ray /bookstore/book evaluated to a sequence of nodes, each node corresponding to a book element: //book evaluated to the same result
Nov 28, 2005York University27 XPath 2.0 Example XQuery Kick Start James McGovern Per Bothner Kurt Cagle James Linn Vaidyanathan Nagarajan Learning XML Erik T. Ray evaluates to a sequence containing 2 book element nodes:
Nov 28, 2005York University28 XPath 2.0 Example some $x in //book satisfies $x/price > 49 evaluates to a sequence containing a atomic value TRUE every $x in //book satisfies $x/price > 49 evaluates to a sequence containing a atomic value FALSE
Nov 28, 2005York University29 XPath 2.0 Example Everyday Italian Giada De Laurentiis /bookstore/book[position()=1] evaluated to a sequence containing one element node:
Nov 28, 2005York University30 Road Map XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery
Nov 28, 2005York University31 XQuery What’s XQuery? The language for querying XML data XQuery is a language for finding and extracting elements and attributes from XML documents. XQuery for XML is like SQL for relational databases Lots of the concepts and techniques used in SQL processing and optimization can be applied to XQuery processing and optimization.
Nov 28, 2005York University32 XQuery What’s XQuery? XQuery is built on XPath 2.0 expressions XQuery 1.0 and XPath 2.0 share the same data model Support the same functions and operators. Understanding XPath 2.0 is essential to understanding XQuery. Supported by all the major database venders IBM Oracle Microsoft etc
Nov 28, 2005York University33 XQuery What’s XQuery? closed with respect to a data model value of every expression in the language is guaranteed to be in the data model. XPath 2.0 is also closed Designed to be a functional language No side-effect Processing and producing sequences XQuery is becoming a W3C standard Current draft version is XQuery 1.0 Not yet a W3C Recommendation (XQuery is a Working Draft)
Nov 28, 2005York University34 XQuery FLWOR expression For expression binds a variable with each element in a sequence iteratively Let expression binds a variable with a sequence Where expression applies conditions during For expression binding Order By sort the output of the For expression Return expression returns a sequence
York University35 XQuery sample XML document – bib.xml TCP/IP Illustrated Stevens W. Addison-Wesley Advanced Programming in the Unix environment Stevens W. Addison-Wesley Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers The Economics of Technology and Content for Digital TV Gerbarg Darcy CITI Kluwer Academic Publishers
Nov 28, 2005York University36 XQuery sample XML document – reviews.xml Data on the Web A very good discussion of semi-structured database systems and XML. Advanced Programming in the Unix environment A clear and detailed discussion of UNIX programming. TCP/IP Illustrated One of the best books on TCP/IP.
York University37 XQuery sample XML document – prices.xml Advanced Programming in the Unix environment bstore2.example.com Advanced Programming in the Unix environment bstore1.example.com TCP/IP Illustrated bstore2.example.com TCP/IP Illustrated bstore1.example.com Data on the Web bstore2.example.com Data on the Web bstore1.example.com 39.95
Nov 28, 2005York University38 XQuery Example 1 Solution in XQuery: { for $b in doc("bib.xml")/bib/book where $b/publisher = "Addison-Wesley" and > 1991 return { $b/title } } Result: TCP/IP Illustrated Advanced Programming in the Unix environment List books published by Addison-Wesley after 1991, including their year and title
Nov 28, 2005York University39 XQuery Example 2 Solution in XQuery: for $b in doc("bib.xml")/bib/book, $t in $b/title, $a in $b/author return { $t } { $a } Result: TCP/IP Illustrated Stevens W. Advanced Programming in the Unix environment Stevens W. Data on the Web Abiteboul Serge Data on the Web Buneman Peter Data on the Web Suciu Dan Create a flat list of all the title-author pairs
Nov 28, 2005York University40 XQuery Example 3 Solution in XQuery: for $b in doc("bib.xml")/bib/book return { $b/title } { $b/author } Result: TCP/IP Illustrated Stevens W. Advanced Programming in the Unix environment Stevens W. Data on the Web Abiteboul Serge Buneman Peter Suciu Dan The Economics of Technology and Content for Digital TV > For each book in the bibliography, list the title and authors
Nov 28, 2005York University41 XQuery Example 4 Solution in XQuery: { for $b in doc("bib.xml")//book, $a in doc("reviews.xml")//entry where $b/title = $a/title return { $b/title } { $a/price/text() } { $b/price/text() } } Result: TCP/IP Illustrated Advanced Programming in the Unix environment Data on the Web For each book found at both bib.xml and reviews.xml, list the title of the book and its price from each source
Nov 28, 2005York University42 XQuery Example 5 Solution in XQuery: { for $b in doc("bib.xml")//book where $b/publisher = "Addison-Wesley" and > 1991 order by $b/title return { } { $b/title } } Result: Advanced Programming in the Unix environment TCP/IP Illustrated List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order
Nov 28, 2005York University43 XQuery Example 6 Solution in XQuery: { let $doc := doc("prices.xml") for $t in distinct-values($doc//book/title) let $p := $doc//book[title = $t]/price return { min($p) } } Result: In the document “prices.xml”, find the minimum price for each book, in the form of a “miniprice” element with the book title as its title attribute
York University44 XQuery sample XML document – book.xml Data on the Web Serge Abiteboul Peter Buneman Dan Suciu Introduction Text... Audience Text... Web Data and the Two Cultures Text... Traditional client/server architecture Text... A Syntax For Data Text... Graph representations of structures Text... Base Types Text... Representing Relational Databases Text... Examples of Relations Representing Object Databases Text...
Nov 28, 2005York University45 XQuery Example 7 Solution in XQuery: declare function local:toc( $book-or-section as element()) as element()* { for $section in $book-or-section/section return { $section/title, local:toc($section) } }; { for $s in doc("book.xml")/book return local:toc($s) } Introduction Audience Web Data and the Two Cultures A Syntax For Data Base Types Representing Relational Databases Representing Object Databases Prepare a (nested) table of contents, listing all sections and their titles. Preserve the original attributes of each element, if any
Nov 28, 2005York University46 Road Map XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery
Nov 28, 2005York University47 Processing XQuery Approaches for querying XML data Mapping XML data into relational data Query with SQL May produces too many relations Loses of information may occurs Ex: ordering, explicit hierarchical relationship between elements Using specific query languages Usually integrated with SQL and relational data management SQL/XML or XQuery
Nov 28, 2005York University48 Processing XQuery IBM System RX SQL/XQuery compiler A new XQuery parser is added to the existing relational query processing All components extended to process XQuery
Nov 28, 2005York University49 Processing XQuery Oracle XQuery Compilation Engine Parser convert XQuery into XQueryX XQueryX is an XML representation of XQuery (another W3C candidate recommendation) XML parser construct a DOM tree from XQueryX Work on the DOM afterward Corresponding components are extended for XQuery too
Nov 28, 2005York University50 Processing XQuery Microsoft XQuery compilation XQuery compiled into XML algebra tree, which is an internal representation Algebra tree can be optimized and executed by relational query processor Optimizations are rule-based Mapper traverses the algebra tree, converting each XML operator into a relational operator sub-tree
Nov 28, 2005York University51 References M. Nicola, Bert van der Linden. Native XML Support in DB2 Universal Database. Proceeding of the 31 st VLDB Conference, Trondheim, Norway, 2005 Kevin Beyer, Chun Zhang, etc. System RX: One Part Relational, One Part XML. SIGMOD 2005, Baltimore, Maryland, USA. Shankar Pal, Istvan Cseri, etc. XQuery Implementation in a Relational Database System. Proceedings of the 31 st VLDB Conference Zhen Hua Liu, Vikas Arora. Native XQuery Processing in Oracle XMLDB. SIGMOD 2005, Baltimore, Maryland, USA Scott Boag, Don Chamberlin, etc. XQuery 1.0: An XML Query Language. Mary Fernandaz, Norman Walsh, etc. XQuery 1.0 and XPath 2.0 Data Model.