1 Querying XML Documents
2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases
3 Only Some Trees are Relations They have height two The root has an unbounded number of children All nodes in the second layer (records) have a fixed number of child nodes (fields) Trees are ordered, while both rows and columns of tables may be permuted without changing the meaning of the data
4 An XML tree books book author titleyear 2000 Data on the Web S. Abiteboul P. Buneman D. Suciu author … A. Moller
5 Relational Tables bidtitleyear b1Data on the Web2000 b2An Introduction to XML and Web Technologies2006 bidauthor b1S. Abiteboul b1P. Buneman b1D. Suciu b2A. Moller b2M. Schwartzbach
6 Query Usage Scenarios Data-oriented to query XML data, like using SQL to query relational databases Document-oriented to retrieve parts of documents to provide dynamic indexes to perform context-sensitive searching to generate new documents as combinations of existing documents Programming to automatically generate documentation, similar to the Javedoc tool Hybrid to retrieve information from hybrid data, such as patient records
7 XQuery Design Requirements XML syntax and human-readable Declarative Namespace aware Coordinate with XML Schema Support simple and complex data types Can combine multiple documents Can transform and create XML trees
8 What is XQuery? XQuery is the language for querying XML data XQuery 1.0 is a strict superset of XPath 2.0 XQuery 1.0 and XPath 2.0 share the same data model and support the same functions and operators XQuery has the extra expressive power to join information from different sources and generate new XML fragments XQuery and XSLT are both domain-specific languages for combining and transforming XML data from multiple sources XQuery is designed from scratch, XSLT is an intellectual descendant of CSS Technically, they may emulate each other XQuery is defined by the W3C XQuery is supported by all the major database engines (IBM, Oracle, Microsoft, etc.)
9 XQuery Prolog XQuery expressions are evaluated relatively to a context, which is explicitly provided by a prolog Declarations specify various parameters for the XQuery processor, such as: xquery version "1.0"; declare xmlspace preserve; declare xmlspace strip; declare default element namespace URI; declare default function namespace URI; import schema at URI; declare namespace NCName = URI;
10 Implicit Declarations Declarations implicitly defined in any XQuery implementation: declare namespace xml = " declare namespace xs = " declare namespace xsi = " declare namespace fn = " declare namespace xdt = " declare namespace local = "
11 XPath vs. XQuery XPath expressions are also XQuery expressions XPath expressions are required to be evaluated in a static context, which must be provided by the invoking application The XQuery prolog gives the required static context The initial context node, position, and size are undefined because an XQuery expression may work on multiple XML documents Only axes child, descendant, parent, attribute, self, and descendant-of-self are required to be implemented in XQuery.
12 Datatype Expressions Same atomic values as XPath 2.0 Also lots of primitive simple values: xs:string("XML is fun") xs:boolean("true") xs:decimal("3.1415") xs:float(" E23") xs:dateTime(" T13:20:00-05:00") xs:time("13:20:00-05:00") xs:date(" ") xs:gYearMonth(" ") xs:gYear("1999") xs:hexBinary("48656c6c6f0a") xs:base64Binary("SGVsbG8K") xs:anyURI(" xs:QName("rcp:recipe")
13 XML Expressions XQuery expressions may compute new XML nodes Expressions may denote element, character data, comment, and processing instruction nodes Each node is created with a unique node identity Constructors may be either direct or computed
14 Direct Constructors Uses the standard XML syntax The expression baz evaluates to the given XML fragment Note that nodes are created with unique identity and therefore is evaluates to false (operators is, > are used to compare nodes on identity and document order)
15 Same Namespace Declarations (1/3) declare default element namespace " John Doe CEO, Widget Inc. (202)
16 Same Namespace Declarations (2/3) declare namespace b = " John Doe CEO, Widget Inc. (202)
17 Same Namespace Declarations (3/3) John Doe CEO, Widget Inc. (202)
18 Enclosed Expressions Expressions create an element named numbers which has a single character data node with value (if boundary-space is set to strip) {1, 2, 3, 4, 5} {1, "2", 3, 4, 5} {1 to 5} 1 {1+1} {" "} {"3"} {" "} {4 to 5} Enclosed expressions are allowed inside attribute values
19 Explicit Constructors The constant expression John Doe CEO, Widget Inc. (202) Can be written as: element card { namespace { " }, element name { text { "John Doe" } }, element title { text { "CEO, Widget Inc." } }, element { text { } }, element phone { text { "(202) " } }, element logo { attribute uri { "widget.gif" } } }
20 Computed QNames Qualified names can be replaced by enclosed expressions evaluating to equivalent strings: element { "card" } { namespace { " }, element { "name" } { text { "John Doe" } }, element { "title" } { text { "CEO, Widget Inc." } }, element { " " } { text { } }, element { "phone" } { text { "(202) " } }, element { "logo" } { attribute { "uri" } { "widget.gif" } }
21 Biliingual Business Cards Controlled by a global variable $lang : element { if ($lang="Danish") then "kort" else "card" } { namespace { " }, element { if ($lang="Danish") then "navn" else "name" } { text { "John Doe" } }, element { if ($lang="Danish") then "titel" else "title" } { text { "CEO, Widget Inc." } }, element { " " } { text { } }, element { if ($lang="Danish") then "telefon" else "phone"} { text { "(202) " } }, element { "logo" } { attribute { "uri" } { "widget.gif" } }
22 FLWOR Expressions Used for general queries: { for $s in fn:doc("students.xml")//student let $m := $s/major where fn:count($m) ge 2 order by return { $s/name/text() } }
23 The Difference Between For and Let (1/4) A FLWOR expression for $x in (1, 2, 3, 4) let $y := ("a", "b", "c") return ($x, $y) Output 1, a, b, c, 2, a, b, c, 3, a, b, c, 4, a, b, c
24 The Difference Between For and Let (2/4) A FLWOR expression let $x in (1, 2, 3, 4) for $y := ("a", "b", "c") return ($x, $y) Output 1, 2, 3, 4, a, 1, 2, 3, 4, b, 1, 2, 3, 4, c
25 The Difference Between For and Let (3/4) A FLWOR expression for $x in (1, 2, 3, 4) for $y in ("a", "b", "c") return ($x, $y) Output 1, a, 1, b, 1, c, 2, a, 2, b, 2, c, 3, a, 3, b, 3, c, 4, a, 4, b, 4, c
26 The Difference Between For and Let (4/4) A FLWOR expression let $x := (1, 2, 3, 4) let $y := ("a", "b", "c") return ($x, $y) Output 1, 2, 3, 4, a, b, c
27 An XML Example Document: books.xml Everyday Italian Giada De Laurentiis Harry Potter J K. Rowling
28 An XML Example Document: books.xml (cont.) XQuery Kick Start James McGovern Per Bothner Kurt Cagle James Linn Vaidyanathan Nagarajan Learning XML Erik T. Ray
29 Path Expressions The following path expression is used to select all the title elements in the "books.xml" file: doc("books.xml")/bookstore/book/title The XQuery above will extract the following: Everyday Italian Harry Potter XQuery Kick Start Learning XML
30 Predicates The following predicate is used to select all the book elements under the bookstore element that have a price element with a value that is less than 30: doc("books.xml")/bookstore/book[price<30] The XQuery above will extract the following: Harry Potter J K. Rowling
31 FLOWR Expressions The following FLWOR expression will select all the title elements under the book elements that are under the bookstore element that have a price element with a value that is higher than 30 for $x in doc("books.xml")/bookstore/book where $x/price>30 return $x/title The result will be: XQuery Kick Start Learning XML
32 Conditional Expressions Notes on the "if-then-else" syntax: parentheses around the if expression are required. else is required, but it can be just else (). for $x in doc("books.xml")/bookstore/book return if then {data($x/title)} else {data($x/title)} The result of the example above will be: Everyday Italian Harry Potter Learning XML XQuery Kick Start
33 Comparisons General comparisons: =, !=,, >= Value comparisons: eq, ne, lt, le, gt, ge > 10 The expression above returns true if any q attributes have values greater than 10. gt 10 The expression above returns true if there is only one q attribute returned by the expression, and its value is greater than 10. If more than one q is returned, an error occurs.
34 Selecting and Filtering Elements The at keyword can be used to count the iteration: for $x at $i in doc("books.xml")/bookstore/book/title return {$i}. {data($x)} Result: 1. Everyday Italian 2. Harry Potter 3. XQuery Kick Start 4. Learning XML It is also allowed with more than one in expression in the for clause. Use comma to separate each in expression: for $x in (10,20), $y in (100,200) return x={$x} and y={$y} Result: x=10 and y=100 x=10 and y=200 x=20 and y=100 x=20 and y=200
35 Computing Joins fridge.xml: eggs olive oil ketchup unrecognizable moldy thing Find recipes that use some ingredients in the refrigerator declare namespace rcp = " for $r in fn:doc("recipes.xml")//rcp:recipe for $i in for $s in fn:doc("fridge.xml")//stuff[text()=$i] return fn:distinct-values($r/rcp:title/text() )
36 Inverting a Relation declare namespace rcp = " { for $i in distinct-values( ) return { for $r in fn:doc("recipes.xml")//rcp:recipe where return $r/rcp:title/text() } }
37 Sorting the Results declare namespace rcp = " { for $i in order by $i return { for $r in fn:doc("recipes.xml")//rcp:recipe where order by $r/rcp:title/text() return $r/rcp:title/text() } }
38 A More Complicated Sorting for $s in document("students.xml")//student order by fn:count( ) descending, fn:count($s/major) descending, xs:integer($s/age/text()) ascending return $s/name/text()
39 Using Functions declare function local:grade($g) { if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3 else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3 else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0 }; declare function local:gpa($s) { fn:avg(for $g in return local:grade($g)) }; { for $s in fn:doc("students.xml")//student return }
40 Extend the Expressive Power Using recursive function to generate an XML tree of a given height: declare function gen($n) { if ($n eq 0) else { gen($n - 1), gen($n - 1) } }; Compute the height of an XML tree: declare function local:height($x) { if (fn:empty($x/*)) then 1 else fn:max(for $y in $x/* return local:height($y))+1 };
41 A Textual Outline Cailles en Sarcophages pastry chilled unsalted butter flour salt ice water filling baked chicken marinated chicken small chickens, cut up Herbes de Provence dry white wine orange juice minced garlic truffle oil...
42 Computing Textual Outlines declare namespace rcp = " declare function local:ingredients($i,$p) { fn:string-join( for $j in $i/rcp:ingredient return local:ingredients($j,fn:concat($p," "))),""),"") }; declare function local:recipes($r) { fn:concat($r/rcp:title/text(),"
",local:ingredients($r," ")) }; fn:string-join( for $r in fn:doc("recipes.xml")//rcp:recipe[5] return local:recipes($r),"" )
43 XML Databases How can XML and databases be merged? Several different approaches: extract XML views of relations use SQL to generate XML shred XML into relational databases
44 Automatic XML Views (1/2) xmlelement(name, "Students", select xmlelement(name, "record", xmlattributes(s.id, s.name, s.age)) from Students )
45 Automatic XML Views (2/2) Joe Average Jack Doe 18 xmlelement(name, "Students", select xmlelement(name, "record", xmlforest(s.id, s.name, s.age)) from Students )
46 Storing XML Documents Designing a specialized system for storing native XML data Using a DBMS to store the whole XML documents as text fields Using a DBMS to store the document contents as data elements It must support the XML’s ordered data model
47 Using a DBMS: Relational DTD Schema-aware An element that can occur at most once in its parent is stored as a column of the table representing its parent ParentIDIDTEXT ParentIDIDtitleyear35“S. Abiteboul” 23“Data on The Web”“2000”36“P. Buneman” 24……37“D. Suciu” The book tableThe author table references book author titleyear 2000Data on the Web S. Abiteboul P. Buneman D. Suciu author … …
48 Using a DBMS: Edge Schema-less A single table is used to store the entire document Each node is assigned an ID in depth first order references book author titleyear 2000 Data on the Web S. Abiteboul P. Buneman D. Suciu author … … root
49 Using a DBMS: Edge (cont.) SourceIDtagordinalTargetIDData 1reference12NULL 2book13NULL 2book24NULL 3author10“S. Abiteboul” 3author20“P. Buneman” 3author30“D. Suciu” 3title40“Data on The Web” 3year50“2000” The edge table