SDPL Querying XML with XQuery1 5 Querying XML n How to access various XML data sources? n XQuery, XML Query Lang, W3C Rec, Jan '07 –joint work by XML Query and XSL WGs »with XPath 2.0 and XSLT 2.0 »Started ~1999; 2 nd Ed. in Dec 2010 –influenced by many research groups and query languages »Quilt, XPath, XQL, XML-QL, SQL, OQL, Lorel,... –A query language for any XML-represented data: both documents and databases
SDPL Querying XML with XQuery2 Outline of this section n Quick overview of XQuery n Review of XPath, emphasizing XPath 2.0 vs 1.0 –items, types, and sequences –tree model, location path expressions; comparison operators n Central features of XQuery (over those of XPath 2.0) –element constructors, FLWOR expressions –use cases, user-defined functions, querying relational data n Comparison of XQuery and XSLT 1.0 n XQuery for problem solving –examples of application to puzzles n Summary
SDPL Querying XML with XQuery3 Capabilities of XQuery (1) n XQuery allows to select, reorganize and transform XML data –respecting document content, structure, hierarchy, and order n Selection, filtering, and search n Combine and join –data from different parts of a document, or from multiple documents n Sort, group, and aggregate n Transform, restructure and create XML data n Operate on numbers and dates n Manipulate content strings
SDPL Querying XML with XQuery4 Capabilities of XQuery (2) Closure property: –Results of XML queries are also XML (well-formed document fragments) –> queries can be combined, without limit n Extensibility: –supports user-defined functions on all data types of the data model n In-place update of XML data not supported –specified in “XQuery Update Facility 1.0”, W3C Rec. March 2011
SDPL Querying XML with XQuery5 XQuery in a Nutshell n Functional expression language (lausekekieli) n Strongly-typed: (optional) type-checking of expressions, and validation of results (We’ll concentrate to processing) –predeclared prefix for type names: xs=" n Extends XPath 2.0 –XQuery 1.0 and XPath 2.0 Functions and Operators, Rec. Jan »over 100; for numbers, strings, dates and times, Booleans, documents & URIs, nodes, and sequences n XQuery XPath XSLT' + SQL' (roughly)
SDPL Querying XML with XQuery6 Example Query xquery version "1.0"; (: optional declaration :) Cheap Books { for $b in Cheap Books { for $b in XML-based syntax (XQueryX) has also been specified easier for applications, harder for humans easier for applications, harder for humans Syntax "concise and easily understood"
SDPL Querying XML with XQuery7 A possible result Cheap Books Computing with Logic David Maier Benjamin Cummings 1999 Designing Internet applications Michael Leventhal Prentice Hall 1998 Cheap Books Computing with Logic David Maier Benjamin Cummings 1999 Designing Internet applications Michael Leventhal Prentice Hall 1998
SDPL Querying XML with XQuery8 XQuery and XPath n XQuery is an extension of XPath (2.0) –Common data model, 108 functions and 68 operators –> review some XPath first n XPath used in several other contexts, too: –For pattern matching and selection in XSLT –in validation rules of Schematron –For uniqueness constraints in XML Schema –For addressing in XLink and XPointer
SDPL Querying XML with XQuery9 XPath in a Nutshell n XPath 1.0 (W3C Rec. 11/99) –a compact non-XML syntax for addressing parts of XML documents (as node-sets) –also operations on strings, numbers and truth values n XPath 2.0 (W3C Rec. 1/07) extends and generalizes: –data manipulated as sequences of items »Item = a node or an atomic value of a simple XML Schema datatype
SDPL Querying XML with XQuery10 Literal Atomic Values and Their Types Examples: Examples: "-12" instance of xs:string -12 instance of xs:integer 1.2 instance of xs:decimal 1.2E3 instance of xs:double string(1.2E3) instance of xs:string number("+12") instance of xs:double xs:date(" ") instance of xs:date true() instance of xs:boolean
SDPL Querying XML with XQuery11 XPath 2.0/XQuery Type Hierarchy
SDPL Querying XML with XQuery12 XPath 2.0/XQuery Type Hierarchy (cont.)
SDPL Querying XML with XQuery13 XQuery/XPath 2.0 Sequences n Expressions operate on, and return sequences of –atomic values (of simple XML Schema types) and –nodes –an item a singleton sequence –sequences are flat: no sequences as items »(1, (2, 3), (), 1) = (1, 2, 3, 1) –sequences are ordered, and can contain duplicates n Unlimited combination of expressions, often with automatic type conversions (e.g. for arithmetics)
SDPL Querying XML with XQuery14 Sequence Expressions n Constant sequences constructed by listing values –comma (, ) is a catenation operator »(1, (2, 3), (), 1) = (1, 2, 3, 1) n Range expressions for integer sequences: –1 to 4 –4 to 1 –reverse(1 to 4) (1, 2, 3, 4) () (4, 3, 2, 1)
SDPL Querying XML with XQuery15 Accessing Documents n XQuery operates on nodes accessible by input functions –fn:doc(" URI ") »document-node of the XML document available at URI »roughly same as document(" URI ") in XSLT 1.0 –fn:collection(" URI ") »sequence of nodes from URI »association defined by implementation –predeclared prefix for the default function namespace: fn="
SDPL Querying XML with XQuery16 XQuery/XPath 2.0 Data Model n Documents are viewed as trees made of six types of nodes: –document (additional root above document element) –element nodes –attribute nodes –text nodes –Comments and processing instructions –Comments and processing instructions n Obs 1: No entity nodes, and no CDATA sections n Obs 2: Namespace nodes have been deprecated
SDPL Querying XML with XQuery17 Document Trees n Defined in Sect. 5 of XPath 1.0 spec –for XSLT/XPath 2.0 & XQuery in their joint Data Model n Element nodes have elements, text nodes, comments and processing instructions of their (direct) content as their children –NB: attribute nodes are not children (but have a parent) –> they have no siblings either –the string value of an document/element is the concatenation of its all text-node descendants
SDPL Querying XML with XQuery18 Document Order n Document order of nodes: –= their left-to-right pre-order –Document root first –Other nodes in the order of the first character of their XML markup in the document text –> an element precedes it's attribute nodes, which precede any content nodes of the element –Order btw nodes belonging to different trees is implementation dependent, but consistent and stable
SDPL Querying XML with XQuery19 Location Paths n XPath can select any parts of a document tree using … n Location paths –evaluated with respect to a context item (.) »assigned on path steps, after the first one »Path expression typically starts with $x or doc(…) –Result: sequence of nodes, in document order, without duplicates
SDPL Querying XML with XQuery20 Path Expressions n Similar to XPath 1.0: [/ [/]]Expr/… /Expr –but steps more liberal: –arbitrary expressions OK, but steps before the last one must produce node sequences –6 (of 13 XPath) axes required: child, descendant, attribute, self, descendant-or-self, parent »others (except namespace ) optional, available if the Full Axis Feature is supported »with document-order operators ( >) sufficient for expressing queries (→ Exercises)
SDPL Querying XML with XQuery21 Location paths Consist of location steps separated by ' / ' Consist of location steps separated by ' / ' –each step produces a sequence of items –steps evaluated left-to-right, each item in turn as the context item Complete location step: AxisName :: NodeTest ( [ PredicateExpr ] )* Complete location step: AxisName :: NodeTest ( [ PredicateExpr ] )* –axis specifies the tree relationship between the context node and the selected nodes –node test restricts the type and and name of nodes –filtered further by 0 or more predicates
SDPL Querying XML with XQuery22 Location steps: Axes n In total 12 axes (~ directions in tree) –for staying at the context node: self –for going downwards: »child, descendant, descendant-or-self –for going upwards: »parent, ancestor, ancestor-or-self –for moving towards start/end of the document: »preceding-sibling, following-sibling, preceding, following –“Special” axes »attribute ( namespace deprecated in XPath 2.0) –(Axes required in XQuery implementations underlined)
SDPL Querying XML with XQuery23 Notes on Location Paths (1) n XPath 2.0 allows unrestricted expressions as steps –but intermediate steps must produce nodes only Numeric predicates support array-style access: $rows[$i] Numeric predicates support array-style access: $rows[$i] n Predicates evaluated step at a time. This sometimes causes confusion with shorthand notations: –doc("doc.xml")//title[3] third title child of each parent (likely none!). Why? –= doc("doc.xml")/ descendant-or-self::node()/child::title[3] –To get the third title in the doc use ( doc("doc.xml")//title)[3]
SDPL Querying XML with XQuery24 Notes on Location Paths (2) n References to attributes and subelements easy to use as predicates –Get divisions that are of class C or have a head : or head] –Values are coerced to Booleans on demand »string/sequence true iff non-empty »number false if and only if zero or NaN (but a single number as a predicate tests for equality with position() ) (but a single number as a predicate tests for equality with position() )
SDPL Querying XML with XQuery25 Filter Expressions Location steps can be filtered by predicates: doc("foo.xml")/body/(chap | app)[last()]/title Location steps can be filtered by predicates: doc("foo.xml")/body/(chap | app)[last()]/title the title of the last chapter of appendix, whichever is last the title of the last chapter of appendix, whichever is last Other sequences, too: (1 to 20)[. mod 5 eq 0] → (5, 10, 15, 20) Other sequences, too: (1 to 20)[. mod 5 eq 0] → (5, 10, 15, 20) –('. ' generalized from XPath 1.0 shorthand for self::node() into the context item) XPath 2.0 extended step
SDPL Querying XML with XQuery26 Path Steps as a Map operator n XPath 2.0 path exprs provide a kind-of map facility, to compute a new sequence by evaluating an expression for each item of the input sequence n Example: Get all salaries incremented by 20%: * 1.2) n Useful tricks, like providing defaults for missing attributes: * 1.2) * 1.2)
SDPL Querying XML with XQuery27 Path Steps as a Map (2) n Limitation: steps are applicable to node sequences only. Example: an invalid attempt to square numbers 1, 2,..., 10: (1 to 10)/(. *.) n Work-around: translate items first to text nodes: (for $i in 1 to 10 return text{ $i })/(. *.) or simply: for $i in 1 to 10 return $i * $i or simply: for $i in 1 to 10 return $i * $i n Function calls can also be used as steps: myFun:toTextNodes(1 to 10)/myFun:square(.) myFun:toTextNodes(1 to 10)/myFun:square(.)
SDPL Querying XML with XQuery28 Set Operations on Node Sequences n Assume variable bindings: $s2 n Then: $s1 $s1 union $s2 = $s1 intersect $s2 = $s1 except $s2 = based on node indentity ( node 1 is node 2 ) abcde abcde c ab w.o. duplicates, in doc. order
SDPL Querying XML with XQuery29 Node Comparisons n To compare single nodes, –for identity: is is ($book//chap)[1] true iff the chapter with id="ch1" is the first chap –for document order: > >> $book//title[. eq "Intro"] true iff the chapter with id="ch2" appears after Intro >> $book//title[. eq "Intro"] true iff the chapter with id="ch2" appears after Intro –if either operand is empty, then result is empty (~ false)
SDPL Querying XML with XQuery30 Comparing values of sequences and items n General comparisons btw sequences: –=, !=,, >= –existential semantics: true iff some pair of values from operand sequences satisfy the condition »(1,2) = (2,3); (2,3) = (3,4); (1,2) != (3,4) »Same as in XPath 1.0: //book[author = "Aho"] → books where some author is Aho –"Is (some) author of $book Ann or Bob?": $book/author = ("Ann", "Bob") –Slice of $seq from pos $s to $e:
SDPL Querying XML with XQuery31 Set operations for sequences of atomic items n General comparison as a predicate yields set operations: n Union of $A and $B: n Intersection of $A and $B: n Difference of $A and $B: n Above comparisons require items of compatible types
SDPL Querying XML with XQuery32 Value Comparisons n For comparing single values: –eq, ne, lt, le, gt, ge »1 eq 3 - 2; 10 lt 20 le 100] –the last assumes that a numeric type has been assigned by validation »otherwise it has type xs:untypedAtomic, which is cast to xs:string ( TYPE ERROR) general comparisons more convenient with unvalidated elements & attributes
SDPL Querying XML with XQuery33 Working with Untyped Values Text values may receive a specific type in a schema-validated element or attribute; Otherwise their type is xs:untypedAtomic Text values may receive a specific type in a schema-validated element or attribute; Otherwise their type is xs:untypedAtomic Automatic atomization and casting make dealing with them easy. Example: l et $elem := return ( "Value of", concat(substring($elem, 1, 6), ".. is about"), Automatic atomization and casting make dealing with them easy. Example: l et $elem := return ( "Value of", concat(substring($elem, 1, 6), ".. is about"), round-half-to-even($elem, 2) ) round-half-to-even($elem, 2) ) -> Value of is about > Value of is about 2.72
SDPL Querying XML with XQuery34 General vs Value Comparisons wrt Types n Comparisons atomize operands: nodes typed values Assume that $E := 007 Assume that $E := 007 General comparisons try to cast xs:untypedAtomic operands to compatible types: $E xs:double(007) = 7 :), $E "007" :), General comparisons try to cast xs:untypedAtomic operands to compatible types: $E xs:double(007) = 7 :), $E "007" :), Value comparisons cast xs:untypedAtomic values to strings: $E lt "6" (: true: xs:string("007") lt "6" :), $E lt 6 Value comparisons cast xs:untypedAtomic values to strings: $E lt "6" (: true: xs:string("007") lt "6" :), $E lt 6 TYPE ERROR: cannot compare xs:untypedAtomic to xs:integer TYPE ERROR: cannot compare xs:untypedAtomic to xs:integer
SDPL Querying XML with XQuery35 What does XQuery add to XPath 2.0? n A query is an expression (lauseke) –any XPath expression is a query n XQuery adds to XPath expressions –Element constructors ( XSLT templates) –FLWOR expressions (”flower”; for-let-where-order by-return)
SDPL Querying XML with XQuery36 Central XQuery Expressions n Path expressions n Sequence expressions n Comparison operators n Conditionals: if (..) then.. else.. n Quantified expressions (some/every $var in … satisfies …) n Element constructors ( XSLT templates) n FLWOR expressions (”flower”; for-let-where-order by-return) – XPath 2.0 has a simpler for-return expression also in XPath 2.0
SDPL Querying XML with XQuery37 Example: Quantified Expression Find book elements which have at least 10 section s in each of their chapter s : Find book elements which have at least 10 section s in each of their chapter s : doc(”Books.xml”)//book[ every $c in.//chapter satisfies count($c//section) ge 10 ]
SDPL Querying XML with XQuery38 Element Constructors n Direct element constructors ~ XSLT templates: –start and end tag enclosing the content –literal fragments written directly, expressions enclosed in braces { and } ≈ XSLT 1.0 attribute value templates n often used inside another expression that binds variables used in the element constructor –(There is no 'current node' in XQuery) –See next
SDPL Querying XML with XQuery39 Example An emp element with an empid attribute and child elements name and job, from values in variables $id, $n, and $j : An emp element with an empid attribute and child elements name and job, from values in variables $id, $n, and $j : {$n} {$n} {$j} {$j} </emp> Also computed constructors: element {"emp"} { attribute {"empid"}{$id}, {$n}, {$j} }
SDPL Querying XML with XQuery40 Identity of Component Nodes n Each node has node identity, and at most one parent. Existing nodes are copied before they get a new parent. n Example: let $x := Hi, $y := {$x} $y := {$x} return not($x is $y/e) and deep-equal($x, $y/e) -> true
SDPL Querying XML with XQuery41 FLWOR ("flower") Expressions n Constructed from for, let, where, order by and return clauses (~SQL select-from-where) n Syntax: (ForClause | LetClause)+ WhereClause? OrderByClause? "return" Expr n FLWOR binds variables to values, and uses these bindings to construct a result (an ordered sequence of items)
SDPL Querying XML with XQuery42 Flow of data in a FLWOR expression tuple = monikko/rivi sequnce of items items
SDPL Querying XML with XQuery43 for clauses for $V 1 in Exp 1 (, $V 2 in Exp 2, …) for $V 1 in Exp 1 (, $V 2 in Exp 2, …) –associates each variable V i with expression Exp i (e.g. a path expression) n Result: list of tuples, each containing a binding for each of the variables n can be though of as loops iterating over the items returned by respective expressions
SDPL Querying XML with XQuery44 Example: for clause for $i in (1,2), $j in (1 to $i) return {$i} {$j} for $i in (1,2), $j in (1 to $i) return {$i} {$j} Result:<tuple><i>1</i><j>1</j></tuple><tuple><i>2</i><j>1</j></tuple><tuple><i>2</i><j>2</j></tuple>
SDPL Querying XML with XQuery45 let clauses n let also binds variables to expressions –each variable gets the entire sequence as its value (without iterating over the items of the sequence) –results in binding a single sequence for each variable n Compare: –for $b in doc("bib.xml")//book many bindings (to single books) –let $bl := doc("bib.xml")//book a single binding (to sequence of books)
SDPL Querying XML with XQuery46 Example: let clauses let $s := (,, ) return {$s} return {$s} Result:<out> </out> for $s in (,, ) return {$s} return {$s} --> -->
SDPL Querying XML with XQuery47 for/let clauses n A FLWOR expr may contain several fors and lets –each may refer to variables bound in previous clauses n the result of the for/let sequence: –an ordered list of tuples (monikko) of bound variables –number of tuples = product of the cardinalities of the sequences returned by the for expressions
SDPL Querying XML with XQuery48 where clause n binding tuples generated by for and let clauses are filtered by an optional where clause –tuples with a true condition are used to instantiate the return clause the where clause may contain several predicates connected by and, or, and fn:not() the where clause may contain several predicates connected by and, or, and fn:not() –usually refer to the bound variables –sequences as Booleans (similarly to node-sets in XPath 1.0): empty ~ false; non-empty ~ true
SDPL Querying XML with XQuery49 where clause n for binds variables to single items value comparisons, e.g. $color eq " red " value comparisons, e.g. $color eq " red " let to whole sequences general comparisons, e.g. $colors = "red" ( ~ some $c in $colors satisfies $c eq "red" ) let to whole sequences general comparisons, e.g. $colors = "red" ( ~ some $c in $colors satisfies $c eq "red" ) –a number of aggregation functions available: avg(), sum(), count(), max(), min() (also in XPath 1.0)
SDPL Querying XML with XQuery50 return clause n The return clause generates the output of the FLWOR expression n instantiated once for each binding tuple n often contains element constructors, references to bound variables, and nested sub-expressions
SDPL Querying XML with XQuery51 Example: for + return for $i in (1,2), $j in (1 to $i) return {$i} {$j} for $i in (1,2), $j in (1 to $i) return {$i} {$j} Result:<tuple><i>1</i><j>1</j></tuple><tuple><i>2</i><j>1</j></tuple><tuple><i>2</i><j>2</j></tuple>
SDPL Querying XML with XQuery52 Example: Prime numbers Generate prime numbers up to $N, i.e., integers in {2, 3, … $N } which are not divisible by others: declare variable $N external; let $cands := 2 to $N for $cand in $cands where count($cands[$cand mod.eq 0]) eq 1 return $cand Generate prime numbers up to $N, i.e., integers in {2, 3, … $N } which are not divisible by others: declare variable $N external; let $cands := 2 to $N for $cand in $cands where count($cands[$cand mod.eq 0]) eq 1 return $cand
SDPL Querying XML with XQuery53 Positional variables For items, can also get their position in the seq: for $char at $i in ("a", "b", "c") return concat($i, ".", $char, ";") For items, can also get their position in the seq: for $char at $i in ("a", "b", "c") return concat($i, ".", $char, ";") 1.a; 2.b; 3.c; Could pair items by their position: let $boys:= doc("kids.xml")//boy, $girls:= doc("kids.xml")//girl for $b at $i in $boys where $i le count($girls) return { $b, $girls[$i] } Could pair items by their position: let $boys:= doc("kids.xml")//boy, $girls:= doc("kids.xml")//girl for $b at $i in $boys where $i le count($girls) return { $b, $girls[$i] }
SDPL Querying XML with XQuery54 Prime numbers more efficiently Only smaller numbers can be divisors; useless to test against others: let $cands := 2 to $N for $cand at $pos in $cands where empty( $cands[position() lt $pos] [$cand mod.eq 0] ) return $cand Only smaller numbers can be divisors; useless to test against others: let $cands := 2 to $N for $cand at $pos in $cands where empty( $cands[position() lt $pos] [$cand mod.eq 0] ) return $cand
SDPL Querying XML with XQuery55 Effect of the Optimization (with Saxon-HE 9.3) Quite positive on both time and space: Quite positive on both time and space: – can be optimized much more (see later)
SDPL Querying XML with XQuery56 Examples (adapted from "XML Query Use Cases") Assume: a document named ” bib.xml ” containing of a list of book s: + + Assume: a document named ” bib.xml ” containing of a list of book s: + +
SDPL Querying XML with XQuery57 { { } } List Morgan Kaufmann book titles since 1998 for $b in doc("bib.xml")//book where $b/publisher = "Morgan Kaufmann" and $b/year >= 1998 and $b/year >= 1998 return return {$b/title}
SDPL Querying XML with XQuery58 Result could be... <recent-MK-books> TCP/IP Illustrated TCP/IP Illustrated Advanced Programming in the Unix environment Advanced Programming in the Unix environment </recent-MK-books>
SDPL Querying XML with XQuery59 Publishers with avg price of their books: for $p in fn:distinct-values( doc("bib.xml")//publisher ) let $a := avg( doc("bib.xml")//book[ publisher = $p]/price ) return {$p} {$a} return {$p} {$a} atomic values, without duplicates
SDPL Querying XML with XQuery60 Invert the book-list structure { (: group books by authors :) { (: group books by authors :) for $a in distinct-values( doc("bib.xml")//author ) for $a in distinct-values( doc("bib.xml")//author ) return { return { {$a}, {$a}, for $b in doc("bib.xml")//book[ for $b in doc("bib.xml")//book[ author = $a] return $b/title } return $b/title } } }
SDPL Querying XML with XQuery61 List of publishers sorted alphabetically, and their books in descending order of price for $p in distinct-values( doc("bib.xml")//publisher ) order by $p return {$p} { for $b in doc("bib.xml")//book[ {$p} { for $b in doc("bib.xml")//book[ publisher = $p] order by number($b/price) descending return {$b/title, return {$b/title, $b/price} } $b/price} } treat untyped values as numbers, instead of the xs:string default
SDPL Querying XML with XQuery62 Queries on Document Order n $x > $x similarly) $a
SDPL Querying XML with XQuery63 Example Query on Document Order Consider a surgical report with procedure elements that contain incision sub- elements Consider a surgical report with procedure elements that contain incision sub- elements n Return a "critical sequence" of contents btw the first and the second incisions of the first procedure
SDPL Querying XML with XQuery64 Computing a "critical sequence" { { let $p := (doc("report.xml")//procedure)[1] let $p := (doc("report.xml")//procedure)[1] for $n in $p/node() for $n in $p/node() where $n >> ($p//incision)[1] and $n > ($p//incision)[1] and $n << ($p//incision)[2] return $n } return $n } NB: if incision s are not children of the procedure, then an ancestor of the second incision gets to the result; How to avoid this? NB: if incision s are not children of the procedure, then an ancestor of the second incision gets to the result; How to avoid this?
SDPL Querying XML with XQuery65 User-defined functions: Example declare function local:precedes-not-anc($a as node()?, $b as node()?) as xs:boolean { $a << $b and (: $a is no ancestor of $b: :) empty($a/descendant::node() intersect $b) }; empty($a/descendant::node() intersect $b) }; local : is predeclared prefix for the namespace of local function names local : is predeclared prefix for the namespace of local function names –Alternatively: declare namespace my = " declare namespace my = " declare function my:precedes(... (as above)
SDPL Querying XML with XQuery66 User-defined functions: Example Now, ”critical sequence” without ancestors of incision : Now, ”critical sequence” without ancestors of incision : { { let $p := (doc("report.xml")//procedure)[1] let $p := (doc("report.xml")//procedure)[1] for $n in $p/node() for $n in $p/node() where $n >> ($p//incision)[1] and local:precedes-not-anc($n, ($p//incision)[2]) where $n >> ($p//incision)[1] and local:precedes-not-anc($n, ($p//incision)[2]) return $n return $n } }
SDPL Querying XML with XQuery67 Prime numbers with a function n Method: (a variant of) the “Sieve of Eratosthenes” –Initialize a list of numbers 2, 3, 4, …, n –Repeat: (1) Move first of remaining numbers, p, to Primes (2) Cross out multiples of p (sufficient to start at p*p)
SDPL Querying XML with XQuery68 The sieve() function n Invocation: pr:sieve(2 to $N) declare namespace pr=" declare function pr:sieve($cands as xs:integer*) as xs:integer* { (: Pre: $cands are ascending and contain no multiples of primes < $cands[1] :) (: Pre: $cands are ascending and contain no multiples of primes < $cands[1] :) if ($cands[1] * $cands[1] gt $cands[last()]) then if ($cands[1] * $cands[1] gt $cands[last()]) then $cands (: all of $cands are primes :) $cands (: all of $cands are primes :) else ( $cands[1], pr:sieve($cands[. mod $cands[1] ne 0]) ) }; else ( $cands[1], pr:sieve($cands[. mod $cands[1] ne 0]) ) }; n NB if-then-else
SDPL Querying XML with XQuery69 Efficiency of pr:sieve() (with Saxon-HE 9.3) – vs 23 s (!) for $N=100,000 with previous optimized FLWOR expression
SDPL Querying XML with XQuery70 Recursive Transformations n Example: “Table-of-contents” for nested sections –Exclude anything but titles, and tags of sect element
declare namespace my=" declare function my:toc( $n as element() ) as element()* { if (name($n)="sect") then { for $c in $n/* return my:toc($c) } for $c in $n/* return my:toc($c) } else if (name($n)="title") then $n else if (name($n)="title") then $n else (: do child elems, if any: :) for $c in $n/* return my:toc($c) }; else (: do child elems, if any: :) for $c in $n/* return my:toc($c) };my:toc(doc("mydoc.xml")/*) SDPL Querying XML with XQuery71 The TOC function
SDPL Querying XML with XQuery72 Querying relational data n Lots of data is stored in relational databases n Should be able to access also them n Example: Tables for Parts and Suppliers –P ( pno, descrip ) : part numbers and descriptions –S ( sno, sname ) : supplier numbers and names –SP ( sno, pno, price ): who supplies which parts and for what price?
SDPL Querying XML with XQuery73 Possible XML representation of relations * * *
SDPL Querying XML with XQuery74 Selecting in SQL vs. XQuery n SQL: n XQuery: SELECT pno FROM p WHERE descrip LIKE ’Gear%’ ORDER BY pno; for $p in doc("p.xml")//p_tuple where starts-with($p/descrip, "Gear") order by $p/pno return $p/pno
SDPL Querying XML with XQuery75 Grouping n Many queries involve grouping data and applying aggregation function like count or avg to each group n in SQL: GROUP BY and HAVING clauses n Example: Find the part number and average price for parts with at least 3 suppliers
SDPL Querying XML with XQuery76 Grouping: SQL SELECT pno, avg(price) AS avgprice FROM sp GROUP BY pno HAVING count(*) >= 3 ORDER BY pno;
SDPL Querying XML with XQuery77 Grouping: XQuery for $pn in distinct-values( doc("sp.xml")//pno) let $grp := doc("sp.xml")//sp_tuple[pno=$pn] where count($grp) >= 3 order by $pn return { { {$pn}, {$pn}, {avg($grp/price)} {avg($grp/price)} } }
SDPL Querying XML with XQuery78 Joins n Example: Return a ”flat” list of supplier names and their part descriptions, in alphabetic order for $sp in doc("sp.xml")//sp_tuple, $p in doc("p.xml")//p_tuple[pno = $sp/pno], $p in doc("p.xml")//p_tuple[pno = $sp/pno], $s in doc("s.xml")//s_tuple[sno = $sp/sno] $s in doc("s.xml")//s_tuple[sno = $sp/sno] order by $p/descrip, $s/sname return { $s/sname, $s/sname, $p/descrip $p/descrip } }
SDPL Querying XML with XQuery79 XQuery vs. XSLT 1.0 n Could we express XQuery queries with XSLT? –In principle yes, always, (but could be tedious) n Partial XSLT simulation of FLWOR expressions: –XQuery : for $x in Expr …rest of query –can be expressed in XSLT as : … translation of the rest of the query –can be expressed in XSLT as : … translation of the rest of the query
SDPL Querying XML with XQuery80 XQuery vs. XSLT 1.0 –XQuery : let $y := Expr … corresponds directly to : corresponds directly to : and where Condition … rest … translation of the rest … translation of the rest
SDPL Querying XML with XQuery81 XQuery vs. XSLT 1.0 –XQuery : return ElemConstructor can be simulated with a corresponding XSLT template: static fragments as such static fragments as such enclosed expressions in element content, e.g. {$s/sname} become enclosed expressions in element content, e.g. {$s/sname} become
SDPL Querying XML with XQuery82 XQuery vs. XSLT 1.0: Example for $b in doc("bib.xml")//book where $b/publ = "MK" and $b/year > 1998 return return {$b/title} {$b/title} XQuery: 1998”> 1998”>
SDPL Querying XML with XQuery83 XSLT for FLWOR Expressions n The sketched simulation is not complete: –Only two things, roughly, can be done with XSLT 1.0 result tree fragments produced by templates: »insertion in result tree - with and conversion to a string - with »insertion in result tree - with and conversion to a string - with –Not possible to apply other operations to results (like, e.g., sorting in XQuery): for $y in ( {$x/code},... ) order by $y/key
SDPL Querying XML with XQuery84 Using XQuery for Problem Solving n XQuery has features which make it a potential tool for experimental problem solving –Compositionality, flexible combining of expressions –Easy manipulation of sequences –FLWOR expressions for non-deterministic search –General comparisons –XML/XPath trees and recursion for generic data representation and repetition P. Kilpeläinen, Manuscript 2011
SDPL Querying XML with XQuery85 Filtering Integer Sequences n XQuery yields simple solutions to some arithmetic problems n Example: ”Add numbers below one thousand that are multiples of 3 or 5” [ from ] sum( (1 to 999)[. mod 3 eq 0 or. mod 5 eq 0] ) n Generation of Prime Numbers considered earlier
SDPL Querying XML with XQuery86 Solving “Tricky Triangles” n A puzzle of 9 cards with animal figures: ™ Dan Gilbert Art Group
SDPL Querying XML with XQuery87 Modeling the Puzzle n Matching parts represented as opposite numbers: DolphinHeadFrontRearTail Dark blue12-2 Light blue Big greenNA5-5NA Small green6-6NA White n Cards represented by empty elements, and figures on their sides by attributes (in clock-wise order) :
SDPL Querying XML with XQuery88 Representing Puzzle Cards n Introduce card elements: declare variable $cards := (,,,, ); );
SDPL Querying XML with XQuery89 Generating Rotations of Cards n Three turns of a $card created by a function: declare function tr:rotations($card as element(card)) as element(card) { }; };
SDPL Querying XML with XQuery90 Invoking the solution function n Card rotations first generated, then passed to a solution function : let $cardsRotated := $cards/tr:rotations(.) return tr:solutions($cardsRotated) n The function uses 9 variables for card slots:
SDPL Querying XML with XQuery91 Solution function for the puzzle n Possible turns selected from remaining cards, in a careful order of slots: declare function tr:solutions($cards as element(card)+) as element(solution)* { (: Start from the top card, $c1: :) (: Start from the top card, $c1: :) for $c1 in $cards/turn for $c1 in $cards/turn let $cards := $cards except $c1/parent::card let $cards := $cards except $c1/parent::card (: Try matching turns for the mid-card of the 2nd row: :) (: Try matching turns for the mid-card of the 2nd row: :) for $c3 in = for $c3 in = let $cards := $cards except $c3/parent::card let $cards := $cards except $c3/parent::card n NB let clauses for simulating assignment statements
SDPL Querying XML with XQuery92 Solution function (cont) n Cards selected in a similar manner, until … (: … VARIABLES UP TO THE LAST TWO EXCLUDED … :) (: … VARIABLES UP TO THE LAST TWO EXCLUDED … :) for $c8 in = and for $c8 in = = let $cards := $cards except $c8/parent::card let $cards := $cards except $c8/parent::card for $c9 in = for $c9 in = return { $c1, $c2, $c3, $c4, $c5, $c6, $c7, $c8, $c9 } }; return { $c1, $c2, $c3, $c4, $c5, $c6, $c7, $c8, $c9 } };
SDPL Querying XML with XQuery93 Solutions n The program finds two unique solutions:
SDPL Querying XML with XQuery94 Solutions n Corresponding arrangements: n Well, actually more …
SDPL Querying XML with XQuery95 Avoiding multiple solutions n Each solution is reported three times, as a rotation of the entire puzzle n No obvious solution to avoid them n would be easy, had the puzzle a slot at the center of symmetry
SDPL Querying XML with XQuery96 Efficiency n Solutions computed in 0.8 sec and 30 MB (with Saxon-HE) n Careful order of card selections relevant for restricting the number of combinations n The number of brute-force combinations is high: n 9 x 3 = 27 for the 1st card, 27 x (8 x 3) = 648 for the first two etc n Number of ways of choosing the ith card: Strategy1st2nd3rd4th5th6th7th8th9th Brute-force ,608244,9444x10 6 4x10 7 4x10 8 2x10 9 7x10 9 Careful
SDPL Querying XML with XQuery97 Solving Sudoku Puzzles n Discussion of an XQuery Sudoku solver n Example: Inkala’s ”AI Escargot” Sudoku and its input representation: n Internal representation of cells: Start by loading the cells of a board: let $cells := sudo:preprocess( doc($SudoDoc)//row ) Start by loading the cells of a board: let $cells := sudo:preprocess( doc($SudoDoc)//row )
SDPL Querying XML with XQuery98 Loading Sudoku Boards declare namespace sudo=" declare variable $SudoDoc external; declare function sudo:preprocess( $rows as element(row)+ ) as element(cell)+ { (: Return cells with numbers for row, col and box :) for $row at $rowNum in $rows for $row at $rowNum in $rows let $colContents := tokenize(string($row), ",\s*") let $colContents := tokenize(string($row), ",\s*") for $colCont at $colNum in $colContents for $colCont at $colNum in $colContents return <cell rowNum="{$rowNum}" colNum="{$colNum}" return <cell rowNum="{$rowNum}" colNum="{$colNum}" val="{xs:integer($colCont)}" box="{sudo:boxNum($rowNum, $colNum)}" /> }; val="{xs:integer($colCont)}" box="{sudo:boxNum($rowNum, $colNum)}" /> }; n (Generation of box number s; see next )
SDPL Querying XML with XQuery99 Generating box numbers for Sudoku cells declare function sudo:boxNum( $rowNum as xs:integer, $colNum as xs:integer ) as xs:integer { (: box number in 1,2,...,9 :) as xs:integer { (: box number in 1,2,...,9 :) (($rowNum - 1) idiv 3)*3 + ($colNum - 1) idiv (($rowNum - 1) idiv 3)*3 + ($colNum - 1) idiv }; };
SDPL Querying XML with XQuery100 Invoking the solution function n After preprocessing, pass free cells and fixed cells to a solution function, and display its solutions : let $freeCells := = 0], $fixedCells := $cells except $freeCells $fixedCells := $cells except $freeCells for $solution in sudo:solution($freeCells, $fixedCells) return sudo:displayCells($solution/*) Function solution() returns board elements, Function solution() returns board elements, whose cells are displayed by displayCells() : whose cells are displayed by displayCells() :
SDPL Querying XML with XQuery101 The Sudoku solution function declare function sudo:solution( $freeCells as element(cell)*, $fixedCells as element(cell)+ ) as element(board)* { if ( empty($freeCells) ) then (: the board is complete :) $fixedCells as element(cell)+ ) as element(board)* { if ( empty($freeCells) ) then (: the board is complete :) { $fixedCells } { $fixedCells } else else let $cell := $freeCells[1] (: pick any unfilled cell :) let $cell := $freeCells[1] (: pick any unfilled cell :) let $thisRow := eq let $thisRow := eq $thisCol := eq $thisCol := eq $thisBox := eq $thisBox := eq let $forbiddenVals := ($thisRow | $thisCol | let $forbiddenVals := ($thisRow | $thisCol | for $val in (1 to 9)[not(. = $forbiddenVals)] for $val in (1 to 9)[not(. = $forbiddenVals)] let $fixedCells2 := ( $fixedCells, { ne "val"] } ) let $fixedCells2 := ( $fixedCells, { ne "val"] } ) return sudo:solution($freeCells except $cell, $fixedCells2) }; return sudo:solution($freeCells except $cell, $fixedCells2) };
SDPL Querying XML with XQuery102 Displaying a solved board declare variable $line-btw-boxes:=" "; declare function sudo:displayCells( $cells as element(cell)+ ) as xs:string+ { for $rowNum in (1 to 9) return ( = $rowNum]), return ( = $rowNum]), if ($rowNum eq 3 or $rowNum eq 6) then $line-btw-boxes if ($rowNum eq 3 or $rowNum eq 6) then $line-btw-boxes else () ) }; else () ) }; declare function sudo:displayRow( $cells as element(cell)+) as xs:string+ { for $colNum in (1 to 9) return = (: + bar btw boxes: :) return = (: + bar btw boxes: :) if ($colNum eq 3 or $colNum eq 6) then "|" else () ), " " }; if ($colNum eq 3 or $colNum eq 6) then "|" else () ), " " };
SDPL Querying XML with XQuery103 Heuristic Optimization n Intuitively useful to fill the most constrained cells first For this, compute the number of constraints for a $cell : For this, compute the number of constraints for a $cell : declare function sudo:numOfConstraints( declare function sudo:numOfConstraints( $cell as element(cell), $fixedCells as element(cell)+ ) as xs:integer { $cell as element(cell), $fixedCells as element(cell)+ ) as xs:integer { let $neighbors := eq or let $neighbors := eq eq eq eq return count( ) }; return count( ) }; n Use this function to choose maximally constrained free cells:
SDPL Querying XML with XQuery104 Heuristic Optimization (2) declare function sudo:mostConstrainedFreeCells( $freeCells as element(cell)+, $fixedCells as element(cell)+ ) as element(cell)* { (: Maximally constrained free cells: :) $freeCells as element(cell)+, $fixedCells as element(cell)+ ) as element(cell)* { (: Maximally constrained free cells: :) let $maxNumOfConstraints := max( for $cell in $freeCells let $maxNumOfConstraints := max( for $cell in $freeCells return sudo:numOfConstraints($cell, $fixedCells) ) return sudo:numOfConstraints($cell, $fixedCells) ) for $cell in $freeCells for $cell in $freeCells where sudo:numOfConstraints($cell, $fixedCells) eq $maxNumOfConstraints where sudo:numOfConstraints($cell, $fixedCells) eq $maxNumOfConstraints return $cell }; return $cell }; Use in function solve() : Use in function solve() : let $cell := (: pick a maximally constrained cell :) sudo:mostConstrainedFreeCells($freeCells, $fixedCells)[1]
SDPL Querying XML with XQuery105 XQuery Sudoku Efficiency n Efficiency, and the effect of the optimization varies by puzzle instances: PuzzleTimeMemoryTimeMemory Easy s 36 MB+20%+33% Hard s146 MB-45%-44% AI Escargot4.0 s270 MB+20%+25% Fiendish 26.9 s360 MB-40%-5% Inkala ’107.1 s360 MB-40%-3% Minimal s400 MB-40%-5% Double s410 MB-85%-15% with heuristic unoptimized
SDPL Querying XML with XQuery106 XQuery: Summary –A recent W3C XML query language, also capable of general XML processing –Vendor support?? » mentions 50+ prototypes or products (2004: ~ 30, 2005: ~ 40; free, commercial,... Oracle, IBM DB2, MS SQL Server; Native XML databases,...) –Future?? »Promising confluence of document and database research »highly potential for XML-based data integration