Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processing of structured documents Part 6. 2 XML Query language zW3C 20.12.2001: working drafts yA data model (XQuery 1.0 and XPath 2.0) yXQuery 1.0 Formal.

Similar presentations


Presentation on theme: "Processing of structured documents Part 6. 2 XML Query language zW3C 20.12.2001: working drafts yA data model (XQuery 1.0 and XPath 2.0) yXQuery 1.0 Formal."— Presentation transcript:

1 Processing of structured documents Part 6

2 2 XML Query language zW3C 20.12.2001: working drafts yA data model (XQuery 1.0 and XPath 2.0) yXQuery 1.0 Formal Semantics (June 2001) yXQuery 1.0: A Query Language for XML zinfluenced by the work of many research groups and query languages zgoal: a query language that is broadly applicable across all types of XML data sources

3 3 Usage scenarios zHuman-readable documents yperform queries on structured documents and collections of documents, such as technical manuals, xto retrieve individual documents, xto generate tables of contents, xto search for information in structures found within a document, or xto generate new documents as the result of a query

4 4 Usage scenarios zData-oriented documents yperform queries on the XML representation of database data, object data, or other traditional data sources xto extract data from these sources xto transform data into new XML representations xto integrate data from multiple heterogeneous data sources ythe XML representation of data sources may be either physical or virtual: xdata may be physically encoded in XML, or an XML representation of the data may be produced

5 5 Usage scenarios zMixed-model documents yperform both document-oriented and data-oriented queries on documents with embedded data, such as catalogs, patient health records, employment records zAdministrative data yperform queries on configuration files, user profiles, or administrative logs represented in XML zNative XML repositories (databases)

6 6 Usage scenarios zFiltering streams yperform queries on streams of XML data to process the data (logs of email messages, network packets, stock market data, newswire feeds, EDI) xto filter and route messages represented in XML xto extract data from XML streams xto transform data in XML streams zDOM yperform queries on DOM structures to return sets of nodes that meet the specified criteria

7 7 Usage scenarios zMultiple syntactic environments yqueries may be used in many environments ya query might be embedded in a URL, an XML page, or a JSP or ASP page yrepresented by a string in a program written in a general-purpose programming language yprovided as an argument on the command-line or standard input

8 8 Requirements zQuery language syntax ythe XML Query Language may have more than one syntax binding yone query language syntax must be convenient for humans to read and write yone query language syntax must be expressed in XML in a way that reflects the underlying structure of the query zDeclarativity ythe language must be declarative yit must not enforce a particular evaluation strategy

9 9 Requirements zReliance on XML Information Set ythe XML Query data model relies on information provided by XML Processors and Schema Processors yit must ensure that it does not require information that is not made available by such processors zDatatypes ythe data model must represent both XML 1.0 character data and the simple and complex types of the XML Schema specification zSchema availability yqueries must be possible whether or not a schema is available

10 10 Requirements: functionality zSupport operations (selection, projection, aggregation, sorting, etc.) on all data types: yChoose a part of the data based on content or structure yAlso operations on hierarchy and sequence of document structures zStructural preservation and transformation: yPreserve the relative hierarchy and sequence of input document structures in the query results yTransform XML structures and create new XML structures zCombination and joining: yCombine related information from different parts of a given document or from multiple documents

11 11 Requirements: functionality zReferences: yQueries must be able to traverse intra- and inter-document references zClosure property: yThe result of an XML document query is also an XML document (usually not valid but well-formed) yThe results of a query can be used as input to another query zExtensibility: yThe query language should support the use of externally defined functions on all datatypes of the data model

12 12 XQuery zDesign goals: ya small, easily implementable language yqueries are concise and easily understood yflexible enough to query a broad spectrum of XML information sources (incl. both databases and documents) ya human-readable query syntax zfeatures borrowed from many languages yQuilt, Xpath, XQL, XML-QL, SQL, OQL, Lorel,...

13 13 XQuery vs. another XML activities zXQuery 1.0 and XPath 2.0 Data Model zsemantics of XQuery is defined in the XQuery Formal Semantics ztype system is based on the type system of XML Schema zpath expressions (for navigating in hierarchic documents) = path expressions of XPath 2.0 zthe XML-based syntax is described in XQueryX

14 14 XQuery zA query is represented as an expression zseveral kinds of expressions -> several forms zexpressions can be nested with full generality zthe input and output of a query are instances of a data model (XQuery 1.0 and XPath 2.0 Data Model) ya fragment of a document or a collection of documents may lack a common root and may be modeled as an ordered forest of nodes

15 15 An instance of the Data Model - an ordered forest

16 16 XQuery expressions zpath expressions zelement constructors zFLWR (”flower”; for-let-where-return) expressions zexpressions involving operators and functions zconditional expressions zquantified expressions zexpressions that test or modify datatypes

17 17 Path expressions zthe result of a path expression is an ordered list of nodes (document order) yeach node includes its descendant nodes -> the result is an ordered forest zthe top-level nodes in the result are ordered according to their position in the original hierarchy (in top-down, left-right order) zno duplicate nodes

18 18 Element constructors zAn element constructor creates an XML element zconsists of a start tag and an end tag, enclosing an optional list of expressions that provide the content of the element ythe start tag may also specify the values of one of more attributes ztypical use: ynested inside another expression that binds variables that are used in the element constructor

19 19 Example zGenerate an element containing an ”empid” attribute and nested and elements. The values of the attribute and nested elements are specified elsewhere. {$n} {$j}

20 20 Element constructors zIn an element constructor, curly braces {} delimit enclosed expressions, distinguishing them from literal text zenclosed expressions are evaluated and replaced by their value, whereas material outside curly braces is simply treated as literal text zan enclosed expression may evaluate to any sequence of nodes and/or simple values

21 21 Computed element constructors zGenerate an element with a computed name, containing nested elements named and element {$tagname}{ {$d}, {$p} }

22 22 FLWR expressions zConstructed from for, let, where, and return clauses z~SQL select-from-where zclauses must appear in a specific order y1. for/let, 2. where, 3. return za FLWR expression binds values to one or more variables and then uses these variables to construct a result (in general, an ordered forest of nodes)

23 23 A flow of data in a FLWR expression

24 24 for clauses zA for clause introduces one or more variables, associating each variable with an expression that returns a list of nodes (e.g. a path expression) zthe result of a for clause is a list of tuples, each of which contains a binding for each of the variables zeach variable in a for clause can be thought of as iterating over the nodes returned by its respective expression

25 25 let clauses zA let clause is also used to bind one or more variables to one or more expressions za let clause binds each variable to the value of its respective expression without iteration zresults in a single binding for each variable zCompare: yfor $x in /library/book -> many bindings (books) ylet $x := /library/book -> single binding (a list of books)

26 26 for/let clauses zA FLWR expression may contain several for and let clauses yeach of these clauses may contain references to variables bound in previous clauses zthe result of the for/let sequence: yan ordered list of tuples of bound variables zthe number of tuples generated by the for/let sequence: ythe product of the cardinalities of the node-lists returned by the expressions in the for clauses

27 27 for/let clauses let $s := (,, ) return {$s} Result:

28 28 for/let clauses for $s in (,, ) return {$s} Result:

29 29 for/let clauses for $i in (1,2), $j in (3,4) return { $i } { $j } Result: 1 3 1 4 2 3 2 4

30 30 where clause zEach of the binding tuples generated by the for and let clauses can be filtered by an optional where clause zonly those tuples for which the condition in the where clause is true are used to invoke the return clause zthe where clause may contain several predicates connected by and, or, and not ypredicates usually contain references to the bound variables

31 31 where clause zVariables bound by a for clause represent a single node y-> scalar predicates, e.g. $p/color = ”Red” zVariables bound by a let clause may represent lists of nodes y-> list-oriented predicates, e.g. avg($p/price) > 100

32 32 return clause zThe return clause generates the output of the FLWR expression ya node, an ordered forest of nodes, primitive value zis executed on each tuple zcontains an expression that often contains element constuctors, references to bound variables, and nested subexpressions

33 33 Examples zAssume: a document named ”bib.xml” zcontains a list of elements zeach contains a element, one or more elements, a element, a element, and a element

34 34 List the titles of books published by Addison Wesley after 1998 { for $b in document(”bib.xml”)//book where $b/publisher = ”Addison Wesley” and $b/year > ”1998” return {$b/title} }

35 35 Result could be... TCP/IP Illustrated Advanced Programming in the Unix environment

36 36 List each publisher and the average price of its books for $p in distinct-values(document(”bib.xml”)//publisher) let $a := avg(document(”bib.xml”)//book[publisher = $p]/price) return { {$p/text()}, {$a} }

37 37 List the publishers who have published more than 100 books { for $p in distinct-values(document(”bib.xml”)//publisher) let $b := document(”bib.xml”)//book[publisher = $p] where count($b) > 100 return $p }

38 38 Invert the structure of the input document so that each distinct author element contains a list of book-titles { let $input := document(”bib.xml”) for $a in distinct-values($input//author) return { { $a/text() }, { for $b in $input//book where $b/author = $a return $b/title } }

39 39 Make an alphabetic list of publishers, within each publisher, make a list of books (title & price), in descending order by price for $p in distinct-values(document(”bib.xml”)//publisher) return {$p/text()} {for $b in document(”bib.xml”)//book[publisher = $p] return {$b/title} {$b/price} sortby(price descending) } sortby(name)

40 40 Operators in expressions zExpressions can be constructed using infix and prefix operators; nested expressions inside parenthesis can serve as operands zarithmetic and logical operators; collection operators (union, intersect, except)

41 41 Queries on sequence zXQuery uses the precedes and follows operators to express conditions based on sequence zthe following example involves a surgical report that contains procedure, incision and anesthesia elements zthe query returns a critical sequence that contains all elements and nodes found between the 1st and 2nd incisions of the 1st procedure

42 42 Queries on sequence { let $proc := //procedure[1] for $n in $proc//node() where $n follows ($proc//incision)[1] and $n precedes ($proc//incision)[2] return $n }

43 43 Conditional expressions zif-then-else zconditional expressions can be nested and used wherever a value is expected zassume: a library has many holdings (element with a ”type” attribute that identifies its type, e.g. book or journal). All holdings have a title and other nested elements that depend on the type of holding

44 44 Make a list of holdings, ordered by title. For journals, include the editor, and for all others, include the author for $h in //holding return {$h/title}, {if ($h/@type = ”Journal”) then $h/editor else $h/author} sortby (title)

45 45 Quantifiers zIt may be necessary to test for existence of some element that satisfies a condition, or to determine whether all elements in some collection satisfy a condition z-> existential and universal quantifiers

46 46 Find titles of books in which both sailing and windsurfing are mentioned in the same paragraph for $b in //book where some $p in $b//para satisfies contains($p/text(), ”sailing”) and contains($p/text(), ”windsurfing”) return $b/title

47 47 Find titles of books in which sailing is mentioned in every paragraph for $b in //book where every $p in $b//para satisfies contains($p/text(), ”sailing”) return $b/title

48 48 Filtering zFunction filter (in XQuery core function library) zone parameter yexpression that evaluates to an ordered forest of nodes zfilter returns copies of some of the nodes in the original document yorder and hierarchy are preserved znodes that are copied: ynodes that are present at any level in the original document and are also top-level nodes of the forest returned by the parameter

49 49 Action of filter on a hierarchy zfilter (C\\(A | B))

50 50 Prepare a table of contents for the document ”cookbook.xml”, containing nested sections and their title let $b := document(”cookbook.xml”) return { filter($b// (section | section/title | section/title/text() )) }

51 51 Other built-in functions zA core library of built-in functions zdocument: returns the root node of a named document zall functions of the XPath core function library zall the aggregation functions of SQL yavg, sum, count, max, min… zdistinct-values: eliminates duplicates from a list zempty: returns true if and only if its argument is an empty list

52 52 User-defined functions zUsers are allowed to define own functions zeach function definition must ydeclare the datatypes of its parameters and result yprovide an expression that defines how the result of the function is computed from its parameters zwhen a function is invoked, its arguments must be valid instances of the declared parameter types zthe result must also be a valid instance of its declared type

53 53 Functions zExample: assume a purchase order bound to variable $po1 … define function timezone(element of type po:USAddress $a) returns integer {...} call: timezone($po1/shipTo) - timezone($po1/billTo)

54 54 Querying relational data zA lot of data is stored in relational databases zan XML query language should be able to access this data zExample: suppliers and parts yTable S: supplier numbers (sno) and names (sname) yTable P: part numbers (pno) and descriptions (descrip) yTable SP: relationships between suppliers and the parts they supply, including the price (price) of each part from each supplier

55 55 One possible XML representation of relational data

56 56 SQL vs. XQuery zSQL: zXQuery: SELECT pno FROM p WHERE descrip LIKE ’Gear’ ORDER BY pno; for $p in document(”p.xml”)//p_tuple where contains($p/descrip, ”Gear”) return $p/pno sortby(.)

57 57 Grouping zMany relational queries involve forming data into groups and applying some aggregation function such as count or avg to each group zin SQL: GROUP BY and HAVING clauses zExample: Find the part number and average price for parts that have at least 3 suppliers

58 58 Grouping: SQL SELECT pno, avg(price) AS avgprice FROM sp GROUP BY pno HAVING count(*) >= 3 ORDER BY pno;

59 59 Grouping: XQuery for $pn in distinct-values(document(”sp.xml”)//pno) let $sp := document(”sp.xml”)//sp_tuple[pno = $pn] where count($sp) >= 3 return {$pn} {avg($sp/price)} </avgprice) sortby(pno)

60 60 Joins zJoins combine data from multiple sources into a single query result zExample: Return a ”flat” list of supplier names and their part descriptions, in alphabetic order for $sp in document(”sp.xml”)//sp_tuple, $p in document(”p.xml”)//p_tuple[pno = $sp/pno], $s in document(”s.xml”)//s_tuple[sno = $sp/sno] return { $s/sname, $p/descrip } sortby (sname, descrip)

61 61 Example: ”left outer join” zReturn names of all the suppliers in alphabetic order, including those that supply no parts; inside each supplier element, list the descriptions of all the parts it supplies, in alphabetic order for $s in document(”s.xml”)//s_tuple return { $s/sname, for $sp in document(”sp.xml”)//sp_tuple[sno = $s/sno], $p in document(”p.xml”)//p_tuple[pno = $sp/pno] return $p/descrip sortby(.) } sortby(sname)


Download ppt "Processing of structured documents Part 6. 2 XML Query language zW3C 20.12.2001: working drafts yA data model (XQuery 1.0 and XPath 2.0) yXQuery 1.0 Formal."

Similar presentations


Ads by Google