Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Languages for XML

Similar presentations


Presentation on theme: "Query Languages for XML"— Presentation transcript:

1 Query Languages for XML
XPath

2 The XPath Data Model Given an XML document, XPath operates on it and produces values that are sequences of items. An item is either: A value of primitive type, e.g., integer or string; A node (defined next).

3 Principal Kinds of Nodes
Documents represent entire XML documents. Local path name or a URL. Elements are pieces of a document consisting of some opening tag, its matching closing tag (if any), and everything in between. Attributes are names that are given values inside opening tags.

4 Document Nodes Formed by doc(filename)
filename can be a local name or a URL. Example: doc(“univ.xml”) or doc(“/mydir/univ.xml”) All XPath queries refer to a doc node, either explicitly or implicitly. Example: key definitions in XML Schema have XPath expressions that refer to the document described by the schema.

5 Running Example <!DOCTYPE UNIV [ <!ELEMENT UNIV (STUDENT*, COURSE*)> <!ELEMENT STUDENT (NAME, MARK+)> <!ATTLIST STUDENT sno ID #REQUIRED> <!ELEMENT NAME (#PCDATA)> <!ELEMENT MARK (#PCDATA)> <!ATTLIST MARK theCNO IDREF #REQUIRED> <!ELEMENT COURSE (#PCDATA)> <!ATTLIST COURSE cno ID #REQUIRED> ]>

6 Example Document An element node
<UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV> An attribute node Document node is all of this, plus the header ( <? xml version… ?>).

7 Nodes as Semistructured Data
univ.xml UNIV sno = ”007” cno = ”CS123” STUDENT COURSE theCNO = ”CS456” theCNO = ”CS123” MARK DB NAME MARK 98 James Bond 80

8 Paths in XML Documents XPath is a language for describing paths in XML documents. The result of the described path is a sequence of items.

9 Path Expressions Simple path expressions are sequences of slashes (/) and tags, starting with /. Example: /UNIV/STUDENT/MARK Construct the result by starting with just the doc node and processing each tag in turn from the left.

10 Evaluating a Path Expression
Assume the first tag is the root. Processing the doc node by this tag results in a sequence consisting of only the root element. e.g., /UNIV Suppose we have obtained a sequence of items from processing the previous tags, and the next tag is X. For each item that is an element node, replace the element by all its subelements with tag X.

11 Example: /UNIV One item, the UNIV element <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV>

12 Example: /UNIV/STUDENT
<UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV> This STUDENT element followed by all the other STUDENT elements

13 Example: /UNIV/STUDENT/MARK
<UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV> These MARK elements followed by the MARK elements of all the other STUDENTs.

14 Relative Path We can use XPath expressions that are relative to the current node or sequence of nodes. Do not start with /. Example If we have arrived at node /UNIV, then we can use relative path STUDENT/NAME or COURSE to describe its subelements. Lu Chaojun, SJTU

15 Attributes in Paths Instead of going to subelements with a given tag, you can go to an attribute of the elements you already have. An attribute is indicated by in front of its name.

16 Evaluating Attributes
When a path expression ends in an attribute, the result is typically a sequence of values of primitive type.

17 Example: /UNIV/STUDENT/MARK/@theCNO
<UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV> These attributes contribute ”CS123” ”CS456” to the result, followed by other theCNO values.

18 Paths that Begin Anywhere
If the path begins with //X, then the first step can begin at the root or any subelement of the root, as long as the tag is X. In fact, //X can appear anywhere in a path. e.g., /UNIV//NAME // is the shorthand of a kind of axis. (see next slide)

19 Axes: Modes of Navigation
In general, path expressions allow us to start at the root and execute steps to find a sequence of nodes. At each step, we may follow any one of several axes. The default axis is child:: --- go to all the children of the current set of nodes. Shorthand: /

20 Example: Axes /UNIV/STUDENT is really shorthand for /child::UNIV/child::STUDENT. @ is really shorthand for the attribute:: axis. Thus, is shorthand for /child::UNIV/child::STUDENT/attribute::sno

21 More Axes Some other useful axes are:
parent:: = parent(s) of the current node(s) Shorthand: .. self Shorthand: the dot descendant-or-self:: = the current node(s) and all descendants Shorthand: // ancestor, ancestor-or-self, next-sibling, etc.

22 Wildcard * A star (*) in place of a tag represents “any tag”.
Example: /*/*/NAME represents all NAME elements at the third level of nesting. @* represents “any attribute”. Example:

23 Example: /UNIV/* <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV> This STUDENT element, all other STUDENT elements, the COURSE element, all other COURSE elements

24 Selection Conditions A condition inside […] may follow a tag.
If so, then only paths that have that tag and also satisfy the condition are included in the result of a path expression. Sequence comparisons have an implied “there exists” sense: two sequences are related if any pair of items, one from each sequence, are related by the given comparison operator.

25 Example: Selection Condition
/UNIV/STUDENT/MARK[. < 90] <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV> The condition that the MARK be < 90 makes this but not the CS123 mark part of the result.

26 Example: Attribute in Selection
= ”CS123”] <UNIV> <STUDENT sno = ”007”> <NAME>James Bond</NAME> <MARK theCNO = ”CS123”>80</MARK> <MARK theCNO = ”CS456”>98</MARK> </STUDENT> … <COURSE cno = ”CS123”> DB</COURSE>… </UNIV> Now, this MARK element is selected, along with any other marks for CS123.

27 Other Forms of Conditions
Here are some useful forms of conditions: X[i] = true for ith child of its parent X[T] = true for X having subelement with tag T X[A] = true for X having attribute A Lu Chaojun, SJTU

28 Query Languages for XML
XQuery

29 XQuery XQuery extends XPath to a query language that has power similar to SQL. Uses the same sequence-of-items data model. XQuery is an expression/functional language. Any XQuery expression can be an argument of any other XQuery expression. Like relational algebra

30 More About Item Sequences
XQuery will sometimes form sequences of sequences. All sequences are flattened. Example: (1 2 () (3 4)) = ( ). Empty sequence

31 FLWR Expressions Zero or more for and/or let clauses.
Then an optional where clause. Exactly one return clause.

32 Semantics of FLWR Expressions
Each for creates a loop. let produces only a local definition. At each iteration of the nested loops, if any, evaluate the where clause. If the where clause returns TRUE, invoke the return clause, and append its value to the output.

33 for Clauses for var in exp, ... Variables begin with $.
A for-variable takes on each item in the sequence denoted by the expression, in turn. Whatever follows this for is executed once for each value of the variable.

34 Example: for “Expand the en- closed string by replacing variables and path exps. by their values.” for $c in return <CNO> {$c} </CNO> $c ranges over the cno attributes of all courses in our example document. Result is a sequence of CNO elements: <CNO>CS123</CNO> <CNO>CS456</CNO> . . .

35 Use of Braces When a variable name like $x, or an expression, could be text, we need to surround it by braces to avoid having it interpreted literally. Example: <A>$x</A> is an A-element with value ”$x”. <A>{$x}</A> is correct. But return $x is unambiguous. You cannot return an untagged string without quoting it, as return ”$x”.

36 let Clauses let var := exp, ...
Value of the variable becomes the sequence of items defined by the expression. Note let does not cause iteration; for does.

37 Example: let Returns one element with all the course numbers, like:
let $d := document(”univ.xml”) let $c := return <CNO> {$c} </CNO> Returns one element with all the course numbers, like: <CNO>CS123</CNO>

38 order by Clauses FLWR is really FLWOR: an order-by clause can precede the return. Form: order by <expression> With optional ascending or descending. The expression is evaluated for each assignment to variables. Determines placement in output sequence.

39 Example: order by List all prices for Bud, lowest first.
let $d := document(”univ.xml”) for $p in order by $p return $p Generates bindings for $p to MARK elements. Order those bindings by the values inside the elements (auto- matic coersion). Each binding is evaluated for the output. The result is a sequence of MARK elements.

40 Predicates Normally, conditions imply existential quantification.
e.g., for two sequences of items to be equal, we have only to find any pair of items, one from each side, that equate.

41 Strict Comparisons To require that the things being compared are sequences of only one element, use comparison operators: eq, ne, lt, le, gt, ge. Example: $x/NAME eq ”James Bond” is true if somebody is the only person named “James Bond”.

42 Boolean Values in XQuery
The effective boolean value (EBV) of an expression is: The actual value if the expression is of type boolean. FALSE if the expression evaluates to 0, ”” [the empty string], or () [the empty sequence]. TRUE otherwise.

43 Comparison of Elem. and Values
When an element is compared to a primitive value, the element is treated as its value, if that value is atomic. Example: eq ”80” is true if 007 get 80 for CS123.

44 Comparison of Two Elements
It is insufficient that two elements look alike. Example: eq is false. For elements to be equal, they must be the same, physically, in the implied document.

45 Getting Data From Elements
Suppose we want to compare the values of elements, rather than their location in documents. To extract just the value (e.g., the mark itself) from an element E, use data(E).

46 Eliminating Duplicates
Use function distinct-values applied to a sequence. Subtlety: this function strips tags away from elements and compares the string values. But it doesn’t restore the tags in the result. Example return distinct-values( let $d= doc(”univ.xml”) return $d/UNIV/STUDENT/MARK)

47 Quantifier Expressions
some $x in E1 satisfies E2 Evaluate the sequence E1. Let $x (any variable) be each item in the sequence, and evaluate E2. Return TRUE if E2 has EBV TRUE for at least one $x. Analogously: every $x in E1 satisfies E2

48 Aggregations Take sequence as argument, and return count, sum, max, etc. Example let $d := doc(“univ.xml”) for $s in $d/UNIV/STUDENT where count($s/MARK) > 100 return $s Lu Chaojun, SJTU

49 Branching Expressions
if (E1) then E2 else E3 is evaluated by: Compute the EBV of E1. If true, the result is E2; else the result is E3. Example eq ”007”) then $x/NAME else ()

50 Query Languages for XML
XSLT

51 XSLT XSLT (Extensible Stylesheet Language – Transformation) is another language to process XML documents. Originally intended as a presentation language: transform XML into an HTML page that could be displayed. It can also extract data from XML or transform XML -> XML, thus serving as a query language.

52 XSLT Programs Like XML Schema, an XSLT program is itself an XML document. Usually called a stylesheet. XSLT has a special namespace of tags, usually indicated by xsl:. <?xml version=“1.0” ?> <xsl:stylesheet xmlns:xsl = </xsl:stylesheet>

53 Templates A stylesheet has one or more templates.
A template describes a set of elements (of the document being processed) and what should be done with them. The form: <xsl:template match = path > … </xsl:template> Attribute match gives an XPath expression describing how to find the nodes to which the template applies.

54 Example Template matches only the root. <xsl:template match = ”/”> <HTML><BODY> <B>This is a document.</B> </BODY> </HTML> </xsl:template> Output of the template is a HTML page.

55 Obtaining Values from XML
The output usually depend on the data of the input XML. Use xsl:value-of to extract data. Example: <xsl:template match = “/UNIV/STUDENT”> <xsl:value-of select = /> <BR/> </xsl:template>

56 Recursive Use of Templates
An XSLT document usually contains many templates. Start by finding the first one that applies to the root. Any template can have within it <xsl:apply-templates/>, which causes the template-matching to apply recursively from the current node.

57 Apply-Templates Attribute select gives an XPath expression describing the subelements to which we apply templates. Example: <xsl:apply-templates select = ”A/B” /> says to follow all paths tagged A, B from the current node and apply all templates there.

58 Example: Apply-Templates
<xsl:template match = ”/”> <Students> <xsl:apply-templates /> </Students> </xsl:template> <xsl:template match = ”UNIV/STUDENT”> <Stu> <xsl:value-of select = “NAME”/> </Stu>

59 Iteration Loop within a template:
<xsl:for-each select = “XPath expression” > … </xsl:for-each> executes the body of the for-each at each child of the current node that is reached by the path.

60 Conditionals Branching <xsl:if test = “boolean expression” > …
executes the body if and only if the boolean expression is true. Lu Chaojun, SJTU

61 End


Download ppt "Query Languages for XML"

Similar presentations


Ads by Google