Download presentation
Presentation is loading. Please wait.
1
XPath By Laouina Marouane
2
Outline Introduction Data Model Expression Patterns Patterns Location Paths Location Paths Example XPath 2.0 Practice Conclusion
3
What is XPath? A scheme for locating documents and identifying sub- structures within them. A language designed to be used by both XSL Transformations (XSLT) and XPointer. Provides common syntax and semantics for functionality shared between XSLT and XPointer. Primary purpose: Address ‘parts’ of an XML document, and provide basic facilities for manipulation of strings, numbers and booleans. W3C Recommendation. November 16, 1999 Latest version: http://www.w3.org/TR/xpath
4
Why XPath? Unique identifiers are not sufficient Assigning unique identifier to every element is a burden Assigning unique identifier to every element is a burden Identity of element may be unknown Identity of element may be unknown Identifiers cannot handle ranges of text Identifiers cannot handle ranges of text May be inconvenient to identify a large number of objects by listing their identifiers May be inconvenient to identify a large number of objects by listing their identifiers
5
Introduction XPath uses a compact, string-based, rather than XML element-based syntax. XPath uses a compact, string-based, rather than XML element-based syntax. Operates on the abstract, logical structure of an XML document (tree of nodes) rather than its surface syntax. Operates on the abstract, logical structure of an XML document (tree of nodes) rather than its surface syntax. Uses a path notation (like URLs) to navigate through this hierarchical tree structure, from which it got its name. Uses a path notation (like URLs) to navigate through this hierarchical tree structure, from which it got its name. A subset of it can be used for matching, i.e. testing whether or not a node matches a pattern. A subset of it can be used for matching, i.e. testing whether or not a node matches a pattern. Models an XML document as a tree of nodes of types: element, attribute, text. Models an XML document as a tree of nodes of types: element, attribute, text. Supports Namespaces. Supports Namespaces. Name of a node (a pair consisting of a local part and namespace URI). Name of a node (a pair consisting of a local part and namespace URI). Example of an XPath expression: /bib/book/publisher Example of an XPath expression: /bib/book/publisher
6
Data Model Treats an XML document as a logical tree This tree consists of 7 nodes: Root Node – the root of the document not the document element Element Nodes – one for each element in the document Unique ID’s Attribute Nodes Namespace Nodes Processing Instruction Nodes Comment Nodes Text Nodes The tree structure is ordered and reads from top to bottom and left to right
7
Data Model bib book publisherauthor.. Addison-WesleySerge Abiteboul The root The root element Processing instruction Comment
8
Example For this simple doc: <doc> Some emphasis here. Some emphasis here. Some more stuff. Some more stuff. </doc> Might be represented as: root<doc> text text text text text texttext
9
Expressions A text string to select an element, attribute, processing instructions, or text The primary syntactic construct in XPath. An expression is evaluated to yield an object, which has one of the following four basic types: 1. node-set (an unordered collection of nodes without duplicates) 2. boolean (true or false) 3. number (a floating-point number) 4. string (a sequence of UCS characters)
10
Element Context Meaning of element can depend upon its context … … … … Want to search for, e.g. title of book, not title of person XPath exploits sequential and hierarchical context of XML to specify elements by their context (i.e. location in hierarchy) XPath exploits sequential and hierarchical context of XML to specify elements by their context (i.e. location in hierarchy) titlebook/titleperson/titletitlebook/titleperson/title
11
Context Expression evaluation occurs with respect to a context. The context consists of: 1. a node (the context node) 2. a pair of non-zero positive integers (the context position and the context size) 3. a set of variable bindings 4. a function library 5. the set of namespace declarations in scope for the expression
12
More on context types The context position is always less than or equal to the context size The variable bindings consist of a mapping from variable names to variable values The function library consists of a mapping from function names to functions. Each function takes zero or more arguments and returns a single result The namespace declarations consist of a mapping from prefixes to namespace URIs
13
Patterns A pattern is an expression used not to find objects, but to establish if a specific object matches certain criteria Very important in XSLT specification The ' | ' symbol is used to specify alternative patterns for matching note|warning|/book/intro note|warning|/book/intro
14
Location Paths One important kind of expression is a location path (special case of expr) One important kind of expression is a location path (special case of expr) The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path Location paths can recursively contain expressions that are used to filter sets of nodes Location paths can recursively contain expressions that are used to filter sets of nodes LocationPath (most important construct) describes a path from 1 point to another. LocationPath (most important construct) describes a path from 1 point to another. Analogy: Set of street directions. Analogy: Set of street directions. “Second store on the left after the third light” Two types of paths: Relative & Absolute Two types of paths: Relative & Absolute Composed of a series of steps (1 or more) and optional predicates Composed of a series of steps (1 or more) and optional predicates
15
Relative Paths A relative location path consists of a sequence of one or more location steps separated by / Each node in that set is used as a context node for the following step E.g. para will select children of the current node that are of name ' para ' //Current node … … //Selected … //Not selected until note //Current node … … //Selected … //Not selected until note Verbose expression is child::para
16
Absolute Paths An absolute location path consists of / optionally followed by a relative location path A / by itself selects the root node of the document containing the context node
17
Location Steps A location step has three parts: 1. an axis, which specifies the tree relationship between the nodes selected by the location step and the context node, 2. a node test, which specifies the node type and expanded-name of the nodes selected by the location step, and 3. zero or more predicates, which use arbitrary expressions to further refine the set of nodes selected by the location step.
18
Location Steps parts explained Axes Axes 13 axes defined in XPath Ancestor, ancestor-or-self Attribute Child Descendant, descendant-or-self Following Preceding Following-sibling, preceding-sibling Namespace Parent Self Node test Node test Identifies type of node. Evaluates to true/false Can be a name or function to evaluate/verify type Predicate Predicate XPath boolean expressions in square brackets following the basis(axis & node test)
19
Location Steps in syntax The syntax for a location step is the axis name and node test separated by a double colon, followed by zero or more expressions each in square brackets. For example, in child::para[position()=1], child is the name of the axis, para is the node test and [position()=1] is a predicate
20
Abbreviated Syntax child:: can be omitted from a location step. (child is the default axis) div/para is equivalent to child::div/child::para attribute:: can be abbreviated to @ // is short for /descendant-or-self::node()/ A location step of. is short for self::node() ex:.//para is short for self::node()/descendant-or-self::node()/child::para Location step of.. is short for parent::node()
21
Wildcards Sometimes don't or can't know names Can use wildcard ' * ' for any single element Can use wildcard ' * ' for any single element book/intro/title and book/chapter/title are matched by book/*/title (but so is book/appendix/title )book/intro/title and book/chapter/title are matched by book/*/title (but so is book/appendix/title ) Verbose child::* Verbose child::* Multiple asterisks can match several levels Multiple asterisks can match several levels But must know exact level and that inappropriate matches won't be madeBut must know exact level and that inappropriate matches won't be made
22
Descendants Rather than use wildcard - Recursively search through descendants chapter//para will go through chapter hierarchy and select any para elements chapter//para will go through chapter hierarchy and select any para elements //Starting node … … //Selected … //Selected //Starting node … … //Selected … //Selected child::chapter/descendant-or-self::node()/child::para
23
Ancestors To signify parent of context element '.. ' '.. ' parent() parent() To find all ' title ' elements that share parent of context node../title../title parent::node()/child::title parent::node()/child::title
24
Other Relationships May move around siblings of current context element preceding-sibling:: preceding-sibling:: following-sibling:: following-sibling:: preceding-sibling:: following-sibling:: parent:: child::
25
Other Relationships (2) Can access all ancestors and descendants of current context element ancestor:: ancestor:: descendant:: descendant:: These methods don't select siblings descendant:: ancestor::
26
Other Relationships (3) Can access all ancestors and descendants of current context element ancestor-or-self:: ancestor-or-self:: descendant-or-self:: descendant-or-self:: These methods don't select siblings descendant-or-self:: ancestor-or-self::
27
Other Relationships (4) Can access all preceding and following completed nodes of current context element preceding:: preceding:: following:: following:: Can access attributes attribute:: attribute:: following:: preceding:: attribute::
28
Predicate Filters Location paths are indiscriminate May get a list of items that are selected May get a list of items that are selected Predicate filter is used to filter the list Filter is held between ' [ ] ' Filter is held between ' [ ] ' Simplest is position() function predicate exon[position() = 1]//1st exon exon[position() = 1]//1st exon intron[2]//2nd intron intron[2]//2nd intron Can combine tests with ' and ' and ' or '
29
Position Tests The last() operation Locates the last sibling in list Locates the last sibling in list The count() operation Evaluates the number of items in list Evaluates the number of items in list child::transcript[count(child::intron) = 1] child::transcript[count(child::intron) = 1] The id() operation Checks the identifier of the element Checks the identifier of the element child::transcript[id("ENS0001")] child::transcript[id("ENS0001")]
30
Attribute Tests Attributes can be selected feature/@type feature/@type Elements can be selected dependant upon attribute value feature[@type="exon"] feature[@type="exon"]
31
Functions Functions in XPath: text() = matches the text value text() = matches the text value node() = matches any node (= * or @* or text()) node() = matches any node (= * or @* or text()) name() = returns the name of the current tag name() = returns the name of the current tag
32
Booleans A boolean can only have two values: true or false The following expressions can be evaluated: or or and and =, != =, != =, > =, >
33
Example Operations perform boolean tests on conditions exon[not(position() = 1)] exon[not(position() = 1)] transcript[not(exon)] transcript[not(exon)] intron[position != last()] intron[position != last()] exon[position > 2] exon[position > 2] exon[position >= 3] exon[position >= 3] exon[position() = 1 or last()] exon[position() = 1 or last()]
34
Numbers A number represents a floating-point number The numeric operators convert their operands to numbers Operators include: +, -, *, div, mod +, -, *, div, mod Since XML allows - in names, the - operator typically needs to be preceded by whitespace Since XML allows - in names, the - operator typically needs to be preceded by whitespace Example: 5 mod 2 returns 1 Example: 5 mod 2 returns 1
35
Strings Strings consist of a sequence of zero or more character A character is defined in the XML Recommendation
36
Example Strings can be tested for characters and substrings hello there hello there note[contains(text(), "hello")]note[contains(text(), "hello")] hello there hello there note[contains(., "hello")]note[contains(., "hello")] The '. ' is current node, and will go through all children The '. ' is current node, and will go through all children
37
Example (2) starts-with(string, pattern) note[starts-with(., "hello")] note[starts-with(., "hello")] string(exp) note[contains(string(2))] note[contains(string(2))] string-after(string, terminator) string-before(string, terminator) substring(string, offset, length)
38
Example (3) normalize(string) Removes trailing and leading whitespace Removes trailing and leading whitespace translate(string, source, replace) translate(., ";+", ",") translate(., ";+", ",") concat(strings) string-length(string)
39
Core Function Library XPath defines a core set of functions and operators All implementations of Xpath must implement the core function library Node Set Functions list/item[position() mod2 = 1] selects all odd number element of a list id)(“foo”)/child::para[position()=5] selects the 5 th para child of the element with the unique ID foo String Functions substring(“12345”, 0, 3) returns “12” Boolean Functions boolean true() returns “true” Number Functions number sum(node-set) returns the sum of the nodes
40
Example for XPath Queries Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998 Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998
41
Example summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a book @pricematches a price attribute bib/book/@pricematches price attribute in book, in bib bib/book/[@price<“55”]/author/lastname matches…
42
XPath 2.0 Latest version: http://www.w3.org/TR/xpath20/ http://www.w3.org/TR/xpath20/ http://www.w3.org/TR/xpath20/ W3C Working Draft 22 August 2003 Any expression that is syntactically valid and executes successfully in both XPath 2.0 and XQuery 1.0 will return the same result in both languages
43
XPath 2.0 (2) XPath 2.0 is a much more powerful language that operates on a much larger domain of data types A better way of describing XPath 2.0 is as an expression language for processing sequences, with built-in support for querying XML documents driving forces behind XPath 2.0 include not only the XPath 2.0 Requirements document but also many of the XML Query language requirements. XPath 2.0 is a strict syntactic subset of XQuery 1.0
44
XPath 2.0 (3) XPath 2.0 introduces support for the XML Schema primitive types, which immediately gives the user access to 19 simple types, including dates, years, months, URIs, etc. In addition, a number of functions and operators are provided for processing and constructing these different data types
45
XPath 2.0 (4) Everything is a sequence sequences are ordered In XPath 1.0, if you wanted to process a collection of nodes, you had to deal with node- sets. In XPath 2.0, the concept of the node-set has been generalized and extended. sequences may contain simple-typed values as well as nodes “for” expression enables iteration over sequences
46
XPath 2.0 (5) sum(for $x in /order/item return $x/price * $x/quantity) Conditional expression: if ($widget1/unit-cost < $widget2/unit-cost) then $widget1 else $widget2 Quantifiers: some $x in /students/student/name satisfies $x = "Fred“ every $x in /students/student/name satisfies $x = "Fred"
47
XPath 2.0 (6) Intersections, differences, unions: The except operator to select all of a given node-set, except for certain nodes @* except @exc:foo the intersect operator $x intersect /foo/bar
48
Some Practice Try XPath Visualizer. You can download it from: http://www.vbxml.com/downloads/files/xpathvisualisersepte mber.zip http://www.vbxml.com/downloads/files/xpathvisualisersepte mber.zip It can help you with: Learning and playing with XPath expressions. Composing and visually verifying the exact XPath expression when designing an XSLT stylesheet. Obtaining the quantitative characteristics of an xml document, counts, sums, arithmetical and relational results, strings, substrings, etc.
49
Conclusion XPath provides a concise and intuitive way to address into XML documents XPath provides a concise and intuitive way to address into XML documents Standard part of the XSLT and XPointer specifications Standard part of the XSLT and XPointer specifications Implementing XPath basically requires learning the abbreviated syntax of location path expressions and the functions of the core library Implementing XPath basically requires learning the abbreviated syntax of location path expressions and the functions of the core library
50
References http://www.w3.org/TR/xpath http://www.w3.org/TR/xpath http://www.w3.org/TR/xpath20/ http://www.w3.org/TR/xpath20/ http://www.vbxml.com/xpathvisualizer/defa ult.asp http://www.vbxml.com/xpathvisualizer/defa ult.asp http://www.vbxml.com/xpathvisualizer/defa ult.asp http://www.xml.com/pub/a/2002/03/20/xpat h2.html http://www.xml.com/pub/a/2002/03/20/xpat h2.html http://www.xml.com/pub/a/2002/03/20/xpat h2.html XML in a Nutshell
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.