Download presentation
Presentation is loading. Please wait.
Published byAllan Cain Modified over 9 years ago
1
August 20061 Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology Radford University
2
August 20062 XPath XML is often compared to a database because of the way it structures information for easy retrieval. With a little knowledge of the markup language, you can locate and extract any piece of information. XPath is used to locate a specific data element from a known location within a document and it can be used to provide contextual details to a processing program. For example, one could specify that items in a list should use a particular kind of bullet specified in a metadata section at the beginning of the document. Every XML document can be represented graphically as a tree structure. Because there is only one possible tree configuration for any given document, there is a unique path from the root to any other point. XPath describes how to climb the tree in a series of steps.
3
August 20063 XML Trees Each step in a path touches a branching or terminal point in the tree called a node (remember these are the elements). A terminal node is called a leaf (no descendants). There are seven different kinds of nodes: - root The root of the document is a special kind of node. It's not an element, but rather it contains the document element. It also contains any comments or processing instructions that surround the document element. - element Elements and the root node share a special property among nodes: they alone can contain other nodes. An element node can contain other elements. In a tree, this is the point where branches meet. Empty elements are leaf nodes.
4
August 20064 XML Trees (cont.) - attribute XPath treats attributes as separate nodes from their element hosts. This allows you to select the element as a whole or the only the attribute in that element using the same path syntax. An attribute is like an element that contains only text. - text A region of uninterrupted text is treated as a leaf node. It is always the child of an element. An element may have more than one text node cheld, however, if it is broken up by elements or other node types. Keep in mind, if you process text in an element: you may have to check more than one node. - comment An XML comment is considered a valid node. - processing instruction Like comments, a processing instruction can appear anywhere in the document under the root.
5
August 20065 XML Trees (cont.) Namespace A namespace is actually a region of the document not just the possession of a single element. All the descendants of that element will be affected. XML processors must pay special attention to namespaces, so XPath makes it a unique node type. The DTD is not included in the list of nodes. XPath maintains the structure and content of a document so that document could be reconstructed almost exactly. (The order of the attributes might change, but since order of attributes is not significant in XML, a semantically equivalent document can be produced. In XML any node in the tree can be thought of as a tree in its own right (subtree). Trees facilitate recursive programming. XSLT is elegant because a rule treats every element as a tree.
6
August 20066 Finding Nodes XPath uses chains of steps called a location path to define the location of elements in an XML tree. A location step has three parts: - an axis that describes the direction to travel, - a node test that specifies what kinds of nodes' - and a set of optional predicates that use Boolean tests to reduce the candidates. The axis is a keyword that specifies a direction you can travel from any node. You can go up through ancestors, down through descendants, or linearly through siblings.
7
August 20067 Node Axes Axis TypesMatches Ancestor All nodes above the context node, including the parent, grandparent, and so on up to the root node. Ancestor-or -Self The ancestor node plus the context node. Attribute Attributes of the context node. Child Children of the context node. Descendent Children of the context node, plus their children, and so on down to the leaves of the subtree. Descendant- or-self The descendant node plus the context node.
8
August 20068 Node Axes (cont.) Axis TypesMatches Following Nodes that follow the context node at any level in the document. Does not include any descendants but does include following siblings and their descendants. Following-sibling Nodes that follow the context node at the same level. Namespace All the namespace nodes of an element. Parent The parent of the context node. Preceding Nodes that occur before the context node at any level. Does not include descendents but does include preceding siblings and their descendants Preceding-sibling Nodes that occur before the context node at the same level. Self The context node.
9
August 20069 Node Axes After the axis comes a node test parameter, joined to the axis by a double colon (::). A name can be used in place of an explicit node type, in which case the node type is inferred from the axis. For the attribute axis - attribute, namespace axis - namespace, all other axes the node is assumed to be an element. In the absence of a node axis specifier, the axis is assumed to be child and the node is assumed to be of type element.
10
August 200610 Node tests TermMatches /The root node: the node containing the root element and any comments or processing instructions that precede it node()matches any node, attribute::node() - would select all the attributes of the context node *attribute axis - any attribute, namespace axis - any namespace others - any element crabcakeattribute axis - the attribute named crabcake of the context node, (same for namespace & elements) text()any text node processing- instruction() any processing instruction processing- instruction('for- web') any processing instruction with target for-web comment()any comment node
11
August 200611 Location Paths Location path steps are chained together using the slash (/) character. Each step gets you a little closer to the node you want to locate. It's like giving directions. For example, to get from the root node to a para element inside a section inside a chapter inside a book, a path might look like this: book/chapter/section/para XPath defines some handy shortcuts. @role matches an attribute named role. the context node. /* matches the document element. Any path that starts with a / is an absolute path, the next step is * which matches any element parent::*/following-sibling::para matches all paras that follow the parent of the context node.
12
August 200612 Location Paths (cont.).. matches the parent node. The double dot is shorthand for parent::node().//para matches any element of type para that is a descendent of the current node. The // is shorthand for /descendant-or-self::node() //para matches any para descending from the root node. It matches all paras anywhere in the document. A location path starting with a // is assumed to begin at the root.../* matches all sibling elements (and the context node if it is an element)
13
August 200613 Examples Look at the document example in 6-1 which is a sample XML document.example in 6-1 Here's some location path examples /quotelist/child::node() matches all the quotation elements plus the XML comment /quotelist/quotation matches all the quotation elements /*/* matches all the quotation elements //comment()/following-sibling::*/@style matches the style attribute of the last quotation element id('q2')/.. matches the document element
14
August 200614 Predicates If the axis and node type aren't sufficient to narrow down the selection, you can use one or more predicates (Boolean expressions). Every node that passes this test (in addition to the node test and axis specifier) is included in the final node set. Nodes that fail the test are not. Examples //quotation[@id="q3"]/text text element in the third quotation element. /*/*[position()=last()] last quotation element. The position() function equals the position of the most recent step among eligible candidates. The function last() is equal to the total number of candidates (in this case 5) //quotation[@style='silly' or @style='wise'] The first, third, and fourth quotation elements. The or keyword is a Boolean op.
15
August 200615 Boolean Expressions XPath contains a full set of comparison operators to compare strings or numbers. There are also node set expressions. The expression evaluates to a set of nodes. This is a set in the strict mathematical sense, meaning that it contains no duplicates. The same node can be added many times, but the set will always contain only one copy of it. Node Set functions count (node set) returns the no. of nodes generate-id(node set) string containing a unique identifier for the first node in node set, or for the context node if the argument is left out. This string is generated by the processor and guaranteed to be unique for each node. last() the number of the node in the context node set
16
August 200616 Node Set Functions (cont.) local-name(node set) name of the first node in node set, without the namespace prefix. name (node set) name of the first node in node set including the namespace prefix namespace-url(node set) the URI of the namespace for the first node in node set, without an argument it returns the namespace URI for the context node. position() the number of the context node in the context node set. There are also functions that create node sets, pulling together nodes from all over the document.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.