XPath By Laouina Marouane. Outline  Introduction  Data Model  Expression Patterns Patterns Location Paths Location Paths  Example  XPath 2.0  Practice.

XPath By Laouina Marouane

Outline  Introduction  Data Model  Expression Patterns Patterns Location Paths Location Paths  Example  XPath 2.0  Practice  Conclusion

What is XPath?  A scheme for locating documents and identifying sub- structures within them.  A language designed to be used by both XSL Transformations (XSLT) and XPointer.  Provides common syntax and semantics for functionality shared between XSLT and XPointer.  Primary purpose: Address ‘parts’ of an XML document, and provide basic facilities for manipulation of strings, numbers and booleans.  W3C Recommendation. November 16, 1999  Latest version: http://www.w3.org/TR/xpath

Why XPath?  Unique identifiers are not sufficient Assigning unique identifier to every element is a burden Assigning unique identifier to every element is a burden Identity of element may be unknown Identity of element may be unknown Identifiers cannot handle ranges of text Identifiers cannot handle ranges of text May be inconvenient to identify a large number of objects by listing their identifiers May be inconvenient to identify a large number of objects by listing their identifiers

Introduction XPath uses a compact, string-based, rather than XML element-based syntax. XPath uses a compact, string-based, rather than XML element-based syntax. Operates on the abstract, logical structure of an XML document (tree of nodes) rather than its surface syntax. Operates on the abstract, logical structure of an XML document (tree of nodes) rather than its surface syntax. Uses a path notation (like URLs) to navigate through this hierarchical tree structure, from which it got its name. Uses a path notation (like URLs) to navigate through this hierarchical tree structure, from which it got its name. A subset of it can be used for matching, i.e. testing whether or not a node matches a pattern. A subset of it can be used for matching, i.e. testing whether or not a node matches a pattern. Models an XML document as a tree of nodes of types: element, attribute, text. Models an XML document as a tree of nodes of types: element, attribute, text. Supports Namespaces. Supports Namespaces. Name of a node (a pair consisting of a local part and namespace URI). Name of a node (a pair consisting of a local part and namespace URI). Example of an XPath expression: /bib/book/publisher Example of an XPath expression: /bib/book/publisher

Data Model  Treats an XML document as a logical tree  This tree consists of 7 nodes:  Root Node – the root of the document not the document element  Element Nodes – one for each element in the document  Unique ID’s  Attribute Nodes  Namespace Nodes  Processing Instruction Nodes  Comment Nodes  Text Nodes  The tree structure is ordered and reads from top to bottom and left to right

Data Model bib book publisherauthor.. Addison-WesleySerge Abiteboul The root The root element Processing instruction Comment

Example For this simple doc: <doc> Some emphasis here. Some emphasis here. Some more stuff. Some more stuff. </doc> Might be represented as: root<doc> text text text text text texttext

Expressions  A text string to select an element, attribute, processing instructions, or text  The primary syntactic construct in XPath.  An expression is evaluated to yield an object, which has one of the following four basic types: 1. node-set (an unordered collection of nodes without duplicates) 2. boolean (true or false) 3. number (a floating-point number) 4. string (a sequence of UCS characters)

Element Context  Meaning of element can depend upon its context … … … …  Want to search for, e.g. title of book, not title of person XPath exploits sequential and hierarchical context of XML to specify elements by their context (i.e. location in hierarchy) XPath exploits sequential and hierarchical context of XML to specify elements by their context (i.e. location in hierarchy) titlebook/titleperson/titletitlebook/titleperson/title

Context  Expression evaluation occurs with respect to a context.  The context consists of: 1. a node (the context node) 2. a pair of non-zero positive integers (the context position and the context size) 3. a set of variable bindings 4. a function library 5. the set of namespace declarations in scope for the expression

More on context types  The context position is always less than or equal to the context size  The variable bindings consist of a mapping from variable names to variable values  The function library consists of a mapping from function names to functions. Each function takes zero or more arguments and returns a single result  The namespace declarations consist of a mapping from prefixes to namespace URIs

Patterns  A pattern is an expression used not to find objects, but to establish if a specific object matches certain criteria  Very important in XSLT specification  The ' | ' symbol is used to specify alternative patterns for matching note|warning|/book/intro note|warning|/book/intro

Location Paths One important kind of expression is a location path (special case of expr) One important kind of expression is a location path (special case of expr) The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path Location paths can recursively contain expressions that are used to filter sets of nodes Location paths can recursively contain expressions that are used to filter sets of nodes LocationPath (most important construct) describes a path from 1 point to another. LocationPath (most important construct) describes a path from 1 point to another. Analogy: Set of street directions. Analogy: Set of street directions. “Second store on the left after the third light” Two types of paths: Relative & Absolute Two types of paths: Relative & Absolute Composed of a series of steps (1 or more) and optional predicates Composed of a series of steps (1 or more) and optional predicates

Relative Paths  A relative location path consists of a sequence of one or more location steps separated by /  Each node in that set is used as a context node for the following step  E.g. para will select children of the current node that are of name ' para ' //Current node … … //Selected … //Not selected until note //Current node … … //Selected … //Not selected until note  Verbose expression is child::para

Absolute Paths  An absolute location path consists of / optionally followed by a relative location path  A / by itself selects the root node of the document containing the context node

Location Steps  A location step has three parts: 1. an axis, which specifies the tree relationship between the nodes selected by the location step and the context node, 2. a node test, which specifies the node type and expanded-name of the nodes selected by the location step, and 3. zero or more predicates, which use arbitrary expressions to further refine the set of nodes selected by the location step.

Location Steps parts explained Axes Axes  13 axes defined in XPath  Ancestor, ancestor-or-self  Attribute  Child  Descendant, descendant-or-self  Following  Preceding  Following-sibling, preceding-sibling  Namespace  Parent  Self Node test Node test  Identifies type of node. Evaluates to true/false  Can be a name or function to evaluate/verify type Predicate Predicate  XPath boolean expressions in square brackets following the basis(axis & node test)

Location Steps in syntax  The syntax for a location step is the axis name and node test separated by a double colon, followed by zero or more expressions each in square brackets.  For example, in child::para[position()=1], child is the name of the axis, para is the node test and [position()=1] is a predicate

Abbreviated Syntax  child:: can be omitted from a location step. (child is the default axis) div/para is equivalent to child::div/child::para  attribute:: can be abbreviated to @  // is short for /descendant-or-self::node()/  A location step of. is short for self::node() ex:.//para is short for self::node()/descendant-or-self::node()/child::para  Location step of.. is short for parent::node()

Wildcards  Sometimes don't or can't know names Can use wildcard ' * ' for any single element Can use wildcard ' * ' for any single element book/intro/title and book/chapter/title are matched by book/*/title (but so is book/appendix/title )book/intro/title and book/chapter/title are matched by book/*/title (but so is book/appendix/title ) Verbose child::* Verbose child::* Multiple asterisks can match several levels Multiple asterisks can match several levels But must know exact level and that inappropriate matches won't be madeBut must know exact level and that inappropriate matches won't be made

Descendants  Rather than use wildcard - Recursively search through descendants chapter//para will go through chapter hierarchy and select any para elements chapter//para will go through chapter hierarchy and select any para elements //Starting node … … //Selected … //Selected //Starting node … … //Selected … //Selected  child::chapter/descendant-or-self::node()/child::para

Ancestors  To signify parent of context element '.. ' '.. ' parent() parent()  To find all ' title ' elements that share parent of context node../title../title parent::node()/child::title parent::node()/child::title

Other Relationships  May move around siblings of current context element preceding-sibling:: preceding-sibling:: following-sibling:: following-sibling:: preceding-sibling:: following-sibling:: parent:: child::

Other Relationships (2)  Can access all ancestors and descendants of current context element ancestor:: ancestor:: descendant:: descendant::  These methods don't select siblings descendant:: ancestor::

Other Relationships (3)  Can access all ancestors and descendants of current context element ancestor-or-self:: ancestor-or-self:: descendant-or-self:: descendant-or-self::  These methods don't select siblings descendant-or-self:: ancestor-or-self::

Other Relationships (4)  Can access all preceding and following completed nodes of current context element preceding:: preceding:: following:: following::  Can access attributes attribute:: attribute:: following:: preceding:: attribute::

Predicate Filters  Location paths are indiscriminate May get a list of items that are selected May get a list of items that are selected  Predicate filter is used to filter the list Filter is held between ' [ ] ' Filter is held between ' [ ] '  Simplest is position() function predicate exon[position() = 1]//1st exon exon[position() = 1]//1st exon intron[2]//2nd intron intron[2]//2nd intron  Can combine tests with ' and ' and ' or '

Position Tests  The last() operation Locates the last sibling in list Locates the last sibling in list  The count() operation Evaluates the number of items in list Evaluates the number of items in list child::transcript[count(child::intron) = 1] child::transcript[count(child::intron) = 1]  The id() operation Checks the identifier of the element Checks the identifier of the element child::transcript[id("ENS0001")] child::transcript[id("ENS0001")]

Attribute Tests  Attributes can be selected feature/@type feature/@type  Elements can be selected dependant upon attribute value feature[@type="exon"] feature[@type="exon"]

Functions Functions in XPath: text() = matches the text value text() = matches the text value node() = matches any node (= * or @* or text()) node() = matches any node (= * or @* or text()) name() = returns the name of the current tag name() = returns the name of the current tag

Booleans  A boolean can only have two values: true or false  The following expressions can be evaluated: or or and and =, != =, != =, > =, >

Example  Operations perform boolean tests on conditions exon[not(position() = 1)] exon[not(position() = 1)] transcript[not(exon)] transcript[not(exon)] intron[position != last()] intron[position != last()] exon[position > 2] exon[position > 2] exon[position >= 3] exon[position >= 3] exon[position() = 1 or last()] exon[position() = 1 or last()]

Numbers  A number represents a floating-point number  The numeric operators convert their operands to numbers  Operators include: +, -, *, div, mod +, -, *, div, mod Since XML allows - in names, the - operator typically needs to be preceded by whitespace Since XML allows - in names, the - operator typically needs to be preceded by whitespace Example: 5 mod 2 returns 1 Example: 5 mod 2 returns 1

Strings  Strings consist of a sequence of zero or more character  A character is defined in the XML Recommendation

Example  Strings can be tested for characters and substrings hello there hello there note[contains(text(), "hello")]note[contains(text(), "hello")] hello there hello there note[contains(., "hello")]note[contains(., "hello")] The '. ' is current node, and will go through all children The '. ' is current node, and will go through all children

Example (2)  starts-with(string, pattern) note[starts-with(., "hello")] note[starts-with(., "hello")]  string(exp) note[contains(string(2))] note[contains(string(2))]  string-after(string, terminator)  string-before(string, terminator)  substring(string, offset, length)

Example (3)  normalize(string) Removes trailing and leading whitespace Removes trailing and leading whitespace  translate(string, source, replace) translate(., ";+", ",") translate(., ";+", ",")  concat(strings)  string-length(string)

Core Function Library  XPath defines a core set of functions and operators  All implementations of Xpath must implement the core function library  Node Set Functions list/item[position() mod2 = 1] selects all odd number element of a list id)(“foo”)/child::para[position()=5] selects the 5 th para child of the element with the unique ID foo  String Functions substring(“12345”, 0, 3) returns “12”  Boolean Functions boolean true() returns “true”  Number Functions number sum(node-set) returns the sum of the nodes

Example for XPath Queries Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998 Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998

Example summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a book @pricematches a price attribute bib/book/@pricematches price attribute in book, in bib bib/book/[@price<“55”]/author/lastname matches…

XPath 2.0  Latest version: http://www.w3.org/TR/xpath20/ http://www.w3.org/TR/xpath20/ http://www.w3.org/TR/xpath20/  W3C Working Draft 22 August 2003  Any expression that is syntactically valid and executes successfully in both XPath 2.0 and XQuery 1.0 will return the same result in both languages

XPath 2.0 (2)  XPath 2.0 is a much more powerful language that operates on a much larger domain of data types  A better way of describing XPath 2.0 is as an expression language for processing sequences, with built-in support for querying XML documents  driving forces behind XPath 2.0 include not only the XPath 2.0 Requirements document but also many of the XML Query language requirements.  XPath 2.0 is a strict syntactic subset of XQuery 1.0

XPath 2.0 (3)  XPath 2.0 introduces support for the XML Schema primitive types, which immediately gives the user access to 19 simple types, including dates, years, months, URIs, etc.  In addition, a number of functions and operators are provided for processing and constructing these different data types

XPath 2.0 (4)  Everything is a sequence  sequences are ordered  In XPath 1.0, if you wanted to process a collection of nodes, you had to deal with node- sets.  In XPath 2.0, the concept of the node-set has been generalized and extended.  sequences may contain simple-typed values as well as nodes  “for” expression enables iteration over sequences

XPath 2.0 (5)  sum(for $x in /order/item return $x/price * $x/quantity)  Conditional expression:  if ($widget1/unit-cost < $widget2/unit-cost)  then $widget1  else $widget2  Quantifiers:  some $x in /students/student/name satisfies $x = "Fred“  every $x in /students/student/name satisfies $x = "Fred"

XPath 2.0 (6)  Intersections, differences, unions:  The except operator to select all of a given node-set, except for certain nodes  @* except @exc:foo  the intersect operator  $x intersect /foo/bar

Some Practice  Try XPath Visualizer.  You can download it from: http://www.vbxml.com/downloads/files/xpathvisualisersepte mber.zip http://www.vbxml.com/downloads/files/xpathvisualisersepte mber.zip  It can help you with:  Learning and playing with XPath expressions.  Composing and visually verifying the exact XPath expression when designing an XSLT stylesheet.  Obtaining the quantitative characteristics of an xml document, counts, sums, arithmetical and relational results, strings, substrings, etc.

Conclusion XPath provides a concise and intuitive way to address into XML documents XPath provides a concise and intuitive way to address into XML documents Standard part of the XSLT and XPointer specifications Standard part of the XSLT and XPointer specifications Implementing XPath basically requires learning the abbreviated syntax of location path expressions and the functions of the core library Implementing XPath basically requires learning the abbreviated syntax of location path expressions and the functions of the core library

References  http://www.w3.org/TR/xpath http://www.w3.org/TR/xpath  http://www.w3.org/TR/xpath20/ http://www.w3.org/TR/xpath20/  http://www.vbxml.com/xpathvisualizer/defa ult.asp http://www.vbxml.com/xpathvisualizer/defa ult.asp http://www.vbxml.com/xpathvisualizer/defa ult.asp  http://www.xml.com/pub/a/2002/03/20/xpat h2.html http://www.xml.com/pub/a/2002/03/20/xpat h2.html http://www.xml.com/pub/a/2002/03/20/xpat h2.html  XML in a Nutshell

XPath By Laouina Marouane. Outline  Introduction  Data Model  Expression Patterns Patterns Location Paths Location Paths  Example  XPath 2.0  Practice.

Similar presentations

Presentation on theme: "XPath By Laouina Marouane. Outline  Introduction  Data Model  Expression Patterns Patterns Location Paths Location Paths  Example  XPath 2.0  Practice."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

XPath By Laouina Marouane. Outline  Introduction  Data Model  Expression Patterns Patterns Location Paths Location Paths  Example  XPath 2.0  Practice.

Similar presentations

Presentation on theme: "XPath By Laouina Marouane. Outline  Introduction  Data Model  Expression Patterns Patterns Location Paths Location Paths  Example  XPath 2.0  Practice."— Presentation transcript:

Similar presentations

About project

Feedback