XPath By Laouina Marouane. Outline  Introduction  Data Model  Expression Patterns Patterns Location Paths Location Paths  Example  XPath 2.0  Practice.

Slides:



Advertisements
Similar presentations
Dr. Alexandra I. Cristea CS 253: Topics in Database Systems: XPath, NameSpaces.
Advertisements

XML: Extensible Markup Language
Dr. Alexandra I. Cristea XPath and Namespaces.
Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Spring Part III: Introduction to XPath XML Path Language.
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Managing Data Exchange: XPath
XPath XML Path Language. Outline XML Path Language (XPath) Data Model Description Node values XPath expressions Relative expressions Simple subset of.
XPath Eugenia Fernandez IUPUI. XML Path Language (XPath) a data model for representing an XML document as an abstract node tree a mechanism for addressing.
XML 6.6 XPath 6. What is XPath? XPath is a syntax used for selecting parts of an XML document The way XPath describes paths to elements is similar to.
2-Jun-15 XPath. 2 What is XPath? XPath is a syntax used for selecting parts of an XML document The way XPath describes paths to elements is similar to.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
XPath Carissa Mills Jill Kerschbaum. What is XPath? n A language designed to be used by both XSL Transformations (XSLT) and XPointer. n Provides common.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
XPath Tao Wan March 04, What is XPath? n A language designed to be used by XSL Transformations (XSLT), Xlink, Xpointer and XML Query. n Primary.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
SD2520 Databases using XML and JQuery
10/06/041 XSLT: crash course or Programming Language Design Principle XSLT-intro.ppt 10, Jun, 2004.
Navigating XML. Overview  Xpath is a non-xml syntax to be used with XSLT and Xpointer. Its purpose according to the W3.org is  to address parts of an.
CSE3201/CSE4500 XPath. 2 XPath A locator for elements or attributes in an XML document. XPath expression gives direction.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
1/17 ITApplications XML Module Session 7: Introduction to XPath.
Introduction to XPath Web Engineering, SS 2007 Tomáš Pitner.
XML DOCUMENTS & DATABASES. Summary of Introduction to XML HTML vs. XML HTML vs. XML Types of Data Types of Data Basics of XML Basics of XML XML Syntax,
CSE3201/CSE4500 Information Retrieval Systems
XP New Perspectives on XML Tutorial 6 1 TUTORIAL 6 XSLT Tutorial – Carey ISBN
XSLT and XPath, by Dr. Khalil1 XSL, XSLT and XPath Dr. Awad Khalil Computer Science Department AUC.
1 XPath XPath became a W3C Recommendation 16. November 1999 XPath is a language for finding information in an XML document XPath is used to navigate through.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XPath Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
XPath. Why XPath? Common syntax, semantics for [XSLT] [XPointer][XSLT] [XPointer] Used to address parts of an XML document Provides basic facilities for.
XSLT part of XSL (Extensible Stylesheet Language) –includes also XPath and XSL Formatting Objects used to transform an XML document into: –another XML.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 6. XML Path (XPath)
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
XPath Aug ’10 – Dec ‘10. XPath   XML Path Language   Technology that allows to select a part or parts of an XML document to process   XPath was.
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
More XML: semantics, DTDs, XPATH February 18, 2004.
1 XML Data Management XPath Principles Werner Nutt.
More XML XPATH, XSLT CS 431 – February 23, 2005 Carl Lagoze – Cornell University.
XP New Perspectives on XML, 2 nd Edition Tutorial 7 1 TUTORIAL 7 CREATING A COMPUTATIONAL STYLESHEET.
Session II Chapter 3 – Chapter 3 – XPath Patterns & Expressions Chapter 4 – XPath Functions Chapter 15 – XPath 2.0http://
University of Nottingham School of Computer Science & Information Technology Introduction to XML 2. XSLT Tim Brailsford.
IS432 Semi-Structured Data Lecture 4: XPath Dr. Gamal Al-Shorbagy.
CSE3201/CSE4500 XPath. 2 XPath A locator for items in XML document. XPath expression gives direction of navigation.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 XPath. 2 Agenda XPath Introduction XPath Nodes XPath Syntax XPath Operators XPath Q&A.
1 The XPath Language. 2 XPath Expressions Flexible notation for navigating around trees A basic technology that is widely used uniqueness and scope in.
CITA 330 Section 5 XPath. XSL XSL (Extensible Stylesheet Language) is the standard language for writing stylesheets to transform XML documents among different.
5 Copyright © 2004, Oracle. All rights reserved. Navigating XML Documents by Using XPath.
XPath.
Querying and Transforming XML Data
{ XML Technologies } BY: DR. M’HAMED MATAOUI
XML Path Language Andy Clark 17 Apr 2002.
XPath 9-May-19.
XPath 7-Dec-19.
Presentation transcript:

XPath By Laouina Marouane

Outline  Introduction  Data Model  Expression Patterns Patterns Location Paths Location Paths  Example  XPath 2.0  Practice  Conclusion

What is XPath?  A scheme for locating documents and identifying sub- structures within them.  A language designed to be used by both XSL Transformations (XSLT) and XPointer.  Provides common syntax and semantics for functionality shared between XSLT and XPointer.  Primary purpose: Address ‘parts’ of an XML document, and provide basic facilities for manipulation of strings, numbers and booleans.  W3C Recommendation. November 16, 1999  Latest version:

Why XPath?  Unique identifiers are not sufficient Assigning unique identifier to every element is a burden Assigning unique identifier to every element is a burden Identity of element may be unknown Identity of element may be unknown Identifiers cannot handle ranges of text Identifiers cannot handle ranges of text May be inconvenient to identify a large number of objects by listing their identifiers May be inconvenient to identify a large number of objects by listing their identifiers

Introduction XPath uses a compact, string-based, rather than XML element-based syntax. XPath uses a compact, string-based, rather than XML element-based syntax. Operates on the abstract, logical structure of an XML document (tree of nodes) rather than its surface syntax. Operates on the abstract, logical structure of an XML document (tree of nodes) rather than its surface syntax. Uses a path notation (like URLs) to navigate through this hierarchical tree structure, from which it got its name. Uses a path notation (like URLs) to navigate through this hierarchical tree structure, from which it got its name. A subset of it can be used for matching, i.e. testing whether or not a node matches a pattern. A subset of it can be used for matching, i.e. testing whether or not a node matches a pattern. Models an XML document as a tree of nodes of types: element, attribute, text. Models an XML document as a tree of nodes of types: element, attribute, text. Supports Namespaces. Supports Namespaces. Name of a node (a pair consisting of a local part and namespace URI). Name of a node (a pair consisting of a local part and namespace URI). Example of an XPath expression: /bib/book/publisher Example of an XPath expression: /bib/book/publisher

Data Model  Treats an XML document as a logical tree  This tree consists of 7 nodes:  Root Node – the root of the document not the document element  Element Nodes – one for each element in the document  Unique ID’s  Attribute Nodes  Namespace Nodes  Processing Instruction Nodes  Comment Nodes  Text Nodes  The tree structure is ordered and reads from top to bottom and left to right

Data Model bib book publisherauthor.. Addison-WesleySerge Abiteboul The root The root element Processing instruction Comment

Example For this simple doc: <doc> Some emphasis here. Some emphasis here. Some more stuff. Some more stuff. </doc> Might be represented as: root<doc> text text text text text texttext

Expressions  A text string to select an element, attribute, processing instructions, or text  The primary syntactic construct in XPath.  An expression is evaluated to yield an object, which has one of the following four basic types: 1. node-set (an unordered collection of nodes without duplicates) 2. boolean (true or false) 3. number (a floating-point number) 4. string (a sequence of UCS characters)

Element Context  Meaning of element can depend upon its context … … … …  Want to search for, e.g. title of book, not title of person XPath exploits sequential and hierarchical context of XML to specify elements by their context (i.e. location in hierarchy) XPath exploits sequential and hierarchical context of XML to specify elements by their context (i.e. location in hierarchy) titlebook/titleperson/titletitlebook/titleperson/title

Context  Expression evaluation occurs with respect to a context.  The context consists of: 1. a node (the context node) 2. a pair of non-zero positive integers (the context position and the context size) 3. a set of variable bindings 4. a function library 5. the set of namespace declarations in scope for the expression

More on context types  The context position is always less than or equal to the context size  The variable bindings consist of a mapping from variable names to variable values  The function library consists of a mapping from function names to functions. Each function takes zero or more arguments and returns a single result  The namespace declarations consist of a mapping from prefixes to namespace URIs

Patterns  A pattern is an expression used not to find objects, but to establish if a specific object matches certain criteria  Very important in XSLT specification  The ' | ' symbol is used to specify alternative patterns for matching note|warning|/book/intro note|warning|/book/intro

Location Paths One important kind of expression is a location path (special case of expr) One important kind of expression is a location path (special case of expr) The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path Location paths can recursively contain expressions that are used to filter sets of nodes Location paths can recursively contain expressions that are used to filter sets of nodes LocationPath (most important construct) describes a path from 1 point to another. LocationPath (most important construct) describes a path from 1 point to another. Analogy: Set of street directions. Analogy: Set of street directions. “Second store on the left after the third light” Two types of paths: Relative & Absolute Two types of paths: Relative & Absolute Composed of a series of steps (1 or more) and optional predicates Composed of a series of steps (1 or more) and optional predicates

Relative Paths  A relative location path consists of a sequence of one or more location steps separated by /  Each node in that set is used as a context node for the following step  E.g. para will select children of the current node that are of name ' para ' //Current node … … //Selected … //Not selected until note //Current node … … //Selected … //Not selected until note  Verbose expression is child::para

Absolute Paths  An absolute location path consists of / optionally followed by a relative location path  A / by itself selects the root node of the document containing the context node

Location Steps  A location step has three parts: 1. an axis, which specifies the tree relationship between the nodes selected by the location step and the context node, 2. a node test, which specifies the node type and expanded-name of the nodes selected by the location step, and 3. zero or more predicates, which use arbitrary expressions to further refine the set of nodes selected by the location step.

Location Steps parts explained Axes Axes  13 axes defined in XPath  Ancestor, ancestor-or-self  Attribute  Child  Descendant, descendant-or-self  Following  Preceding  Following-sibling, preceding-sibling  Namespace  Parent  Self Node test Node test  Identifies type of node. Evaluates to true/false  Can be a name or function to evaluate/verify type Predicate Predicate  XPath boolean expressions in square brackets following the basis(axis & node test)

Location Steps in syntax  The syntax for a location step is the axis name and node test separated by a double colon, followed by zero or more expressions each in square brackets.  For example, in child::para[position()=1], child is the name of the axis, para is the node test and [position()=1] is a predicate

Abbreviated Syntax  child:: can be omitted from a location step. (child is the default axis) div/para is equivalent to child::div/child::para  attribute:: can be abbreviated  // is short for /descendant-or-self::node()/  A location step of. is short for self::node() ex:.//para is short for self::node()/descendant-or-self::node()/child::para  Location step of.. is short for parent::node()

Wildcards  Sometimes don't or can't know names Can use wildcard ' * ' for any single element Can use wildcard ' * ' for any single element book/intro/title and book/chapter/title are matched by book/*/title (but so is book/appendix/title )book/intro/title and book/chapter/title are matched by book/*/title (but so is book/appendix/title ) Verbose child::* Verbose child::* Multiple asterisks can match several levels Multiple asterisks can match several levels But must know exact level and that inappropriate matches won't be madeBut must know exact level and that inappropriate matches won't be made

Descendants  Rather than use wildcard - Recursively search through descendants chapter//para will go through chapter hierarchy and select any para elements chapter//para will go through chapter hierarchy and select any para elements //Starting node … … //Selected … //Selected //Starting node … … //Selected … //Selected  child::chapter/descendant-or-self::node()/child::para

Ancestors  To signify parent of context element '.. ' '.. ' parent() parent()  To find all ' title ' elements that share parent of context node../title../title parent::node()/child::title parent::node()/child::title

Other Relationships  May move around siblings of current context element preceding-sibling:: preceding-sibling:: following-sibling:: following-sibling:: preceding-sibling:: following-sibling:: parent:: child::

Other Relationships (2)  Can access all ancestors and descendants of current context element ancestor:: ancestor:: descendant:: descendant::  These methods don't select siblings descendant:: ancestor::

Other Relationships (3)  Can access all ancestors and descendants of current context element ancestor-or-self:: ancestor-or-self:: descendant-or-self:: descendant-or-self::  These methods don't select siblings descendant-or-self:: ancestor-or-self::

Other Relationships (4)  Can access all preceding and following completed nodes of current context element preceding:: preceding:: following:: following::  Can access attributes attribute:: attribute:: following:: preceding:: attribute::

Predicate Filters  Location paths are indiscriminate May get a list of items that are selected May get a list of items that are selected  Predicate filter is used to filter the list Filter is held between ' [ ] ' Filter is held between ' [ ] '  Simplest is position() function predicate exon[position() = 1]//1st exon exon[position() = 1]//1st exon intron[2]//2nd intron intron[2]//2nd intron  Can combine tests with ' and ' and ' or '

Position Tests  The last() operation Locates the last sibling in list Locates the last sibling in list  The count() operation Evaluates the number of items in list Evaluates the number of items in list child::transcript[count(child::intron) = 1] child::transcript[count(child::intron) = 1]  The id() operation Checks the identifier of the element Checks the identifier of the element child::transcript[id("ENS0001")] child::transcript[id("ENS0001")]

Attribute Tests  Attributes can be selected  Elements can be selected dependant upon attribute value

Functions Functions in XPath: text() = matches the text value text() = matches the text value node() = matches any node (= * or text()) node() = matches any node (= * or text()) name() = returns the name of the current tag name() = returns the name of the current tag

Booleans  A boolean can only have two values: true or false  The following expressions can be evaluated: or or and and =, != =, != =, > =, >

Example  Operations perform boolean tests on conditions exon[not(position() = 1)] exon[not(position() = 1)] transcript[not(exon)] transcript[not(exon)] intron[position != last()] intron[position != last()] exon[position > 2] exon[position > 2] exon[position >= 3] exon[position >= 3] exon[position() = 1 or last()] exon[position() = 1 or last()]

Numbers  A number represents a floating-point number  The numeric operators convert their operands to numbers  Operators include: +, -, *, div, mod +, -, *, div, mod Since XML allows - in names, the - operator typically needs to be preceded by whitespace Since XML allows - in names, the - operator typically needs to be preceded by whitespace Example: 5 mod 2 returns 1 Example: 5 mod 2 returns 1

Strings  Strings consist of a sequence of zero or more character  A character is defined in the XML Recommendation

Example  Strings can be tested for characters and substrings hello there hello there note[contains(text(), "hello")]note[contains(text(), "hello")] hello there hello there note[contains(., "hello")]note[contains(., "hello")] The '. ' is current node, and will go through all children The '. ' is current node, and will go through all children

Example (2)  starts-with(string, pattern) note[starts-with(., "hello")] note[starts-with(., "hello")]  string(exp) note[contains(string(2))] note[contains(string(2))]  string-after(string, terminator)  string-before(string, terminator)  substring(string, offset, length)

Example (3)  normalize(string) Removes trailing and leading whitespace Removes trailing and leading whitespace  translate(string, source, replace) translate(., ";+", ",") translate(., ";+", ",")  concat(strings)  string-length(string)

Core Function Library  XPath defines a core set of functions and operators  All implementations of Xpath must implement the core function library  Node Set Functions list/item[position() mod2 = 1] selects all odd number element of a list id)(“foo”)/child::para[position()=5] selects the 5 th para child of the element with the unique ID foo  String Functions substring(“12345”, 0, 3) returns “12”  Boolean Functions boolean true() returns “true”  Number Functions number sum(node-set) returns the sum of the nodes

Example for XPath Queries Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998 Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998

Example summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a a price attribute price attribute in book, in bib matches…

XPath 2.0  Latest version:  W3C Working Draft 22 August 2003  Any expression that is syntactically valid and executes successfully in both XPath 2.0 and XQuery 1.0 will return the same result in both languages

XPath 2.0 (2)  XPath 2.0 is a much more powerful language that operates on a much larger domain of data types  A better way of describing XPath 2.0 is as an expression language for processing sequences, with built-in support for querying XML documents  driving forces behind XPath 2.0 include not only the XPath 2.0 Requirements document but also many of the XML Query language requirements.  XPath 2.0 is a strict syntactic subset of XQuery 1.0

XPath 2.0 (3)  XPath 2.0 introduces support for the XML Schema primitive types, which immediately gives the user access to 19 simple types, including dates, years, months, URIs, etc.  In addition, a number of functions and operators are provided for processing and constructing these different data types

XPath 2.0 (4)  Everything is a sequence  sequences are ordered  In XPath 1.0, if you wanted to process a collection of nodes, you had to deal with node- sets.  In XPath 2.0, the concept of the node-set has been generalized and extended.  sequences may contain simple-typed values as well as nodes  “for” expression enables iteration over sequences

XPath 2.0 (5)  sum(for $x in /order/item return $x/price * $x/quantity)  Conditional expression:  if ($widget1/unit-cost < $widget2/unit-cost)  then $widget1  else $widget2  Quantifiers:  some $x in /students/student/name satisfies $x = "Fred“  every $x in /students/student/name satisfies $x = "Fred"

XPath 2.0 (6)  Intersections, differences, unions:  The except operator to select all of a given node-set, except for certain nodes  the intersect operator  $x intersect /foo/bar

Some Practice  Try XPath Visualizer.  You can download it from: mber.zip mber.zip  It can help you with:  Learning and playing with XPath expressions.  Composing and visually verifying the exact XPath expression when designing an XSLT stylesheet.  Obtaining the quantitative characteristics of an xml document, counts, sums, arithmetical and relational results, strings, substrings, etc.

Conclusion XPath provides a concise and intuitive way to address into XML documents XPath provides a concise and intuitive way to address into XML documents Standard part of the XSLT and XPointer specifications Standard part of the XSLT and XPointer specifications Implementing XPath basically requires learning the abbreviated syntax of location path expressions and the functions of the core library Implementing XPath basically requires learning the abbreviated syntax of location path expressions and the functions of the core library

References    ult.asp ult.asp ult.asp  h2.html h2.html h2.html  XML in a Nutshell