1 XML Data Management XPath Principles Werner Nutt.

Slides:



Advertisements
Similar presentations
XML Data Management 8. XQuery Werner Nutt. Requirements for an XML Query Language David Maier, W3C XML Query Requirements: Closedness: output must be.
Advertisements

Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Covering Indexes for XML Queries by Prakash Ramanan
Managing Data Exchange: XPath
XPath XML Path Language. Outline XML Path Language (XPath) Data Model Description Node values XPath expressions Relative expressions Simple subset of.
Transforming XML Part I Document Navigation with XPath John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel:
The learning site: /xpath_syntax.asp xsl/xsl/slides.html.
XPath Eugenia Fernandez IUPUI. XML Path Language (XPath) a data model for representing an XML document as an abstract node tree a mechanism for addressing.
XML 6.6 XPath 6. What is XPath? XPath is a syntax used for selecting parts of an XML document The way XPath describes paths to elements is similar to.
2-Jun-15 XPath. 2 What is XPath? XPath is a syntax used for selecting parts of an XML document The way XPath describes paths to elements is similar to.
Lecture 13. The various node tests also work on this axis: eg node() This book has descendant-or- self nodes As expected, text nodes are included in the.
Lecture 13. The various node tests also work on this axis: eg node() This book has descendant-or- self nodes As expected, text nodes are included in the.
XPath s/xmljava/chapters/ch16.html.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
XPath Carissa Mills Jill Kerschbaum. What is XPath? n A language designed to be used by both XSL Transformations (XSLT) and XPointer. n Provides common.
Lecture 12. Default Processing in XSLT The default processing in XSLT is to process the XPath root node The default processing for various node types.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
XPath Tao Wan March 04, What is XPath? n A language designed to be used by XSL Transformations (XSLT), Xlink, Xpointer and XML Query. n Primary.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
XP ATH - XML Path Language. W HAT IS XP ATH ? XPath, the XML Path Language, is a query language for selecting nodes from an XML document.query languagenodesXML.
SD2520 Databases using XML and JQuery
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 3 XPath (Based on Møller and Schwartzbach,
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Navigating XML. Overview  Xpath is a non-xml syntax to be used with XSLT and Xpointer. Its purpose according to the W3.org is  to address parts of an.
CSE3201/CSE4500 XPath. 2 XPath A locator for elements or attributes in an XML document. XPath expression gives direction.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
1/17 ITApplications XML Module Session 7: Introduction to XPath.
Introduction to XPath Web Engineering, SS 2007 Tomáš Pitner.
XML DOCUMENTS & DATABASES. Summary of Introduction to XML HTML vs. XML HTML vs. XML Types of Data Types of Data Basics of XML Basics of XML XML Syntax,
CSE3201/CSE4500 Information Retrieval Systems
XSLT and XPath, by Dr. Khalil1 XSL, XSLT and XPath Dr. Awad Khalil Computer Science Department AUC.
1 XPath XPath became a W3C Recommendation 16. November 1999 XPath is a language for finding information in an XML document XPath is used to navigate through.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XQuery.
XMLI Structure of XML Data Structure of XML Data XML Document Schema XML Document Schema XPATH XPATH.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Lecture 22 XML querying. 2 Example 31.5 – XQuery FLWOR Expressions ‘=’ operator is a general comparison operator. XQuery also defines value comparison.
XPath Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
XPath. Why XPath? Common syntax, semantics for [XSLT] [XPointer][XSLT] [XPointer] Used to address parts of an XML document Provides basic facilities for.
XSLT part of XSL (Extensible Stylesheet Language) –includes also XPath and XSL Formatting Objects used to transform an XML document into: –another XML.
More on XSLT 07. More on XSLT © Aptech Limited XPath  In this first lesson, XPath, you will learn to:  Define and describe XPath.  Identify nodes according.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 6. XML Path (XPath)
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Query Data Model Lecturer.
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
Database Systems Part VII: XML Querying Software School of Hunan University
XPath Aug ’10 – Dec ‘10. XPath   XML Path Language   Technology that allows to select a part or parts of an XML document to process   XPath was.
WPI, MOHAMED ELTABAKH PROCESSING AND QUERYING XML 1.
XPath Presented by Kushan Athukorala. 2 Agenda XPath XPath Terminology Selecting Nodes Predicates.
1 XML Data Management Extracting Data from XML: XPath Werner Nutt based on slides by Sara Cohen, Jerusalem.
XML Data Management XPath Exercises: Right or Wrong? Werner Nutt.
Session II Chapter 3 – Chapter 3 – XPath Patterns & Expressions Chapter 4 – XPath Functions Chapter 15 – XPath 2.0http://
University of Nottingham School of Computer Science & Information Technology Introduction to XML 2. XSLT Tim Brailsford.
CSE3201/CSE4500 XPath. 2 XPath A locator for items in XML document. XPath expression gives direction of navigation.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
1 XPath. 2 Agenda XPath Introduction XPath Nodes XPath Syntax XPath Operators XPath Q&A.
XML Data Management XPath Exercises: Right or Wrong? Werner Nutt.
1 The XPath Language. 2 XPath Expressions Flexible notation for navigating around trees A basic technology that is widely used uniqueness and scope in.
CITA 330 Section 5 XPath. XSL XSL (Extensible Stylesheet Language) is the standard language for writing stylesheets to transform XML documents among different.
Displaying Data with XSLT ©NIITeXtensible Markup Language/Lesson 6/Slide 1 of 45 Objectives In this lesson, you will learn to: * Perform conditional formatting.
5 Copyright © 2004, Oracle. All rights reserved. Navigating XML Documents by Using XPath.
Querying and Transforming XML Data
{ XML Technologies } BY: DR. M’HAMED MATAOUI
XML Path Language Andy Clark 17 Apr 2002.
XQuery Leonidas Fegaras.
XML DOCUMENTS & DATABASES
Presentation transcript:

1 XML Data Management XPath Principles Werner Nutt

XPath Expressions and the XPath Document Model XPath expressions are evaluated over documents XPath operates on an abstract document structure Documents are trees with several types of nodes, such as –element nodes –attribute nodes –text nodes There are other node types (namespaces, comments, etc.), which we ignore in this lecture

The Recipes Example (DTD) <!ATTLIST ingredient name CDATA #REQUIRED amount CDATA #IMPLIED unit CDATA #IMPLIED> <!ATTLIST nutrition calories CDATA #REQUIRED fat CDATA #REQUIRED>

Zuppa Inglese Warm up the milk in a sauce pan. In a large bowl beat the egg yolks with the sugar. Refrigerate for at least 4 hours. A Recipe Document

Document Nodes Are Ordered Document order Element e1 is “before” e2 if the opening tag of e1 occurs before the opening tag of e2 Element e is “before” its attributes (order on attributes is implementation dependent, but most often attributes are ordered according to their a occurrence) If e1 is “before” e2, then all attributes of e1 are “before” all attributes of e2 Reverse document order is document order backwards

Expressions There are two kinds of expressions, returning either a set of nodes (= “node sets”), or a value (i.e., number, string, boolean) Mechanism: specify node sets compute values from node sets use values in conditions that further constrain a node set

The Basic Mechanism for Specifying Node Sets: Location Steps A location step goes in some direction (i.e., along some “axis”) leads to a node with a certain property Properties can be specified by node tests and zero or more predicates. Node tests test for node types: e.g., element (“ * ”), text ( “ text() “ ), element or attribute names Syntax: :: [ ]...[ ]

Location Steps: Examples descendant::* all descendant elements following-sibling::ingredient all “ingredient” siblings following in document order following::text() all following text all attributes

Location Steps: Examples all “amount” attributes all descendant elements with name “ingredient” where the attribute amount has the value 1.5 descendant::ingredient[position()=2] the second descendant element with name “ingredient” descendant::*[self::ingredient][2] same as above

Steps Can be Combined to Paths A path has a starting point, which can be the root of the document tree: "/“ a “current node” Example: /descendant::recipe[1] “ / “ has two meanings: “the root” at the beginning of a path step concatenation

Semantics of Steps and Paths (1) A step leads from a node (the “context node”) to a set of nodes (which may be empty) For any node n, and axis α, there is the set of nodes reachable from n via α, denoted as R α (n) (The definition of reachable nodes for an axis α is more or less as one would expect. More later on.)

Semantics of Steps and Paths (2) Consider a step S = α :: [ ] If S starts from a node n, then it returns the set S(n) of all nodes in R α (n) that satisfy –the test and –the predicate The set S(n) is the context for each node in the set.

A path P is a sequence of steps P = S 1 /S 2 /…/S n A path defines a set of nodes P(n) as follows: If P consists of a single step S, then P(n) = S(n) If P is a combined path P = P 0 /S, then P(n) = Union of all S(n 0 ) where n 0 in P 0 (n) The context of a result node is determined by the last step Semantics of Steps and Paths (3)

childthe children of the context node descendantall descendants (children, children’s children,...) parentthe parent (empty if at the root) ancestor all ancestors from the parent to the root selfthe context node itself following-siblingsiblings to the right preceding-siblingsiblings to the left Axes in XPath

followingall following nodes in the document, excluding descendants precedingall preceding nodes in the document, excluding ancestors attributethe attributes of the context node namespacenamespace declarations in the context node descendant-or-selfthe union of descendant and self ancestor-or-selfthe union of ancestor and self Axes in XPath (cntd.)

What should be the meaning of /descendant::recipe[last()] /preceding::recipe[2] ? Forward axes: child, descendant, following-sibling, following Reverse axes: ancestor, preceding-sibling, preceding By default, the ordering of a node set is document order. If a node set has been obtained by a step along a reverse axis, it is in reverse document order. Ordering of Axes

Testing by node type: text()character data nodes comment()comment nodes processing-instruction() processing instruction nodes node()all nodes (not including attributes and namespace declarations) Testing by node name (for elements and attributes): recipenodes with the name “recipe” *any element (*) or attribute node Node Tests

What is the meaning of /descendant::text() ? /descendant::* ? /descendant::node() ? ? ? Node Tests (Exercises)

There are shorthands for moving along the descendant and the child axis // means/descendant-or-self::node()/ /recipes means/child::recipes /* means/child::* /node() means/child::node(). meansself::node().. meansparent::* Shorthands

What is the difference between //* and //. ? What is the result of –//*[2] ? –//self::*[2] ? –//*[2]/self::* ? //* means /descendant-or-self::node()/child::* This is different from /descendant::* ! Why? Shorthands: Exercises

Predicates are expressions of type boolean, although they do not always look like that... A predicate filters a node-set by evaluating the predicate expression on each node in the set with –that node as the context node, –the size of the node-set as the context size, and –the position of the node in the node-set wrt. the axis ordering as the context position. Predicates can be combined with and, or, and not() Expressions that are not boolean are cast to boolean Predicates

Casting of node sets to boolean –true, if the set is nonempty –false otherwise Example: means: ingredients having a unit attribute Casting of Node Sets (1)

Casting of nodes to string Every text node has a string as its content Every element node has a string content: –the strings occurring in the node and its descendants, –concatenated in document order Every attribute node has a string value (which may be empty) Casting of Node Sets (2)

Casting of nodes to number Interpret the string value as a number If not possible, value is NaN (= "not a number") Casting of node sets to string, number, or boolean Take the value of the first node wrt to document order Casting of Node Sets (3)

XPath has explicit casting functions for the value types boolean, string, and number Essentially, they work as one would expect. –Note: an integer in a predicate is interpreted as referring to the position of the context node Examples: –string(true) = "true“, string(0) = "0“ –number(false) = 0, number(true) = 1, number("123") = 123, number("sugar") = NaN Casting Between Values

–boolean(0) = false, boolean(2) = true, boolean(NaN) = false –boolean("") = false, boolean("false") = true What is the meaning of /descendant::recipe/title["Ricotta Pie"] ? And what about /descendant::recipe/title[self::*="Ricotta Pie"] ? Casting Between Values (cntd.)

Things become more complicated if a node set is involved in an equality: = means: “The contains some node that has the value after casting.” Similarly for the case = Equalities

Examples “Recipes where sugar is one of the ingredients” = "sugar"] “Recipes with some ingredient other than sugar” != "sugar"] “Recipes where sugar is the only ingredient” /descendant::recipe[ not != "sugar")]

What is the meaning of /descendant-or-self::* [descendant-or-self::node() = "Zuppa Inglese"] ? What of /descendant-or-self::* [descendant-or-self::* = "Zuppa Inglese"] ? And what about /descendant-or-self::node() [descendant-or-self::node() = "Zuppa Inglese"] ? Exercise

XPath has built-in functions returning numbers, e.g. position()the position of the context node in the current node set last()the number of elements in the current node set With numbers and numeric functions one can build up arithmetic expressions 2, 2 * 2, last() div 2, last() -1, last() - position() Arithmetic Expressions

A predicate that contains only a numeric expression, e.g., [last() -1], is a shorthand for a position predicate, e.g., [position() = last() -1]. Otherwise, numbers are cast to boolean. Exercise: What is the meaning of //*[2 and ingredient] ? Arithmetic Expressions

The aggregation functions count and sum are applied to node sets and return numbers. –The count result is always a number. –The sum result is only a number if every node in the argument set can be cast as a number. Aggregation functions in a predicate refer to the current node set. Functions min, max, and avg do not exist in XPath (but in XQuery) Aggregation Functions

What is the meaning of sum(/descendant::ingredient and ? //recipe[count(ingredient) > 5] ? //recipe[count(.//ingredient) > 5] ? Examples

An expression can be: a constant, e.g. "..." a function call: function(arguments) a boolean expression: or, and, =, !=,, = (standard precedence, all left associative) a numerical expression: +, -, *, div, mod a node-set expression: using location paths and “ | ” (set union) XPath Expressions: Summary

Expressions have a type: node-set (set of nodes), boolean (true or false), number (floating point), or string (text) Coercion/casting may occur at function arguments and when expressions are used as predicates. Functions are evaluated using the context. XPath Expressions: Summary

Node-set functions: –last() returns the context size –position() returns the context position –count(node-set)number of nodes in node-set –name(node-set)string representation of first node in node-set –id(ID)returns element with id ID String functions: –string(value)type cast to string –concat(string, string,...)string concatenation Core function library (1)

Boolean functions: –boolean(value)type cast to boolean –not(boolean) boolean negation –contains(string, substring)substring test –starts-with(string, prefix)prefix test Number functions: –number(value) type cast to number –sum(node-set) sum of number value of each node in node-set Core function library (2)

Write XPath queries that ask for the following over the Recipes document: –The titles of all recipes, returned as strings. –The titles of all recipes that use olive oil. –The titles of all recipes that do not use olive oil. –The amount of sugar needed for Zuppa Inglese. –The recipes that have an ingredient in common with Zuppa Inglese. Exercises

–The number of recipes in the document. –The last step in preparing Zuppa Inglese. –The average fat content per recipe. –The recipes with less than average fat content. –The titles of recipes that have no compound ingredients. –The titles of recipes where all top level ingredients are compound. –The titles of recipes that have only non-compound ingredients. Exercises (cntd.)

The Family DTD <!DOCTYPE family [ <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED> ]>

Family Exercise Return the children of Marge. Return the names of the children of Marge. Return the father of the children of Marge.