Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,

Similar presentations


Presentation on theme: "XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,"— Presentation transcript:

1 XML Query languages--XPath

2 Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns, which represent special XPath expressions. 2

3 XML Query Languages XPath – core query language. Very limited, a glorified selection operator. Very useful, though: used in XML Schema, XSLT, XQuery, many other XML standards XQuery – W3C standard. Very powerful, fairly intuitive, SQL-style – will discuss later 3

4 Why Query XML? Need to extract parts of XML documents Need to transform documents into different forms Need to relate – join – parts of the same or different documents 4

5 XPath W3C standard –Versions 1.0, 2.0, 3.0 XPath views an XML document as a tree –Root of the tree is a new node, which doesn’t correspond to anything in the document –Internal nodes are elements –Leaves are either Attributes Text nodes 5

6 Example document Italian G Lau 2005 30.00 Harry Potter Rowling Loyed 2005 29.99 6

7 Tree Representation root bookstore book titleauthor year price @category @lang book titleauthor year @category @lang price =“cooking” =“en” “G. Lau” “2005” “30” =“children” =“en” “Italian” “Rowling” “Harry Potter” “Loyed” “2005” “29.99” comment2 7

8 Relationships between nodes Parent-Child Element child has order (1 st, 2 nd, …) Attribute child has no order Ancestor-Descendant Sibling (nodes having the same parent) 8

9 XPath Basics An XPath expression takes a document tree as input and returns a multi-set of nodes of the tree (substrees rooted at the nodes) doc(“name.xml”) corresponds to the root doc(“name.xml”)/* returns all children of the root. Here / represent the child-axis * represents “any element” 9

10 A few more examples doc("Books.xml")/bookstore/book -- all book elements under bookstore doc("Books.xml")/bookstore/book[1] -- the first book child of bookstore element doc("Books.xml")//book[3]/author[2] -- the 2 nd author of the 3 rd book element 10

11 More examples doc("Books.xml")//* -- all descendants of the root doc("Books.xml")//data(@lang) -- values of all @lang attributes doc("Books.xml")//book[price<30] -- all books of price less than 30 doc("Books.xml")//book[price] -- all book elements that has a price child 11

12 More examples doc("Books.xml")//book[title/@lang='en'] -- all books that have a title and under the title there is a lang attribute that has the value ‘en’ doc("Books.xml")//title[@lang] --all title descendants of the root, that have a lang attribute doc("Books.xml")//book[//@lang][price]/title Titles of all books that have a lang descendant and a price child 12

13 More examples doc("Books.xml")//book[//author | price]/title --title of books that have an author descendant or a price child doc("Books.xml")//book[//author and price<30]/title doc("Books.xml")//book[//author][price<‘30’]/title --title of books that have an author descendant and price less than 30 doc("Books.xml")//title/.. --parent elements of all title elements 13

14 Axes An XPath expression navigates along the tree along various axes: Childattribute Descendant self parent descendant-or-self ancestor ancestor-or-self following-sibling preceding-sibling following preceding 14

15 Use of axes /axis name:: nodeSelector go along the named axis to find the matching nodes. Examples: doc("Books.xml")//@lang/parent::* doc("Books.xml")//author[3]/following-sibling::* Abbreviations: /child::  / /descendant::  // /parent:: .. 15

16 Functions XPath defines some useful functions that can be used in an expression, e.g., name() Node() data() text() contains() Substring() Refer to http://www.w3.org/TR/xpath/#corelibhttp://www.w3.org/TR/xpath/#corelib 16

17 Examples Compare: doc("Books.xml")//book/* doc("Books.xml")//book/node() doc("Books.xml")//book/text() doc("Books.xml")//book/name() doc("Books.xml")//book/title/* doc("Books.xml")//book/title/node() doc("Books.xml")//book/title/text() 17

18 Tree Pattern Queries (TPQ) Tree patterns represent a special subclass of XPath expressions –No function or value comparisons –No disjunction (i.e., no ‘|’) –Attributes and elements that have text children are treated the same way (no distinction between them) –Only child and descendant axes –No ordering of sibling in XML tree /bookstore/book[//title//lang][price]/author 18

19 Definition of TPQ A TPQ is a tree P such that –Every node v is labelled with either a symbol in label set Σ or the wildcard *. –Every edge is labelled / or // –There is a selection node sn(P) –Path from root to sn(P) is called the selection path represents an XPath expression /book[title]//author[//tel][name] book author title telname 19

20 Evaluating TP P over XML tree t Returns a set, P(t), of subtrees of t. Each subtree in P(t) is generated by an embedding h of P in t, which is a mapping from P to t which is –Root-preserving: root(P) --> root(t) –Label-preserving: label(v) = label(h(v)) or label(v) =* –Structure-preserving: /-edge --> edge //-edge -- > path The subtree of t rooted at h(sn(P)) is an answer in P(t). 20

21 Evaluating TP P over XML tree t - example 21 author book title authors autho r name contact tel fax email book author title telname t P tel

22 Twig Patterns Twig Patterns are similar to TPQs, with the exception that (1) when finding an embedding, there is no need for “root-preserving”; (2) the output is a set of nodes corresponding to the set of nodes in the twig pattern. 22

23 Algorithms for efficient evaluation of TPQs TPQs and twig patterns lie in the centre of XPath. Many algorithms have been developed for evaluating such queries over the last 10 years. –See optional readings Nearly all algorithms utilize some encoding (a.k.a. labeling) of the XML tree that enable constant time checking of ancestor— descendant or parent-child relationships. 23

24 Encoding of XML trees Dewey code 24

25 Encoding of XML trees-continued Region-code: –each node is encoded with a 3-tuple (start, end, level) –node A is an ancestor of node B if and only if A.start <B.start and B.end <A.end –node A is the parent of node B if and only if it is an ancestor of B and A.level =B.level-1 25

26 Dynamic encoding When XML documents are updated, the code for each node may need to change Can be expensive for large documents with frequent update operations Dynamic encoding aims to reduce the number of nodes whose encoding need to be changed –Search in Google “dynamic labelling of XML” and you should see many publications 26


Download ppt "XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,"

Similar presentations


Ads by Google