1 XML Query Languages Yanlei Diao University of Massachusetts Amherst Slide content courtesy of Ramakrishnan & Gehrke, Donald Kossmann, and Gerome Miklau.

Slides:



Advertisements
Similar presentations
XML Data Management 8. XQuery Werner Nutt. Requirements for an XML Query Language David Maier, W3C XML Query Requirements: Closedness: output must be.
Advertisements

XML: Extensible Markup Language
Spring Part III: Introduction to XPath XML Path Language.
Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
XPath Eugenia Fernandez IUPUI. XML Path Language (XPath) a data model for representing an XML document as an abstract node tree a mechanism for addressing.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
1 Part 3: Query Languages Managing XML and Semistructured Data.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Friday, September 4 th, 2009 The Systems Group at ETH Zurich XML and Databases Exercise Session 6 courtesy of Ghislain Fourny/ETH © Department of Computer.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
IS432: Semi-Structured Data Dr. Azeddine Chikh. 7. XQuery.
QSX (LN 3)1 Query Languages for XML XPath XQuery XSLT (not being covered today!) (Slides courtesy Wenfei Fan, Univ Edinburgh and Bell Labs)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 357 Database Systems I Query Languages for XML.
1 COS 425: Database and Information Management Systems XML and information exchange.
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.
1 Lecture 11: Xpath/XQuery Friday, October 20, 2006.
XPath Carissa Mills Jill Kerschbaum. What is XPath? n A language designed to be used by both XSL Transformations (XSLT) and XPointer. n Provides common.
XPath Tao Wan March 04, What is XPath? n A language designed to be used by XSL Transformations (XSLT), Xlink, Xpointer and XML Query. n Primary.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Xpath to XQuery February 23rd, Other Stuff HW 3 is out. Instructions for Phase 3 are out. Today: finish Xpath, start and finish Xquery. From Wednesday:
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
Processing of structured documents Spring 2003, Part 8 Helena Ahonen-Myka.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
Introduction to XQuery Resources: Official URL: Short intros:
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XQuery.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Lecture 22 XML querying. 2 Example 31.5 – XQuery FLWOR Expressions ‘=’ operator is a general comparison operator. XQuery also defines value comparison.
Introduction to XQuery Bun Yue Professor, CS/CIS UHCL.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
XPath. Why XPath? Common syntax, semantics for [XSLT] [XPointer][XSLT] [XPointer] Used to address parts of an XML document Provides basic facilities for.
XSLT part of XSL (Extensible Stylesheet Language) –includes also XPath and XSL Formatting Objects used to transform an XML document into: –another XML.
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
More XML: semantics, DTDs, XPATH February 18, 2004.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
1 XML Data Management XPath Principles Werner Nutt.
1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
IS432 Semi-Structured Data Lecture 4: XPath Dr. Gamal Al-Shorbagy.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting 2.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.
1 The XPath Language. 2 XPath Expressions Flexible notation for navigating around trees A basic technology that is widely used uniqueness and scope in.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
5 Copyright © 2004, Oracle. All rights reserved. Navigating XML Documents by Using XPath.
SDPL 2005Notes 7: XQuery1 7 Querying XML n How to access different sources (DBs, docs) as XML? n XQuery, W3C XML Query Language –"work in progress", (last.
XML path expressions CSE 350 Fall 2003.
Querying and Transforming XML Data
Lecture 11: Xpath/XQuery
Lecture 12: XML, XPath, XQuery
XQuery Leonidas Fegaras.
Lecture 15: Querying XML Friday, October 27, 2000.
Lecture 11: XML and Semistructured Data
Presentation transcript:

1 XML Query Languages Yanlei Diao University of Massachusetts Amherst Slide content courtesy of Ramakrishnan & Gehrke, Donald Kossmann, and Gerome Miklau

2 Querying XML How do you query a directed graph? a tree?  The standard approach used by many XML, semistructured-data, and object query languages: Define some sort of a template describing traversals from the root of the directed graph  In XML, the basis of this template is called an XPath.  The complete query language is XQuery.

3 Querying XML Data  XPath = simple navigation through the tree  XQuery = the SQL of XML Selecting data  pattern matching on structural & path properties  typical selection conditions Constructing output, or transforming data  constructing new elements  restructuring  ordering  XSLT = recursive traversal will not discuss in class

4 XML Path Language (XPath) 1.0  XPath 1.0: navigates through hierarchical XML document structure applies selection conditions retrieves nodes  Input: a tree of nodes Types of node: Document, Element, Attribute, Text…  Output: an object of one of the four types Node set, Boolean, Number, String  An ordered language It can query in order-aware fashion. It always returns nodes in order.

5 Sample Data for Queries Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998

6 Data Model for XPath bib book publisherauthor.. Addison-WesleySerge Abiteboul Document Node Root Element Node Text Node Price = 50 Attribute Node Ordered Tree: - ordered among elements - unordered among attributes

7 XPath /bib/book/year /bib/paper/year //author /bib//first-name //author/* /bib/book/author[firstname] /bib/book/author[firstname][address[.//zip][city]]/lastname child operator “/” descendant operator “//” wildcard operator “*” attribute operator nested path more nested paths and ‘.’ operator

8 XPath: Simple Expressions Result: Result: empty (there were no papers) /bib/book/year /bib/paper/year

9 XPath: Restricted Kleene Closure Result: Serge Abiteboul Rick Hull Victor Vianu Jeffrey D. Ullman Result: Rick //author /bib//first-name

10 Xpath: Text Nodes Result: Serge Abiteboul Victor Vianu Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: – text() = matches the text value – node() = matches any node (= *, or text()) – name() = returns the name of the current tag /bib/book/author/text()

11 Xpath: Wildcard Result: Rick Hull * Matches any element //author/*

12 Xpath: Attribute Nodes Result: means that price has to be an attribute

13 Xpath: Predicates Result: Rick Hull /bib/book/author[firstname]

14 Xpath: More Predicates Result: … … /bib/book/author[firstname][address[.//zip][city]]/lastname

15 XPath: Summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper | book matches a paper or a matches a price attribute price attribute in book, in bib matches…

16 Axes: More Complex Traversals  Thus far, we’ve seen XPath expressions that go down the tree  But we might want to go up, left, right, etc.  These are expressed with so-called axes. For details see XQuery 1.0…

17 XQuery 1.0  A programming language that can retrieve fragments from XML transform XML data to XML data in arbitrary ways  Declarative querying + functional programming  Currently being widely used for querying XML databases transforming XML messages in Web services routing XML messages in message brokering creating logical views in data integration…

18 Query Language and Data Model  A query language is “closed” w.r.t. its data model if input and output of a query conform to the model  SQL Set of tuples in, set of tuples out  XPath 1.0 A tree of nodes (well-formed XML) in, an object (node set, boolean, number, or string) out  XQuery 1.0 Sequence of items in, sequence of items out  Compositionality of a language Output of Query 1 can be used as input to Query 2

19 XQuery Data Model  Instance of the data model: a sequence composed of zero or more items Empty sequence often considered as the “null value”  Item: node or atomic value  Node: document | element | attribute | text | namespace | processing instruction | comment  Atomic value: an indivisible value, e.g., string, boolean, ID, IDREF, decimal, QName, URI,...

20 Atomic values  The values of the 19 atomic types available in XML Schema ( ) E.g., xs:integer, xs:boolean, xs:date  All the user defined derived atomic types E.g., myNS:ShoeSize derived from xs:integer  Untyped atomic values (non schema validated)  Atomic values carry both type and value (8, myNS:ShoeSize) is not the same as (8, xs:integer)

21 XML nodes  Types of nodes: document | element | attribute | text | namespaces | PI | comment  Every node has a unique node identifier In an instance of data model, every node is unique  Nodes have children and an optional parent conceptual “tree”  Nodes are ordered based of the topological order in the tree (“document order”)  Nodes can be compared directly by (i) identity, (ii) document order, or (iii) value after conversion from node to value

22 Sequences  Can be heterogeneous (nodes and atomic values) (, 3)  Can contain duplicates (by value and by identity) (1, 1, 1)  Are not necessarily ordered in document order  Nested sequences are automatically flattened ( 1, 2, (3, 4) ) = (1, 2, 3, 4)  Single items and singleton sequences are the same 1 = (1)

23 XQuery Expressions XQuery Expr := Constants | Variable | FunctionCalls | ComparisonExpr | ArithmeticExpr | LogicExpr | PathExpr | FLWRExpr | ConditionalExpr | QuantifiedExpr | TypeSwitchExpr | InstanceofExpr | CastExpr | UnionExpr | IntersectExceptExpr | ConstructorExpr | ValidateExpr Expressions can be nested with full generality ! Functional programming heritage.

24 Path Expressions  A path expression consists of a sequence of steps  A step contains (axis node test predicate*)  An axis controls the navigation direction in the tree Forward axes: (“attribute” | “child” | “descendant” | “self” | “descendant-or-self” | “following-sibling” | “following” ) “::” Backward axes: (“parent” | “ancestor” | “ancestor-or- self” | “preceding-sibling” | “preceding”) “::” Given a context node (the evaluation context), an axis returns a sequence of nodes

25 Path Expressions (cont’d)  An node test filters the sequence of nodes that an axis selects. Name test: e.g. publisher, * ( wildcard for name test), myNS:publisher, *:publisher, myNS:*, *:* Node kind test: e.g. node(), text(), comment(), … Type test: e.g. attribute(*, xs:integer)

26 Path Expressions (cont’d)  A predicate further filters the nodes selected by the axis and retained by the node test. axis :: node test [pred 1] [pred 2] … [pred n]  A predicate can be any expression, whose result is coerced to a boolean value; node retained if true. On attribute: descendant::toy[attribute::color = “red”] On text data: descendant::toy[child::text() = “pooh”] On position: child::chapter[position() = 2] On other elements: child::chapter[child::figure] context node axis node test pred 1 pred 2 pred n sequence of nodes

27 Semantics of Path Expressions  Semantics of step1 / step2 : Evaluate step1 => sequence of nodes (o.w. runtime error) For each node in this sequence: Bind the evaluation context to this node. Evaluate step2 with this binding => Case 1: sequence of nodes. (i) Eliminate duplicates by node identity; (ii) Sort by document order Case 2: sequence of atomic values. Concatenate the partial sequences.  Implicit iteration through steps of a path expression

28 Non-Abbreviated Syntax  Step in the non-abbreviated syntax: axis “::” node test ( “[“ predicate “]”)*  doc(“bibliography.xml”)/child::bib  $x/child::bib/child::book/attribute::year  $x/descendant::author  $x/child:chapter [child::figure]  $x/child::book/child::chapter[fn:position() = 5]/child::section[fn:position() = 2]  $x/child::para[attribute::type = “warning”] [fn:position() = 5]  $x/child::para [fn:position() = 5] [attribute::type = “warning”]

29 Abbreviated Syntax  Axis can be missing By default the child axis $x/child::person -> $x/person  Shorthand notations for common axes Descendent: $x/descendant-or-self::*/child::chapter -> $x//chapter Parent: $x/parent::* -> $x/.. Attribute: $x/attribute::year-> Self: $x/self::* -> $x/.  In a predicate, function fn:position() can be omitted Positional predicate: [fn:position() = 2] -> [2]

30 Examples of Abbreviated Syntax  doc(“bibliography.xml”)/bib  $x/bib/book/year  $x/chapter[figure]  $x//author  $x/book/chapter[5]/section[2]  = “warning”][5]  = “warning”]  Typical mistakes: $x/a/b[1] means $x/a/(b[1]) and not ($x/a/b)[1] $x//chapter[1]($x/descendant-or- self::*/child::chapter[1]) is NOT the same as $x/descendant::chapter[1].

31 Simple Iteration Expression  Syntax : for variable in expression1 return expression2  Example for $x in doc(“bib.xml”)/bib/book return $x/title  Semantics: bind the variable $x to each node returned by expression1 for each such binding, evaluate expression2 concatenate the resulting sequences nested sequences in the “for” clause are automatically flattened

32 FLWOR expressions  FLOWR is a high-level construct that supports iteration and binding of variables to intermediate results is useful for joins and restructuring data  Syntax: For-Let-Where-Order by-Return for $x in expression1 /* similar to FROM in SQL */ [ let $y := expression2 ] /* no analogy in SQL */ [ where expression3 ] /* similar to WHERE in SQL */ [ order by expression4 (ascending|descending)? (empty (greatest|least))? ] /* similar to ORDER-BY in SQL */ return expression4 /* similar to SELECT in SQL */

33 Example FLOWR Expression for $x in doc(“bib.xml”)/bib/book // iterate, bind each item to $x let $y := $x/author // no iteration, bind a sequence to $y where $x/title=“XML” // filter each tuple ($x, $y) order by descending // order tuples return count($y) // one result per surviving tuple The for clause iterates over all books, binding $x to each book in turn. For each binding of $x, the let clause binds $y to all authors of this book. The result of for and let clauses is a tuple stream in which each tuple contains a pair of bindings for $x and $y, i.e. ($x, $y). The where clause filters each tuple ($x, $y) by checking predicates. The order by clause orders surviving tuples. Atomization is implicitly applied to the order by expression to yield an atomic value for sorting. The return clause returns the count of $y for each surviving tuple.

34 FOR versus LET for $x in doc("bib.xml")/bib/book return $x for $x in doc("bib.xml")/bib/book return $x Returns:... let $x in document("bib.xml")/bib/book return $x let $x in document("bib.xml")/bib/book return $x Returns:...

35 Getting Distinct Values from FOR  Distinct values: the fn:distinct-values function eliminates duplicates in a sequence by value The for expression evaluates to a sequence of nodes fn:distinct-values converts it to a sequence of atomic values and removes duplicates for $a in distinct-values(doc(“bib.xml”)/book/author) return {$a} versus for $a in doc(“bib.xml”)/book/author return $a

36 More Examples of WHERE  Selections for $b in doc("bib.xml")/bib/book where $b/publisher = “Addison Wesley" and = "1998" return $b/title for $b in doc("bib.xml")/bib/book where empty($b/author) return $b/title for $b in doc("bib.xml")/bib/book where count($b/author) = 1 return $b/title Aggregates over a sequence: count, avg, sum, min, max

37 Value Comparison  Value comparison “ eq” : compares single values  “eq” applies atomization ( fn:data ( )) to each operand Given a sequence of nodes, fn:data ( ) returns an atomic value for each node which consists of: a string value, i.e., the concatenation of the string values of all its Text Node descendants in document order a type, e.g., xdt:untypedAtomic For each operand, “eq” uses the fn:data() result if it evaluates to a singleton sequence, o.w. runtime error. Jeffery Ullman  for $a in doc(“bib.xml”)/bib/book/author where $a eq “JefferyUllman” return $a/.. for $b in doc(“bib.xml”)/bib/book where $b/author eq “JefferyUllman” return $b/author

38 General Comparison  General comparison operators (=, !=,, =): existentially quantified comparisons, applied to operand sequences of any length  Atomization (fn:data()) is applied to each operand to get a sequence of atomic values  Comparison is true if one value from a sequence satisfies the comparison for $b in doc(“bib.xml”)/bib/book where $b/author = “JefferyUllman” return $b/author (1, 2) = (2, 3)?

39 Node Comparison  Node comparison by identity for $b in doc(“bib.xml”)/bib/book where $b/author[last eq “Ullman”] is $b/author[first eq “Jeffery”] return $b/title  Node comparison by document order for $b in doc(“bib.xml”)/bib/book where $b/author[. eq “JefferyUllman”] << $b/author[. eq “JenniferWidom”] return $b/title

40 String Operations  Functions for string matching fn:contains(xs:string, xs:string) fn:starts(ends)-with(xs:string, xs:string fn:substring-before(after)(xs:string, xs:string) fn:matches($input as xs:string, $pattern as xs:string) …  Again, atomization (fn:data()) is applied to each function parameter to get an atomic value. for $a in doc(“bib.xml”)//author where contains($a, “Ullman") return $a Jeffery Ullman Jeffery Ullman

41 Order By  Order by: applies atomization (fn:data( )) to the sort by expression to yield an atomic value for sorting runtime error if the sort by expression evaluates to a sequence of size > 1 for $a in doc(“bib.xml”)//author order by $a return ($a, $a/../title) for $b in doc(“bib.xml”)/bib/book order by $b/author[1] return {$b/title} {$b/author} Need “(“ “)” if return contains multiple items. Need “{“ “}” around paths inside element construction.

42 Joins in FOR, LET, WHERE  Joins for $b in doc(“bib.xml”)//book, $p in doc(“publishers.xml")//publisher where $b/publisher = $p/name return ($b/title, $p/name, $p/address) for $d in doc("depts.xml")/depts/deptno let $e := doc("emps.xml")/emps/emp[deptno = $d] where count($e) >= 10 order by avg($e/salary) descending return { $d, {count($e)}, {avg($e/salary)} } A tuple here is ($b, $p), a unique combination of bindings of $b, $p. A tuple here is ($b, $e)

43 Node Constructors  Node constructor : constructs new nodes elements, attributes, text, documents, processing instructions, comments element constructor is the most commonly used  Direct element constructor : a form of element constructor in which the element name is a constant Harold and the Purple Crayon Crockett Johnson

44 Element Construction in RETURN  Literal versus Evaluated Element Content for $p in doc(“report.xml”)//person return ( literal text content/* literal*/, { $person/name } /* {evaluated content} */, { $person/name/text()}, Yesterday {$person/name} visited us. )  Braces "{}" are used to delimit enclosed expressions, which will be evaluated and replaced by their value.  If return contains more than 1 item, use “()” and the comma operator.

45 More on Element Construction  The start tag of an element may contain attributes. The value of an attribute can be: a string of characters enclosed in quotes, e.g., evaluated using enclosed expressions specified in { } { for $b in doc("bib.xml")/bib/book where $b/publisher = "Addison-Wesley" and > 1991 return { $b/title } }  The largest “wrapping” tag creates well-formed XML.

46 Nested FLWOR { for $a in distinct-values(doc(“bib.xml”)/book/author) order by $a return {$a} { for $b in doc(“bib.xml”)/book[author = $a] order by $b/title return $b/title } } The nested FLOWR effectively implements “group books by author”. No Group By in XQuery!

47 If-Then-Else Expressions  For each book that has at least one author, list the title and first two authors, and an empty "et-al" element if the book has additional authors. { for $b in doc("bib.xml")//book where count($b/author) > 0 return { $b/title } { for $a in $b/author[position()<=2] return $a } { if (count($b/author) > 2) then else () } }

48 XQuery Implementation  Open Source Saxon (Michael Kay) Galax (AT&T, Mary Fernandez)  Commercial IBM, Microsoft, Oracle (with DB products) BEA System (WebLogic Integration) Some freelancers  Visit:

49 Questions

50 XPath: More Predicates < 60] < 25] /bib/book[author/text()]

51 Axes: More Complex Traversals Thus far, we’ve seen XPath expressions that go down the tree  But we might want to go up, left, right, etc.  These are expressed with so-called axes : self::path-step child::path-stepparent::path-step descendant::path-stepancestor::path-step descendant-or-self::path-stepancestor-or-self::path-step preceding-sibling::path-stepfollowing-sibling::path-step preceding::path-stepfollowing::path-step  The previous XPaths we saw were in “abbreviated form”