Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Quilt, not a Camel Don Chamberlin Jonathan Robie Daniela Florescu May 19, 2000.

Similar presentations


Presentation on theme: "A Quilt, not a Camel Don Chamberlin Jonathan Robie Daniela Florescu May 19, 2000."— Presentation transcript:

1 A Quilt, not a Camel Don Chamberlin Jonathan Robie Daniela Florescu May 19, 2000

2 2 The Web Changes Everything l All kinds of information can be made available everywhere, all the time l XML is the leading candidate for a universal language for information interchange l To realize its potential, XML needs a query language of comparable flexibility l Several XML query languages have been proposed and/or implemented l XPath, XQL, XML-QL, Lorel, YATL l Most are oriented toward a particular domain such as semi-structured documents or databases

3 3 Goals of the Quilt Proposal l Leverage the most effective features of several existing and proposed query languages l Design a small, clean, implementable language l Cover the functionality required by all the XML Query use cases in a single language l Write queries that fit on a slide l Design a quilt, not a camel l "Quilt" refers both to the origin of the language and to its intended use in knitting together heterogeneous data sources

4 4 Antecedents: XPath and XQL l Closely-related languages for navigating in a hierarchy l A path expression is a series of steps l Each step moves along an axis (children, ancestors, attributes, etc.) and may apply a predicate l XPath has a well-defined abbreviated syntax: /book[title = "War and Peace"] /chapter[title = "War"] //figure[contains(caption, "Korea")] l XQL adds some operators: BEFORE, AFTER,...

5 5 Antecedent: XML-QL l Proposed by Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, Dan Suciu l WHERE-clause binds variables according to a pattern, CONSTRUCT-clause generates output document WHERE $pname in "parts.xml", $sname in "supp.xml", in "sp.xml" CONSTRUCT $pname $sname

6 6 Antecedents: SQL and OQL l SQL and OQL are database query languages l SQL derives a table from other tables by a stylized series of clauses: SELECT - FROM - WHERE l OQL is a functional language l A query is an expression l Expressions can take several forms l Expressions can be nested and combined l SELECT-FROM-WHERE is one form of OQL expression

7 7 A First Look at Quilt l "Find the description and average price of each red part that has at least 10 orders" FOR $p IN document("parts.xml") //part[color = "Red"]/partno LET $o := document("orders.xml") //order[partno = $p] WHERE count($o) >= 10 RETURN $p/description, avg($o/price)

8 8 Quilt Expressions l Like OQL, Quilt is a functional language (a query is an expression, and expressions can be composed.) l Some types of Quilt expressions: l A path expression (using abbreviated XPath syntax): document("bids.xml")//bid[itemno="47"]/bid_amount l An expression using operators and functions: ($x + $y) * foo($z) l An element constructor: $u, $a l A "FLWR" expression

9 9 A FLWR Expression FOR_clause LET_clause WHERE_clause RETURN_clause FOR... LET... WHERE... RETURN l A FLWR expression binds some variables, applies a predicate, and constructs a new result.

10 10 FOR Clause l Each expression evaluates to a collection of nodes l The FOR clause produces many binding-tuples from the Cartesian product of these collections l In each tuple, the value of each variable is one node and its descendants. l The order of the tuples preserves document order unless some expression contains a non-order-preserving function such as distinct( ). FOR variable IN expression,

11 11 LET Clause l A LET clause produces one binding for each variable (therefore the LET clause does not affect the number of binding-tuples) l The variable is bound to the value of expression, which may contain many nodes. l Document order is preserved among the nodes in each bound collection, unless expression contains a non-order-preserving function such as distinct( ). LET, variable := expression

12 12 WHERE Clause l Applies a predicate to the tuples of bound variables l Retains only tuples that satisfy the predicate l Preserves order of tuples, if any l May contain AND, OR, NOT l Applies scalar conditions to scalar variables: $color = "Red" l Applies set conditions to variables bound to sets: avg($emp/salary) > 10000 WHERE boolean-expression

13 13 RETURN Clause l Constructs the result of the FLWR expression l Executed once for each tuple of bound variables l Preserves order of tuples, if any,... l OR, can impose a new order using a SORTBY clause l Often uses an element constructor: $item/itemno, avg($b/bid_amount) SORTBY itemno RETURN expression

14 14 Summary of FLWR Data Flow RETURN WHERE FOR/LET List of tuples of bound variables List of tuples of bound variables XML XML = ordered forest of nodes ($x = value, $y = value, $z = value), ($x = value, $y = value, $z = value)

15 15 Simple Quilt queries l "Find all the books published in 1998 by Penguin" FOR $b IN document("bib.xml")//book WHERE $b/year = "1998" AND $b/publisher = "Penguin" RETURN $b SORTBY(author, title) l "Find titles of books that have no authors" FOR $b IN document("bib.xml")//book WHERE empty($b/author) RETURN $b/title SORTBY(.)

16 16 Nested queries l "Invert the hierarchy from publishers inside books to books inside publishers" FOR $p IN distinct(//publisher) RETURN $p/text(), FOR $b IN //book[publisher = $p] RETURN $b/title, $b/price SORTBY(price DESCENDING) SORTBY(name)

17 17 Operators based on global ordering l Returns nodes in expr1 that are before (after) some node in expr2 l "Find procedures where no anesthesia occurs before the first incision." FOR $proc IN //section[title="Procedure"] WHERE empty( $proc//anesthesia BEFORE ($proc//incision)[1] ) RETURN $proc expr1expr2 BEFORE AFTER

18 18 The FILTER Operator expression FILTER path-expression l Returns the result of the first expression, "filtered" by the second expression l Result is an "ordered forest" that preserves sequence and hierarchy. A A A B B B C C C C A B C C C CAB A B B B A A A B A B LET $x := /C $x FILTER //A | //B

19 19 Projection (Filtering a document) l "Generate a table of contents containing nested sections and their titles" document("cookbook.xml") FILTER //section | //section/title | //section/title/text()

20 20 Conditional Expressions l "Make a list of holdings, ordered by title. For journals, include the editor; otherwise include the author." FOR $h IN //holding RETURN $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author SORTBY(title) IF expr1 THEN expr2 ELSE expr3

21 21 Functions l A query can define its own local functions If f is a scalar function, f(S) is defined as { f(s): s c S } l Functions can be recursive l "Compute the maximum depth of nested parts in the document named partlist.xml" FUNCTION depth($e) { IF empty($e/*) THEN 0 ELSE max(depth($e/*)) + 1 } depth(document("partlist.xml") FILTER //part)

22 22 Quantified Expressions l Quantified expressions are a form of predicate (return Boolean) l "Find titles of books in which both sailing and windsurfing are mentioned in the same paragraph" FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "Sailing") AND contains($p, "Windsurfing") RETURN $b/title var IN expr SATISFIES predicate SOME EVERY

23 23 Variable Bindings l "For each book that is more expensive than average, list the title and the amount by which the book's price exceeds the average price" LET $a := avg(//book//price) EVAL FOR $b IN //book WHERE $b/price > $a RETURN $b/title, $b/price - $a LET variable := expression EVAL expression

24 24 Relational Queries PNO DESCRIP PARTS SNO SNAME SUPPLIERS SNO PNO PRICE CATALOG l Tables can be represented by simple XML trees l Table = root l Each row becomes a nested element l Each data value becomes a further nested element

25 25 SQL vs. Quilt l SQL: SELECT pno, descrip FROM parts AS p WHERE descrip LIKE 'Gear' ORDER BY pno; l Quilt: FOR $p IN document("parts.xml")//p_tuple WHERE contains($p/descrip, "Gear") RETURN $p/pno SORTBY(.) "Find part numbers of gears, in numeric order"

26 26 GROUP BY and HAVING l SQL: SELECT pno, avg(price) AS avg_price FROM catalog AS c GROUP BY pno HAVING count(*) >= 3 ORDER BY pno; l Quilt: FOR $p IN distinct(document("parts.xml")//pno) LET $c := document("catalog.xml") //c_tuple[pno = $p] WHERE count($c) >= 3 RETURN $p, avg($c/price) SORTBY(pno) "Find part no's and avg. prices for parts with 3 or more suppliers"

27 27 Inner Join "Return a 'flat' list of supplier names and their part descriptions" l Quilt: FOR $c IN document("catalog.xml")//c_tuple, $p IN document("parts.xml") //p_tuple[pno = $c.pno], $s IN document("suppliers.xml") //s_tuple[sno = $c.sno] RETURN $s/sname, $p/descrip SORTBY(sname, descrip)

28 28 Outer Join l Quilt: FOR $s IN document("suppliers.xml")//s_tuple RETURN $s/sname, FOR $c IN document("catalog.xml") //c_tuple[sno = $s/sno], $p IN document("parts.xml") //p_tuple[pno = $c/pno] RETURN $p/descrip SORTBY(.) SORTBY(sname) "List names of all suppliers in alphabetic order; within each supplier, list the descriptions of parts it supplies (if any)"

29 29 Defining XML Views of Relations l Use an SQL query to define the data you want to extract (in tabular form) l Use a simple default mapping from tables to XML trees l Use a Quilt query to compose the XML trees into a view with any desired structure l Quilt queries against the view are composed with the Quilt query that defines the view

30 30 Quilt grammar (1) l Queries and Functions: query ::= function_defn* expr function_defn ::= 'FUNCTION' function_name '(' variable_list ')' '{' expr '}' l Example of a function definition: FUNCTION spouse_age($x) { $x/spouse/age } l Functions: l Core XML Query Language library: avg, contains, empty,... l domain-dependent library: eg. area of a polygon l local functions: eg. spouse_age($x)

31 31 Quilt grammar (2) l Expressions: expr ::= variable | constant | expr infix_operator expr | prefix_operator expr | function_name '(' expr_list? ')' | '(' expr ')' | expr '[' expr ']' | 'IF' expr 'THEN' expr 'ELSE' expr | 'LET' variable ':=' expr 'EVAL' expr l Infix operators: + - * div mod = >= != | AND OR NOT UNION INTERSECT EXCEPT BEFORE AFTER Prefix operators: + - NOT

32 32 Quilt grammar (3) l Expressions, continued: expr ::= path_expression | element_constructor | FLWR_expression element_constructor ::= start_tag expr_list? end_tag start_tag ::= ' ' attributes ::= ( attr_name '=' expr )+ | 'ATTRIBUTES' expr ::= ' ' ::= QName | variable ::= QName | variable

33 33 Quilt grammar (4) l FLWR_Expressions: FLWR_expression ::= for_clause ( for_clause | let_clause )* where_clause? return_clause for_clause ::= 'FOR' variable 'IN' expr (',' variable 'IN' expr)* let_clause ::= 'LET' variable ':=' expr (',' variable ':=' expr)* where_clause ::= 'WHERE' expr return_clause ::= 'RETURN' expr

34 34 Quilt grammar (5) l Second-order expressions: expr ::= expr 'FILTER' path_expression | quantifier variable 'IN' expr 'SATISFIES' expr | expr 'SORTBY' '(' expr order?,... ')' quantifier ::= 'SOME' | 'EVERY' order ::= 'ASCENDING' | 'DESCENDING'

35 35 Comments on the Grammar l In general the correctness of a program/query is enforced by: l Syntactic rules (e.g. grammar) l Semantic rules (e.g. variable and function scope) l Type checking rules (e.g. the expression in the WHERE clause must be of type Boolean) l The Quilt grammar is quite permissive l It deals with only the first of the above items l The Quilt grammar is just a beginning. Still to come: l Core function library l Type checking rules l Formal semantic specification

36 36 Summary l XML is very versatile markup language l Quilt is a query language designed to be as versatile as XML l Quilt draws features from several other languages l Quilt can pull together data from heterogeneous sources l Quilt can help XML to realize its potential as a universal language for data interchange


Download ppt "A Quilt, not a Camel Don Chamberlin Jonathan Robie Daniela Florescu May 19, 2000."

Similar presentations


Ads by Google