Presentation is loading. Please wait.

Presentation is loading. Please wait.

IBM Almaden Research Center © 2006 IBM Corporation On the Path to Efficient XML Queries Andrey Balmin, Kevin Beyer, Fatma Özcan IBM Almaden Research Center.

Similar presentations


Presentation on theme: "IBM Almaden Research Center © 2006 IBM Corporation On the Path to Efficient XML Queries Andrey Balmin, Kevin Beyer, Fatma Özcan IBM Almaden Research Center."— Presentation transcript:

1 IBM Almaden Research Center © 2006 IBM Corporation On the Path to Efficient XML Queries Andrey Balmin, Kevin Beyer, Fatma Özcan IBM Almaden Research Center Matthias Nicola IBM Silicon Valley Lab

2 IBM Almaden Research Center © 2006 IBM Corporation New languages = new abilities + new pitfalls  XQuery –A new query language designed specifically for XML data  SQL / XML –Added XML as a data type, including XQuery sequences –Added XQuery as a sublanguage

3 IBM Almaden Research Center © 2006 IBM Corporation Purpose  Teach users of the new XML query languages  Teach the teachers  Share our users’ experiences  Influence languages

4 IBM Almaden Research Center © 2006 IBM Corporation Focus  Large databases with many moderate XML documents  Schema flexibility is required –Many schemas in one collection –No schema validation used –Schemas with xs:any –Documents like Atom Syndication and RSS that allow any extension  Therefore –Document filtering is primary concern –Limited type inference –Any data is possible

5 IBM Almaden Research Center © 2006 IBM Corporation Index eligibility Index Eligibility: We say that an index I is eligible to answer predicate P of query Q, if for any collection of XML documents D, the following holds: Q(D) = Q( I( P,D )). Where I( P,D ) is the set of XML documents produced, by probing index I with predicate P. This is not as obvious as it is in relational databases.

6 IBM Almaden Research Center © 2006 IBM Corporation XML indexes in DB2  Index a linear XPath pattern over a column as a particular datatype CREATE INDEX index-name ON table(xml-column) USING 'pattern' AS type pattern ::= namespace-decls? (( / | // ) axis? ( name-test | kind-test ))+ axis ::= @ | child:: | attribute:: | self:: | descendant:: | descendant-or-self:: name-test ::= qname | * | ncname:* | *:ncname kind-test ::= node() | text() | comment() | processing-instruction() type ::= varchar | double | date | timestamp

7 IBM Almaden Research Center © 2006 IBM Corporation Query pattern  index pattern CREATE INDEX li_price ON orders(orddoc) USING XMLPATTERN '//lineitem/@price' AS double  Can use the index: more restrictive for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ] return $i  Cannot use the index: less restrictive for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@* > 100 ] return $i

8 IBM Almaden Research Center © 2006 IBM Corporation Match index and query predicate data type CREATE INDEX li_price ON orders(orddoc) USING XMLPATTERN '//lineitem/@price' AS double  Can use the index: numeric predicate and index for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ] return $i  Cannot use the index: string predicate for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[lineitem/@price > "100" ] return $i

9 IBM Almaden Research Center © 2006 IBM Corporation Data Types for Joins CREATE INDEX o_custid ON orders(orddoc) USING XMLPATTERN '//custid' AS double CREATE INDEX c_custid ON customer(cdoc) USING XMLPATTERN '/customer/id' AS double  Cannot use the indexes: unknown comparison type for $i in db2-fn:xmlcolumn("ORDERS.ORDDOC")/order for $j in db2-fn:xmlcolumn("CUSTOMER.CDOC")/customer where $i/custid = $j/id return $i  Can use the indexes: at least one cast required for $i in db2-fn:xmlcolumn("ORDERS.ORDDOC")/order for $j in db2-fn:xmlcolumn("CUSTOMER.CDOC")/customer where $i/custid/xs:double(.) = $j/id/xs:double(.) return $i

10 IBM Almaden Research Center © 2006 IBM Corporation SQL/XML Query Functions  XMLQuery Scalar function that returns an (possibly empty) XQuery sequence for every row  XMLExists Predicate that returns true iff the XQuery sequence produced is not empty  XMLTable Produces a table with one row for each item in the row-producing XQuery sequence, and with one column per column-producing XQuery expression. The columns may be XQuery sequences or cast to simple SQL types.

11 IBM Almaden Research Center © 2006 IBM Corporation XMLQuery does not filter rows (usually)  Cannot use the index: SELECT XMLQuery(‘ $order//lineitem[ @price > 100 ] ‘ passing orddoc as "order") FROM orders  Can use the index: VALUES (XMLQuery(’ db2-fn:xmlcolumn("ORDERS.ORDDOC") //lineitem[ @price > 100 ] '))  Can use the index: db2-fn:xmlcolumn('ORDERS.ORDDOC') //lineitem[ @price > 100 ] (LI1, LI2) () (LI3) () (LI1, LI2, LI3) LI1 LI2 LI3 Result

12 IBM Almaden Research Center © 2006 IBM Corporation XMLExists filter rows (usually)  Can use the index SELECT ordid, orddoc FROM orders WHERE XMLExists(‘ $order//lineitem[ @price > 100 ] ‘ passing orddoc as "order")  Cannot use the index: false exists SELECT ordid, orddoc FROM orders WHERE XMLExists(‘ $order//lineitem/@price > 100 ‘ passing orddoc as "order") Need XMLTest which uses XQuery’s Effective Boolean Value

13 IBM Almaden Research Center © 2006 IBM Corporation XMLQuery + XMLExists vs. XMLTable  Can use the index SELECT ordid, XMLQuery(‘ $order//lineitem[@price > 100] ’ passing orddoc as "order") FROM orders WHERE XMLExists(‘ $order//lineitem[@price > 100] ’ passing orddoc as "order")  XMLTable: More efficient and less redundant SELECT o.ordid, t.lineitem FROM orders o, XMLTable(‘ $order//lineitem[@price > 100] ’ passing o.orddoc as "order“ COLUMNS "lineitem" XML BY REF PATH '.') as t(lineitem)

14 IBM Almaden Research Center © 2006 IBM Corporation Predicates in XMLTable column expressions  Can use the index SELECT o.ordid, t.lineitem FROM orders o, XMLTable(‘ $order//lineitem[@price > 100] ’ passing o.orddoc as "order“ COLUMNS "lineitem" XML BY REF PATH '.') as t(lineitem)  Cannot use the index SELECT o.ordid, t.lineitem, t.price FROM orders o, XMLTable(‘ $order//lineitem ’ passing o.orddoc as "order" COLUMNS "lineitem" XML BY REF PATH '.', "price" DECIMAL(6,3) PATH '@price[. > 100]‘ ) as t(lineitem, price) 1LI1 1LI2 3LI3 1LI4null 1LI1175 1LI2150 2LI5null 2LI6null 3LI3201 3LI7null

15 IBM Almaden Research Center © 2006 IBM Corporation Joining XML Values in SQL/XML  Can use index on product/id, but not p.id SELECT p.name, o.orddoc FROM products p, orders o WHERE XMLExists(‘ $order//lineitem/product[ id eq $pid ] ‘ passing o. orddoc as "order", p.id as "pid")  Can use index on p.id, but not product/id SELECT p.name, o.orddoc FROM products p, orders o WHERE p.id = XMLCast( XMLQuery(‘ $order//lineitem/product/id ‘ passing o. orddoc as "order") as VARCHAR(13)) Need to unify XQuery and SQL data types

16 IBM Almaden Research Center © 2006 IBM Corporation Joining XML Values in SQL/XML  Probably cannot use XML indexes: SQL types differ from XML SELECT c.name, o.orddoc FROM orders o, customer c WHERE XMLCast( XMLQuery(‘ $order/order/custid ’ passing o.orddoc as "order") as DOUBLE) = XMLCast( XMLQuery(‘ $cust/customer/id ’ passing c.cdoc as "cust") as DOUBLE)  Can use XML indexes SELECT c.name, o.orddoc FROM orders o, customer c WHERE XMLExists(‘ $order/order[ custid/xs:double(.) = $cust/customer/id/xs:double(.) ] ‘ passing o.orddoc as "order", c.cdoc as "cust")

17 IBM Almaden Research Center © 2006 IBM Corporation XQuery Let Clauses  Can use the index for $doc in db2-fn:xmlcolumn('ORDERS.ORDDOC') for $item in $doc//lineitem[ @price > 100 ] return { $item } for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order return $ord/lineitem[ @price > 100 ]  Cannot use the index for $doc in db2-fn:xmlcolumn('ORDERS.ORDDOC') let $item:= $doc//lineitem[ @price > 100 ] return { $item } for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order return { $ord/lineitem[ @price > 100 ]} R/L1 R/L2 R/L3 R/L1,L2 R R/L3 R

18 IBM Almaden Research Center © 2006 IBM Corporation XQuery Let Clauses  Can use the index for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order where $ord/lineitem/@price > 100 return { $ord/lineitem }  Same as above for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order let $price := $ord/lineitem/@price where $price > 100 return { $ord/lineitem } R/L4,L1,L2 R/L3,L7

19 IBM Almaden Research Center © 2006 IBM Corporation Context is everything  $i is bound to the document node for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') return $i/order/lineitem  $j is bound to for $j in ( for $o in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order return { $o/* } ) return $j/my_order/lineitem

20 IBM Almaden Research Center © 2006 IBM Corporation Remember the dot  Produces a type error: no document node at root let $order := { db2-fn:xmlcolumn('ORDERS.ORDDOC') /order[custid > 1001] } return $order[ //customer/name ]  Absolute path expressions is a shorthand for fn:root(.) treat as document-node().  Absolute path expressions are bad style

21 IBM Almaden Research Center © 2006 IBM Corporation Construction and View Composition  Want to rewrite this… let $view := for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC')/ order/lineitem return { $i/@quantity, $i/product/@price, { $i/product/id/data(.) } } for $j in $view where $j/pid = '17‘ return $j/@price

22 IBM Almaden Research Center © 2006 IBM Corporation Construction and View Composition  … into this for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') /order/lineitem where $i/product/id/data(.) = '17‘ return $i/product/@price  but…

23 IBM Almaden Research Center © 2006 IBM Corporation Construction and View Composition  Data type changed to untypedAtomic –id is string: comparison is now an error –id is long: comparison is now as double instead of long  List types are concatenated  Error for duplicate @price attributes lost  New node identity lost  Parent axis is broken Any sequence should live in tree without change. Separate identity from construction?

24 IBM Almaden Research Center © 2006 IBM Corporation Remember the namespaces  Index definition and query must match namespaces CREATE INDEX li_price ON orders(orddoc) USING XMLPATTERN '//lineitem/@price' AS double  Cannot use the index. Which is right? declare default element namespace "http://ournamespaces.com/order"; for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ] return $i

25 IBM Almaden Research Center © 2006 IBM Corporation Elements and text nodes differ CREATE INDEX PRICE_TEXT ON orders.orddoc USING XMLPATTERN '//price' AS varchar  Can not use index for $ord in db2-fn:xmlcolumn(“ORDERS.ORDDOC”) /order[ lineitem/price/text() = “99.50” ] return $ord  Element might have more data than just text 99.50 USD

26 IBM Almaden Research Center © 2006 IBM Corporation Attributes are shy  No attributes //* //node()  Only attributes //@* //attribute::node()  Empty result due to “principle node kind” //@*/self:*

27 IBM Almaden Research Center © 2006 IBM Corporation Between predicates are not obvious  Might not be between: multiple prices lineitem[ price > 100 and price < 200 ]  Between or error lineitem[ price gt 100 and price lt 200 ]  Always between lineitem/price/data()[. > 100 and. < 200 ]  Between if not list type lineitem/price[. > 100 and. < 200 ]  Between if not list type lineitem[ @price > 100 and @price < 200 ]

28 IBM Almaden Research Center © 2006 IBM Corporation Conclusions  Easy to make mistakes without schema constraints  Many subtle differences in expressions  Improve construction composition  Unify SQL and XQuery type systems  Add XMLTest to SQL/XML


Download ppt "IBM Almaden Research Center © 2006 IBM Corporation On the Path to Efficient XML Queries Andrey Balmin, Kevin Beyer, Fatma Özcan IBM Almaden Research Center."

Similar presentations


Ads by Google