Download presentation
Presentation is loading. Please wait.
Published byMartha James Modified over 9 years ago
1
IBM Almaden Research Center © 2006 IBM Corporation On the Path to Efficient XML Queries Andrey Balmin, Kevin Beyer, Fatma Özcan IBM Almaden Research Center Matthias Nicola IBM Silicon Valley Lab
2
IBM Almaden Research Center © 2006 IBM Corporation New languages = new abilities + new pitfalls XQuery –A new query language designed specifically for XML data SQL / XML –Added XML as a data type, including XQuery sequences –Added XQuery as a sublanguage
3
IBM Almaden Research Center © 2006 IBM Corporation Purpose Teach users of the new XML query languages Teach the teachers Share our users’ experiences Influence languages
4
IBM Almaden Research Center © 2006 IBM Corporation Focus Large databases with many moderate XML documents Schema flexibility is required –Many schemas in one collection –No schema validation used –Schemas with xs:any –Documents like Atom Syndication and RSS that allow any extension Therefore –Document filtering is primary concern –Limited type inference –Any data is possible
5
IBM Almaden Research Center © 2006 IBM Corporation Index eligibility Index Eligibility: We say that an index I is eligible to answer predicate P of query Q, if for any collection of XML documents D, the following holds: Q(D) = Q( I( P,D )). Where I( P,D ) is the set of XML documents produced, by probing index I with predicate P. This is not as obvious as it is in relational databases.
6
IBM Almaden Research Center © 2006 IBM Corporation XML indexes in DB2 Index a linear XPath pattern over a column as a particular datatype CREATE INDEX index-name ON table(xml-column) USING 'pattern' AS type pattern ::= namespace-decls? (( / | // ) axis? ( name-test | kind-test ))+ axis ::= @ | child:: | attribute:: | self:: | descendant:: | descendant-or-self:: name-test ::= qname | * | ncname:* | *:ncname kind-test ::= node() | text() | comment() | processing-instruction() type ::= varchar | double | date | timestamp
7
IBM Almaden Research Center © 2006 IBM Corporation Query pattern index pattern CREATE INDEX li_price ON orders(orddoc) USING XMLPATTERN '//lineitem/@price' AS double Can use the index: more restrictive for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ] return $i Cannot use the index: less restrictive for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@* > 100 ] return $i
8
IBM Almaden Research Center © 2006 IBM Corporation Match index and query predicate data type CREATE INDEX li_price ON orders(orddoc) USING XMLPATTERN '//lineitem/@price' AS double Can use the index: numeric predicate and index for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ] return $i Cannot use the index: string predicate for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[lineitem/@price > "100" ] return $i
9
IBM Almaden Research Center © 2006 IBM Corporation Data Types for Joins CREATE INDEX o_custid ON orders(orddoc) USING XMLPATTERN '//custid' AS double CREATE INDEX c_custid ON customer(cdoc) USING XMLPATTERN '/customer/id' AS double Cannot use the indexes: unknown comparison type for $i in db2-fn:xmlcolumn("ORDERS.ORDDOC")/order for $j in db2-fn:xmlcolumn("CUSTOMER.CDOC")/customer where $i/custid = $j/id return $i Can use the indexes: at least one cast required for $i in db2-fn:xmlcolumn("ORDERS.ORDDOC")/order for $j in db2-fn:xmlcolumn("CUSTOMER.CDOC")/customer where $i/custid/xs:double(.) = $j/id/xs:double(.) return $i
10
IBM Almaden Research Center © 2006 IBM Corporation SQL/XML Query Functions XMLQuery Scalar function that returns an (possibly empty) XQuery sequence for every row XMLExists Predicate that returns true iff the XQuery sequence produced is not empty XMLTable Produces a table with one row for each item in the row-producing XQuery sequence, and with one column per column-producing XQuery expression. The columns may be XQuery sequences or cast to simple SQL types.
11
IBM Almaden Research Center © 2006 IBM Corporation XMLQuery does not filter rows (usually) Cannot use the index: SELECT XMLQuery(‘ $order//lineitem[ @price > 100 ] ‘ passing orddoc as "order") FROM orders Can use the index: VALUES (XMLQuery(’ db2-fn:xmlcolumn("ORDERS.ORDDOC") //lineitem[ @price > 100 ] ')) Can use the index: db2-fn:xmlcolumn('ORDERS.ORDDOC') //lineitem[ @price > 100 ] (LI1, LI2) () (LI3) () (LI1, LI2, LI3) LI1 LI2 LI3 Result
12
IBM Almaden Research Center © 2006 IBM Corporation XMLExists filter rows (usually) Can use the index SELECT ordid, orddoc FROM orders WHERE XMLExists(‘ $order//lineitem[ @price > 100 ] ‘ passing orddoc as "order") Cannot use the index: false exists SELECT ordid, orddoc FROM orders WHERE XMLExists(‘ $order//lineitem/@price > 100 ‘ passing orddoc as "order") Need XMLTest which uses XQuery’s Effective Boolean Value
13
IBM Almaden Research Center © 2006 IBM Corporation XMLQuery + XMLExists vs. XMLTable Can use the index SELECT ordid, XMLQuery(‘ $order//lineitem[@price > 100] ’ passing orddoc as "order") FROM orders WHERE XMLExists(‘ $order//lineitem[@price > 100] ’ passing orddoc as "order") XMLTable: More efficient and less redundant SELECT o.ordid, t.lineitem FROM orders o, XMLTable(‘ $order//lineitem[@price > 100] ’ passing o.orddoc as "order“ COLUMNS "lineitem" XML BY REF PATH '.') as t(lineitem)
14
IBM Almaden Research Center © 2006 IBM Corporation Predicates in XMLTable column expressions Can use the index SELECT o.ordid, t.lineitem FROM orders o, XMLTable(‘ $order//lineitem[@price > 100] ’ passing o.orddoc as "order“ COLUMNS "lineitem" XML BY REF PATH '.') as t(lineitem) Cannot use the index SELECT o.ordid, t.lineitem, t.price FROM orders o, XMLTable(‘ $order//lineitem ’ passing o.orddoc as "order" COLUMNS "lineitem" XML BY REF PATH '.', "price" DECIMAL(6,3) PATH '@price[. > 100]‘ ) as t(lineitem, price) 1LI1 1LI2 3LI3 1LI4null 1LI1175 1LI2150 2LI5null 2LI6null 3LI3201 3LI7null
15
IBM Almaden Research Center © 2006 IBM Corporation Joining XML Values in SQL/XML Can use index on product/id, but not p.id SELECT p.name, o.orddoc FROM products p, orders o WHERE XMLExists(‘ $order//lineitem/product[ id eq $pid ] ‘ passing o. orddoc as "order", p.id as "pid") Can use index on p.id, but not product/id SELECT p.name, o.orddoc FROM products p, orders o WHERE p.id = XMLCast( XMLQuery(‘ $order//lineitem/product/id ‘ passing o. orddoc as "order") as VARCHAR(13)) Need to unify XQuery and SQL data types
16
IBM Almaden Research Center © 2006 IBM Corporation Joining XML Values in SQL/XML Probably cannot use XML indexes: SQL types differ from XML SELECT c.name, o.orddoc FROM orders o, customer c WHERE XMLCast( XMLQuery(‘ $order/order/custid ’ passing o.orddoc as "order") as DOUBLE) = XMLCast( XMLQuery(‘ $cust/customer/id ’ passing c.cdoc as "cust") as DOUBLE) Can use XML indexes SELECT c.name, o.orddoc FROM orders o, customer c WHERE XMLExists(‘ $order/order[ custid/xs:double(.) = $cust/customer/id/xs:double(.) ] ‘ passing o.orddoc as "order", c.cdoc as "cust")
17
IBM Almaden Research Center © 2006 IBM Corporation XQuery Let Clauses Can use the index for $doc in db2-fn:xmlcolumn('ORDERS.ORDDOC') for $item in $doc//lineitem[ @price > 100 ] return { $item } for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order return $ord/lineitem[ @price > 100 ] Cannot use the index for $doc in db2-fn:xmlcolumn('ORDERS.ORDDOC') let $item:= $doc//lineitem[ @price > 100 ] return { $item } for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order return { $ord/lineitem[ @price > 100 ]} R/L1 R/L2 R/L3 R/L1,L2 R R/L3 R
18
IBM Almaden Research Center © 2006 IBM Corporation XQuery Let Clauses Can use the index for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order where $ord/lineitem/@price > 100 return { $ord/lineitem } Same as above for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order let $price := $ord/lineitem/@price where $price > 100 return { $ord/lineitem } R/L4,L1,L2 R/L3,L7
19
IBM Almaden Research Center © 2006 IBM Corporation Context is everything $i is bound to the document node for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') return $i/order/lineitem $j is bound to for $j in ( for $o in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order return { $o/* } ) return $j/my_order/lineitem
20
IBM Almaden Research Center © 2006 IBM Corporation Remember the dot Produces a type error: no document node at root let $order := { db2-fn:xmlcolumn('ORDERS.ORDDOC') /order[custid > 1001] } return $order[ //customer/name ] Absolute path expressions is a shorthand for fn:root(.) treat as document-node(). Absolute path expressions are bad style
21
IBM Almaden Research Center © 2006 IBM Corporation Construction and View Composition Want to rewrite this… let $view := for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC')/ order/lineitem return { $i/@quantity, $i/product/@price, { $i/product/id/data(.) } } for $j in $view where $j/pid = '17‘ return $j/@price
22
IBM Almaden Research Center © 2006 IBM Corporation Construction and View Composition … into this for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') /order/lineitem where $i/product/id/data(.) = '17‘ return $i/product/@price but…
23
IBM Almaden Research Center © 2006 IBM Corporation Construction and View Composition Data type changed to untypedAtomic –id is string: comparison is now an error –id is long: comparison is now as double instead of long List types are concatenated Error for duplicate @price attributes lost New node identity lost Parent axis is broken Any sequence should live in tree without change. Separate identity from construction?
24
IBM Almaden Research Center © 2006 IBM Corporation Remember the namespaces Index definition and query must match namespaces CREATE INDEX li_price ON orders(orddoc) USING XMLPATTERN '//lineitem/@price' AS double Cannot use the index. Which is right? declare default element namespace "http://ournamespaces.com/order"; for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ] return $i
25
IBM Almaden Research Center © 2006 IBM Corporation Elements and text nodes differ CREATE INDEX PRICE_TEXT ON orders.orddoc USING XMLPATTERN '//price' AS varchar Can not use index for $ord in db2-fn:xmlcolumn(“ORDERS.ORDDOC”) /order[ lineitem/price/text() = “99.50” ] return $ord Element might have more data than just text 99.50 USD
26
IBM Almaden Research Center © 2006 IBM Corporation Attributes are shy No attributes //* //node() Only attributes //@* //attribute::node() Empty result due to “principle node kind” //@*/self:*
27
IBM Almaden Research Center © 2006 IBM Corporation Between predicates are not obvious Might not be between: multiple prices lineitem[ price > 100 and price < 200 ] Between or error lineitem[ price gt 100 and price lt 200 ] Always between lineitem/price/data()[. > 100 and. < 200 ] Between if not list type lineitem/price[. > 100 and. < 200 ] Between if not list type lineitem[ @price > 100 and @price < 200 ]
28
IBM Almaden Research Center © 2006 IBM Corporation Conclusions Easy to make mistakes without schema constraints Many subtle differences in expressions Improve construction composition Unify SQL and XQuery type systems Add XMLTest to SQL/XML
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.