XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian Baras, Brandon Berg, Denis Churin, Eugene Kogan SQL Server Microsoft Corp
VLDB Sep 1S. Pal et al.2 Overview Background XML Support in SQL Server 2005 OrdPath labeling of XML nodes XML indexes – PATH, VALUE, PROPERTY Main topic – XQuery compilation Architecture XML operators Mapping XML operators to relational+ ops Conclusions
VLDB Sep 1S. Pal et al.3 Create table DOCS ( ID int primary key, XDOC xml) XML stored in an internal, binary form (‘blob’) Optionally typed by a collection of XML schemas Used for storage and query optimizations 3 of 5 methods on XML data type: query(): returns XML type value(): returns scalar value exist(): checks conditions on XML nodes XML indexing More information at Background XML Support in SQL Server 2005
VLDB Sep 1S. Pal et al.4 Background XQuery embedded in SQL Retrieve section titles from wrapped in new elements: SELECT ID, XDOC.query(' for $s in /BOOK/SECTION return {data($s/TITLE)} ') FROM DOCS
VLDB Sep 1S. Pal et al.5 Background XQuery – supported features XQuery clauses “for”, “where”, “return” and “order by” XPath axes – child, descendant, parent, attribute, self and descendant-or-self Functions – numeric, string, Boolean, nodes, context, sequences, aggregate, constructor, data accessor SQL Server extension functions to access SQL variable and column data within XQuery Numeric operators (+, -, *, div, mod) Value comparison operators (eq, ne, lt, gt, le, ge) General comparison operators (=, !=,, =)
VLDB Sep 1S. Pal et al.6 Background [SIGMOD04] ORDPATH Label of Nodes BOOK1 Section1.3 Figure1.3.3Title1.3.1 Section1.5 node 1 precedes node 2 in document order ORDPATH (node 1 ) < ORDPATH (node 2 ) node 1 is ancestor of node 2 ORDPATH (node 1 ) is prefix of ORDPATH (node 2 ) ORDPATH(1.3) ≤ id < Descendant_Limit (1.3) = 1.4
VLDB Sep 1S. Pal et al.7 Background [VLDB 2004] Indexing XML column Primary XML index on an XML column Creates B+tree tree on data model content of the XML nodes Adds column Path_ID for the reversed, encoded path from each XML node to root of XML tree OrdPath labeling schema is used for XML nodes Relative order of nodes Document hierarchy
VLDB Sep 1S. Pal et al.8 Background XML example INSERT INTO myTable VALUES (7, ‘ Bad Bugs Tree frogs … ’)
VLDB Sep 1S. Pal et al.9 Background Primary XML Index Entries IDORDPATHTAGNODETYPEVALUEPATH_ID 711 (Book)10 (ns:bT)NULL# (ISBN)2 (xs:string)' …'#2# (Section)11 (ns:sT)NULL#3# (Title)2 (xs:string)'Bad Bugs'#4#3# (Figure)12 (ns:fT)NULL#5#3# (Section)11 (ns:sT)NULL#3# (Title)2 (xs:string)'Tree frogs'#4#3# (Figure)12 (ns:fT)NULL#5#3#1 Clustering key - Encoding of tags & types stored in system meta-data - Additional details not shown
VLDB Sep 1S. Pal et al.10 Background Secondary XML indexes To speed up different classes of commonly occurring queries Statistics created on key columns of the primary and secondary XML indexes Used for cost-based selection of secondary XML indexes PATHpath-based queriesPATH_ID, VALUE, ID, ORDPATH VALUEvalue-based queries VALUE, PATH_ID, ID, ORDPATH PROPERTYObject propertiesID, PATH_ID, VALUE, ORDPATH
VLDB Sep 1S. Pal et al.11 Background Handling Types If XML column is typed Values are stored in XML blob and XML indexes with appropriate typing Untyped XML Values are stored as strings Convert to appropriate types for operations SQL typed values stored in primary XML index Most SQL types are compatible with XQuery types (integer) Value comparisons on XML index columns suffice Some types (e.g. xs:datetime) are stored in internal format and processed specially
VLDB Sep 1S. Pal et al.12 XQuery Processing Architecture XQuery Compiler: Parses XQuery expr Checks static type correctness Type annotations Applies static optimiztns Path collapsing Rewrites using XML schemas XML Operator Mapper Recursively traverses XML algebra tree Converts each XmlOp to reln+ operator sub-tree Mapping depends upon existence of primary XML index XQuery expression XQuery Compiler XML algebra tree (XmlOp ops) XML Operator Mapper Relational Operator Tree (relational+ operators) Reln Query Processor
VLDB Sep 1S. Pal et al.13 Examples of XML Operators XmlOp_Select In: list of items, condition Out: items satisfying condition XmlOp_Path In: simple paths, no predicates Opt: path context to collapse paths Out: eligible XML nodes XmlOp_Apply In: two item lists Out: one item list Variable binding in “for” expression XmlOp_Construct In: sub-nodes for element construction, otherwise value Out: constructed node
VLDB Sep 1S. Pal et al.14 XML Operator Mapping – Overview XMLPK XQUERY PK REL+ tree Primary XML Index PATH Index VALUE Index PROPERTY Index OrdPath Special handling for SELECT * | XDOC
VLDB Sep 1S. Pal et al.15 New operators Some produce N rows from M (≠ N) rows XML_Reader – streaming, pull-model XML parser XML_Serializer – to serialize query result as XML Some are for efficiency Contains – to evaluate XQuery contains() TextAdd – to evaluate the XQuery function string() Data – to evaluate XQuery data() function Some are for specific needs Check – validate XML during insertion or modification
VLDB Sep 1S. Pal et al.16 XML Operator Mapping Following categories: Mapping of XPath expressions Mapping of XQuery expressions Mapping of XQuery built-in functions
VLDB Sep 1S. Pal et al.17 XPath Expressions Two cases: Fully known, forward paths without branching after path collapsing Paths without branching that are not fully known after path collapsing Segments of the path cannot be collapsed or a path is split into multiple segments Occurs most commonly for paths containing wildcard steps, //, self and parent axes Evaluated using LIKE operator on XML idx
VLDB Sep 1S. Pal et al.18 Non-indexed XML, Full Path XML_Reader produces subtrees of Node table rows Contains OrdPath No PK or PATH_ID XML_Serialize reassembles those row into XML data type To output result XML operator tree: XmlOp_Path PATH = “/BOOK/SECTION” “/BOOK/SECTION” Rel+ operator tree: XML_Serialize XML_Reader (XDOC, “/BOOK/SECTION”)
VLDB Sep 1S. Pal et al.19 Query Execution on XML Blob XDOC column value in each row parsed at runtime Parser is XmlReader (not DOM) Evaluate simple XPath (without branching) during parsing Rest of processing done in memory using relational operators // and * are also pushed into XML_Reader SELECT ID, XDOC.query (' /BOOK/SECTION [2] ') FROM DOCS
VLDB Sep 1S. Pal et al.20 Sample query execution using Primary XML Index IDORDPATHTAGNODETYPEVALUEPATHID 711 (Book)10 (ns:bT)NULL# (ISBN)2 (xs:string)' …'#2# (Section)11 (ns:sT)NULL#3# (Title)2 (xs:string)'Bad Bugs'#4#3# (Figure)12 (ns:fT)NULL#5#3# (Section)11 (ns:sT)NULL#3# (Title)2 (xs:string)'Tree frogs'#4#3# (Figure)12 (ns:fT)NULL#5#3#1 Clustering key /Book/Section #3#1 (by XML Op Mapper) /Book/Section #3#1 (by XML Op Mapper)
VLDB Sep 1S. Pal et al.21 Indexed XML, Full Path XmlOp_Path mapped to SELECT GET(PXI) – rows from primary XML index Match PATH_ID Not shown: JOIN with base table on PKXML_Serialize Apply Select ($b) GET(PXI) Path_ID=#SECTION#BOOK $b.OrdP ≤ OrdP< DL($b) GET(PXI) Select Assemble Subtree
VLDB Sep 1S. Pal et al.22 XML index – PATH PATH_IDVALUEIDORDPATH #1NULL71 #2#1' …'71.1 #3#1NULL71.3 #3#1NULL71.5 #4#3#1'Bad Bugs' #4#3#1'Tree frogs' #5#3#1NULL #5#3#1NULL Speeds up path evaluations Example – /Book/Section #3#1
VLDB Sep 1S. Pal et al.23 Indexed XML, Imprecise Paths /BOOK/SECTION// TITLE Matched using LIKE operator on Path_ID Apply Select ($s) GET(PXI) Path_ID LIKE #TITLE%#SECTION#BOOK XML_Serialize Assemble subtree of Assemble subtree of
VLDB Sep 1S. Pal et al.24 SBN#BOOK & VALUE=“12” & Par($b) Predicate Evaluation = “12”] Search value compared with VALUE column in PXI Collapsed path Induce index seeks Reduce intermediate result size Parent check – Par($b) Using OrdPath Value conversion might be neededXML_Serialize Apply Select GET(PXI) Apply Select ($b) GET(PXI) Path_ID= #BOOK Assemble subtree of Assemble subtree of
VLDB Sep 1S. Pal et al.25 Ordinal Predicate /BOOK[n] Adds ranking column to the rows for elements Retrieves the nth node Special optimizations [1] TOP 1 ascending [last()] TOP 1 descending Avoids sorting when input is sorted Example – in XML_Serializer
VLDB Sep 1S. Pal et al.26 Error handling Static type errors at compilation time Raises static type errors if an expression could fail at runtime due to type safety violation Addition of string to integer Querying non-existent node name in typed XML Non-singleton in “eq” Some can be fixed using explicit cast or ordinal specification Dynamic error converted to empty sequence Yields correct result in predicates without negations
VLDB Sep 1S. Pal et al.27 “for” Iterator Path_ID LIKE #BK & VALUE >= 3 & Par($s) Select Select ($s) GET (PXI) Path_ID LIKE #SECTION%#BOOK Exists GET(PXI) Select XML_Serialize Assemble Path_ID LIKE #TITLE#SECTION% #BOOK & Par($s) Apply ($s) Apply for $s in /BOOK//SECTION where >= 3 return $s/TITLE XML op for “for” is XmlOp_Apply Maps to APPLY Binds $s and iterates over Determines its children Nested “for” and “for” with multiple bindings turn into nested APPLY Each APPLY binds to a different variable
VLDB Sep 1S. Pal et al.28 XQuery “order by” and “where” Order by: Sorts rows based on order-by expression Adds a ranking column to these rows Ranking column converted into OrdPath values Yield the new order of the rows Fits rest of query processing framework Where Becomes SELECT on input sequence Filters rows satisfying specified condition
VLDB Sep 1S. Pal et al.29 XQuery “return” Return nodes sequence in document order Use OrdPath values and XML_Serialize operator New element and sequence constructions Merge constructed and existing nodes into a single sequence (SWITCH_UNION)
VLDB Sep 1S. Pal et al.30 XQuery Functions & Operators Built-in fn and op are mapped to relational fn and op if possible fn:count() count() Additional support for XQuery types, functions and operators that cannot be mapped directly Intrinsics
VLDB Sep 1S. Pal et al.31 Optimizations Exploiting Ordered Sets Sorting information (OrdPath) made available to further relational operators XML_Serialize is an example Using static type information Eliminates CONVERT() in operations Allows range scan on VALUE index
VLDB Sep 1S. Pal et al.32 Conclusions Built-up infrastructure for query processing framework Other XQuery features (such as “let” and typeswitch) can be implemented Data modification language Fits into relational query processing framework XQuery features can be implemented using rel++ operators Optimizations pose the biggest challenges More cost-based optimizations can be done Enhanced costing model (e.g. choice of PXI) Matching materialized views
VLDB Sep 1S. Pal et al.33 Thank you!