Order-sensitive XML Query Processing over Relational Sources: An Algebraic Approach Authors: Ling Wang, Song Wang, Brian Murphy and Elke A. Rundensteiner Institute: Database Systems Research Group, Worcester Polytechnic Institute (WPI) IDEAS’2005
IDEAS’05 2 Order in XML Order is important to XML Document order XML view can be ordered (OrderBy) … User query can be order-sensitive (OrderBy, position(), range()…) SXE Revenge Shutdown FOR $play in document(“record.xml")/PLAY OrderBy $play/band RETURN $play[3]/SONG[rang 1 to 2]/text() 1. Sort PLAY by its band’s name 2. Find third PLAY 3. Extract its first and second SONG Misfits She Back Street Boy Bullet We Are 138 Project X SXE Revenge Shutdown
IDEAS’05 3 Why XML-to-SQL? XML is stored in relational database to … provide reliable persistent storage exploit mature technologies XML-to-SQL Systems SilkRoute (AT&T), XPERANTO (IBM), RAINBOW (WPI), Rolex (BellLab), Agora, MARS Oracle XML DB, Microsoft SQL Server 2000 SQLXML, IBM DB2 XML Extender
IDEAS’05 4 XML Views support XML view mechanism for XML data publishing support queries (updates) over XML views XML publishing scenario Relational model is not order-sensitive Order in XML views over RDB has no meaning XML storage scenario Order is essential !!! Order-preserving loading –XML document Relational database –Implicit order in XML document explicit order code in RDB Order-restoring in extraction views –Explicit order code in RDB implicit order in XML view through view query XML Views XML RDB XML View User query Order encoding
IDEAS’05 5 Order-specific loading Order-specific loading: Loading strategies: Inline, edge, … Order encoding methods: Global, local, dewey …
IDEAS’05 6 Example <xs:element name="PLAY" minoccurs="1" maxOccurs="unbounded"> <xs:element name="SONG" type="xs:string" minoccurs="1"/> Misfits She Back Street Boy Bullet We Are 138 Project X SXE Revenge Shutdown XML schema XML document
IDEAS’05 7 IIDPIDPOSITION 101 RECORDLIST IIDPIDPOSITIONBAND_PCDATA 211Misfits 312Back Street Boy 413Project X PLAY IIDPIDPOSITIONSONG_PCDATA 521She 631Bullet 732We Are SXE Revenge 942Shutdown SONG Relational Database Inline loading + local order encoding Example FOR $play IN document("dxv.xml")/PLAY/ROW ORDER BY $play/POSITION/text() RETURN $play/BAND_PCDATA FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN $song/SONG_PCDATA/text() View query
IDEAS’05 8 Motivation Many loading + Encoding combinations are possible … {inline, edge, …} * {local, global, dewey…} Hybrid of multiple loading and encoding may occur: Loading: –Schema is available --- inline –Schema is not available --- edge Order-encoding –Heavy update workload --- dewey –Query workload --- global Multiple XML documents are loaded into RDB Other loading and encoding methods may emerge in future Conclude: Need general approach for XQuery-to-SQL translation
IDEAS’05 9 XSOT XML-to-SQL Order-sensitive Translation (XSOT): Step1: Encode XML document with explicit order code (order-exposing) Step 2: Load XML to relational database (order-preserving) Step 2: Extract XML view from relational database (order-restoring) Step 3: Query via XML view with order predicates (order-sensitive)
IDEAS’05 10 XQuery Parser Default XML Schema Default XML View Web/Intranet Application User Sub- System Process Data Query flow Data flow Legend XAT Generator User XAT View Composer XAT Optimizer View XAT SQL XML Result Ordered Tuple Streams XML Schema XML Data View Query XML Generator XAT RDBMS SQL Generator Mapping Manager XQuery Engine DB2OracleSQL Server Loading XQuery Schema generation Data Loading Order Encoding XQuery Data Extracting XML Source Wrapper Default XML View Order-Sensitive User Query Composed XAT Optimized XAT Order Code Comparison Function Sybase XSOT Framework
IDEAS’05 11 IIDPIDPOSITION 101 RECORDLIST IIDPIDPOSITIONBAND_PCDATA 211Misfits 312Back Street Boy 413Project X PLAY IIDPIDPOSITIONSONG_PCDATA 521She 631Bullet 732We Are SXE Revenge 942Shutdown SONG We are 138 Shutdown FOR $record in document(“record.xml") RETURN $record/PLAY/SONG[2]/text() Find second song of each play FOR $play IN document("dxv.xml")/PLAY/ROW ORDER BY $play/POSITION/text() RETURN $play/BAND_PCDATA FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN $song/SONG_PCDATA/text() View query Relational Database Inline loading + local order encoding Running Example
IDEAS’05 12 Order-sensitive XML Algebra Tree XSOT methodology: An algebraic approach XML Algebra Tree (XAT) XAT operators –Select, CartesianProduct, ThetaJoin, LeftOuterJoin, Distinct, GroupBy, OrderBy –Source, Navigate, Combine, Tagger XAT Order Extension –Position() –Range() Composition of the view and user XAT
IDEAS’05 13 View Query XAT FOR $play IN document("dxv.xml")/PLAY/ROW ORDER BY $play/POSITION/text() RETURN $play/BAND_PCDATA FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN $song/SONG_PCDATA/text() Combine $dataPlayTag Tagger $dataSongTag $dataPlayTag GroupBy $play Combine $dataSongTag Navigate $song, SONG_PCDATA/text() $sData Tagger $sData $dataSongTag Navigate $song, POSITION/text() $sPos OrderBy $sPos GroupBy $play Source “dxv.xml” $S Navigate $S, SONG/ROW $song Navigate $song, PID/text() $sPID ThetaJoin $pIID=$sPID Source “dxv.xml” $P Navigate $P, PLAY/ROW $play Navigate $play, POSITION/text() $pPos OrderBy $pPos Tagger $dataPlayTag $record Navigate $play, IID/text() $pIID
IDEAS’05 14 User Query XAT FOR $record in document(“record.xml") RETURN $record/PLAY/SONG[2]/text() GroupBy $record, $uPlay Combine $uDataSongTag Tagger $uDataSongTag $result Tagger $uSongData $uDataSongTag Navigate $uRecord, PLAY $uPlay Navigate $uPlay, SONG $uSong Navigate $uSong, text() $uSongData Select $uNumPos=2 Source “record.xml” $P POS $uSong $uNumPos
IDEAS’05 15 GroupBy $record, $uPlay Combine $uDataSongTag Tagger $uDataSongTag $result Tagger $uSongData $uDataSongTag Navigate $uRecord, PLAY $uPlay Navigate $uPlay, SONG $uSong Navigate $uSong, text() $uSongData Select $uNumPos=2 Source “record.xml” $P POS $uSong $uNumPos User XAT $P=$record Composed XAT Combine $dataPlayTag Tagger $dataSongTag $dataPlayTag GroupBy $play Combine $dataSongTag Navigate $song, SONG_PCDATA/text() $sData Tagger $sData $dataSongTag Navigate $song, POSITION/text() $sPos OrderBy $sPos GroupBy $play Source “dxv.xml” $S Navigate $S, SONG/ROW $song Navigate $song, PID/text() $sPID ThetaJoin $pIID=$sPID Source “dxv.xml” $P Navigate $P, PLAY/ROW $play Navigate $play, POSITION/text() $pPos OrderBy $pPos Tagger $dataPlayTag $record Navigate $play, IID/text() $pIID View XAT
IDEAS’05 16 XAT Optimization – Order Explicit Why? Order in user XAT depends on the implicit order in the view It blocks further optimization: Computation push down
IDEAS’05 17 XAT Optimization – Order Explicit Tagger $dataSongTag $dataPlayTag GroupBy $play Combine $dataSongTag Tagger $sData $dataSongTag View XAT construct SONG construct PLAY GroupBy $record, $uPlay Select $uNumPos=2 POS $uSong $uNumPos For each PLAY Sort SONGs Pick second song User XAT Depend on Cannot push down! Cannot translated into SQL!
IDEAS’05 18 XAT Optimization – Order Explicit Goal: Convert user query order FROM: implicit order in the XML view TO: Explicit order-code column in relational encoding POS $uSong = POS $sPos POS $uSong $uNumPos GroupBy $record, $uPlay View Portion XAT Navigate $song, POSITION/text() $sPos OrderBy $sPos GroupBy $play POS $sPos $uNumPos GroupBy $play User Portion XAT View Portion XAT Navigate $song, POSITION/text() $sPos OrderBy $sPos GroupBy $play User Portion XAT
IDEAS’05 19 SQL-oriented XAT optimization Goal: Optimize XAT for efficient order-sensitive SQL generation Rules: Computation push-down –Push as much as possible to RDB Order pull-up –Sort as late as possible –Avoid re-sorting !!! Order-step rewrite –Match RDB order template
IDEAS’05 20 Optimized XAT Navigate $song, POSITION/text() $sPos OrderBy $sPos GroupBy $play Source “dxv.xml” $S Navigate $S, SONG/ROW $song Navigate $song, PID/text() $sPID ThetaJoin $pIID=$sPID Source “dxv.xml” $P Navigate $P, PLAY/ROW $play Navigate $play, POSITION/text() $pPos Navigate $play, IID/text() $pIID OrderBy $sPos, $pPos 4 GroupBy $pPos Combine $uDataSongTag Tagger $uDataSongTag $result Tagger $sData $uDataSongTag Select $uNumPos= POS $sPos $uNumPos Navigate $song, SONG_PCDATA/text() $sData 13 Computation push down Order pull up OrderStep rewrite OrderStep [$pPos], [$pPos, $sPos] $uNumPos
IDEAS’05 21 Navigate $song, POSITION/text() $sPos OrderStep [$pPos], [$pPos, $sPos] $uNumPos Source “dxv.xml” $S Navigate $S, SONG/ROW $song Navigate $song, PID/text() $sPID ThetaJoin $pIID=$sPID Source “dxv.xml” $P Navigate $P, PLAY/ROW $play Navigate $play, POSITION/text() $pPos Navigate $play, IID/text() $pIID OrderBy $sPos, $pPos 4 Combine $uDataSongTag Tagger $uDataSongTag $result Tagger $sData $uDataSongTag Select $uNumPos= Navigate $song, SONG_PCDATA/text() $sData 13 Optimized XAT
IDEAS’05 22 TEMPLATE: SELECT row_number() over ( ? ) $pos_func_binding FROM + PARTITION: partition by ORDERBY: order by | TONUMBER: to_number( ) ELEMENT: element name TABLE: table name | TEMPLATE Order Template SQL-99 standard Oracle, DB2 … Order Template
IDEAS’05 23 Order-sensitive SQL generation About push-down strategies In general ---- push as much computation as possible into relational engine. In order scenario --- tradeoff Deep push: –Push OrderStep into Relational Engine –Relational engine has to support order template (SQL99) Q5 = SELECT Q2.sData FROM (SELECT Q1.pPos, Q1.sPos, Q1.sData, row_number() OVER (PARTITION BY Q1.pPos ORDER BY Q1.sPos) uNumPos FROM (SELECT P.POSITION AS pPos, S.POSITION AS sPos, S.SONG_PCDATA AS sData FROM PLAY P, SONG S WHERE P.IID = S.PID ) Q1 ) Q2 WHERE Q2.uNumPos = 2 ORDER BY Q2.pPos, Q2.sPos SQL Q5 $sData 32 Combine $uDataSongTag Tagger $uDataSongTag $result Tagger $sData $uDataSongTag
IDEAS’05 24 Order-sensitive SQL generation Shallow push (otherwise) –leave OrderStep outside RDB –No requirement for Relational engine for supporting order template (SQL99) SELECT P.POSITION AS pPos, S.POSITION AS sPos, S.SONG_PCDATA AS sData FROM PLAY P, SONG S WHERE P.IID = S.PID OrderStep [$pPos], [$pPos, $sPos] $uNumPos 29 OrderBy $sPos, $pPos 4 Combine $uDataSongTag Tagger $uDataSongTag $result Tagger $sData $uDataSongTag Select $uNumPos= SQL Q1 $sData
IDEAS’05 25 Deep Push vs. Shallow Push Low selectivity similar High selectivity Shallow push is better Repeated sorting in deep push is expensive!
IDEAS’05 26 Experimental Study SQL Execution time --- Global vs. Local order encoding
IDEAS’05 27 Discussion: Further SQL optimization General SQL optimization can be applied… Cost-based SQL translation (SilkRoute) Any other SQL optimization… When order encoding is assumed… SQL statements can be simplified by avoiding re-ordering When relational database schema is aware … Schema specific SQL optimization [KKN2002]
IDEAS’05 28 Related Work XQuery-to-SQL translation systems: XPERANTO, SilkRoute, … [TVB2002] I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and Querying Ordered XML Using a Relational Database System. In SIGMOD, Three order encoding methods are utilized Algorithms of translating ordered XPath expressions into SQL But … [KKN2002] R. Krishnamurthy, R. Kaushik, and J. F. Naughton. Optimizing Fixed-Schema XML to SQL Query Translation. In VLDB, 2002.
IDEAS’05 29 Conclusion Propose a general framework for order-sensitive XQuery-to-SQL translation (XSOT) Propose order-sensitive XML algebra Tree (XAT) SQL-oriented order-sensitive XAT optimization Efficient order SQL statements generation and optimization techniques Implementation using Rainbow query engine Experiments to verify the generality and SQL performance
IDEAS’05 30 Rainbow XML Management System Rainbow website: Software download Thank you!