Download presentation
Presentation is loading. Please wait.
1
WIDM 2002 DSRG, Worcester Polytechnic Institute1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke A. Rundensteiner
2
WIDM 2002DSRG, Worcester Polytechnic Institute2 XML and Relational XML Flexible and powerful way to: 1)Represent data on the web 2)Exchange data between applications Relational Database 1) Widely used to store business data 2) Efficient, reliable, secure 3) Provides standard querying (SQL) The look and feel of an XML query system combined with the maturity and technology support of RDB +
3
WIDM 2002DSRG, Worcester Polytechnic Institute3 Tuples XAT Merger SQL Generator RDBMS User XQuery SQL XAT Generator XAT Executor User Query Results in XML XAT Optimizer XAT View XQuery XAT Decorrelator View XAT User XAT Architecture XAT XAT: XML Algebra Tree Virtual XML Document View XAT User XAT XAT Virtual XML Document XML Document
4
WIDM 2002 DSRG, Worcester Polytechnic Institute4 GOAL: XQuery level optimization
5
WIDM 2002DSRG, Worcester Polytechnic Institute5 TCP/IP Illustrated Data on the Web Running Example Data on the Web002 TCP/IP Illustrated001 TitleBid 34.95002 65.95001 PriceBid 001 65.95 002 34.95 001 TCP/IP Illustrated 002 Data on the Web FOR $t IN document(“prices.xml”)/book/title RETURN $t TCP/IP Illustrated 65.95 Data on the Web 34.95 FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN $book/title, $prices/price
6
WIDM 2002DSRG, Worcester Polytechnic Institute6 T $t col3 Agg S ”prices.xml” R0 R0, book/title $ t col3 1: 2: 3: 6: 7: User Query User XML Algebra Tree (XAT) FOR $t IN document(“prices.xml”)/book/title RETURN $t XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT
7
WIDM 2002DSRG, Worcester Polytechnic Institute7 $book, title col10 T col5 col4 S “dxv.xml” R1 R1, /book/row $book Agg T [col10][col12] col5 S “dxv.xml” R3 R3, /prices/row $prices $prices, price col12 11: 12: 22: 23: 25: 14: 15: 20: 21: 31: $book, bid col6 $prices, bid col7 27: 28: col6=col7 26: View Query View XML Algebra Tree (XAT) FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN $book/title, $prices/price XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT
8
WIDM 2002DSRG, Worcester Polytechnic Institute8 T $t col3 Agg col4 R0 R0, book/title $ t col3 1: 2: 3: 6: 7: $book, title col10 T col5 col4 S “dxv.xml” R1 R1, /book/row $book Agg T [col10][col12] col5 S “dxv.xml” R3 R3, /prices/row $prices $prices, price col12 11: 12: 22: 23: 25: 14: 15: 20: 21: 31: $book, bid col6 $prices, bid col7 27: 28: col6=col7 26: User Query View Query Merged XML Algebra Tree (XAT) XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT
9
WIDM 2002DSRG, Worcester Polytechnic Institute9 Outline XAT Optimization: XAT Rewrite XAT Cleanup Preliminary Evaluation Related Work Summary
10
WIDM 2002DSRG, Worcester Polytechnic Institute10 XAT Rewrite Query Optimization at Logic Level. Goal: Redundancy Elimination. Computation Pushdown. Technique: Equivalence Rewrite Rules. Heuristics: Pushdown Navigates Remove Construction of Intermediate Result Combine Multiple Operators. XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT
11
WIDM 2002DSRG, Worcester Polytechnic Institute11 T $t col3 Agg col4 R0 R0, book/title $ t col3 1: 2: 3: 6: 7: $book, title col10 T col5 col4 S “dxv.xml” R1 R1, /book/row $book Agg T [col10][col12] col5 S “dxv.xml” R3 R3, /prices/row $prices $prices, price col12 11: 12: 22: 23: 25: 14: 15: 20: 21: 31: $book, bid col6 $prices, bid col7 27: 28: col6=col7 26: User QueryView Query Before Navigation Pushdown
12
WIDM 2002DSRG, Worcester Polytechnic Institute12 31: $book, bid col6 27: R1, /book/row $book 14: S “dxv.xml” R1 15: $book, title col10 23: $prices, bid col7 28: R3, /prices/row $prices 20: S “dxv.xml” R3 21: $prices, price col12 25: T $t col3 Agg col3 1: 2: 3: R0, book/title $t 6: col6=col7 26: T col5 R0 11: Agg 12: T [col10][col12] col5 22: After Navigation Pushdown View QueryUser Query
13
WIDM 2002DSRG, Worcester Polytechnic Institute13 After Tagger Cancel Out JOIN col6=col7 31: $book, bid col6 27: R1, /book/row $book 14: S “dxv.xml” R1 15: $book, title $t 23: $prices, bid col7 28: R3, /prices/row $prices 20: S “dxv.xml” R3 21: $prices, price col12 25: col3 1: T $t col3 2: Agg 3: View QueryUser Query
14
WIDM 2002DSRG, Worcester Polytechnic Institute14 Outline XAT Optimization XAT Rewrite XAT Cleanup Preliminary Evaluation Related Work Summary
15
WIDM 2002DSRG, Worcester Polytechnic Institute15 XAT Cleanup Why: SQL engine cannot reduce redundancy in XQuery. How: Data Redundancy by Schema Cleanup Each operator produced, consumed and modified some columns. Minimum schema is then computed. Tree Redundancy by Unused Operator Cutting Cutting matrix generation. Required columns analysis. Operator cutting. XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT
16
WIDM 2002DSRG, Worcester Polytechnic Institute16 XAT Operator Properties Produced Desc: New column generated by operator. Example: , S, T Consumed Desc: Columns required by operator. Example: , Modified Desc: Columns modified by operator. Example: , ,
17
WIDM 2002DSRG, Worcester Polytechnic Institute17 Schema Computation {R3}{}{R3}2021 {R3, $prices}{R3}{$prices}2820 {R3, $prices, col7}{$prices}{col7}2528 {R3, $prices, col7, col12}{$prices}{col12}3125 {R1}{}{R1}1415 {R1, $book}{R1}{$book}2714 {R1, $book, col6}{$book}{col6}2327 {R1, $book, col6, $t}{$book}{$t}3123 {R1, $book, col6, $t, R3, $prices, col7, col12} {col6, col7}{}331 {R1, $book, col6, $t, R3, $prices, col7, col12} {} 23 {col3, R1, $book, col6, $t, R3, $prices, col7, col12} {$t}{col3}12 {}1 Old SchemaConsumedProducedParentNode $book, title $t S “dxv.xml” R1 R1, /book/row $book col6=col7 S “dxv.xml” R3 R3, /prices/row $prices $book, bid col6 $prices, bid col7 $prices, price col12 T $t col3 Agg col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3:
18
WIDM 2002DSRG, Worcester Polytechnic Institute18 Schema Computation {R3} P2021 {$prices} CP2820 {$prices, col7} CP2528 {col7, col12} CP3125 {R1} P1415 {$book} CP2714 {$book, col6} CP2327 {col6, $t} CP3123 {$t} CC331* {$t} 23 {col3} CP12 C1 New Schema R3$pricescol12R1$bookcol7col6$tcol3Parent()# *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. Intuition: Don’t keep anything that’s not used later. $book, title $t S “dxv.xml” R1 R1, /book/row $book col6=col7 S “dxv.xml” R3 R3, /prices/row $prices $book, bid col6 $prices, bid col7 $prices, price col12 T $t col3 Agg col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3:
19
WIDM 2002DSRG, Worcester Polytechnic Institute19 Schema Cleanup Result Node Original SchemaMinimum Schema 1 {col3, R1, $book, col6, $t, R3, $prices, col7, col12}{col3} 2 {col3, R1, $book, col6, $t, R3, $prices, col7, col12}{col3} 3 {R1, $book, col6, $t, R3, $prices, col7, col12}{$t} 31 {R1, $book, col6, $t, R3, $prices, col7, col12}{$t} 23 {R1, $book, col6, $t}{col6, $t} 27 {R1, $book, col6}{$book, col6} 14 {R1, $book}{$book} 15 {R1} 25 {R3, $prices, col7, col12}{col7, col12} 28 {R3, $prices, col7}{$prices, col7} 20 {R3, $prices}{$prices} 21 {R3}
20
WIDM 2002DSRG, Worcester Polytechnic Institute20 XAT Cleanup Schema Cleanup Each operator produced, consumed and modified some columns. Minimum schema is then computed. Unused Operator Cutting Cutting matrix generation. Required columns analysis. Operator cutting.
21
WIDM 2002DSRG, Worcester Polytechnic Institute21 Cutting Matrix Purpose: Get rid of the unused operators. Equations: Propagation of modified Propagation of required Identify cuttable node.
22
WIDM 2002DSRG, Worcester Polytechnic Institute22 Matrix Computation #Parent()col3$tcol6col7$bookR1col12$pricesR3Cut? 1C 21PC 32--------- 31*3CC 2331PC 2723PC 1427PC 1514P 2531PC 2825PC 2028PC 2120P *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. $book, title $t S “dxv.xml” R1 R1, /book/row $book JOIN col6=col7 S “dxv.xml” R3 R3, /prices/row $prices $book, bid col6 $prices, bid col7 $prices, price col12 T $t col3 Agg col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3:
23
WIDM 2002DSRG, Worcester Polytechnic Institute23 Matrix Computation (Cont.1) P2021 CP2820 CP2528 CP3125 P1415 CP2714 CP2327 CP3123 CC331* -------M-23 CP12 RRRR1 Cut?R3$pricescol12R1$bookcol7col6$tcol3Parent()# *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. $book, title $t S “dxv.xml” R1 R1, /book/row $book JOIN col6=col7 S “dxv.xml” R3 R3, /prices/row $prices $book, bid col6 $prices, bid col7 $prices, price col12 T $t col3 Agg col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3: Intuition: Give me only the required columns in order to get the final result.
24
WIDM 2002DSRG, Worcester Polytechnic Institute24 Matrix Computation (Cont. 2) #Parent()col3$tcol6col7$bookR1col12$pricesR3Cut? 1RRRR 21PC 32-M------- 31*3CCX 2331PC 2723PCX 1427PC 1514P 2531PCX 2825PCX 2028PCX 2120PX *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. $book, title $t S “dxv.xml” R1 R1, /book/row $book JOIN col6=col7 S “dxv.xml” R3 R3, /prices/row $prices $book, bid col6 $prices, bid col7 $prices, price col12 T $t col3 Agg col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3:
25
WIDM 2002DSRG, Worcester Polytechnic Institute25 XAT after Cutting $book, title $t S “dxv.xml” R1 R1, /book/row $book Agg col3 14: 15: 23: 1: 3: T $t col3 2: $book, title $t S “dxv.xml” R1 R1, /book/row $book JOIN col6=col7 S “dxv.xml” R3 R3, /prices/row $prices $book, bid col6 $prices, bid col7 $prices, price col12 T $t col3 Agg col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3: Reduced To
26
WIDM 2002DSRG, Worcester Polytechnic Institute26 SQL Generated $book, title $t S “dxv.xml” R1 R1, /book/row $book Agg col3 14: 15: 23: 1: 3: T $t col3 2: $book, title $t S “dxv.xml” R1 R1, /book/row $book JOIN col6=col7 S “dxv.xml” R3 R3, /prices/row $prices $book, bid col6 $prices, bid col7 $prices, price col12 T $t col3 Agg col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3: SELECT “$book”.title as “$t”, “$book”.bid as “col6”, “$prices”.price as “col12”, “$prices”.bid as “col7” FROMbook “$book”, prices “$prices” WHERE“col6”=“col7” SELECT “$book”.title as “$t”, FROMbook “$book”, XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT
27
WIDM 2002DSRG, Worcester Polytechnic Institute27 Outline XAT Optimization XAT Rewrite XAT Cleanup Preliminary Evaluation Related Work Summary
28
WIDM 2002DSRG, Worcester Polytechnic Institute28 Preliminary Evaluation Experiment Setup XQuery over Kweelt Parser PIII800 256 MB, Win 2k Pro. Data Setup Synthetic Data Synthetic Queries Query Execution Native XML Engine.
29
WIDM 2002DSRG, Worcester Polytechnic Institute29 Performance Gain in Execution
30
WIDM 2002DSRG, Worcester Polytechnic Institute30 Query Engine Overhead XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT XAT Rewrite XAT Cleanup Total: 32,522 ms
31
WIDM 2002DSRG, Worcester Polytechnic Institute31 Outline XAT Optimization XAT Rewrite XAT Cleanup Preliminary Evaluation Related Work Summary
32
WIDM 2002DSRG, Worcester Polytechnic Institute32 Related Work Rainbow: Optimize on XAT. (static analysis) Algebra level rewriting. SQL Optimization Algebra based optimization. Static analysis. XQuery by Views: Optimize in SQL. XPERANTO[VLDBJ2000]: XQGM vs. XAT Extension by UDFs for XML features. SilkRoute[IEEE2001(24:2)]: Generate SQL Efficiently. AGORA[VLDB2000]: Syntax level rewriting.
33
WIDM 2002DSRG, Worcester Polytechnic Institute33 Summary Efficient XQuery Processing XML Algebra Tree (XAT) XAT Optimization: Rewrite by using equivalent rules Cleanup Schema cleanup Operator cutting Prototype system implementation.
34
WIDM 2002 DSRG, Worcester Polytechnic Institute34 Questions? (Futures!) http://davis.wpi.edu/dsrg/rainbow https://sourceforge.net/projects/rainbow-engine/ Special Thanks: Brian Murphy, Luping Ding, DSRG group.
35
WIDM 2002DSRG, Worcester Polytechnic Institute35 XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT
36
WIDM 2002DSRG, Worcester Polytechnic Institute36 Schema Computation NodeParentProducedConsumedMinimum Schema 1{}{col3} 21 {$t}{col3} 32{} {$t} 313{}{col6, col7}{$t} 2331{$t}{$book}{col6, $t} 2723{col6}{$book}{$book, col6} 1427{$book}{R1}{$book} 1514{R1}{}{R1} 2531{col12}{$prices}{col7, col12} 2825{col7}{$prices}{$prices, col7} 2028{$prices}{R3}{$prices} 2120{R3}{}{R3} $book, title $t S “dxv.xml” R1 R1, /book/row $book col6=col7 S “dxv.xml” R3 R3, /prices/row $prices $book, bid col6 $prices, bid col7 $prices, price col12 T $t col3 Agg col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3:
37
WIDM 2002DSRG, Worcester Polytechnic Institute37 col3 1: T $t col3 2: Agg 3: col6=col7 26: After Tagger Cancel Out 31: $book, bid col6 27: R1, /book/row $book 14: S “dxv.xml” R1 15: $book, title $t 23: $prices, bid col7 28: R3, /prices/row $prices 20: S “dxv.xml” R3 21: $prices, price col12 25: View Query User Query
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.