1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin
2 Outline 1. Overview of Rainbow System 2. Process of translating XQuery -> SQL 3. XML Operators 4. Partial translation walkthrough with running example
3 Rainbow System Complete XML SQL system Uses some ideas from XPERANTO, Niagara, and other systems Several main subsystems: Document Shredder View Generator Query Translation, Query Rewrite Result Generation Work in progress
4 Steps in Translation 1. User inputs XQuery query 2. User Query is converted into an XML Algebra Tree (XAT) 3. Database Mapping Query’s XAT generated 4. Queries are Decorrelated 5. Trees are merged, unnecessary branches cut
5 Steps Continued 6. Computation Pushdown (presentation concludes here) 7. SQL Generation 8. Query Execution 9. Tagging of Results
6 What is the difference between the two queries? The user query is executed over a view of the XML document and specifies what to return and how to return it The mapping query specifies how the view the user is querying “maps” to the database Therefore, combining the two queries into one is necessary in order to correctly process the user’s request
7 XAT Operators Each XAT is comprised of XAT Operators. Similar in concepts to Relational Algebra Operator set is combination between Niagara and Xperanto papers
8 Set of Operators SQL like (9): Project, Select, Join (Theta, Outer, Semi), Groupby, Orderby, Union (Node, Outer), Cartesian Product. XML like (4): Tagger, Navigate, is(Element, Text), Aggregate. Special: SQL, Function, Source, NameColumn, FOR
9 SQL like Operators (9) NiagaraXPERANTO ProjectExposeProject Select Theta JoinJoinTheta Join Outer JoinN/AOuter Join Semi JoinN/A GroupbyGroupGroupby OrderbyN/AOrderby Union Outer UnionUnionOuter Union
10 XML like Operators NiagaraXPERANTO Tagger* (pattern) VertexProject: cr8(Elem, AttList, Att, XMLFragList), Navigate (from, path) FollowProject: get(TagName, Attributes, Contents, AttName, AttValue), Unnest IsN/ASelect: is(Element, Text), AggregateGroupAggXMLFrags
11 Special Operators NiagaraXPERANTODescription SQLN/AInputDenote a SQL query. FunctionN/AFunctionUsed to represent recursive query Source Table, ViewIdentify a data source. NameColumnRenameN/ANaming of columns. FORN/A FOR iteration.
12 Boston Red Sox Nomar Shortstop Pedro Pitcher Manny Outfield … Sports XML Document Fenway Park 33, … <player name="Pedro" number="45" rookieYear = "1991" /> <player name="Nomar" number="5" rookieYear = "1997" /> <player name="Manny" number="24" rookieYear = "1993" />
13 Example XQuery { For $p in document("sports.xml")/sports/organization Let $a = $p/team/text() Where $a = "Boston Red Sox" Return $p/starPlayer/pname/text() } List all of the star players’ names on the Boston Red Sox
14 XAT Tree for Example Query V1 := Aggregate $pname = Navigate($p, starPlayer/pname/text()) Select($a = "Boston Red Sox") $a := Navigate($p, team/text()) $p := Navigate(“/”, sports/organization) Source(“sports.xml”) Tagger( V1 Tagger( $pname
15 RDBMS Tables of Sports Info organizationIDteamNamestadiumnName 1Boston Red SoxFenway Park Organization stadiumIDsnameCapacityyearBuiltticketHighticketLow 1Fenway Park33, Stadium starPlayerNamestarPlayePositionorganizationID NomarShortStop1 PedroPitcher1 MannyOutfield1 StarPlayer PlayerNameNumberrookieYear Nomar51997 Pedro Manny PlayerInfo
16 Partial Default XML View 1 Boston Red Sox Fenway Park 1 Fenway Park …
17 Challenge Question I 1 Boston Red Sox Fenway Park … Nomar shortstop 1 … Boston Red Sox Nomar Shortstop Pedro Pitcher Manny Outfield What is the XQuery that converts the document on the left (default XML view) to the document on the right (user view)?
18 Mapping Query Part I Create view invoice as ( FOR $organization IN view ("default") /Organization/row RETURN $organization/teamName/text() FOR $starPlayer IN view ("default") /StarPlayer/row WHERE $starPlayer/organizationID = $organization/organizationID RETURN $starPlayer/starPlayerName/text() $starPlayer/starPlayerPosition/text() B1 B2
19 Mapping Query Part II FOR $stadium IN view ("default") /Stadium/row RETURN $stadium/sname/text() $stadium/capacity/text() $stadium/yearBuilt/text() FOR $player IN view ("default") /PlayerInfo/row RETURN ) B3 B4
20 Cutting Mapping Query The mapping query has data that is unused by the user query, so we can get rid of it B3 and B4 are completely removed Remove stadium from B1 Remove position from B2
21 Mapping Query XAT General Form $organization := Navigate("/",Organization/row) Source(“default.xml”) FOR $organization More Stuff Some Stuff Source(“default.xml”) FOR $starPlayer Some Stuff will be shown in Part I More Stuff in Part II B1 B2 $starPlayer := Navigate("/", StarPlayer/row)
22 Mapping Query XAT Part I B1 O := Tagger( All </sports) All = Aggregate Tagger( V0 </organization) V0 := Aggregate Tagger ( $tname ) $tname := Navigate($organization, teamName/text()) $starPlayer := Navigate("/", StarPlayer/row) Source("default.xml") FOR $starPlayer To: Part II Some Stuff FOR $organization
23 Mapping Query XAT Part II Aggregate $ID := Navigate($organization, organizationID) Select($starPlayerID = $ID) $starPlayerID := Navigate($starPlayer, OrganizationID) $sname := Navigate($starPlayer, starPlayerName) To: Part I B2 Tagger( $sname </starPlayer) More Stuff
24 Decorrelated Mapping XAT Part I Boston Red Sox Nomar Pedro Manny Tagger ( $tname O:= Tagger( All ) All = Aggregate Tagger( V0 </organization) V0 := Aggregate $tname := Navigate($organization, teamName/text()) From Part II
25 Decorrelated Mapping XAT Part II Source("default.xml") $organization = Navigate("/", Organization/row) $starPlayer := (Navigate"/", StarPlayer/row) Cartesian Product $ID := Navigate($organization, organizationID) $starPlayerID := Navigate($starPlayer, organizationID) Select($starPlayerID = $ID) $sname := Navigate($starPlayer, starPlayerName) To Part I Aggregate Tagger( $sname </starPlayer)
26 Progress Report 1. User inputs XQuery query 2. User Query is converted into an XML Algebra Tree (XAT) 3. Database Mapping Query’s XAT generated 4. Queries are Decorrelated 5. Trees are merged, unnecessary branches cut
27 XAT merging Input: User Query XAT + Mapping Query XAT Output: Simplified composite XAT Approach: The Tagger from the top of the Mapping Query is linked to the bottom of the User Query. The Source Operator at the bottom of the User Query is deleted Pushdown Navigation By using the commutative rules Cancel out the navigation operators By using the composition rules
28 Combined XAT V1 := Aggregate $pname = Navigate($p, starPlayer/pname/text()) Select($a = "Boston Red Sox") $a := Navigate($p, team/text()) $p := Navigate(O, sports/organization) Tagger( V1 Tagger( $pname Tagger ( $tname O:= Tagger( All ) All = Aggregate Tagger( V0 </organization) V0 := Aggregate $tname := Navigate($organization, teamName/text()) Top of Mapping Query User Query Rest of Mapping Query
29 Computation Pushdown Part I What is PushDown? After merging the 2 XATs, there may be redundancies in the larger tree. Ex: The user query and mapping query may navigate to the same thing The decorrelated query tree may be unorganized and inefficient Pushdown aims to eliminate these problems
30 Computation Pushdown Part II XPERANTO mentions pushdown as a means of pushing computation to relational engine Niagara defines equivalence rules and specifies several different heuristics for using the rules
31 XAT Pushdown Example Part I V1 := Aggregate $pname = Navigate($p, starPlayer/pname/text()) Select($a = "Boston Red Sox") $a := Navigate($p, team/text()) $p := Navigate(O, sports/organization) Tagger( V1 Tagger( $pname Tagger ( $tname O:= Tagger( All ) All = Aggregate Tagger( V0 </organization) V0 := Aggregate $tname := Navigate($organization, teamName/text()) Top of Mapping Query User Query Rest of Mapping Query
32 XAT Pushdown Example Part II V1 := Aggregate $pname = Navigate($p, starPlayer/pname/text()) Select($a = "Boston Red Sox") $a := Navigate($p, team/text()) $p := Navigate(O, sports/organization) Tagger( V1 Tagger( $pname Tagger ( $tname O:= Tagger( All ) All = Aggregate Tagger( V0 </organization) V0 := Aggregate $tname := Navigate($organization, teamName/text()) Top of Mapping Query User Query Rest of Mapping Query
33 XAT Pushdown Example Part III V1 := Aggregate $pname = Navigate($p, starPlayer/pname/text()) Select($a = "Boston Red Sox") $a := Navigate($p, team/text()) $p := Navigate(O, sports/organization) Tagger( V1 Tagger( $pname Source("default.xml") Cartesian Product $organization = Navigate("/", Organization/row) $starPlayer := (Navigate"/", StarPlayer/row) $ID := Navigate($organization, organizationID) $starPlayerID := Navigate($starPlayer, organizationID) Select($starPlayerID = $ID) $sname := Navigate($starPlayer, starPlayerName) Source("default.xml") User Query Tagger( $sname </starPlayer)
34 XAT Pushdown Example Part IV V1 := Aggregate $pname = Navigate($p, starPlayer/pname/text()) Select($a = "Boston Red Sox") $a := Navigate($p, team/text()) $p := Navigate(O, sports/organization) Tagger( V1 Tagger( $pname Source("default.xml") Cartesian Product $organization = Navigate("/", Organization/row) $starPlayer := (Navigate"/", StarPlayer/row) $ID := Navigate($organization, organizationID) $starPlayerID := Navigate($starPlayer, organizationID) Select($starPlayerID = $ID) $sname := Navigate($starPlayer, starPlayerName) Source("default.xml") User Query Tagger( $sname </starPlayer)
35 Challenge Questions II & III What are some of the heuristics we could use during Pushdown? What can / should we try to accomplish? What should the tree look like afterwards? How could we go about pushing things down? What would the algorithm be? How do we know if an operator can be pushed down? When do we stop pushing an operator down?
36 Computation Pushdown Part III Goal: Tagger + SQL operators + XML operators Use Equivalence rules repository to swap operators Step 1: Navigation Pushdown. Cancel Mapping Query Taggers and corresponding Aggregates Delete redundant Navigates from User Query Rename columns in Mapping Query Step 2: SQL Computation Pushdown. By commutative and composition rules.
37 Equivalence Rules Pair-wise rules that determine if one operator (parent) may be pushed through another (child) Navigate / Navigate rule: If the parent depends on the child, they may not be swapped Navigate / Join: Navigate is pushed to the side of the join that its entry point comes from And many, many more
38 Pushdown Results 1.Push Navigates to the correct side of Cartesian Product 2.Create a NameColumn operator that renames $tname into $a 3.Create a 2 nd NameColumn operator that renames $pname into $sname 4.Get rid of all Taggers and Aggregates from Mapping Query and Navigates that were crossed out from User Query 5.Merge Select($starPlayerID = $ID) and Cartesian into a Join
39 XAT After Computation PushDown Part I V1 := Aggregate Select($a = "Boston Red Sox") Tagger( V1 Tagger( $pname NameColumn( $pname = $sname) NameColumn( $a = $tname) From Part II
40 XAT After Computation PushDown Part II $starPlayerID := Navigate($starPlayer, OrganizationID) $sname := Navigate($starPlayer, starPlayerName) $starPlayer := Navigate("/", StarPlayer/row) Source("default.xml") $ID := Navigate($organization, organizationID) Source("default.xml") $organization := Navigate("/",Organization/row) $tname := Navigate($organization, teamName/text()) Join on ($ID = $starPlayerID) To Part I
41 Rest of the Process 1. Take the Combined XAT from the previous slide and generate a single SQL query. 2. Execute query on local RDBMS 3. Format result tuples according to Tagger 4. Return XML document to user
42 Summary 1. Created XAT of the user query 2. Created XAT for mapping query 1. Cut information unused by user query 2. Decorrelated Mapping query 3. Merged two queries into 1 larger XAT 4. Identified weaknesses in combined tree 5. Walked through pushdown steps 6. Displayed final, optimized tree
43 The End!!!