1 XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke
2 Outline Merged algebra proposed based on Niagara XPERANTO One thorough example of XQuery SQL
3 Data Model An Ordered Table in two dimensions Tuple order Column order. Every cell has its own domain Every column binds to one variable. The domain can be: SQL domains. XML Fragment. Can be a list of XML elements. Comparison are done by values
4 Data Model Examples Table of XML Fragments. Explicit Naming E.g. variable bindings Implicit Naming E.g. XPath notations. Reduce complexity of many internal variables. $carrier </carrier invoice_idcarrier carrier_entry carriers $carrier </carrier $carrier ………. //invoice/invoice/account_number $rate
5 Naming of Columns Implicit: SQL operators Navigate Explicit ( “name”): Variable binding: Holding a set of values. Variable name ($name) is name of a column Rename Distinguish in one operator where, same “names” from different sources Abbreviate a very long “name”. Create a new name for creation operators Need to used with those operators. E.g. Tagger
6 Operators SQL like (9): Project, Select, Join (Theta, Outer, Semi), Groupby, Orderby, Union (Node, Outer), COp. XML like (4): Tagger, Navigate, is(Element, Text), Aggregate. Special: SQL, Function, Source, Name, FOR
7 SQL like Operators (9) NiagaraXPERANTO ProjectExposeProject Select Theta JoinJoinTheta Join Outer JoinN/AOuter Join Semi JoinN/A GroupbyGroupGroupby OrderbyN/AOrderby Union Outer UnionUnionOuter Union COpN/ACorrelated Join
8 XML like Operators NiagaraXPERANTO Tagger* (pattern) VertexProject: cr8(Elem, AttList, Att, XMLFragList), Navigate (from, path) FollowProject: get(TagName, Attributes, Contents, AttName, AttValue), Unnest IsN/ASelect: is(Element, Text), AggregateGroupAggXMLFrags
9 Special Operators NiagaraXPERANTODescription SQLN/AInputDenote a SQL query. FunctionN/AFunctionUsed to represent recursive query Source Table, ViewIdentify a data source. NameRenameN/ANaming of columns. FORN/A FOR iteration.
10 Operator Specification Description Input Specification. Output Specification. Logic description. Illustrative Example
11 Naming Operator Syntax: Name(“from_name”, “to_name”) Simplified Syntax: to_name := from_name
12 Steps in Translation XQuery XML Algebra Tree User View XML Algebra Tree View Composition Computation Pushdown Optimization
13 <!DOCTYPE invoice [ <!ELEMENT invoice (account_number, bill_period, carrier+, itemized_call*, total)> <!ATTLIST itemized_call no ID #REQUIRED date CDATA #REQUIRED number_called CDATA #REQUIRED time CDATA #REQUIRED rate (NIGHT|DAY) #REQUIRED min CDATA #REQUIRED amount CDATA #REQUIRED> ]> Jun 9 - Jul 8, 2000 Sprint $0.35 Example of Telephone Bill
14 Example XQuery User XQuery: { FOR $rate IN LET $itemized_call := WHERE LIKE ‘973%’ RETURN $rate count($itemized_call) } Count number of itemized_calls in calling area 973 grouped by the calling rate.
15 XQuery XML Algebra Tree Divide into query blocks Convert each query block into XML Algebra Tree (XAT). Identify Correlated Operators Combine into one XML Algebra Tree. Query decorrelation
16 Query Blocks User XQuery: { FOR $rate IN LET $itemized_call := WHERE LIKE ‘973%’ RETURN $rate count($itemized_call) } B1: Construct summary from the result from B2 B2: Get all the distinct rate and iterate through it. B1 B2 B3 B3: Count itemized call for a given rate. The block identification is arbitrary (wrong).
17 XAT of B1 B1 B2 XAT: Tagger( [V1] ) B2 [V2] it is a name instead of a part of pattern. Name(“Tagger( [V1] )”, “V2”)
18 XAT of B2 B3 { FOR $rate IN distinct(document(“invoice”) } B3 XAT: B3 Source(“invoice.xml”) Navigate(“/”, “$rate”) FOR($rate) Aggregate
19 XAT of B3 B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call WHERE $itemized_call LIKE ‘973%’ RETURN $rate count($itemized_call) XAT: Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call) B2 = “$rate”) Name(“invoice/itemized_call:/”, “$itemized_call”)
20 XAT of B3 (Cont.) B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call WHERE $itemized_call LIKE ‘973%’ RETURN $rate count($itemized_call) XAT: like ‘973%’)
21 XAT of B3 (Cont.) B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call WHERE $itemized_call LIKE ‘973%’ RETURN $rate count($itemized_call) XAT: Tagger( [$rate] [count($itemized_call)] ) Select(count(“$itemized_call”)) B2 Name(“Tagger( [$rate] [count($itemized_call)] )”, “V1”)
22 Put it Together Select(count(“$itemized_call”)) like ‘973%’) Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call) = “$rate”) Name(“Tagger( [V1] )”, “V2”) Source(“invoice.xml”) Navigate(“/”, Name(“invoice/itemized_call:/”, “$itemized_call”) B1 B2 B3 FOR($rate) “$rate”) Aggregate() Tagger( [$rate] [count($itemized_call)] ) Name(“Tagger( [$rate] [count($itemized_call)] )”, “V1”) Tagger( [V1] )
23 Syntax Suger Select(count(“$itemized_call”)) like ‘973%’) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) = “$rate”) V2 := Tagger( [V1] ) $rate := Source(“invoice.xml”) Navigate(“/”, B1 B2 B3 FOR($rate) Aggregate() V1:=Tagger( [$rate] [count($itemized_call)] )
24 Query Decorrelation for COp Top-down approach over XAT Tree. Approach: Correlated Binding (CB) Op1[COp(CB, Op2)[Op3[Correlated Operator[A],B]]] Op1[ROJ(CB)[Op2[Groupby(CB, Op3[]) [Operator[Cartesian[A,B]]]], B]] For example: Correlated Join Outer Join with Groupby with Cartesian
25 Query Decorrelation for FOR Top-down approach over XAT Tree. Approach: Correlated Binding (CB) Op1[FOR(CB)[Op2[Correlated Operator[A],B]]] Op1[Groupby(CB, Op2[]) [Operator[Cartesian[A,B]]]] Differences: SQL Decorrelation: Return Outer Query XQuery Decorrelation: Return Inner Query CO: Return both Outer/Inner Query
26 FOR Decorrelation Example Source(“invoice.xml”) = “$rate”) …1 Source(“invoice.xml”) …3 B2 B3 FOR($rate) …2 Source(“invoice.xml”) = “$rate”) Groupby(“$ratel”, ) Cartesian Source(“invoice.xml”) …3 B1 B2 B3 Aggregate …2 …1
27 Default XML View Jun 9 – Jun 8, 2000 $ Sprint... idaccount_numberbill_periodtotal Jun 9 – Jun 8, 2000$0.35 invoice invoice_idcarrier 1Sprint carrier invoice_idnodatenumber_calledtimerateminamount 11JUN :17pmNIGHT JUN :19amDAY JUN :25pmNIGHT30.15 itemized_call
28 User Defined XML View Jun 9 - Jul 8, 2000 Sprint $ Jun 9 – Jun 8, 2000 $ Sprint 1 … …
29 User Defined XML View Cont. Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN $invoice/account_number/text() $invoice/bill_period/text() FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN $carrier/carrier/text() FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN SORTBY $invoice/total/text() )
30 User Defined XML View Block Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN $invoice/account_number/text() $invoice/bill_period/text() FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN $carrier/carrier/text() FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN SORTBY $invoice/total/text() ) B4 B5 B6
31 XML View XAT V4 := Tagger( [$invoice/account_number/text()] [$invoice/bill_period/text() …[V3] [$invoice/total/text()] ) V3 := Tagger( Aggregate() Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) FOR($invoice/id) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, no/text()) Navigate($itemized_call, invoice_id) Select(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) B5 Navigate($itemized_call, amount/text()) …
32 3-Way Correlation …2 Source(“invoice.xml”) B4 FOR($invoice/id) …1 B5B6
33 3-Way Decorrelation …2 Source(“default.xml”) B4 JOIN($invoice/id) …1 B5 with CartesianB6 with Cartesian GB($invoice/id, …) …2 Source(“default.xml”)
34 View XAT After Decorrelation V4 := Tagger( [$invoice/account_number/text()] [$invoice/bill_period/text() …[V3] [$invoice/total/text()] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Join($invoice/id) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, no/text()) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) B5 Navigate($itemized_call, amount/text()) … Groupby($invoice/id…) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Navigate(“$invoice”, id)
35 View Composition Input: User Query XAT + User View XAT Output: Simplified composite XAT Approach: XAT Cutting: Remove un-referenced columns and operators. Pushdown Navigation By using the commutative rules Cancel out the navigation operators By using the composition rules
36 XAT Cutting Cut Query Blocks User query only require itemized_call. B5 is cut, Invoice is cut B4 is simplified. B6 is simplified. Cut Columns User query only used
37 View XAT After B5 is Cut. V4 := Tagger( [$invoice/account_number/text()] [$invoice/bill_period/text()</bill_period[V3] [$invoice/total/text()] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, no/text()) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) Navigate($itemized_call, amount/text()) …
38 View After Columns are Cut. V4 := Tagger( [V3] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, number_called/text()) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) Navigate($itemized_call, rate/text())
39 Navigation Cancel Out Navigation Pushdown Based on some transformation rules. E.g. commutative of navigation and other operators. Navigation + Tagger Cancel Out Composition Rules. The cancellation result is “renaming”
40 Query XAT Navi. Pushdown Select(count(“$itemized_call”)) like ‘973%’) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) = “$rate”) B3 V1:=Tagger( [$rate] [count($itemized_call)] ) Select(count(“$itemized_call”)) like ‘973%’) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) = “$rate”) V1:=Tagger( [$rate] [count($itemized_call)] )
41 Navi. Tagger Cancel Out Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) B3 …1 V4 := Tagger( [V3] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Navigate($itemized_call, number_called/text()) Navigate($itemized_call, rate/text()) …2
42 The Result of Cancel Out …1 := Navigate($itemized_call, rate/text()) := Navigate($itemized_call, number_called/text()) …2
43 Computation Pushdown Goal: XAT SQL operators + XML operators Step 0: Navigation Pushdown. Step 1: XML Default View SQL Operators Renaming columns Step 2: SQL Computation Pushdown. By commutative and composition rules. E.g: predicates pushdown.
44 Navigation Pushdown. Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) := Navigate($itemized_call, rate/text()) := Navigate($itemized_call, number_called/text()) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) := Navigate($itemized_call, rate/text()) := Navigate($itemized_call, number_called/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”)
45 XML Default View SQL Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) := Navigate($itemized_call, rate/text()) := Navigate($itemized_call, number_called/text()) Source(“itemized_call”) Project(rate, number_called) := rate := number_called … …
46 Computation Pushdown A SQL Block Select(count(“$itemized_call”)) like ‘973%’) = “$rate”) V1:=Tagger( [$rate] [count($itemized_call)] ) B3 like ‘973%’) Select(count(“$itemized_call”)) = “$rate”) V1:=Tagger( [$rate] [count($itemized_call)] ) A SQL Block
47 Result of the Transformation Tagger( [V1] ) V1 := Aggregate Tagger( [rate] [count(*)] ) SQL: SELECT rate, count(*) FROM itemized_call, invoice WHERE number_called LIKE ‘973%’ AND invoice.id = itemized_call.invoice_id GROUPBY rate
48 Optimization Efficient Publishing XML Views Sorted Outer Union. Special Tagger implementation A lot More!
49 Summary XQuery XAT Query Block Identification Query Decorrelation View Composition XAT Cutting Navigation Pushdown Navigation Cancel Out Computation Pushdown Navigation Pushdown XML Default View SQL Operators Computation Pushdown Optimization