Rainbow XML-Query Processing Revisited: The Complete Story (Part I) Xin Zhang.

1 1 Rainbow XML-Query Processing Revisited: The Complete Story (Part I) Xin Zhang

2 2 Motivation Experience from Past Imp. Solid Foundation for Researches Order-sensitive Query Processing. Brian Update Computation Pushdown. Mukesh Query Optimization. Brian & Brad Cost-based XML Storage. Xin

3 3 What are we going to do? XAT Data Model XAT Operators XAT Generation

4 4 TCP/IP Illustrated 65.95 TCP/IP Illustrated 69.95 Data on the Web 34.95 Data on the Web 39.95 Example* of XML Use Cases.

5 5 Example XQuery { for $t in distinct (document("prices.xml") //book/title) let $p := document("prices.xml") //book[title = $t]/price return min($p/text()) } In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element with the book title as its title attribute. 65.95 34.95

6 6 Four Kinds of Data Models Object –Relational storage. Special data type: sequence. Two kinds of Tables Flat Table: Required ID assignment. Nested Table: Complicated operators. Two kinds of Cells: References to DOM trees: Requires de- referencing. Values: Waste space. They are all interchangeable.

7 7 Data Model An Ordered Sensitive Table. Every cell has its own domain, e.g.: SQL domains. XML node. A Collection. Every column denotes one variable ($v) or an internal variable (col n,R m ). Comparison are done by deep equal.

8 8 Do we need Schema Order? So far, all the operators doesn’t require any schema order. Hence, we will not consider the schema order in the data model.

9 9 Definition of Collection Collection has to have at least 2 objects. If try to generate a collection of one object, the collection will be reduced into the object, and no collection will be generated. Collection cannot be nested! Collection is an unnamed XML node.

10 10 Data Model Examples Table of XML Fragments. Table Types: Regular Relations. Table with XML nodes. Table with a collection of XML nodes. 65.95 titleprice prices { 34.95, 39.95,...}

11 11 Column Names A relation column name “price”, A generated column name. “col1”, “r1”. A Variable Binding: “$var”

12 12 Where are we? XAT Data Model XAT Operators XAT Generation

13 13 XML Operators (5+2) OperatorSym.Prms.OutputDataDescription TaggerTpcolsTaggering s according to list pattern p. Navigate  col1, path col2sNavigate from column col of s through a XPath. AggregateAggN/A sMake a collection for each column. ComposerCpcolsConstruct a XML document from one s according to DOM pattern p. XML Union XX col+colsUnion multiple columns into one. XML Intersect XX col+colsIntersect multiple columns into one. XML Difference-X-Xcol+colsDifference multiple columns into one.

14 14 Special Operators (7) OperatorSymPrms.OutputDataDescription SQL stmtcol+N/AOne SQL query statement stmt over multiple s. Function{F}param + cols?XML or user defined function over zero or one data source with a list of parameters. SourceSdesccol+N/AIdentify a data source by description desc. It could be a piece of XML fragments, an XML documents, or a relational table. Name  col 1, col 2 ns ssss Rename column col 1 of source s into name col 2. name s into ns. FOR col + s, sqFOR operator iterate over s and execute subquery sq with variable binding columns col 1..n. IF_THEN_EL SE IFcsq 1, sq 2 If condition c is true, then execute subquery sq 1, else execute subquery sq 2. MergeMs+s+ Merge multiple tables into one table.

15 15 SQL Operators (11) OperatorSym.Prms.OutputDataDescription Project  col + N/AsProject out multiple columns from subquery s. Select  cN/AsFilter subquery s by condition c. Cartesian Product  N/A s 1, s 2 Cartesian product of the results of two sources, s 1 and s 2. Theta Join  cN/Als, rsJoin two sources ls and rs under condition c. Outer Join c N/A ls, rs Left (right) outer join two sources ls and rs by condition c. Groupby  col + N/As, sq g Making temporary groups by multiple columns from source s, then evaluate subquery sq g for each group, then merge the evaluated results back. Orderby  col + N/AsSort source s by multiple columns. Union  N/A s+s+ Union multiple sources together. Outer Union OO N/A s+s+ Outer union multiple sources together. Difference  N/A ls, rsDifference between two sources. Intersect  N/A s+s+ Intersect multiple sources. COp Col+N/As, sqCorrelated Operator on columns col+. It will execute sq for each tuple in source s.

16 16 Functions (Examples) Ref: TypeExamples Stringconcat, contains, lowercase, name, starts-with, subst, trim, uppercase... Aggregationavg, count, max, min, sum,... Sequenceexists... Date and Timedate... Contextlast, position... Nodeshallow...... User DefinedThe new function defined in the XQuery.

17 17 Expression Used in Select and Join operators. Arithmetic: negative, +, -, *, /, %. Boolean: NOT, OR, AND >, =, =, Terminals: String and Double Column Name

18 18 Pattern for Tagger List pattern only contains Strings and Column Names. DOM pattern is a tree.

19 19 Where are we? XAT Data Model XAT Operators XML (5): Tagger, Composer, Navigate, Aggregate, XML Union. XAT Generation

20 20 Tagger T p col (s) Consume: columns used in the pattern p. Produce: generate the new column col. Logic: One additional column is added with tagged information. Need to work with  operator to create nested structure. Order Handling: The tagged column is added to the end. The tuple order of the output table is same as table s. Requirement: The columns used in pattern p should be in table s.

21 21 Example: T [col1] col2 Col1 65.95 34.95 Col1 65.95 34.95

22 22 Composer C p col (s) Consume: columns used in the pattern p. Produce: generate the new column col with nested structure. Logic: Doesn’t require other operator to create the nested structure. Order Handling: Tuple order is same as the input. Requirement: Require a special schema for the input subquery s. (id[1..n], type, att[1..m], value)

23 23 Navigate  col, path col’ (s) Consume: column col. Produce: new column col’. Logic: One additional column is added with navigation information. Tuples are multiplied if there are more than one results in the navigation. If the navigation result is empty, get rid of that tuple. Order Handling: The navigation column is added to the end. The tuple order of the output table is same as table s and the navigation order. Requirement: N/A

24 24 Two types of Navigates Navigate Unnesting:  Unnesting the parent-children relationship, and duplicates the parent values for each child. Navigate Collection:  Nesting the parent-children relationship, create a collection of children, but keep the single parent.

25 25 Where to use two types Navigate Unnesting:  FOR binding. Navigate Collection:  LET binding.

26 26 Collections Issues in ,  1)What happened if there already a collection in input table? !Depends on the input table. If navigate from the collections, see issue 2. If not, then same as the original collection. 2)What happened if navigate from a collection in the input table? Then, generate another collection, but no nested collections.

27 27 Navigation Steps in the Navigate operator. Attribute: @ Children: //, /child Text: text() Column Name: col1

28 28 Navigation Use Cases  a (... )  NULL  b (... ) ...  a (... ) ...  text() ( text() )  text()  a ({, } 

29 29 Example of  R1, book col1 R1Col1... TCP/IP Illustrated 65.95... TCP/IP Illustrated 69.95... Data on the Web 34.95... Data on the Web 39.95 R1............

30 30 Example of  R1, book col1 R1Col1............ { TCP/IP Illustrated 65.95, TCP/IP Illustrated 69.95, Data on the Web 34.95, Data on the Web 39.95 } R1............

31 31 Example of  col1, book col2 Col1 { TCP/IP Illustrated 65.95, TCP/IP Illustrated 69.95, Data on the Web 34.95, Data on the Web 39.95 } Col1col2 {...} TCP/IP Illustrated 65.95 {...} TCP/IP Illustrated 69.95 {...} Data on the Web 34.95 {...} Data on the Web 39.95

32 32 Example of  col1, title col2 Col1 { TCP/IP Illustrated 65.95, TCP/IP Illustrated 69.95, Data on the Web 34.95, Data on the Web 39.95 } Col1col2 {...,...,...,... } { TCP/IP Illustrated, TCP/IP Illustrated, Data on the Web, Data on the Web }

33 33 Aggregate Agg(s) Consume: nothing. Produce: nothing. Logic: Create a collection for each column. Order Handling: There is only one tuple. Requirement: N/A

34 34 Example of Agg(s) Col1 TCP/IP Illustrated 65.95 TCP/IP Illustrated 69.95 Data on the Web 34.95 Data on the Web 39.95 Col1 { TCP/IP Illustrated 65.95, TCP/IP Illustrated 69.95, Data on the Web 34.95, Data on the Web 39.95 }

35 35 XML Union  X col[1..n] col (s) Consume: columns col[1..n]. Produce: new column col. Logic: For every tuple with col[1..n], merge their results into one collection and put it into the new column col. Order Handling: N/A Requirement: N/A

36 36 Example:  X title, price result (s) titlepriceResult TCP/IP Illustrated 65.95 { TCP/IP Illustrated, 65.95 } TCP/IP Illustrated 69.95 { TCP/IP Illustrated, 69.95 } Data on the Web 34.95 { Data on the Web, 34.95 } Data on the Web 39.95 { Data on the Web, 349.95 }

37 37 Where are we? XAT Data Model XAT Operators XML (5) Special (7):SQL, Function, Source, Name, FOR, IF, Merge. XAT Generation

38 38 SQL SQL stmt col[1..m] Consume: depends on the stmt. Produce: depends on the stmt. Logic: Execute stmt over the multiple tables and output the result. It is assumed to be executed by a RDB engine. Usually, it’s the operator right above the source (e.g., table) operator. Order Handling: The tuple order is un-decidable. The tuple order can be reconfirmed by additional orderby node. Requirement: N/A.

39 39 Function F param[1..m] col (s?) Consume: columns used in the param[1..m] Produce: new column col. Logic: Execute XML or user defined function on the data sources. Or used to represent a recursive query. Order Handling: They can be reconfirmed by orderby nodes. Requirement: N/A.

40 40 Source s desc col[1..n] Consume: nothing Produce: new column col for XML sources; multiple columns for Table source. Logic: Identify following sources: view, XML document, XML fragment, or a table. Col[1..n] depends on the source description. It will be one new column if the input is a XML source, otherwise, it will be a list of columns from the table source. Order Handling: Depends on the implementation. Keep original tuple order as much as possible. Requirement: N/A.

41 41 Example of S “prices.xml” R1 R1 TCP/IP Illustrated 65.95 TCP/IP Illustrated 69.95 Data on the Web 34.95 Data on the Web 39.95

42 42 Name Column  col1, col2 (s) Consume: Column col1. Produce: Column col2. Logic: Rename col 1 in table s into col 2. Order Handling: Keep all the schema and tuple orders. Requirement: col 1 in table s.

43 43 Name  ns (s) Consume: Nothing. Produce: Nothing. Logic: name s to ns. Order Handling: Keep all the schema and tuple orders. Requirement: N/A.

44 44 FOR FOR col[1..n] (s, sq ) Consume: Nothing Produce: Nothing. Logic: It’s a FOR iteration operator. For value in the columns col[1..n] of table s, evaluate the sub-query sq. Very important for query decorrelation. Order Handling: Schema order is decided by sq. Tuple order is similar to the join operator without the left part. Requirement: N/A.

45 45 Merge M (s[1..n]) Consume: Nothing Produce: Nothing. Logic: Merge multiple tables into one table. Tuple order is very important in this operator. Order Handling: Tuple order same as the input. Requirement: s[1..n] have same number of tuples.

46 46 Example: M (title, price) titleprice TCP/IP Illustrated 65.95 TCP/IP Illustrated 69.95 Data on the Web 34.95 Data on the Web 39.95 title TCP/IP Illustrated Data on the Web price 65.95 69.95 34.95 39.95

47 47 Where are we? XAT Data Model XAT Operators XML Operators (5). Special Operators (7). SQL Operators (11): Project, Select, Cartesian Product, Join (Theta, Outer), Groupby, Orderby, Union (Node, Outer), COp, Intersect, Difference. XAT Generation

48 48 Project  col[1..n] (s) Consume: All columns in DM(s). Produce: nothing. Logic: Keep only columns col[1..n] in DM(s). Order Handling: Keep original tuple order, the schema order is reordered as the col[1..n] in the project operator. Requirement: The col[1..n] should be in source s.

49 49 Select  c (s) Consume: columns used in condition expression c. Produce: nothing. Logic: Keep tuples in s when c is true. Order Handling: Keep original tuple order, keep original schema order. Requirement: Condition c should only reference to the source s.

50 50 Theta Join  c (ls, rs) Consume: columns in the condition c. Produce: nothing. Logic: Join ls and rs together under condition c. Order Handling: The tuple order of the output table is iteration of tuples in rs over the iteration of tuples in ls, e.g., {,,, } Requirement: Condition c should be relates to both tables ls and rs.

51 51 Left Outer Join  c (ls, rs) Consume: Columns in the condition c. Produce: Nothing Logic: Join but keep all the tuples in ls. Order Handling: The tuple order of the output table is iteration of tuples in rs over the iteration of tuples in ls, e.g., {,,,, } Requirement: Condition c should be relates to both tables ls and rs.

52 52 Right Outer Join  c (ls, rs) Consume: Columns in the condition c. Produce: Nothing. Logic: Join but keep all the tuples in rs. Order Handling: The tuple order of the output table is iteration of tuples in ls over the iteration of tuples in rs, e.g.,{,,,,, }, “null” is at the beginning of the output. Requirement: Condition c should be relates to both tables ls and rs.

53 53 Left Semi Join  c (ls, rs) Consume: Columns in condition c. Produce: nothing. Logic: Join but only keep the columns in ls. Order Handling: The tuple order of the output table is same as table ls. Requirement: Condition c should be relates to both tables ls and rs.

54 54 Semi Join  c (ls, rs) Consume: Columns used in condition c. Produce: nothing. Logic: Join but only keep the columns in rs. Order Handling: The tuple order of the output table is same as table rs. Requirement: Condition c should be relates to both tables ls and rs.

55 55 Groupby  col[1..n] (s, sq) Consume: col[1..n] Produce: nothing. Logic: Group the DM(s) by col[1..n], then apply sq on each group. If the sq generates a table instead of one single value, the generated table will be treated as a collection. Order Handling: The tuple order of the output table is same as table s. Requirement: Col[1..n] should be in table s.

56 56 Orderby  col[1..n] (s) Consume: col[1..n] Produce: nothing. Logic: Order s by col[1..n]. Order Handling: The tuple order of the output table is as specified. Requirement: Col[1..n] should be in table s.

57 57 Union  (s[1..n]) Consume: nothing Produce: nothing Logic: Same as SQL. Order Handling: The tuple order of the output table is in the order of table s[1..n]. Requirement: All tables s[1..n] have same schema.

58 58 Outer Union  O(s[1..n]) Consume: nothing Produce: nothing Logic: Same as SQL. Order Handling: The tuple order of the output table is in the order of table s[1..n]. Requirement: N/A.

59 59 Intersect  (s[1..n]) Consume: nothing Produce: nothing Logic: Same as SQL. Order Handling: The tuple order of the output table is in the order of table s[1..n]. Requirement: All tables s[1..n] have same schema.

60 60 Difference  (ls, rs) Consume: nothing Produce: nothing Logic: Same as SQL. Order Handling: The tuple order of the output table is in the order of table ls. Requirement: Tables ls and rs have same schema.

61 61 Full set of Operators XML (5): T, C, , Agg(),  X Special (7): SQL, F, S, , FOR, IF, M SQL (11): , , , , , , , , ,  O, COp, ,  Syntax Op ( ) :=Op( ) [ ]

62 62 Where are we? XAT Data Model XAT Operators XAT Generation

63 63 Operator used in the Generation. XML: T, , , Agg() Special: F, S, FOR, IF SQL: , , , ,  X, , 

64 64 How to translate FOR binding? FOR $x IN for-binding Inner-query use $x FOR($x) $x IN For-binding Inner-query use $x

65 65 How to translate LET binding? LET $x := let-binding Rest-of-query use $x $x := let-binding Rest-of-query use $x

66 66 What’s difference between FOR and LET bindings? XQuery FOR $x IN document(“x.xml”)/x LET $x := document(“x.xml”)/x XAT For-binding:  R1, x $x (s “x.xml” R1 ) Let-binding:  C R1,x col1 (s “x.xml” R1 )

67 67 XML Parser Tree QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) { for $t in distinct (document("prices.xml") //book/title) let $p := document("prices.xml") //book[title = $t]/price return min($p/text()) }

68 68 Parsed Tree (1) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text())))))))))

69 69 Parsed Tree (2) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) S “prices.xml” R1

70 70 Parsed Tree (3) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))

71 71 Parsed Tree (4) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) S ”prices.xml” R2 Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))

72 72 Parsed Tree (5) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text())))))))))  R2,//book col2 (s ”prices.xml” R2 ) Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))

73 73 Parsed Tree (6) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))  title col3 (  R2,//book col2 (s ”prices.xml” R2 ))

74 74 Parsed Tree (7) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))  $t, text() col4 (  col2,title col3 (  R2,//book col2 (s ”prices.xml” R2 )))

75 75 Parsed Tree (8) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))  col3=col4 (  $t, text() col4 (  col2,title col3 (  R2,//book col2 (s ”prices.xml” R2 ))))

76 76 Parsed Tree (9) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))  col2,price $p (  col3=col4 (  $t, text() col4 (  col2,title col3 (  R2,//book col2 (s ”prices.xml” R2 )))))

77 77 Parsed Tree (10) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))  $t, text() col5 (  col2,price $p (  col3=col4 (  $t, text() col4 (  col2,title col3 (  R2,//book col2 (s ”prices.xml” R2 ))))))

78 78 Parsed Tree (11) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 )) Min col6 col7 (  $p, text() col6 (  $t, text() col5 (  col2,price $p (  col3=col4 (  $t, text() col4 (  col2,title col3 (  R2,//book col2 (s ”prices.xml” R2 )))))))

79 79 Parsed Tree (12) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 )) T [col7] col8 ( Min col6 col7 (  $p, text() col6 (  $t, text() col5 (  col2,price $p (  col3=col4 (  $t, text() col4 (  col2,title col3 (  R2,//book col2 (s ”prices.xml” R2 ))))))))

80 80 Parsed Tree (13) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) Agg( FOR $t ( Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))), T [col7] col8 ( Min col6 col7 (  $p, text() col6 (  $t, text() col5 (  col2,price $p (  col3=col4 (  $t, text() col4 (  col2,title col3 (  R2,//book col2 (s ”prices.xml” R2 ))))))) ))

81 81 Parsed Tree (14) QuiltQuery( ElementConstruct(, FLWRExpression( Binding( ForBinding($t,distinct, Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book), LocationStep(title))), LetBinding($p,Nav( FunDocument(“prices.xml”), Steps( LocationStep(//), LocationStep(book, BinOpComp(=, Nav(CurrentNode, Steps( LocationStep(title))), Nav(Var($t), Steps(Text())))), LocationStep(price))))), ElementConstruct(, AttributeExpression(@title, Nav(Var($t), Steps(Text()))), ElementConstruct(, FunMin( Nav(Var($p, Steps(Text()))))))))) T col8 col9 ( Agg( FOR $t ( Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))), T [col7] col8 ( Min col6 col7 (  $p, text() col6 (  $t, text() col5 (  col2,price $p (  col3=col4 (  $t, text() col4 (  col2,title col3 (  R2,//book col2 (s ”prices.xml” R2 ) )

82 82 XAT Example T col8 col9 ( Agg( FOR $t ( Distinct col1 $t (  R1, //book/title col1 (S “prices.xml” R1 ))), T [col7] col8 ( Min col6 col7 (  $p, text() col6 (  $t, text() col5 (  col2,price $p (  col3=col4 (  $t, text() col4 (  col2,title col3 (  R2,//book col2 (s ”prices.xml” R2 ))))))))))))

83 83 XAT Example (Graph) Min col6 col7  $t, text() col5  $p, text() col6 s ”prices.xml” R2  R2,//book col2  $t, text() col4 T col8 col9 Distinct col1 $t S “prices.xml” R1  R1, //book/title col1  col2,title col3 FOR $t Agg T [col7] col8  col3=col4  col2,price $p

84 84 Discussion and Issues

85 85 Different Set of Operators After Parsing but before Decorrelation With FOR, no  / , no . After Decorrelation With  / , , , Distinct(), no FOR....

86 86 Equivalent Rewriting Rules Navigation Pushdown Swap navigation operator down. Computation Pushdown Swap SQL operator down. Groupby Operator Simplification Pull functions (subqueries) out of Groupby function.

87 87 Issues (1) Use subquery or subquery result? Both. Do we really need cutting? Yes. Do we need Binding and Expose? Binding yes. Expose no: we use navigate instead. Why we need to distinguish the Binding from Column Names? Because binding used in multiple places and immutable, but column names used in one place. Which data model is better? OR is better than R. Bag semantics or Set semantics? Bag Identify different set of operators at different stage? TBD Do we need the collection in the ORDBMS? Yes. What’s the type tree? Regular Expression Types. Better notation for the Algebra Syntax. It’s too complex. Do we really need to define the new column name? Yes. Also, an XC (XML Calculus) is required. Can be directed from Datalog.

88 88 Issues (2) How to handle Union in the XQuery? Union will be translated into XML Union. How do decorrelated the XQuery with Union? As usual. Because, the union will not generate branches but only the linear tree. How to translate XML Union (Intersect and Difference) into the SQL Union (Intersect and Difference)? TBD Can we allow collection of collections? Looks like we don’t need that.

89 89 Entry Point Notation Format: : Examples: author.lastname:book,, lastname:author:book (multi-level entry point) Rules: author.lastname = /:author.lastname lastname:author.lastname = author.lastname text():lastname.text() = lastname.text()

90 90 Discussion of Entry Point/Column Name Entry Point is used to show the dependencies between different navigations. XPERANTO use different column names to distinguish between different navigations, because their sources are relations. Niagara use Entry Point to get rid of tedious column names and make the algebra looks better, and also they are XML oriented. We use column names with typing system. Because we have both source of relations and XML fragments, and also in the middle of the XAT, some operators might generate new columns.

91 91 Column Name and Nested Operators In most of the cases, we can get rid of the column names by using the Nested Operators. Well, the data model is used to separate the operators by the directly nesting, so that, optimization can be done easily. Hence, we still need the column names instead of the nested operators to represent our algebra.

92 92 XML Calculus (XC) Idea of XC is from extending Datalog. It can be used to prove the correctness of the rewriting rules. It can also be used to help with semantic analysis.

93 93 Type Tree To explain the type of each column name, in the other words, the semantic of each column name. It will be used by Navigation pushdown to decide the cancellation, order pushdown, and other rewriting rules that required the semantic checking. It could be: XML type, a relational table, column, and function’s return type. It has type with a list of column names of that type.

94 94 Naming a new Table/Column Name of the new table and column should be unique.

95 95 How to translate multiple LET bindings? If the two let bindings from different sources, For each let binding a collection is generated. Until this is a FOR binding to iterate through the collections, we just keep the two collections.

96 96 How to Handle Multiple FOR? That’s handled in the Decorrelation. Keep this in mind: FOR: means for each, it used . Hence, if there are multiple for, it results in a Cartesian product. Others, navigate means creating a collection!

97 97 Trick in the XAT In XML Algebra, we use a evaluation context, which is a sequence of XML nodes in a XML data model, which is a forest. In Relational, we use a evaluation context, which is a list of tuples. Hence, in the XAT generation, we try to convert the data model used by the XML into the data model used by OR. That’s the tricky part!

98 98 Query Decorrelation for COp Top-down approach over XAT Tree. Approach: Correlated Binding (CB) Op1[COp(CB, Op2)[Op3[Correlated Operator[A],B]]]  Op1[ROJ(CB)[Op2[Groupby(CB, Op3[]) [Operator[Cartesian[A,B]]]], B]] For example: Correlated Join  Outer Join with Groupby with Cartesian

