1 XQuery to XAT Xin Zhang. 2 Outline XAT Data Model. XAT Operator Design. XQuery Block Identification. Equivalent Rewriting Rules. Computation Pushdown.

Slides:



Advertisements
Similar presentations
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
1 Implementation of Relational Operations Module 5, Lecture 1.
Introduction to XML Algebra
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 8 Advanced SQL.
2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.
Database Systems and XML David Wu CS 632 April 23, 2001.
1 XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. SQL - part 2 - Database Management Systems I Alex Coman, Winter 2006.
WIDM 2002 DSRG, Worcester Polytechnic Institute1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke.
Database Systems More SQL Database Design -- More SQL1.
Introduction to Structured Query Language (SQL)
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 7 Introduction to Structured Query Language (SQL)
1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.
1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Rainbow XML-Query Processing Revisited: The Complete Story (Part I) Xin Zhang.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Relational Algebra Instructor: Mohamed Eltabakh 1.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Relational Algebra.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp.
An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
In this session, you will learn to: Query data by using joins Query data by using subqueries Objectives.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
SQL: Structured Query Language Instructor: Mohamed Eltabakh 1 Part II.
Subqueries CIS 4301 Lecture Notes Lecture /23/2006.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Querying and Transforming XML Data
Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)
SQL: Structured Query Language DML- Queries Lecturer: Dr Pavle Mogin
Evaluation of Relational Operations
CS 405G: Introduction to Database Systems
Introduction to Database Systems
The Relational Algebra and Relational Calculus
Instructor: Mohamed Eltabakh
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Chapter 8 Advanced SQL.
SQL: Structured Query Language
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Lecture 5- Query Optimization (continued)
Database Systems: Design, Implementation, and Management Tenth Edition
CPSC-608 Database Systems
Presentation transcript:

1 XQuery to XAT Xin Zhang

2 Outline XAT Data Model. XAT Operator Design. XQuery Block Identification. Equivalent Rewriting Rules. Computation Pushdown Navigation Pushdown Groupby Operator Simplification

3 Data Model An Ordered Table in two dimensions Tuple order Column order. Every cell has its own domain, e.g.: SQL domains. XML Fragment. Can be a list of XML elements. Every column binds to one variable. Comparison are done by values Note: When values are “handles”, the comparison are done by deference of handles.

4 Data Model Examples Table of XML Fragments. Table Types: Regular Relations. Table with XML Elements. Table with XML Fragments. Table with Variable Binding. Table with Path Navigation. $carrier </carrier invoice_idcarrier carrier_entry carriers $carrier </carrier $carrier ………. //invoice/invoice/account_number $rate

5 Column Names A String: “name” A Variable Binding: “$var” Operators with their parameters: “op(p 1, p 2,..., p n )” A XPath with Entry Point Notation. “/”, “/invoice”, “/invoice/book” “invoice:/”, “book:invoice:/”

6 Operators SQL like (9): Project, Select, Join (Theta, Outer, Semi), Groupby, Orderby, Union (Node, Outer), CO XML like (3): Tagger Navigate Aggregate: Groupby without by-column. Special (5): SQL, Function, Source, Name, FOR

7 SQL like Operators (9) OperatorSyntaxDescription ProjectPi(col + )[s]Project out multiple columns from source s. SelectTheta(c)[s]Filter source s by condition c. Theta JoinJoin(c)[l, r]Join two sources l and r under condition c. Outer JoinLOJ(c)[l, r] ROJ(c)[l, r] Left (right) outer join sources l and r by condition c. Semi JoinLSJ(c)[l, r] RSJ(c)[l, r] Left (right) semi join sources l and r by condition c. GroupbyGB(col +, F (col) + )[s]Groupby multiple columns by multiple aggregation functions F() of columns over source s. OrderbyOB(col + )[s]Sort source s by multiple columns. UnionU[s + ]Union multiple sources together. Outer UnionOU[s + ]Outer union multiple sources together. COpCOp(col+, Op)[s 1, s 2 ]Correlated Operator on columns col+. s 1 is outer query, s 2 is inner query.

8 XML like Operators OperatorSyntaxDescription TaggerTag(p)[s]Taggering source s by pattern p. NavigateNav(path)[s]Navigate from source s through a XPath. AggregateAgg(F (col) + )[s] Aggregate source s by multiple aggregate functions F() of columsn over source s.

9 Special Operators OperatorParametersDescription SQLSQL(stmt)[s + ]One SQL query statement stmt over multiple sources. FunctionF(param + )[s + ]User defined function over multiple sources with multiple parameters. Sources(desc)Identify a data source by description desc. NameRho(col 1, col 2 )[s] Rho(s 2 )[s 1 ] Rename column col 1 of source s into name col 2. Rename source s 1 into s 2. FORFOR(col + )[s 1, s 2 ]FOR operator iterate over sources s 1 and execute subquery s 2 with variable binding columns col 1..n.

10 Project Pi(col 1..n )[s] Input: table s Output: table s Logic: Same as SQL. Order Handling: Keep original tuple order, the schema order is reordered as the col 1..n in the project operator. Requirement: The col 1..n should be in source s.

11 Select Theta(c)[s] Input: table s Output: table s Logic: Same as SQL. Order Handling: Keep original tuple order, keep original schema order. Requirement: Condition c should be only reference to the source s.

12 Theta Join Join(c)[l, r] Input: table l, and table r. Output: One table (with temporary table name) Logic: Same as SQL. Order Handling: The schema order of the output table is columns of table l followed by the columns of table r. The tuple order of the output table is iteration of tuples in r over the iteration of tuples in l, e.g., {,,, } Requirement: Condition c should be relates to both tables l and r.

13 Outer Join LOJ(c)[l, r] Input: table l, and table r. Output: One table (with temporary table name) Logic: Same as SQL. Order Handling: The schema order of the output table is columns of table l followed by the columns of table r. The tuple order of the output table is iteration of tuples in r over the iteration of tuples in l, e.g., {,,,, } Requirement: Condition c should be relates to both tables l and r.

14 Outer Join ROJ(c)[l, r] Input: table l, and table r. Output: One table (with temporary table name) Logic: Same as SQL. (Similar to LOJ) Order Handling: The schema order of the output table is columns of table l followed by the columns of table r. The tuple order of the output table is iteration of tuples in l over the iteration of tuples in r, e.g.,{,,,,, }, “null” is at the beginning of the output. Requirement: Condition c should be relates to both tables l and r.

15 Semi Join LSJ(c)[l, r] Input: table l, and table r. Output: table l. Logic: Same as SQL. Order Handling: The schema order of the output table same as table l. The tuple order of the output table is same as table l. Requirement: Condition c should be relates to both tables l and r.

16 Semi Join RSJ(c)[l, r] Input: table l, and table r. Output: table r. Logic: Same as SQL. Order Handling: The schema order of the output table same as table r. The tuple order of the output table is same as table r. Requirement: Condition c should be relates to both tables l and r.

17 Groupby GB(col 1..n, F 1..m (col))[s] Input: table s. Output: table s. Logic: Same as SQL. Order Handling: The schema order of the output table is col 1..n followed by F 1..m (col). F 1..m (col) can be nested operators, e.g., a subquery. The tuple order of the output table is same as table s. Requirement: col 1..n and all the col in the F 1..m should be in table s.

18 Groupby Example Input: S (a, b, c) Operator: GB (b, a, avg(c), count(c)) Output: S (b, a, “avg(c)”, “count(c)”)

19 Orderby OB(col 1..n )[s] Input: table s. Output: table s. Logic: Same as SQL. Order Handling: The schema order of the output table is same as table s. The tuple order of the output table is as specified. Requirement: col 1..n should be in table s.

20 Union U[s 1..n ] Input: Multiple tables s 1..n. Output: One table (with temporary name). Logic: Same as SQL. Order Handling: The schema order of the output table is same as table s 1. The tuple order of the output table is in the order of table s 1..n. Requirement: All tables s 1..n have same schema.

21 Outer Union OU[s 1..n ] Input: Multiple tables s 1..n. Output: One table (with temporary name). Logic: Same as SQL. Order Handling: The schema order of the output table is un-decidable, it depends on implementation. The schema order should be ensured by another projection node. The tuple order of the output table is in the order of table s 1..n. Requirement: N/A.

22 Tagger Tag(p)[s] Input: Table s. Output: Table s. Logic: One additional column is added with tagged information. Pattern p is only one level. Order Handling: The tagged column is added to the end. The tuple order of the output table is same as table s. Requirement: The columns used in pattern p should be in table s.

23 Navigate Nav(path)[s] Input: Table s. Output: Table s. Logic: One additional column is added with navigation information. Tuples are multiplied if there are more than one results in the navigation. If the navigation result is empty, put NULL in the new column. Order Handling: The navigation column is added to the end. The tuple order of the output table is same as table s and the navigation order. Requirement: N/A

24 Aggregate Agg(F 1..m (col))[s] Input: table s. Output: table s. Logic: Merge all tuples in that table into one, and apply functions on those columns. If there is no functions, then just merge all the content. Order Handling: The schema order of the output table is F 1..m (col). There is only one tuple. Requirement: All the col in the F 1..m should be in table s.

25 SQL SQL(stmt)[s 1..n ] Input: Multiple tables s 1..n. Output: Temporary table. Logic: Execute stmt over the multiple tables and output the result. It is assumed to be executed by a RDB engine. Usually, it’s the operator right above the source (e.g., table) operator. Order Handling: The schema order of the output table is depends on the underlying implementation. The schema order can be reconfirmed by additional projection node. The tuple order is un-decidable. The tuple order can be reconfirmed by additional orderby node. Requirement: N/A.

26 Function F(param 1..m )[s 1..n ] Input: Multiple tables s 1..n. Output: Temporary table. Logic: Execute some user defined function on the data sources. Or used to represent a recursive query. Order Handling: Schema and tuple orders are depends on the implementation. They can be reconfirmed by projection and orderby nodes. Requirement: N/A.

27 Source s(desc) Input: N/A Output: A table with a given name. Logic: Identify following sources: view, xml document, or a table. Order Handling: Depends on the implementation. Keep original schema and tuple order as much as possible. Requirement: N/A.

28 Name Rho(col 1, col 2 )[s] Input: Table s. Output: Table s. Logic: Rename col 1 in table s into col 2. Order Handling: Keep all the schema and tuple orders. Requirement: Col 1 in table s.

29 Name Rho(s 2 )[s 1 ] Input: Table s 1. Output: Table s 2. Logic: Rename table s 1 to table s 2. Order Handling: Keep all the schema and tuple orders. Requirement: N/A.

30 Correlated Ouput FOR(col + )[s 1, s 2 ] Input: Tables s 1 and s 2. Output: Evaluation of subquery s 2 for each tuple in subquery s 1.. Logic: It’s a FOR iteration operator. For value in the columns col + of table s 1, evaluate the sub-query that generates the table s 2. Order Handling: Schema order is output table s 2. Tuple order is similar to the join operator without the left part. Requirement: N/A.

31 Steps in Translation XQuery  XML Algebra Tree User View  XML Algebra Tree View Composition Computation Pushdown Optimization Execution

32 <!DOCTYPE invoice [ <!ELEMENT invoice (account_number, bill_period, carrier+, itemized_call*, total)> <!ATTLIST itemized_call no ID #REQUIRED date CDATA #REQUIRED number_called CDATA #REQUIRED time CDATA #REQUIRED rate (NIGHT|DAY) #REQUIRED min CDATA #REQUIRED amount CDATA #REQUIRED> ]> Jun 9 - Jul 8, 2000 Sprint $0.35 Example of Telephone Bill

33 Example XQuery User XQuery: { FOR $rate IN LET $itemized_call := WHERE LIKE ‘973%’ RETURN $rate count($itemized_call) } Count number of itemized_calls in calling area 973 grouped by the calling rate.

34 XQuery  XML Algebra Tree Translate XQuery into XAT by grammar. Convert each query block into XAT. Identify correlated operators. Identify query blocks. Query decorrelation.

35 XAT Graph Notation Unordered Graph. Nodes: Operators with its parameters. If there is only one source name, we ignore it. Blocks (subqueries) We can use block name as the alias of the table name out of that block. Terminals V3:=Tagger( [V2] ) B2

36 XAT Example Select(count(“$itemized_call”)) like ‘973%’) T2 := Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) = “$rate”) V2 := Tagger( [V1] ) $rate := T1 := Source(“invoice.xml”) Navigate(“/”, FOR($rate) Aggregate V1:=Tagger( [$rate] [count($itemized_call)] )

37 XQuery Block Identification Every query block has only one input point and one output point. Potential Query Block Separation Point: Independent sources. Correlated Operators. Block is used for query optimization, e.g., cutting.

38 Identification of Blocks Select(count(“$itemized_call”)) like ‘973%’) T2 := Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) = “$rate”) V3 := Tagger( [V1] ) $rate := T1 := Source(“invoice.xml”) Navigate(“/”, B1 B2 B3 FOR($rate) Aggregate V1:=Tagger( [$rate] [count($itemized_call)] ) B4

39 XAT Block Tree B1 B2 B3 B4

40 Equivalent Rewriting Rules Navigation Pushdown Swap navigation operator down. Computation Pushdown Swap SQL operator down. Groupby Operator Simplification Pull functions (subqueries) out of Groupby function.