1 XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Advertisements

XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
CS4432: Database Systems II Query Operator & Algebraic Expressions 1.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 XML Algebra Comparison between: XPERANTO NIAGARA.
Introduction to XML Algebra
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data.
1 Introduction To XML Algebra Wan Liu Bintou Kane Advanced Database Instructor: Elka 2/11/
Database Systems and XML David Wu CS 632 April 23, 2001.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
WIDM 2002 DSRG, Worcester Polytechnic Institute1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke.
Database Systems More SQL Database Design -- More SQL1.
1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.
1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
1 XQuery to XAT Xin Zhang. 2 Outline XAT Data Model. XAT Operator Design. XQuery Block Identification. Equivalent Rewriting Rules. Computation Pushdown.
1 Rainbow XML-Query Processing Revisited: The Complete Story (Part I) Xin Zhang.
Relational Algebra.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
©Silberschatz, Korth and Sudarshan4.1Database System Concepts Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries.
©Silberschatz, Korth and Sudarshan5.1Database System Concepts Chapter 5: Other Relational Languages Query-by-Example (QBE) Datalog.
Relational Algebra Instructor: Mohamed Eltabakh 1.
Comparing XSLT and XQuery Michael Kay XTech 2005.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XQuery.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
Relational Algebra.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Querying XML, Part II Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 5, 2008.
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
More SQL: Complex Queries, Triggers, Views, and Schema Modification
CSE202 Database Management Systems
More SQL: Complex Queries,
Efficient Evaluation of XQuery over Streaming Data
COMP3017 Advanced Databases
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
COP4710 Database Systems Relational Algebra.
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Prof: Dr. Shu-Ching Chen TA: Yimin Yang
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Relational Algebra 1.
The Relational Algebra and Relational Calculus
Instructor: Mohamed Eltabakh
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Prof: Dr. Shu-Ching Chen TA: Haiman Tian
Chapter 2: Intro to Relational Model
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 2: Intro to Relational Model
Contents Preface I Introduction Lesson Objectives I-2
Chapter 2: Intro to Relational Model
Relational Algebra & Calculus
Presentation transcript:

1 XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke

2 Outline Merged algebra proposed based on Niagara XPERANTO One thorough example of XQuery  SQL

3 Data Model An Ordered Table in two dimensions Tuple order Column order. Every cell has its own domain Every column binds to one variable. The domain can be: SQL domains. XML Fragment. Can be a list of XML elements. Comparison are done by values

4 Data Model Examples Table of XML Fragments. Explicit Naming E.g. variable bindings Implicit Naming E.g. XPath notations. Reduce complexity of many internal variables. $carrier </carrier invoice_idcarrier carrier_entry carriers $carrier </carrier $carrier ………. //invoice/invoice/account_number $rate

5 Naming of Columns Implicit: SQL operators Navigate Explicit ( “name”): Variable binding: Holding a set of values. Variable name ($name) is name of a column Rename Distinguish in one operator where, same “names” from different sources Abbreviate a very long “name”. Create a new name for creation operators Need to used with those operators. E.g. Tagger

6 Operators SQL like (9): Project, Select, Join (Theta, Outer, Semi), Groupby, Orderby, Union (Node, Outer), COp. XML like (4): Tagger, Navigate, is(Element, Text), Aggregate. Special: SQL, Function, Source, Name, FOR

7 SQL like Operators (9) NiagaraXPERANTO ProjectExposeProject Select Theta JoinJoinTheta Join Outer JoinN/AOuter Join Semi JoinN/A GroupbyGroupGroupby OrderbyN/AOrderby Union Outer UnionUnionOuter Union COpN/ACorrelated Join

8 XML like Operators NiagaraXPERANTO Tagger* (pattern) VertexProject: cr8(Elem, AttList, Att, XMLFragList), Navigate (from, path) FollowProject: get(TagName, Attributes, Contents, AttName, AttValue), Unnest IsN/ASelect: is(Element, Text), AggregateGroupAggXMLFrags

9 Special Operators NiagaraXPERANTODescription SQLN/AInputDenote a SQL query. FunctionN/AFunctionUsed to represent recursive query Source Table, ViewIdentify a data source. NameRenameN/ANaming of columns. FORN/A FOR iteration.

10 Operator Specification Description Input Specification. Output Specification. Logic description. Illustrative Example

11 Naming Operator Syntax: Name(“from_name”, “to_name”) Simplified Syntax: to_name := from_name

12 Steps in Translation XQuery  XML Algebra Tree User View  XML Algebra Tree View Composition Computation Pushdown Optimization

13 <!DOCTYPE invoice [ <!ELEMENT invoice (account_number, bill_period, carrier+, itemized_call*, total)> <!ATTLIST itemized_call no ID #REQUIRED date CDATA #REQUIRED number_called CDATA #REQUIRED time CDATA #REQUIRED rate (NIGHT|DAY) #REQUIRED min CDATA #REQUIRED amount CDATA #REQUIRED> ]> Jun 9 - Jul 8, 2000 Sprint $0.35 Example of Telephone Bill

14 Example XQuery User XQuery: { FOR $rate IN LET $itemized_call := WHERE LIKE ‘973%’ RETURN $rate count($itemized_call) } Count number of itemized_calls in calling area 973 grouped by the calling rate.

15 XQuery  XML Algebra Tree Divide into query blocks Convert each query block into XML Algebra Tree (XAT). Identify Correlated Operators Combine into one XML Algebra Tree. Query decorrelation

16 Query Blocks User XQuery: { FOR $rate IN LET $itemized_call := WHERE LIKE ‘973%’ RETURN $rate count($itemized_call) } B1: Construct summary from the result from B2 B2: Get all the distinct rate and iterate through it. B1 B2 B3 B3: Count itemized call for a given rate. The block identification is arbitrary (wrong).

17 XAT of B1 B1 B2 XAT: Tagger( [V1] ) B2 [V2] it is a name instead of a part of pattern. Name(“Tagger( [V1] )”, “V2”)

18 XAT of B2 B3 { FOR $rate IN distinct(document(“invoice”) } B3 XAT: B3 Source(“invoice.xml”) Navigate(“/”, “$rate”) FOR($rate) Aggregate

19 XAT of B3 B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call WHERE $itemized_call LIKE ‘973%’ RETURN $rate count($itemized_call) XAT: Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call) B2 = “$rate”) Name(“invoice/itemized_call:/”, “$itemized_call”)

20 XAT of B3 (Cont.) B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call WHERE $itemized_call LIKE ‘973%’ RETURN $rate count($itemized_call) XAT: like ‘973%’)

21 XAT of B3 (Cont.) B4 LET $itemized_call := document(“invoice”) /invoice/itemized_call WHERE $itemized_call LIKE ‘973%’ RETURN $rate count($itemized_call) XAT: Tagger( [$rate] [count($itemized_call)] ) Select(count(“$itemized_call”)) B2 Name(“Tagger( [$rate] [count($itemized_call)] )”, “V1”)

22 Put it Together Select(count(“$itemized_call”)) like ‘973%’) Source(“invoice.xml”) Navigate(“/”, invoice/itemized_call) = “$rate”) Name(“Tagger( [V1] )”, “V2”) Source(“invoice.xml”) Navigate(“/”, Name(“invoice/itemized_call:/”, “$itemized_call”) B1 B2 B3 FOR($rate) “$rate”) Aggregate() Tagger( [$rate] [count($itemized_call)] ) Name(“Tagger( [$rate] [count($itemized_call)] )”, “V1”) Tagger( [V1] )

23 Syntax Suger Select(count(“$itemized_call”)) like ‘973%’) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) = “$rate”) V2 := Tagger( [V1] ) $rate := Source(“invoice.xml”) Navigate(“/”, B1 B2 B3 FOR($rate) Aggregate() V1:=Tagger( [$rate] [count($itemized_call)] )

24 Query Decorrelation for COp Top-down approach over XAT Tree. Approach: Correlated Binding (CB) Op1[COp(CB, Op2)[Op3[Correlated Operator[A],B]]]  Op1[ROJ(CB)[Op2[Groupby(CB, Op3[]) [Operator[Cartesian[A,B]]]], B]] For example: Correlated Join  Outer Join with Groupby with Cartesian

25 Query Decorrelation for FOR Top-down approach over XAT Tree. Approach: Correlated Binding (CB) Op1[FOR(CB)[Op2[Correlated Operator[A],B]]]  Op1[Groupby(CB, Op2[]) [Operator[Cartesian[A,B]]]] Differences: SQL Decorrelation: Return Outer Query XQuery Decorrelation: Return Inner Query CO: Return both Outer/Inner Query

26 FOR Decorrelation Example Source(“invoice.xml”) = “$rate”) …1 Source(“invoice.xml”) …3 B2 B3 FOR($rate) …2 Source(“invoice.xml”) = “$rate”) Groupby(“$ratel”, ) Cartesian Source(“invoice.xml”) …3 B1 B2 B3 Aggregate …2 …1

27 Default XML View Jun 9 – Jun 8, 2000 $ Sprint... idaccount_numberbill_periodtotal Jun 9 – Jun 8, 2000$0.35 invoice invoice_idcarrier 1Sprint carrier invoice_idnodatenumber_calledtimerateminamount 11JUN :17pmNIGHT JUN :19amDAY JUN :25pmNIGHT30.15 itemized_call

28 User Defined XML View Jun 9 - Jul 8, 2000 Sprint $ Jun 9 – Jun 8, 2000 $ Sprint 1 … …

29 User Defined XML View Cont. Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN $invoice/account_number/text() $invoice/bill_period/text() FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN $carrier/carrier/text() FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN SORTBY $invoice/total/text() )

30 User Defined XML View Block Create view invoice as ( FOR $invoice IN view(“default”)/invoice/row RETURN $invoice/account_number/text() $invoice/bill_period/text() FOR $carrier in view(“default”)/carrier/row WHERE $carrier/invoice_id = $invoice/id RETURN $carrier/carrier/text() FOR $itemized_call in view(“default”)/itemized_call/row WHERE $itemized_call/invoice_id = $invoice/id RETURN SORTBY $invoice/total/text() ) B4 B5 B6

31 XML View XAT V4 := Tagger( [$invoice/account_number/text()] [$invoice/bill_period/text() …[V3] [$invoice/total/text()] ) V3 := Tagger( Aggregate() Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) FOR($invoice/id) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, no/text()) Navigate($itemized_call, invoice_id) Select(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) B5 Navigate($itemized_call, amount/text()) …

32 3-Way Correlation …2 Source(“invoice.xml”) B4 FOR($invoice/id) …1 B5B6

33 3-Way Decorrelation …2 Source(“default.xml”) B4 JOIN($invoice/id) …1 B5 with CartesianB6 with Cartesian GB($invoice/id, …) …2 Source(“default.xml”)

34 View XAT After Decorrelation V4 := Tagger( [$invoice/account_number/text()] [$invoice/bill_period/text() …[V3] [$invoice/total/text()] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Join($invoice/id) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, no/text()) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) B5 Navigate($itemized_call, amount/text()) … Groupby($invoice/id…) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Navigate(“$invoice”, id)

35 View Composition Input: User Query XAT + User View XAT Output: Simplified composite XAT Approach: XAT Cutting: Remove un-referenced columns and operators. Pushdown Navigation By using the commutative rules Cancel out the navigation operators By using the composition rules

36 XAT Cutting Cut Query Blocks User query only require itemized_call. B5 is cut, Invoice is cut B4 is simplified. B6 is simplified. Cut Columns User query only used

37 View XAT After B5 is Cut. V4 := Tagger( [$invoice/account_number/text()] [$invoice/bill_period/text()</bill_period[V3] [$invoice/total/text()] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, no/text()) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) Navigate($itemized_call, amount/text()) …

38 View After Columns are Cut. V4 := Tagger( [V3] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, number_called/text()) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) Navigate($itemized_call, rate/text())

39 Navigation Cancel Out Navigation Pushdown Based on some transformation rules. E.g. commutative of navigation and other operators. Navigation + Tagger Cancel Out Composition Rules. The cancellation result is “renaming”

40 Query XAT Navi. Pushdown Select(count(“$itemized_call”)) like ‘973%’) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) = “$rate”) B3 V1:=Tagger( [$rate] [count($itemized_call)] ) Select(count(“$itemized_call”)) like ‘973%’) Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) = “$rate”) V1:=Tagger( [$rate] [count($itemized_call)] )

41 Navi. Tagger Cancel Out Source(“invoice.xml”) $itemized_call := Navigate(“/”, invoice/itemized_call) B3 …1 V4 := Tagger( [V3] ) V3 := Tagger( Aggregate() Groupby($invoice/id, Aggregate()) Navigate($itemized_call, number_called/text()) Navigate($itemized_call, rate/text()) …2

42 The Result of Cancel Out …1 := Navigate($itemized_call, rate/text()) := Navigate($itemized_call, number_called/text()) …2

43 Computation Pushdown Goal: XAT  SQL operators + XML operators Step 0: Navigation Pushdown. Step 1: XML Default View  SQL Operators Renaming columns Step 2: SQL Computation Pushdown. By commutative and composition rules. E.g: predicates pushdown.

44 Navigation Pushdown. Source(“default..xml”) $invoice := Navigate(“/”,invoice/row ) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) Navigate($itemized_call, invoice_id) Join(“$itemized_call/invoice_id”=“$invoice/id”) Navigate(“$invoice”, id) := Navigate($itemized_call, rate/text()) := Navigate($itemized_call, number_called/text()) Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) := Navigate($itemized_call, rate/text()) := Navigate($itemized_call, number_called/text()) Join(“$itemized_call/invoice_id”=“$invoice/id”)

45 XML Default View  SQL Source(“default.xml”) $itemized_call := Navigate(“/”, itemized_call/row) := Navigate($itemized_call, rate/text()) := Navigate($itemized_call, number_called/text()) Source(“itemized_call”) Project(rate, number_called) := rate := number_called … …

46 Computation Pushdown A SQL Block Select(count(“$itemized_call”)) like ‘973%’) = “$rate”) V1:=Tagger( [$rate] [count($itemized_call)] ) B3 like ‘973%’) Select(count(“$itemized_call”)) = “$rate”) V1:=Tagger( [$rate] [count($itemized_call)] ) A SQL Block

47 Result of the Transformation Tagger( [V1] ) V1 := Aggregate Tagger( [rate] [count(*)] ) SQL: SELECT rate, count(*) FROM itemized_call, invoice WHERE number_called LIKE ‘973%’ AND invoice.id = itemized_call.invoice_id GROUPBY rate

48 Optimization Efficient Publishing XML Views Sorted Outer Union. Special Tagger implementation A lot More!

49 Summary XQuery  XAT Query Block Identification Query Decorrelation View Composition XAT Cutting Navigation Pushdown Navigation Cancel Out Computation Pushdown Navigation Pushdown XML Default View  SQL Operators Computation Pushdown Optimization