02.19.2004CS561 On Relational Support for XML Publishing Beyond Sorting and Tagging Surajit Chaudhuri Raghav Kaushik Jeffrey F. Naughton Presented by:

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Query Processing (overview)
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Chapter 19 Query Processing and Optimization
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
T HE Q UERY C OMPILER Prepared by : Ankit Patel (226)
Query Processing Presented by Aung S. Win.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XQuery.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Querying Structured Text in an XML Database By Xuemei Luo.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp With additional slides from.
Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.
Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
XML and Database.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Closing the Query Processing Loop in Oracle 11g Allison Lee, Mohamed Zait.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Prepared by : Ankit Patel (226)
Chapter 12: Query Processing
Introduction to Query Optimization
Evaluation of Relational Operations
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
SilkRoute: A Framework for Publishing Rational Data in XML
Query Optimization CS 157B Ch. 14 Mien Siao.
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Implementation of Relational Operations
Query Processing.
Presentation transcript:

CS561 On Relational Support for XML Publishing Beyond Sorting and Tagging Surajit Chaudhuri Raghav Kaushik Jeffrey F. Naughton Presented by: Conn Doherty

CS561 Outline Motivation & Observations Motivation & Observations XML XML Topic of Paper Topic of Paper GApply Operator Approach GApply Operator Approach Transformation Rules Transformation Rules Experiments and Results Experiments and Results Related Work Related Work Conclusions Conclusions Future Problems Future Problems

CS561 Motivation Does the need for efficient XML publishing bring any new requirements for relational query engines, or is sorting query results in the relational engine and tagging them in middleware sufficient? Does the need for efficient XML publishing bring any new requirements for relational query engines, or is sorting query results in the relational engine and tagging them in middleware sufficient?

CS561 Observations The mismatch between the XML data model and relational model requires relational engines to be enhances for efficiency The mismatch between the XML data model and relational model requires relational engines to be enhances for efficiency Need support for relation-valued variables Need support for relation-valued variables

CS561 XML Extendible Markup Language (rather a metalanguage or metametalanguage) Extendible Markup Language (rather a metalanguage or metametalanguage) Rapidly emerging as a standard for exchanging business data Rapidly emerging as a standard for exchanging business data Substantial interest in publishing existing relational data as XML Substantial interest in publishing existing relational data as XML

CS561 Current XML Publishing Most focus has been on issues external to the RDBMS Most focus has been on issues external to the RDBMS –Determining the class of XML views that can be defined –Languages used to specify the conversion from relational data to XML –Methods of composing XML queries with XML views Data warehousing has caused focus on similar issues internal to RDBMS Data warehousing has caused focus on similar issues internal to RDBMS

CS561 Primary Topic of Paper Focus closely on the class of SQL queries that are typically generated by XML publishing applications Focus closely on the class of SQL queries that are typically generated by XML publishing applications Ask if anything needs to be changed within the relational engine to efficiently evaluate these queries? Ask if anything needs to be changed within the relational engine to efficiently evaluate these queries?

CS561 YES! Differences in the XML and relational data models Differences in the XML and relational data models –cause awkward and inefficient translations of XML queries to relational SQL queries Main Issue Main Issue –XML’s hierarchical model makes it very convenient and natural to apply operators to subtrees

CS561 Part Supplier Example Part and Supplier Data Set Part and Supplier Data Set –supplier(s_key, s_name) –partsupp(ps_suppkey, ps_partkey) –part(p_partkey, p_name, p_retailprice)

CS561 Part Supplier Example Query Q1: For each supplier element, return the names and retail prices of all parts supplied by that supplier, and also, the over-all average retail price of all parts supplied Query Q1: For each supplier element, return the names and retail prices of all parts supplied by that supplier, and also, the over-all average retail price of all parts supplied Example XML Document <suppliers><supplier><sname>S1</sname><parts><part><pname>P1</pname><retailprice>10</retailprice></part><part><pname>P2</pname><retailprice>10</retailprice></part></parts></supplier><supplier><sname>S2</sname><parts><part><pname>P21</pname><retailprice>12</retailprice></part><part><pname>P22</pname><retailprice>13</retailprice></part></parts></supplier><suppliers>

CS561 Example Queries XQuery XQuery For $s in /doc(tpch.xml)/suppliers/supplier Return $s/s_suppkey <parts> For $p in $s/part Return Return $p/p_name$p/p_retailprice</part></parts>avg($s/part/p_retailprice)</ret> SQL SQL (select ps_suppkey, p_name, p_retailprice,null from partsupp, part where ps_partkey = p_partkey union all select ps_suppkey,null,null, avg(p_retailprice) from partsupp, part where ps_partkey = p_partkey group by ps_suppkey) Order by ps_suppkey SQL (relational data model) is hard to express and inefficient SQL (relational data model) is hard to express and inefficient –Unable to bind a variable to sets of tuples and execute subqueries on these sets

CS561 3 Angle Approach 1) New operator, GApply 1) New operator, GApply –Binds variable to sets of tuples –Allows subqureies to be executed over set of tuples (tmp relation) bound to a variable 2) Propose transformation rules to modify query plan trees with GApply operator 2) Propose transformation rules to modify query plan trees with GApply operator 3) Expose GApply operator in SQL syntax 3) Expose GApply operator in SQL syntax

CS561 GApply Operator Syntax: GApply(GCols, PGQ) Syntax: GApply(GCols, PGQ) –GCols: grouping/partitioning columns –PGQ: per-group query Input tuple stream is partitioned on GCols Input tuple stream is partitioned on GCols PGQ applied to each group PGQ applied to each group Output is the union of all above results taken over all groups Output is the union of all above results taken over all groups

CS561 Terminology Outer tuple stream: input tuple stream Outer tuple stream: input tuple stream Inner query: per-group query Inner query: per-group query Outer child of GApply: root of outer query Outer child of GApply: root of outer query Inner child of GApply: root of inner query Inner child of GApply: root of inner query

CS561 PGQ Restrictions Only operate on temporary relation associated with the group of tuples Only operate on temporary relation associated with the group of tuples Operator type also known as groupwise processing Operator type also known as groupwise processing Operators allowed in PGQ: scan, select, project, distinct, apply, exists, union(all), groupby, aggregate, and orderby Operators allowed in PGQ: scan, select, project, distinct, apply, exists, union(all), groupby, aggregate, and orderby

CS561 Physical Implemenation Two Phases: Two Phases: –Partitioning Phase  Implemented using sorting or hashing –Execution Phase  Performed in nested loop fashion  PGQ is evaluated on each group of tuples –Each group is a temporary relation bound to a relation- valued parameter $group

CS561 Implementation Diagram Outer Child Outer Query Partition Phase Inner Child Inner Query Execution Phase NL – Nested Loop Tmp relation: $group $group

CS561 Expose GApply in Syntax Difficult for the parser and optimizer to determine when GApply applies Difficult for the parser and optimizer to determine when GApply applies Tests on Microsoft SQL Server 2000 with GApply operator not exposed in syntax Tests on Microsoft SQL Server 2000 with GApply operator not exposed in syntax –Need sometimes identified by optimizer –Use in each case, considerably speeds up performance

CS561 Proposed Syntax Proposed extension to SQL syntax Proposed extension to SQL syntax SQL query performing groupwise processing: SQL query performing groupwise processing: –Select gapply(PGQ(x)) as –Select gapply(PGQ(x)) as from from where where group by : x –x is a relation-valued variable

CS561 Example Query in Syntax Query Q1: Query Q1: –select gapply(PGQ1(tmpSupp)) from partsupp, part where ps_partkey = p_partkey group by ps_suppkey: tmpSupp –PGQ1(tmpSupp)  select p_name, p_retailprice, null from tmpSupp union all select null, null, avg(p_retailprice) from tmp

CS561 Transformation Rules Precise semantics of the operators Precise semantics of the operators Three categories Three categories –1) Pushing Computation into the Outer Query  Placing Projections Before GApply  Placing Selections Before GApply  Converting GApply to groupby –2) Group Selection –3) Pushing GApply Below Joins

CS561 Rule 2 Group Selection Group Selection –Consider PGQ that either return whole group (subtree) or nothing based on a predicate –Two methods to evaluate  Join suppliers & parts, group by suppkey, check selection method on group, if true - return group  Selection method to get suppkeys, then return join –Second method will win if predicate is highly selective

CS561 Rule 2 cont. –Example For $s in /doc(tpch.xml)/suppliers /supplier[/part/p_retailprice > 1000] /supplier[/part/p_retailprice > 1000] Return $s

CS561 Integrating Rules in Optimizer None of the rules above loop -> optimizer terminates None of the rules above loop -> optimizer terminates Optimizer must estimate the cost of the GApply operation Optimizer must estimate the cost of the GApply operation

CS561 Preliminary Experiments Performance study Performance study –Find efficacy of the GApply operator to speed up queries –Understand impact of each proposed transformation rule Microsoft SQL Server 2000 Microsoft SQL Server 2000 –Supports GApply without syntax exposure –Control over GApply invocation is needed  Simulate operation of GApply on the client side

CS561 Client Side Simulation of GApply Partition Partition –Sorting –Hashing (simulation) Execute Execute –Store result of outer query in temporary table –For each distinct tmp group relation, evaluate PGQ on that relation, then union all results

CS561 Estimate Running Time Measure both elapsed time and CPU time Measure both elapsed time and CPU time Operator trees with GApply is the top most operator Operator trees with GApply is the top most operator Expect real elapsed time less in full server implementation Expect real elapsed time less in full server implementation

CS561 Setup Experimental Setup Experimental Setup –TPCH benchmark data –5GB database –Server  1 GHz processor  784 MB main memory  512 MB buffer pool –Each query ran several times and then average taken

CS561 Results Effectiveness of GApply Effectiveness of GApply –Comparable whether performing partitioning using sorting or hashing –Tested 4 queries representing a wide range of queries

CS561 GApply Effectiveness Results –Main conclusions:  GApply is a useful operator even for simple XQuery queries  Yields improvements of factors of up to 2x faster  Queries representative of a wide class of queries  Q4 took 20% longer with the client side implementation  Q1, Q2, Q3 expect performance improvements with server side implementation (hash-based partitioning)

CS561 Results cont. Effectiveness of Optimization Rules Effectiveness of Optimization Rules –Tested the improvement obtained by firing each rule –Performance metric is elapsed time –Method:  Choose relevant parameterized query  Vary parameter and find performance benefit for each value  Benefit ratio: elapsed time without the rule to time taken with the rule fired

CS561 Rule Effectiveness Example Query: Query: –For $s in /doc(tpch.xml)/suppliers /supplier[/part/p_retailprice > x] /supplier[/part/p_retailprice > x] Return $s –x parameter determines the selectivity of selection

CS561 Results cont. Effectiveness of Optimization Rules Effectiveness of Optimization Rules –Main conclusions:  Proposed rules can have significant impact on elapsed time of a query involving GApply  Some rules always lowered cost of the query, while other occasionally lowered or increased cost  Benefit of converting GApply to groupby is comparatively lower

CS561 Related Work Xperanto Project Xperanto Project –Concluded, pushing as much computation to relational engine is best SilkRoute Project SilkRoute Project –Language to specify the conversion between relational data and XML ROLEX Project ROLEX Project –To avoid inefficient parsing in applications, the relational engine returns a navigable result tree Difference Difference –Question whether whole process of XML publishing has any impact on the core relational operators (YES)

CS561 Conclusions Relational engine must provide support for binding variable to sets of tuples Relational engine must provide support for binding variable to sets of tuples Required support can be enabled through the GApply operator with seamless integration into existing relational engines Required support can be enabled through the GApply operator with seamless integration into existing relational engines Operator should be exposed in the syntax Operator should be exposed in the syntax Optimization rules are needed Optimization rules are needed

CS561 Future Problems How should modified syntax be exploited by algorithms to translate XML queries over XML views of relational data? How should modified syntax be exploited by algorithms to translate XML queries over XML views of relational data? Any other changes needed to meet the requirements of XML publishing? Any other changes needed to meet the requirements of XML publishing? What changes are needed in the optimizer if the relational database returns navigable results? What changes are needed in the optimizer if the relational database returns navigable results?

CS561 Other Papers D. Chatziantoniou and K. A. Ross. Querying multiple features of groups in relational databases. In VLDB, D. Chatziantoniou and K. A. Ross. Querying multiple features of groups in relational databases. In VLDB, –Extension to SQL syntax with relational algebra implementation D. Chatziantoniou and K. A. Ross. Groupwise processing of relational queries. In VLDB, D. Chatziantoniou and K. A. Ross. Groupwise processing of relational queries. In VLDB, –Methods to identify group query components C. A. Galindo-Legaria and M. M. Joshi. Ortogonal optimization of subqueries and aggregation. In SIGMOD, C. A. Galindo-Legaria and M. M. Joshi. Ortogonal optimization of subqueries and aggregation. In SIGMOD, –Introduction of segmentApply operator and many transformation rules