University of Konstanz Advances in Database Query Processing Sahak Maloyan Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan.

Slides:



Advertisements
Similar presentations
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Advertisements

EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Query Optimization Dr. Karen C. Davis Professor School of Electronic and Computing Systems School of Computing Sciences and Informatics.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Chapter 19 Query Processing and Optimization
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Optimizing queries using materialized views J. Goldstein, P.-A. Larson SIGMOD 2001.
Concepts of Database Management, Fifth Edition
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost.
© 1999 FORWISS FORWISS MISTRAL Performance of TPC-D Benchmark and Datawarehouses Prof. R. Bayer, Ph.D. Dr. Volker Markl Dept. of Computer Science, Technical.
From Relational Algebra to SQL CS 157B Enrique Tang.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 13: Query Processing
Chapter 6 The Relational Algebra Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 6- 1.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
CS4432: Database Systems II Query Processing- Part 2.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Mostafa Elhemali Leo Giakoumakis. Problem definition QRel system overview Case Study Conclusion 2.
CSCE Database Systems Chapter 15: Query Execution 1.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Query Processing CS 405G Introduction to Database Systems.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
CSE202 Database Management Systems
Chapter (6) The Relational Algebra and Relational Calculus Objectives
Evaluation of Relational Operations: Other Operations
Chapter 3 The Relational Database Model
The Relational Algebra and Relational Calculus
Evaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

University of Konstanz Advances in Database Query Processing Sahak Maloyan Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan

University of Konstanz Advances in Database Query Processing Sahak Maloyan Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion

University of Konstanz Advances in Database Query Processing Sahak Maloyan Motivation Previous presentation: Fundamental Techniques for Order Optimization Using FDs and selection predicates Determining order propagation from input to output Infer from ordering Current presentation: Aside from orderings, we also infer how relations are grouped (i.e., how records in relations are clustered according to value of certain attributes) Infer from grouping Infer from secondary ordering

University of Konstanz Advances in Database Query Processing Sahak Maloyan Motivation(cont.) Inferred orderings –Make it possible to avoid sorting when preprocessing ORDER BY clauses of SQL query Inferred groupings –Avoid sorting or hashing prior to computing aggregates for GROUP BY clauses –Reduce the cost of projection with duplicate elimination –Complete projection and duplicate elimination in a single pass –Reduce the cost of evaluating selection queries in the form σ A=k (R) in the absence of indexes or an ordering on A Inference of secondary ordering and grouping –Avoid unnecessary sorting or grouping over multiple attributes –Infer new primary orderings or groupings (example follows)

University of Konstanz Advances in Database Query Processing Sahak Maloyan Simple Example Benefits of inferring grouping and secondary ordering TPC-H Query SELECT c_custkey, COUNT (*) FROM Customer, Supplier WHERE c_nationkey = s_nationkey GROUPBY c_custkey How many suppliers could supply each costumer directly without having to go through customs

University of Konstanz Advances in Database Query Processing Sahak Maloyan Simple Example (cont.) group c_custkey, count(*) merge-join c_nationkey = s_nationkey sort c_nationkey table scan supplier table scan customer sort s_nationkey sort c_custkey Postgres QEP of the Query Postgres Plan first sorts the join result on the grouping attribute c_custkey so as to be able to aggregate over groups in a single pass. But one-pass aggregation requires data only to be grouped and not sorted! sort-merge join result is sorted (and hence grouped) on c_nationkey; the output tuples in the same group with respect to c_nationkey, are themselves grouped on the key of outer relation (c_custkey) “c_nationkey G →c_custkey G “ =>no sort TPC-H Query SELECT c_custkey, COUNT (*) FROM Customer, Supplier WHERE c_nationkey = s_nationkey GROUPBY c_custkey

University of Konstanz Advances in Database Query Processing Sahak Maloyan order properties have the form: each A i is an attribute, each α i either specifies an ordering (α i = O) or a grouping (α i =G) A 1 α 1 primary ordering or grouping and A 2 α 2 secondary Ordering properties are formalized with an algebra of constructors, following the signatures given below: Order Properties empty ordering combination of orderings basic orderings: order or group

University of Konstanz Advances in Database Query Processing Sahak Maloyan Grouping followed by ordering Suppose that R=(A,B) consists of 10 tuples, t 1,…,t 10, and its physical representation satisfies the order property, A o → B G. This situation is illustrated on the next slide

University of Konstanz Advances in Database Query Processing Sahak Maloyan Grouping followed by ordering (cont.) A=1 A=3 A=2 t3t3 t 1 t 2 t 7 t 6 t 5 t 4 B=1 B=2 B=1 B=2 t 9 t 10 t8t8 < < B=3 B=2 B=1 The primary ordering (A O ) says that the group of tuples with A=1 precedes the group of tuples with A=2 which precedes the group with A=3 The secondary ordering (B G ) says that within each group of tuples with like values of A, tuples are clustered together if they have the same value for B An illustration of A O → B G t 1 can precede t2 or t2 can precede t1 but the must be adjacent Two Example permutations that satisfies the order property : t 2, t 1, t 3, t 10, t 8, t 9, t 6, t 7, t 4, t 5 t 1, t 2, t 3, t 9, t 8, t 10, t 4, t 5, t 6, t 7

University of Konstanz Advances in Database Query Processing Sahak Maloyan Computing with Order Properties (cont.) The general properties have the form: Shorthand: Also, given and the shorthand: “o 1 →o 2 “ (concatenation of OP) denotes:

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Properties (cont.) for any order property that holds of a physical relation, all prefixes of that order property also hold of R an ordering on any attribute implies a grouping on that attribute If X functionally determines B, and an order property that includes all attributes in X (ordered or grouped) appearing before B α, then B α is superfluous. Identities

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Properties (cont.) Identities (cont.) special case of identity #3, covering the case where X consists of a single attribute the grouping of an attribute that is functionally determined by the attribute that follows it in the order property is superfluous

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Inference Using the algebra of order properties and their formal definitions, we can derive inference rules that state how order properties propagate through relational operators, e.g., joins:

University of Konstanz Advances in Database Query Processing Sahak Maloyan The data structures for all plan nodes in postgres include the following fields: inp1,… inp n : the fields contained in all input tuples to the node left: the left subtree of the node (set to Null for leaf nodes and Append) right: the right subtree of the node (set to Null for leaf nodes, unary operators and Append). Order Property Optimization Postgres Plan Operators Summarized

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization Postgres Plan Operators Summarized(cont.) Additional operator-specific fields provided by Postgres and used by our refinement algorithm

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization Group performs two passes over its input: 1.insert Null values between pairs of consecutive tuples with different values for attributes, att 1, …,att k, 2.apply functions F k+1,…, F n to the collection of values of attributes att k+1,…,att n respectively, for each set of tuples separated by Nulls. Hash: builds a hash table over its input using a predetermined hash function over attribute att. Postgres Plan Operators Summarized (cont.)

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization HJoin: performs a (non-order-preserving) simple hash equijoin (att 1 = att 2 ) with the relation produced by left as the probe relation, and the relation produced by right as the build relation. Merge: performs a merge equijoin (att 1 = att 2 ) with the relation produced by left as the outer relation, and the relation produced by right as the inner relation. NOP: has been added as a dummy plan operator that is temporarily made the root of a Postgres plan prior to its refinement.

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization Input: query plan tree generated by Postgres Output: an equivalent plan tree with unnecessary Sort operators (used either to order or group) removed Requires: 4 new attributes associated with every node in a query plan tree A Plan Refinement Algorithm

University of Konstanz Advances in Database Query Processing Sahak Maloyan keys: a set of attribute sets that are guaranteed to be keys of inputs to n fds: a set of functional dependencies (attribute sets → attribute) that are guaranteed to hold of inputs to n req: a single order property that is required to hold of inputs either to n or some ancestor node of n for that node to execute sat: a set of order properties that are guaranteed to be satisfied by outputs of n Order Property Optimization A Plan Refinement Algorithm(cont.) New Attributes

University of Konstanz Advances in Database Query Processing Sahak Maloyan Idea: –decorate the input plan with these new attributes –remove any Sort operator, whose child node produces a result that is guaranteed to satisfy an order property required by its parent node Accomplished in 3 passes over the input plan A Plan Refinement Algorithm (cont.) Order Property Optimization

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization Refinement of the query plan A Plan Refinement Algorithm (cont.) merge-join c_nationkey = s_nationkey sort c_nationkey table scan supplier table scan customer sort s_nationkey sort c_custkey NOP group c_custkey, count(*)

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization A Plan Refinement Algorithm (cont.) Resulting query plan with Sort removed: merge-join c_nationkey = s_nationkey sort c_nationkey table scan supplier table scan customer sort s_nationkey group c_custkey, count(*)

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization Pass 1: Functional Dependencies and Keys –A bottom-up pass, FDs and keys are propagated upwards when inferred to hold of intermediate query result Pass 2: Required Order Properties –Top-down pass to propagate required order properties (req) downwards from the root of the tree –Pseudocode of this pass given in SetReq (next slide) –New required order properties are generated by: NOP: if its child is Sort, i.e., original query includes order by Group and Unique (whose input needs to be grouped) Join operators (propagate 1 order from above into 2 below) All other nodes pass the required order properties they inherit from parent nodes to their child nodes, except for Hash and Append which propagate the empty order property to their child nodes A Plan Refinement Algorithm (cont.)

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization

University of Konstanz Advances in Database Query Processing Sahak Maloyan Pass 3:Sort Elimination –A bottom-up pass of the query plan tree that determines what order properties are guaranteed to be satisfied by outputs of each node (sat), and that concurrently removes any Sort operator, n for which n.left.sat  n.req –Algorithm: InferSat (next slides) Order Property Optimization A Plan Refinement Algorithm (cont.)

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization A Plan Refinement Algorithm (cont.) InferSat

University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization A Plan Refinement Algorithm (cont.) InferSat (cont.)

University of Konstanz Advances in Database Query Processing Sahak Maloyan Example:TPC-D (now TPC-H) Query 3 TPC-D Query 3 select l.orderkey, sum (l.extendedprice*( 1- l.discount)) as rev, o.orderdate, o.shippriority from customer, order, lineitem where o.orderkey = l.orderkey and c.custkey = o.custkey and c.mktsegment =’building’ and o.orderdate < date(‘ ’) and l.shipdate > date(‘ ’) group by l.orderkey, o.orderdate, o.shippriority order by rev desc, o.orderdate

University of Konstanz Advances in Database Query Processing Sahak Maloyan Previous presentation: –optimized plan outperformed the original plan by a factor of  2 Now: –Further improvements due to reasoning about groupings and secondary orderings Example:TPC-D(now TPC-H) Query 3

University of Konstanz Advances in Database Query Processing Sahak Maloyan NLJ R=> O o_orderkey G (U) Identitiy#5Identitiy#5 => O o_orderkey G (T) Identitiy#4Identitiy#4 => O o_custkey G → O o_orderkey G (T) MJ Rule =>O c_custkey G → c_custkey G → o_custkey G → o_orderkey G (T) and c_custkey = o_custkey => O o_custkey G → o_custkey G → o_custkey G → o_orderkey G (T) group by o_orderkey merge-join c_custkey = o_custkey nested-loops o_orderkey = l_orderkey Index scan lineitem sort c_custkey table scan order table scan customer sort o_custkey sort o_orderkey sort rev, o_orderdate Example:TPC-D(now TPC-H) Query 3 O c_custkey o (R)=> O c_custkey G (R) O o_custkey o (S)=> O o_custkey G (S) Identitiy#5Identitiy#5 => O c_custkey G → o_orderkey G (S)

University of Konstanz Advances in Database Query Processing Sahak Maloyan TPC-D (now TPC-H) Results Database: Customer table: 150,000 rows Supplier table: 10,000 rows Order table: 1,500,000 rows LineItem table: 6,000,000 rows PC: 1 GHz Pentium III Linux, with 512 MB RAM, 120 GB HDD Performance Results

University of Konstanz Advances in Database Query Processing Sahak Maloyan Performance Results group c_custkey, count(*) merge-join c_nationkey = s_nationkey sort c_nationkey table scan supplier table scan customer sort s_nationkey sort c_custkey Experiment #1 our example Postgres PlanRefinedRatio sec487.9 sec13.08 N.B.: Merge join result is HUGE (60 Mio rows)

University of Konstanz Advances in Database Query Processing Sahak Maloyan Performance Results Experiment #2 TPC-H Query 3 group by o_orderkey merge-join c_custkey = o_custkey nested-loops o_orderkey = l_orderkey Index scan lineitem sort c_custkey table scan order table scan customer sort o_custkey sort o_orderkey sort rev, o_orderdate Postgres PlanRefinedRatio sec sec0.05 Same value of o_orderkey were consecutive tuples thereby increased likelihood of finding joining tuples from lineitem in the cache

University of Konstanz Advances in Database Query Processing Sahak Maloyan Performance Results Experiment #2 TPC-H Query 3 With table scan on lineitem group by o_orderkey merge-join c_custkey = o_custkey nested-loops o_orderkey = l_orderkey Table scan lineitem sort c_custkey table scan order table scan customer sort o_custkey sort o_orderkey sort rev, o_orderdate Postgres PlanRefinedRatio sec113.3 sec1.07

University of Konstanz Advances in Database Query Processing Sahak Maloyan Cost of additional optimization How much do we pay for plan refinement? We pay most, when it actually pays off! (queries Q1, Q5, Q10: no refinement)

University of Konstanz Advances in Database Query Processing Sahak Maloyan Conclusion Formal approach to order optimization that integrates both orderings and groupings within the same comprehensive framework Also considered secondary orderings and groupings By inferring secondary orderings and groupings, it is possible to avoid unnecessary sorting or grouping over multiple attributes Use secondary orderings known of an operator's input to infer primary orderings of its output