Download presentation
Presentation is loading. Please wait.
Published byAlexia Peters Modified over 9 years ago
1
University of Konstanz Advances in Database Query Processing Sahak Maloyan Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan
2
University of Konstanz Advances in Database Query Processing Sahak Maloyan Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion
3
University of Konstanz Advances in Database Query Processing Sahak Maloyan Motivation Previous presentation: Fundamental Techniques for Order Optimization Using FDs and selection predicates Determining order propagation from input to output Infer from ordering Current presentation: Aside from orderings, we also infer how relations are grouped (i.e., how records in relations are clustered according to value of certain attributes) Infer from grouping Infer from secondary ordering
4
University of Konstanz Advances in Database Query Processing Sahak Maloyan Motivation(cont.) Inferred orderings –Make it possible to avoid sorting when preprocessing ORDER BY clauses of SQL query Inferred groupings –Avoid sorting or hashing prior to computing aggregates for GROUP BY clauses –Reduce the cost of projection with duplicate elimination –Complete projection and duplicate elimination in a single pass –Reduce the cost of evaluating selection queries in the form σ A=k (R) in the absence of indexes or an ordering on A Inference of secondary ordering and grouping –Avoid unnecessary sorting or grouping over multiple attributes –Infer new primary orderings or groupings (example follows)
5
University of Konstanz Advances in Database Query Processing Sahak Maloyan Simple Example Benefits of inferring grouping and secondary ordering TPC-H Query SELECT c_custkey, COUNT (*) FROM Customer, Supplier WHERE c_nationkey = s_nationkey GROUPBY c_custkey How many suppliers could supply each costumer directly without having to go through customs
6
University of Konstanz Advances in Database Query Processing Sahak Maloyan Simple Example (cont.) group c_custkey, count(*) merge-join c_nationkey = s_nationkey sort c_nationkey table scan supplier table scan customer sort s_nationkey sort c_custkey Postgres QEP of the Query Postgres Plan first sorts the join result on the grouping attribute c_custkey so as to be able to aggregate over groups in a single pass. But one-pass aggregation requires data only to be grouped and not sorted! sort-merge join result is sorted (and hence grouped) on c_nationkey; the output tuples in the same group with respect to c_nationkey, are themselves grouped on the key of outer relation (c_custkey) “c_nationkey G →c_custkey G “ =>no sort TPC-H Query SELECT c_custkey, COUNT (*) FROM Customer, Supplier WHERE c_nationkey = s_nationkey GROUPBY c_custkey
7
University of Konstanz Advances in Database Query Processing Sahak Maloyan order properties have the form: each A i is an attribute, each α i either specifies an ordering (α i = O) or a grouping (α i =G) A 1 α 1 primary ordering or grouping and A 2 α 2 secondary Ordering properties are formalized with an algebra of constructors, following the signatures given below: Order Properties empty ordering combination of orderings basic orderings: order or group
8
University of Konstanz Advances in Database Query Processing Sahak Maloyan Grouping followed by ordering Suppose that R=(A,B) consists of 10 tuples, t 1,…,t 10, and its physical representation satisfies the order property, A o → B G. This situation is illustrated on the next slide
9
University of Konstanz Advances in Database Query Processing Sahak Maloyan Grouping followed by ordering (cont.) A=1 A=3 A=2 t3t3 t 1 t 2 t 7 t 6 t 5 t 4 B=1 B=2 B=1 B=2 t 9 t 10 t8t8 < < B=3 B=2 B=1 The primary ordering (A O ) says that the group of tuples with A=1 precedes the group of tuples with A=2 which precedes the group with A=3 The secondary ordering (B G ) says that within each group of tuples with like values of A, tuples are clustered together if they have the same value for B An illustration of A O → B G t 1 can precede t2 or t2 can precede t1 but the must be adjacent Two Example permutations that satisfies the order property : t 2, t 1, t 3, t 10, t 8, t 9, t 6, t 7, t 4, t 5 t 1, t 2, t 3, t 9, t 8, t 10, t 4, t 5, t 6, t 7
10
University of Konstanz Advances in Database Query Processing Sahak Maloyan Computing with Order Properties (cont.) The general properties have the form: Shorthand: Also, given and the shorthand: “o 1 →o 2 “ (concatenation of OP) denotes:
11
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Properties (cont.) for any order property that holds of a physical relation, all prefixes of that order property also hold of R an ordering on any attribute implies a grouping on that attribute If X functionally determines B, and an order property that includes all attributes in X (ordered or grouped) appearing before B α, then B α is superfluous. Identities
12
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Properties (cont.) Identities (cont.) special case of identity #3, covering the case where X consists of a single attribute the grouping of an attribute that is functionally determined by the attribute that follows it in the order property is superfluous
13
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Inference Using the algebra of order properties and their formal definitions, we can derive inference rules that state how order properties propagate through relational operators, e.g., joins:
14
University of Konstanz Advances in Database Query Processing Sahak Maloyan The data structures for all plan nodes in postgres include the following fields: inp1,… inp n : the fields contained in all input tuples to the node left: the left subtree of the node (set to Null for leaf nodes and Append) right: the right subtree of the node (set to Null for leaf nodes, unary operators and Append). Order Property Optimization Postgres Plan Operators Summarized
15
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization Postgres Plan Operators Summarized(cont.) Additional operator-specific fields provided by Postgres and used by our refinement algorithm
16
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization Group performs two passes over its input: 1.insert Null values between pairs of consecutive tuples with different values for attributes, att 1, …,att k, 2.apply functions F k+1,…, F n to the collection of values of attributes att k+1,…,att n respectively, for each set of tuples separated by Nulls. Hash: builds a hash table over its input using a predetermined hash function over attribute att. Postgres Plan Operators Summarized (cont.)
17
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization HJoin: performs a (non-order-preserving) simple hash equijoin (att 1 = att 2 ) with the relation produced by left as the probe relation, and the relation produced by right as the build relation. Merge: performs a merge equijoin (att 1 = att 2 ) with the relation produced by left as the outer relation, and the relation produced by right as the inner relation. NOP: has been added as a dummy plan operator that is temporarily made the root of a Postgres plan prior to its refinement.
18
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization Input: query plan tree generated by Postgres Output: an equivalent plan tree with unnecessary Sort operators (used either to order or group) removed Requires: 4 new attributes associated with every node in a query plan tree A Plan Refinement Algorithm
19
University of Konstanz Advances in Database Query Processing Sahak Maloyan keys: a set of attribute sets that are guaranteed to be keys of inputs to n fds: a set of functional dependencies (attribute sets → attribute) that are guaranteed to hold of inputs to n req: a single order property that is required to hold of inputs either to n or some ancestor node of n for that node to execute sat: a set of order properties that are guaranteed to be satisfied by outputs of n Order Property Optimization A Plan Refinement Algorithm(cont.) New Attributes
20
University of Konstanz Advances in Database Query Processing Sahak Maloyan Idea: –decorate the input plan with these new attributes –remove any Sort operator, whose child node produces a result that is guaranteed to satisfy an order property required by its parent node Accomplished in 3 passes over the input plan A Plan Refinement Algorithm (cont.) Order Property Optimization
21
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization Refinement of the query plan A Plan Refinement Algorithm (cont.) merge-join c_nationkey = s_nationkey sort c_nationkey table scan supplier table scan customer sort s_nationkey sort c_custkey NOP group c_custkey, count(*)
22
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization A Plan Refinement Algorithm (cont.) Resulting query plan with Sort removed: merge-join c_nationkey = s_nationkey sort c_nationkey table scan supplier table scan customer sort s_nationkey group c_custkey, count(*)
23
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization Pass 1: Functional Dependencies and Keys –A bottom-up pass, FDs and keys are propagated upwards when inferred to hold of intermediate query result Pass 2: Required Order Properties –Top-down pass to propagate required order properties (req) downwards from the root of the tree –Pseudocode of this pass given in SetReq (next slide) –New required order properties are generated by: NOP: if its child is Sort, i.e., original query includes order by Group and Unique (whose input needs to be grouped) Join operators (propagate 1 order from above into 2 below) All other nodes pass the required order properties they inherit from parent nodes to their child nodes, except for Hash and Append which propagate the empty order property to their child nodes A Plan Refinement Algorithm (cont.)
24
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization
25
University of Konstanz Advances in Database Query Processing Sahak Maloyan Pass 3:Sort Elimination –A bottom-up pass of the query plan tree that determines what order properties are guaranteed to be satisfied by outputs of each node (sat), and that concurrently removes any Sort operator, n for which n.left.sat n.req –Algorithm: InferSat (next slides) Order Property Optimization A Plan Refinement Algorithm (cont.)
26
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization A Plan Refinement Algorithm (cont.) InferSat
27
University of Konstanz Advances in Database Query Processing Sahak Maloyan Order Property Optimization A Plan Refinement Algorithm (cont.) InferSat (cont.)
28
University of Konstanz Advances in Database Query Processing Sahak Maloyan Example:TPC-D (now TPC-H) Query 3 TPC-D Query 3 select l.orderkey, sum (l.extendedprice*( 1- l.discount)) as rev, o.orderdate, o.shippriority from customer, order, lineitem where o.orderkey = l.orderkey and c.custkey = o.custkey and c.mktsegment =’building’ and o.orderdate < date(‘1998-11-30’) and l.shipdate > date(‘1998-11-30’) group by l.orderkey, o.orderdate, o.shippriority order by rev desc, o.orderdate
29
University of Konstanz Advances in Database Query Processing Sahak Maloyan Previous presentation: –optimized plan outperformed the original plan by a factor of 2 Now: –Further improvements due to reasoning about groupings and secondary orderings Example:TPC-D(now TPC-H) Query 3
30
University of Konstanz Advances in Database Query Processing Sahak Maloyan NLJ R=> O o_orderkey G (U) Identitiy#5Identitiy#5 => O o_orderkey G (T) Identitiy#4Identitiy#4 => O o_custkey G → O o_orderkey G (T) MJ Rule =>O c_custkey G → c_custkey G → o_custkey G → o_orderkey G (T) and c_custkey = o_custkey => O o_custkey G → o_custkey G → o_custkey G → o_orderkey G (T) group by o_orderkey merge-join c_custkey = o_custkey nested-loops o_orderkey = l_orderkey Index scan lineitem sort c_custkey table scan order table scan customer sort o_custkey sort o_orderkey sort rev, o_orderdate Example:TPC-D(now TPC-H) Query 3 O c_custkey o (R)=> O c_custkey G (R) O o_custkey o (S)=> O o_custkey G (S) Identitiy#5Identitiy#5 => O c_custkey G → o_orderkey G (S)
31
University of Konstanz Advances in Database Query Processing Sahak Maloyan TPC-D (now TPC-H) Results Database: Customer table: 150,000 rows Supplier table: 10,000 rows Order table: 1,500,000 rows LineItem table: 6,000,000 rows PC: 1 GHz Pentium III Linux, with 512 MB RAM, 120 GB HDD Performance Results
32
University of Konstanz Advances in Database Query Processing Sahak Maloyan Performance Results group c_custkey, count(*) merge-join c_nationkey = s_nationkey sort c_nationkey table scan supplier table scan customer sort s_nationkey sort c_custkey Experiment #1 our example Postgres PlanRefinedRatio 6384.9 sec487.9 sec13.08 N.B.: Merge join result is HUGE (60 Mio rows)
33
University of Konstanz Advances in Database Query Processing Sahak Maloyan Performance Results Experiment #2 TPC-H Query 3 group by o_orderkey merge-join c_custkey = o_custkey nested-loops o_orderkey = l_orderkey Index scan lineitem sort c_custkey table scan order table scan customer sort o_custkey sort o_orderkey sort rev, o_orderdate Postgres PlanRefinedRatio 126.8 sec2729.9 sec0.05 Same value of o_orderkey were consecutive tuples thereby increased likelihood of finding joining tuples from lineitem in the cache
34
University of Konstanz Advances in Database Query Processing Sahak Maloyan Performance Results Experiment #2 TPC-H Query 3 With table scan on lineitem group by o_orderkey merge-join c_custkey = o_custkey nested-loops o_orderkey = l_orderkey Table scan lineitem sort c_custkey table scan order table scan customer sort o_custkey sort o_orderkey sort rev, o_orderdate Postgres PlanRefinedRatio 121.4 sec113.3 sec1.07
35
University of Konstanz Advances in Database Query Processing Sahak Maloyan Cost of additional optimization How much do we pay for plan refinement? We pay most, when it actually pays off! (queries Q1, Q5, Q10: no refinement)
36
University of Konstanz Advances in Database Query Processing Sahak Maloyan Conclusion Formal approach to order optimization that integrates both orderings and groupings within the same comprehensive framework Also considered secondary orderings and groupings By inferring secondary orderings and groupings, it is possible to avoid unnecessary sorting or grouping over multiple attributes Use secondary orderings known of an operator's input to infer primary orderings of its output
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.