Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)

Slides:



Advertisements
Similar presentations
Mathematical Preliminaries
Advertisements

Chapter 13: Query Processing
Applications Computational LogicLecture 11 Michael Genesereth Spring 2004.
Constraint Satisfaction Problems
Advanced Piloting Cruise Plot.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
October 17, 2005 Copyright© Erik D. Demaine and Charles E. Leiserson L2.1 Introduction to Algorithms 6.046J/18.401J LECTURE9 Randomly built binary.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Summary of Convergence Tests for Series and Solved Problems
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.
Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
1 Column Generation. 2 Outline trim loss problem different formulations column generation the trim loss problem master problem and subproblem in column.
Randomized Algorithms Randomized Algorithms CS648 1.
ABC Technology Project
Chapter 10: Virtual Memory
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
Green Eggs and Ham.
VOORBLAD.
LT Codes Paper by Michael Luby FOCS ‘02 Presented by Ashish Sabharwal Feb 26, 2003 CSE 590vg.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.
© 2012 National Heart Foundation of Australia. Slide 2.
Lecture plan Outline of DB design process Entity-relationship model
Formal models of design 1/28 Radford, A D and Gero J S (1988). Design by Optimization in Architecture, Building, and Construction, Van Nostrand Reinhold,
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Artificial Intelligence
Chapter 5 Test Review Sections 5-1 through 5-4.
25 seconds left…...
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
Januar MDMDFSSMDMDFSSS
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 6 The Relational Algebra.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
PSSA Preparation.
Essential Cell Biology
Choosing an Order for Joins
The Relational Algebra
Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©
Bart Jansen 1.  Problem definition  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least k leaves?
all-pairs shortest paths in undirected graphs
1 Programming Languages (CS 550) Mini Language Interpreter Jeremy R. Johnson.
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.
1 General Structural Equation (LISREL) Models Week 3 #2 A.Multiple Group Models with > 2 groups B.Relationship to ANOVA, ANCOVA models C.Introduction to.
Presentation transcript:

Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)

2 Background Sort-based query processing algorithms Sort-merge Join (also Union/Intersection) Sort-based grouping and duplicate elimination Explicit order by Notion of Interesting Sort Orders (System-R) Find and remember the best plan for each sort order that may be useful Optimization goal in Volcano : (expr, sort-order)

3 The Problem Interesting orders can be too many! Factorial in number of attributes involved Plan cost can vary substantially with the choice of interesting order Clustering and covering indices Other operators in the input sub-expressions Possibility of partial sorting G Group By {a 2,a 4,a 5,… } R S R.a 1 =S.a 1 and R.a 2 =S.a 2 … R.a n =S.a n

4 Motivation Joins in data integration and decision support involve large number of attributes Increasing use of covering indices Several alternative sort orders Partial sorting Query patterns Attributes common to multiple operators Known techniques Work only for unary operators like group-by

5 Outline of the Talk Partial sorting Changes to external sort Optimizer changes to handle partial sort orders Interesting orders for a join tree : A special case Problem is NP-Hard A 2-approximation for the special case The general problem Notion of favorable orders Plan generation using favorable orders Post-optimization phase Experimental results

6 Exploiting Partial Sort Orders Sort on (a 1, a 2 ) given (a 1 ) Standard external-sort Cost is independent of input sort order Replacement-selection Produces single run but incurs I/O Both methods break the pipeline – first o/p tuple after reading all i/p RS R.a 1 =S.a 1 and R.a 2 =S.a 2 C. Index on (R.a 1 ) (a 1 ) (a 1,a 2 ) () (a 1,a 2 )

7 A Minor Change to External Sorting Multiple partial sort segments Hold only one segment at any given time When a new segment starts Sort the current segment and output No run generation I/O if each segment fits in memory Early output (good for Top-K) Reduced comparisons O(n log n/k) Vs. O(n log n), k = # segments a1a ……

8 Optimizer Changes to Handle Partial Sort Orders Cost Model for Partial Sort: Let the input order be o 1 Required (output) order be o 2 Let o s =Longest common prefix between o 1 and o 2 Let o r =o 2 – o s (i.e, o s + o r = o 2 ) A(o) = Attribute set of order o Є : Empty (no) sort order coe(e, o 1,o 2 ) = D(e, A(o s )) X coe(e, Є, o r ), where e= p (e) and p equates A(o s ) to a constant.

9 Optimizer Changes to Handle Partial Sort Orders Cost Model for Partial Sort: coe(e, o 1,o 2 ) = D(e, A(o s )) X coe(e, Є, o r ), where e= p (e) and p equates A(o s ) to a constant. o 1 =(a,b) o 2 =(a,c) o s =(a), o r =(c), e= (a=k) (e) e

10 Flexible Order Requirements Most operators have interest in any order on the attributes involved Merge-Join, Merge-Union, Group By, Duplicate Elimination Binary operators demand the same order from inputs G {a1, a2} {a1,a2,a3,a4} {a4,a7}{a3,a5,a6}

11 Finding Optimal is NP-Hard A special case: All relations/intermediate results of the same size All attribute cardinalities same We try to maximize the length of common prefixes Maximize LCP(pi, pj) Reduction from graph layout problem SUM-CUT Optimal algorithm for paths and 2-approximation for binary trees

12 A 2-Approximation Algorithm Optimal algorithm for paths s2s2 s1s1 snsn s3s3 S n-1 OPT(i,j) = max {OPT(i,k) + OPT(k+1,j) + c(i,j)}, i k < j 2-Approximation for binary trees - OPT OPT-EVEN + OPT-ODD - Take the one with higher benefit Even levelsOdd levels

13 General Case Logical plan space for inputs not expanded (i.e, Join order not fixed) Varying sizes of relations and intermediate results All orders on base relations do not have the same cost (due to clustering and covering indices)

14 Overview of the Approach Identify a small set of favorable orders Orders that are relatively inexpensive Should not require expanding the input plan space Plan generation (Phase-1) Deduce the interesting orders from the favorable orders Try each of the interesting order, retain the best Plan refinement (Phase-2) Use the 2-approximation algorithm and refine the sort orders further

15 Favorable Orders Benefit of an order: benefit(o, e) = cbp(e, Є) + coe (e, Є, o) – cpb(e,o) Positive benefit The order can be obtained at cost less than the full sort of unordered result (e.g., the clustering order) Favorable orders: ford(e)={ o : benefit(o,e) > 0 } Can be a huge set E.g., Every order having the clustering order as its prefix is a favorable order.

16 Minimal Favorable Orders A favorable order o that satisfies: 1. o o s.t. cbp(e, o) + coe(e, o, o) = cbp(e,o) 2. o s.t. o o and cbp(e, o) = cbp(e,o) E.g., Relation R with clustering index on (a 1,a 2 ) (a 1,a 2 ) is a minimal favorable order (a 1 ), (a 1,a 2,a 3 ) are not ford-min(e) : Set of all minimal favorable orders for expression e For base relations size of ford-min limited to the number of covering indices EE

17 Computing Favorable Orders: Issues Defined in terms of cost of best plan Need them before optimizing input sub-expressions Even ford-min can get prohibitively large for join, group-by expressions R S J1 J2 ford-min contains every permutation of the join attributes

18 Heuristics for Computing ford-min e=R : {o: o is clustering or covering index order} e= p (e 1 ) : {o: o ford-min(e 1 )} e= L (e 1 ) :{o: o ford-min(e 1 ) and o=o ^ L} a,b (e 1 ), ford-min(e1)={(a,c,b)} ford-min(e)={(a)} e=e 1 e 2 : Let T=ford-min(e1) U ford-min(e2) T U {o: o T and o=((o ^ S) permute(S – A(o ^ S))) U U U

19 Heuristics for Computing ford-min S={a,b,c,d} ford-min={(a,b,e),(b)} ford-min={(a)} T = {(a,b,e), (b), (a)} Input F.Order (o)o ^ {a,b,c,d}Extended Order (a,b,e)(a,b)(a,b,c,d) (b) (b,a,c,d) (a)

20 Plan Generation (Phase-1) Form the set I of interesting orders to try Collect input favorable orders and rqd. o/p order Take LCP with the set of join attributes Extend the orders (arbitrarily) to include remaining attributes For each order o in I, generate optimization sub-goals for input sub-expressions

21 Plan Refinement (Phase-2) Identify the suffix that can be freely reordered Use the 2-approximation algorithm to reorder the suffix R2 (a) (a,b,c,h) (a,d,h) R4 (a) R3 (a) R1 (a) (a,e,h) {a,d,h} {a,e,h} (a,h,e) (a,h,b,c) (a,h,d)

22 Experiments 1. Benefits of exploiting partial sort orders 2. Evaluate the plans produced by our optimizer extensions Systems Compared PostgreSQL 8.1.3, SQLServer 2005, DB2 8.2, PYRO Test MachineIntel P4 (HT) PC, 512 MB DatasetTPC-H 1GB and synthetic QueriesSynthetic and from a real application

23 Experiment 1 SELECT suppkey, partkey FROM lineitem ORDER BY suppkey, partkey; (suppkey) (suppkey, partkey)

24 Experiment 2 R(c1,c2,c3), 10 M records, (c1) (c1,c2), card(c1)=10,000

25 Experiment 3

26 Experiment 4 SELECT ps_suppkey, ps_partkey, ps_availqty, sum(l_quantity) AS total_required FROM partsupp, lineitem WHERE ps_suppkey=l_suppkey AND ps_partkey=l_partkey AND l_linestatus='O' GROUP BY ps_partkey, ps_suppkey, ps_availqty, HAVING sum(l_quantity) > ps_availqty ORDER BY ps_partkey; Parts running out of stock:

27 Experiment 4 - Plans Merge-Join Plan on SYS1 and SYS2Plan Generated by PYRO-O

28 Experiment 4 & 5 - Timings

29 Experiments with Variants of PYRO PYRO : Baseline PYRO PYRO-O - : No partial sort PYRO-P : Postgres Heuristic PYRO-O : Our Approach PYRO-E : Exhaustive

30 Optimization Overheads

31 Questions?