RankSQL: Query Algebra and Optimization for Relational Top-k Queries

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Choosing an Order for Joins
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
RankSQL: Supporting Ranking Queries in RDBMS Chengkai Li (UIUC) Mohamed A. Soliman (Univ. of Waterloo) Kevin Chen-Chuan Chang (UIUC) Ihab F. Ilyas (Univ.
1 RankSQL: Query Algebra and Optimization for Relational Top-k Queries Chengkai Li (UIUC) joint work with Kevin Chen-Chuan Chang (UIUC) Ihab F. Ilyas (U.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
The Volcano/Cascades Query Optimization Framework
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Efficient Query Evaluation on Probabilistic Databases
SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Query Processing Presented by Aung S. Win.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Access Path Selection in a Relational Database Management System Selinger et al.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
Database Management 9. course. Execution of queries.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
The Volcano Query Optimization Framework S. Sudarshan (based on description in Prasan Roy’s thesis Chapter 2)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
Cost Estimation For each plan considered, must estimate cost: –Must estimate cost of each operation in plan tree. Depends on input cardinalities. –Must.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 13: Query Processing
1 Chengkai Li Kevin-Chen-Chuan Chang Ihab Ilyas Sumin Song Presented by: Mariam John CSE /20/2006 RankSQL: Query Algebra and Optimization for Relational.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
Chiu Luk CS257 Database Systems Principles Spring 2009
Chapter 14: Query Optimization
Ritu CHaturvedi Some figures are adapted from T. COnnolly
Query Optimization Heuristic Optimization
Database System Implementation CSE 507
UNIT 11 Query Optimization
Database Management System
Efficient Join Query Evaluation in a Parallel Database System
Seung-won Hwang, Kevin Chen-Chuan Chang
Supporting Ad-Hoc Ranking Aggregates
A paper on Join Synopses for Approximate Query Answering
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Overview of Query Optimization
Lecture 26: Query Optimization
Query Optimization Overview
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Relational Query Optimization
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Probabilistic Databases
Relational Query Optimization (this time we really mean it)
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
Relational Query Optimization
Relational Query Optimization
Presentation transcript:

RankSQL: Query Algebra and Optimization for Relational Top-k Queries AUTHORS: Chengkai Li Kevin Chen-Chuan Chang Ihab F. Ilyas Sumin Song Presenter: Roman Yarovoy October 3, 2007

Before RankSQL Ranking (top-k) queries: Query result is sorted by rank and limited to top k results. Support for ranking was lacking from RDBMS. Previously, isolated cases of top-k query processing were studied. No way to integrate top-k operations with other relational operations.

Previous (traditional) approach Query processing without ranking support: Evaluate select-project-join (SPJ) query and materialize the result. Sort the result according to a given ranking function. Take only top k tuples. Associated problems: No interest in total order of all the results. Evaluating ranking function(s) can be expensive.

Key contribution Li et al. proposed: Extending relational algebra to support ranking as a first-class database construct. Consequence: Rank-aware relational query engine  Rank-aware query optimization.

Top-k query: Example 1 R T TID a1 a2 p1 p2 p3 r1 27 50 0.5 0.3 0.25 r2 17 60 0.6 0.65 r3 47 90 0.7 0.95 r4 87 10 0.8 0.1 0.35 r5 70 0.9 0.75 TID b1 b2 p4 p5 t1 47 55 0.5 0.1 t2 66 65 0.7 t3 27 15 0.8 t4 99 95 0.2

WHERE r.a1=t.b1 AND r.a2>t.b2 ORDER-BY p1+p2+p3+p4+p5 LIMIT 2 Example 1 (cont’d) SELECT * FROM R r, T t WHERE r.a1=t.b1 AND r.a2>t.b2 ORDER-BY p1+p2+p3+p4+p5 LIMIT 2 (where F = p1+p2+p3+p4+p5) TID a1 a2 b1 b2 F r3/t1 47 90 55 2.95 r1/t3 27 50 15 2.35 r5/t1 70

Rank-relational algebra There was no way to express such query in relational algebra. Extend relational algebra by adding rank as a first-class operation. Based on the observations of first-class constructs (eg. selection), two requirements are needed to support ranking: Splitting – Predicate-by-predicate rank evaluation. Interleaving – Swapping rank operator with other operators (i.e. ranking is not only applied after filtering).

Ranking Principle Def: Given a ranking function F and a set of evaluated predicates P={p1, p2, … , pn}, maximal-possible score of a tuple t is defined as: Ranking Principle: If FP[t1] > FP[t2], then t1 must be ranked before t2.

Rank-Relation Def: For monotonic scoring function F(p1, …, pn) and a subset P of {p1, …, pn}, a relation R augmented with ranking induced by P is called a rank-relation, denoted by RP. Implicit attribute of RP is the score of tuple t, that is FP[t]. Order relationship of RP : For all t1, t2 Є RP : t1 < RP t2 ↔ FP[t1] < FP[t2]

Operators of rank-relations Rank (or μ) operator “adds” a predicate p to set P. i.e. μp(RP) ≡ R P U{p}. Example 2: μp1(R{p2}) ≡ R{p1, p2}, where F=∑(p1, p2, p3). TID a1 a2 p1 p2 p3 F{p1, p2} r3 47 90 0.7 0.95 2.4 r2 17 60 0.6 0.5 0.65 2.1 r5 70 0.9 0.1 0.75 2.0 r4 87 10 0.8 0.35 1.9 r1 27 50 0.3 0.25 1.8

Extended operators

Example 3: Extended Join πa1,a2,b2(σc (R{p1, p2 p3} JOIN T{p4, p5})) SELECT r.a1, r.a2, t.b1 FROM R r, T t WHERE c ORDER-BY F LIMIT 2 (F = ∑ P and c = r.a1+r.a2 < t.b1) TID a1 a2 b2 F {p1, p2, p3, p4, p5} r2/t4 17 60 99 2.45 r4/t4 87 10 1.95 r1/t4 27 50 1.75

Extended operators (cont’d) Note: Cartesian product is defined similarly to join, but not discussed in the paper. Projection operator π has not changed. Computation is based on both Boolean and ranking logical properties. Perform Boolean operations and maintain the order induced by all given ranking predicates.

Equivalence relations In the extended rank-relational model, ranking is a first-class construct. Can derive algebraic equivalences from the definitions of operators (Proofs are omitted). Example 4: σc(RP) ≡ (σcR)P RP1 ∩ TP2 ≡ (R ∩ T)P1 U P2 Thus, we can interleave the rank operator with other operators (i.e. push μ down across operators).

Equivalence relations (cont’d)

Equivalence relations (cont’d) Note: Proposition 1 states that ranking can be done in stages (i.e. one predicate at the time). By Propositions 2, 3, and 4, the relations hold commutative and associative laws. By Propositions 4 and 5, μ can be swapped with other operators.

Incremental execution Blocking operators (eg. sort) lead to materialization of intermediate results. Goal: To avoid materialization and implement a pipelining execution strategy. We want to split rank computation into stages and to reduce the number of tuples considered in the upcoming stages. We can output (i.e. advance to the next stage) a tuple t, whenever t has a score which is greater or equal to the score of any future tuple t′′ .

Incremental execution (cont’d) Apply μp to RP and maintain priority queue ordered by P U{p}. Let X = set of tuples from preceding stage. Draw t′ from X. If FP U{p}[t] ≥ FP[t′] and FP[t′] ≥ FP[t′′] for any future t′′ drawn from x, then FP U{p}[t] ≥ FP U{p}[t′′] and t can be output (proceed to next stage).

Example 5: Top 2 of W idxScanp6(W) μp7 μp8 Given F = AVG(p6, p7, p8) 3 TID x p6 p7 p8 F w1 3 0.9 0.8 0.2 29/30 w2 7 0.7 0.1 14/15 w3 5 0.6 9/10 w4 1 0.5 0.4 5/6 TID x F w1 3 9/10 w2 7 5/6 w3 5 23/30 w4 1 19/30 TID x F w1 3 19/30 w4 1 3/5 w2 7 8/15 w3 5 7/15

Different evaluation plans There exist algorithms to implement rank-aware operators as well as incremental evaluation. Efficiency of query evaluation will now depend not only on the regular operators, but also on the rank-aware operators. Due to algebraic equivalence laws, we can define additional evaluation plans. Hence, we want a query optimizer to take additional execution plans into consideration.

Rank-aware optimizer Extended algebra  Extended search space. Impact on enumeration algorithm: Li et al. designed a 2-dimension enumeration algorithm: Dimension 1 = Join size, Dimension 2 = Ranking predicates. The algorithm is exponential in both dimensions. Heuristics applied to reduce search space. Impact on cost model: For ranking queries, it is more difficult to estimate the query cardinality of the intermediate results, whose accuracy is the core of the cost model. Authors proposed to estimate cardinality by randomly sampling tuples. Def: The term cardinality of a query refers to the number of rows in the query result whereas selectivity refers to the probability of a row being selected.

Critique Erroneous examples. No example of “tie-breaking” function. Bad explanation of incremental evaluation.

Future research directions Cardinality estimation: New/improved techniques for random sampling over joins. Dynamically determined/chosen k. Exploring physical properties of rank-aware execution plans.