SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
CS 540 Database Management Systems
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Implementation of Relational Operations Module 5, Lecture 1.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 R S C C R JOIN S?
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
CS4432: Database Systems II Query Processing- Part 3 1.
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 2.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Processing CS 405G Introduction to Database Systems.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 6.
Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 5.
CS4432: Database Systems II Query Processing- Part 1 1.
1 VLDB, Background What is important for the user.
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
CS 440 Database Management Systems
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Data Engineering Query Optimization (Cost-based optimization)
Evaluation of Relational Operations
File Processing : Query Processing
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Lecture 2- Query Processing (continued)
Implementation of Relational Operations
Query Specific Ranking
Presentation transcript:

SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju Presented By: Proposed By, Ihab F. Ilyas Walid G. Aref Ahmed K. Elmagarmid

Outline 2  Introduction  Existing join strategies  Contributions  Related Work  Introduction to New Rank join algorithm  Overview of Ripple Joins  New Rank join algorithm  Physical Rank Join Operators  HRJN  HRJN*  Performance Evaluation  Conclusion

Introduction  Need for support of ranking in Relational Databases.  Attributes in Relational databases spread across multiple relations, hence need for ranking on join queries.  User mostly interested in top few results.  Resultset should be ordered based on certain conditions (scoring functions). 3

Existing Join strategies  Sort-Merge join  Relations sorted on join columns.  Nested loop join  Tuples of outer relation are joined with tuples of the inner relation.  Hash join  2 phases: Build, Probe  Build hash table for smaller of the two relations.  Probe this hash table with hash value for each tuple in the other relation. 4

Top-k using existing join strategies  Given a query, how do we get the top-k results? SELECT A.1, B.2 FROM A, B, C WHERE A.1 = B.1 AND B.2 = C.2 ORDER BY ( 0.3 * A * B.2 ) STOP AFTER 5; Problems ? 1.Sorting is a blocking operation. 2.Sorting is expensive and has been done thrice. 5

Order limitations on existing joins  Sort-merge join  Sorting is done on joining columns, NOT on columns that participate in scoring function.  Nested-loop join  Orders of only the outer loop is maintained.  Hash join  Orders on both inputs are lost after the join, when hash tables do not fit in memory. Common characteristic in these joins: Decouple join from sort. 6

Contributions  Proposed a new rank join algorithm.  Implemented this algorithm in practical pipelined rank-join operators based on ripple join.  Proposed a scoring guide function that reduces the number of tuples to be evaluated to get the desired resutls. 7

Desired Result SELECT A.1, B.2 FROM A, B, C WHERE A.1 = B.1 AND B.2 = C.2 ORDER BY ( 0.3 * A * B.2 ) STOP AFTER 5; 8 Using existing join strategies DESIRED: Using rank join

Related Work  This problem is closely related to top-k selection queries.  Here, scoring function is applied on multiple attributes m of the same relation.  Related algorithms: Threshold Algorithm(TA), No-Random Access Algorithm(NRA), J*, A* 9

Introduction: New Rank Join Algorithm  Tuples are retrieved in order to preserve ranking.  Produces first ranked join results as quickly as possible.  Uses a monotonic ranking function.  Based on the idea of ripple join.  Integration with existing physical query engines.  Variations: HRJN, HRJN* 10

Overview of Ripple Joins  Previously unseen random tuple from one relation is joined with previously seen tuples from another relation.  Variations of Ripple Joins  Block  Hash 11

Rank Join Algorithm 12

10 Example 13 IdAB AB L R L.A = R.A Threshold (T): L_top L_bottom R_top R_bottom LI, RI not a valid join Right_threshold =f( R_top, L_bottom ) Left_threshold = f( L_top, R_bottom ) T = Max(Left_threshold, Right_threshold ) L1, R2 is a valid join L3, R3 | L2, R3 are valid joins [ (1,1,5) (2,1,4) ] = 9 [ (2,2,4) (3,2,3) ] = [ (3,2,3) (3,2,3) ] = 6 L4,R1 | L2,R4 | L3,R4 are valid joins [ (4,3,2) (1,3,5) ] = 7 [ (2,2,4) (4,2,2) ] = 6 [ (3,2,3) (4,2,2) ] = 5 Scoring Function: L.B+ R.B K = 2 K = 0K = 1K = 2

Hash Rank Join Operator (HRJN)  Variant of Symmetrical hash join algorithm.  Data Structures  Hash table for each input.  Priority Queue - holds valid join combinations along with their scores.  Methods implemented  Open: initializes its operator and prepares its internal state.  Get Next: returns next ranked join result upon each call.  Close: terminates the operator and performs the necessary clean up. 14

Open(L, R, C, f) 15 L = Left Input R = Right Input C = Join condition f = Monotonic scoring function L = Left Input R = Right Input C = Join condition f = Monotonic scoring function

GetNext() 16 Output: Next ranked join result

Local Ranking Problem 17  Unbalance retrieval rate of left and right inputs.  Use concept of Block Ripple Join. Solving

Example 2 18 IdAB AB L.A = R.A L_top L_bottom R_top R_bottom Threshold (T): Right_threshold =f( R_top, L_bottom ) Left_threshold = f( L_top, R_bottom ) T = Max(Left_threshold, Right_threshold ) 10 Scoring Function: L.B+ R.B K = 2 Scoring Function: L.B+ R.B K = No valid joins L4, R1 is a valid join [ (4,3,2) (1,3,5) ] = 7 L R

HRJN*: Score-Guided Join Strategy 19 Retrieve tuple from input T1 = f( L_top, R_bottom) T2 = f( R_top, L_bottom) T1 = f( L_top, R_bottom) T2 = f( R_top, L_bottom) If T1 > T2 If T1 > T2 Input = R Input = L YesNo

Exploiting available indexes 20  Generalize Rank-join to use random access if available.  Two cases:  An index on join attribute(s) of one input.  An index on join attribute(s) for each input.  Problem : Duplicates can be produced as indexes will contain all data seen and not yet seen.

Exploiting Indexes: On-the-fly duplicate elimination 21 IdAB IdAB L R Scoring Function: L.B+ R.B Index available on R [ (1,1,100) (2,1,9) ] = 109 [ (2,2,5) (3,2,8) ] = 58 [ (2,2,50) (4,2,5) ] = 55 Any join result, not yet produced, cannot have a combined score greater than f( L_bottom, R_bottom) f( L_bottom, R_bottom) = 59

Exploiting Indexes: Faster Termination 22 Previously, T = ( 109, 60 ) = 109 After reducing L_top, T = ( 59, 60 ) = 60 IdAB IdAB L R L.A = R.A Scoring Function: L.B+ R.B Index available on R L_top = L_bottom Reduce L_top to L_bottom, i.e

Performance Evaluation Top-k join operators 23 M = 4 Selectivity = 0.2%

Effect of selectivity 24 M = 4 K = 50

Effect of pipelining 25 Selectivity = 0.2% K = 50

Conclusion 26  Supported top-k join queries using the new rank join algorithm.  Algorithm uses ranking on the input relations to produce ranked join results on a combined score.  The ranking is performed progressively during the join operation.  HRJN, HRJN* operators implement the new algorithm.  Generalization of this algorithm utilized available indexes for faster termination.

27

References 28  “Supporting Top-k Join Queries in Relational Databases.”, Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid, March 2004  Jing Chen: DIBR Spring 2005, CSE - UT Arlington

THANKYOU