Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Query Optimization Goal: Declarative SQL query
SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Relational Query Optimization Module 5, Lecture 2.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Query Processing (overview)
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database Management 9. course. Execution of queries.
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
Querying Structured Text in an XML Database By Xuemei Luo.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Nag Prajval B.C.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Query Processing CS 405G Introduction to Database Systems.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 6.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Execution Plans Detail From Zero to Hero İsmail Adar.
Database Management System
External Sorting Chapter 13
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Ripple Joins for Online Aggregation
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Evaluation of Relational Operations
Chapter 15 QUERY EXECUTION.
Introduction to Database Systems
File Processing : Query Processing
Rank Aggregation.
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
External Sorting Chapter 13
Selected Topics: External Sorting, Join Algorithms, …
Lecture 2- Query Processing (continued)
Implementation of Relational Operations
Relational Query Optimization
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
External Sorting Chapter 13
Query Specific Ranking
Probabilistic Ranking of Database Query Results
Presentation transcript:

Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( ) Deepak Anand ( )

AGENDA Introduction Motivation Requirements Overview of Ripple Join Rank Join Algorithm Example HRJN HRJN* Performance Conclusion References

Introduction What are top-k queries ? Ranking queries which order results on some computed score What are top-k join queries ? Typically these queries involve joins Usually, users are interested in the top-k join results.

Introduction.. Searches are performed using multiple features Every feature produces a different ranking for the query Join and aggregate the individual feature rankings for a global ranking. Answer to a Top-k join query is an ordered set of join results according to some provided function that combines the orders on each input

Example Find location for a house such that the combination of the cost of the house and 5 years tuition at a nearby school is minimal.

Motivation Existing join operators decouple join and sorting (ranking) of results. Sorting is expensive and is a blocking operation. Sort-merge joins (MGJN) only preserves order of joined column data Nested-loop joins (NLJN) only orders on the outer relations are preserved through the join Hash join (HSJN) doesn’t preserve order if hash tables do not fit in memory

Example Ranking Query SELECT A.1, B.2 FROM A, B, C WHERE A.1 = B.1 and B.2 = C.2 ORDER BY (0.3*A *B.2) STOP AFTER 5

Requirements Perform basic join operation Conform with current query operator interface Use the orderings of its inputs Produce top ranked join results ASAP Adapt quickly to input fluctuations

Overview of Ripple Join JOIN : L.A = R.A, L and R are descending, ordered by B We get a tuple from L and a tuple from R (L1(1,1,5) R1(1,3,5)) No valid join result L R

Ripple Join.. We get a second tuple from L and a second tuple from R and join with prior tuples, creating all possible combinations (L2,R2) {(2,2,4),(2,1,4)} L R JOIN : L.A = R.A, L and R are descending, ordered by B

Ripple Join.. (L2,R2) {(2,2,4),(2,1,4)} is an invalid join result! (L2,R1) {2,2,4), (1,3,5)} is an invalid join result! (L1,R2) {(1,1,5), (2,1,4)} is a valid join result! L R JOIN : L.A = R.A, L and R are descending, ordered by B

Variations of the Ripple Join Rectangular version – obtain tuples from one source at a higher rate than from the other source Block Ripple Join– obtain data, b tuples at a time, for classic ripple join b = 1 Hash Ripple Join –in memory, maintain hash tables of the samples obtained so far

Rank-Join Algorithm 1. Generate new valid join combinations 2. Compute score for each valid combination 3. For each incoming input, calculate the threshold value T 4. Store top k in priority queue Lk 5. Halt when lowest value of queue, scorek ≥ T

A Rank-Join Algorithm Example Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 Initial Input: (1). Get a valid join combination using some join strategy Ripple Select (L1, R1) => Not a valid join result Next input: (1). Get a valid join combination using some join strategy Ripple Select (L2, R2) (L2, R2), (L2, R1), (L1, R2) => (L1, R2) is a valid join result Join Condition Score k

Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (2). Compute the score (J) for the result J1(L1, R2) => L.B + R.B = = 9 (3). Compute a threshold score T = Max ( Last L.B + First R.B, First L.B + Last R.B ) For Ripple Selection (L2, R2) => T = Max ( L2.B + R1.B, L1.B + R2.B ) = Max (4+5, 5+4) = 9 Example Continued..

J1 = 9 T = 9 (4). J1 >= T, so report J1 in top-k results (i.e. add it to list L k ) Since we need top 2 (k=2), continue until k=2 and Min(J1, J2, …Jk) >= T Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

Example Continued.. Next input: 1). Get a valid join combination using some join strategy Ripple Select (L3, R3) (L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3) => (L3, R3), (L2, R3) are valid join results (2). Compute the scores (J) for the results J2(L2, R3) = = 7J3(L3, R3) = 3 + 3= 6 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

Example Continued.. (3). Calculate a NEW threshold T T = Max ( Last L.B + First R.B, First L.B + Last R.B ) = Max ( L3.B + R1.B, L1.B + R3.B ) = Max(3 + 5, 5 + 3) = 8 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

Example Continued.. T = 8 J1(L1,R2) = 9 reported J2( L2, R3) = 7 J3(L3, R3) = 6 Note, J’s are in descending order (4). Min (J) = 6 < T so continue Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

Example Continued.. Next input: (1). Get a valid join combination using some join strategy Ripple Select ( L4, R4) => (L4, R1), (L2, R4), (L3, R4) are valid join results (2). Compute the scores (J) for the results J(L4, R4) = 7, J(L2, R4) = 6, J(L3, R4) = 5 (3). Calculate a NEW threshold T T = Max( L4.B+R1.B, L1.B + R4.B ) = Max( 7, 7 ) = 7 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

Example Continued.. T= 7 J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7, J3(L3, R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5 (4). Min(J1, J2) = 7 >= T (k = 2), so report J2 and STOP Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

Rank-Join Continued … Join strategy is very crucial Recommended strategy = Ripple Join Alternates between tuples of the two relations Flexible in the way it sweeps out (rectangular, etc) Retains ordering in considering samples Variant of Rank-Join Hash Rank Join (HRJN) Block Ripple Join

Hash Rank Join (HRJN) Operator Built on idea of hash ripple join Inputs are stored in two hash tables Maintains highest (first) and lowest (last selected) objects from each relation Results are added to a priority queue Advantages: Smaller space requirement Can be pipelined

HRJN * Score-Guided Strategy Consider L = 100, 50, 25, 10…. R= 10, 9, 8, 5.. For 3 tuples from each input, T = max(108,35) = 108 T1= 108, T2 =35 For 4 tuples from R and 2 tuples from L, T = max(105,60) = 105 T1= 105, T2 =60 If T1 > T2, more inputs should be taken from R and reduce T1. Therefore value of Threshold T will reduce => faster reporting of ranked join results.

Performance

Conclusion Integrates well with query plans and supports top – k queries Produces results as fast as possible Minimizes space requirements Eliminates needs for sorting after join

References “Supporting top-k join queries in relational databases” - Ihab Ilyas, Walid Aref, Ahmed Elmagarmid (2004) Jing Chen : CSE6392 Spring 2005, CSE-UT Arlington Spring2005/DBIR/slides/top- k_join.ppt Spring2005/DBIR/slides/top- k_join.ppt Zubin Joseph : CSE6392 Spring 2006, CSE-UT Arlington ration/Zubin_Supporting_Top_k_join_Queries.ppt ration/Zubin_Supporting_Top_k_join_Queries.ppt

TIME TO ASK QUESTIONS 