1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.

Slides:

Advertisements

Similar presentations

Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:

Advertisements

Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.

 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.

Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.

SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

1 Relational Query Optimization Module 5, Lecture 2.

Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.

1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different.

Query Processing (overview)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.

1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,

Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.

Depth Estimation for Ranking Query Optimization Karl Schnaitter, UC Santa Cruz Joshua Spiegel, BEA Systems, Inc. Neoklis Polyzotis, UC Santa Cruz.

1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

Query Evaluation and Optimization Main Steps 1.Translate into RA: select/project/join 2.Greedy optimization of RA: by pushing selection and projection.

Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.

Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.

Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.

Database Management 9. course. Execution of queries.

1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.

Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.

Copyright © Curt Hill Query Evaluation Translating a query into action.

Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,

Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.

All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)

Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.

CS4432: Database Systems II Query Processing- Part 3 1.

Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.

Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.

Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.

CS4432: Database Systems II Query Processing- Part 2.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.

CSCE Database Systems Chapter 15: Query Execution 1.

Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.

Query Processing CS 405G Introduction to Database Systems.

CS 440 Database Management Systems Lecture 5: Query Processing 1.

03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.

Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.

Introduction to Database Systems1 External Sorting Query Processing: Topic 0.

Query Processing – Implementing Set Operations and Joins Chap. 19.

Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.

CS 540 Database Management Systems

By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.

Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.

CS4432: Database Systems II Query Processing- Part 1 1.

1 VLDB, Background What is important for the user.

1 Chengkai Li Kevin-Chen-Chuan Chang Ihab Ilyas Sumin Song Presented by: Mariam John CSE /20/2006 RankSQL: Query Algebra and Optimization for Relational.

RankSQL: Query Algebra and Optimization for Relational Top-k Queries

CS 440 Database Management Systems

Database Management System

External Sorting Chapter 13

Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.

Introduction to Query Optimization

Overview of Query Optimization

Chapter 15 QUERY EXECUTION.

Introduction to Database Systems

Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016

Laks V.S. Lakshmanan Depf. of CS UBC

External Sorting Chapter 13

Selected Topics: External Sorting, Join Algorithms, …

One-Pass Algorithms for Database Operations (15.2)

Implementation of Relational Operations

External Sorting Chapter 13

Query Specific Ranking

Presentation transcript:

1University of Texas at Arlington

 Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank Join Algorithm ◦ Example with different strategies. ◦ HRJN ◦ HRJN* Generalizing the rank join.  Performance  Observations  References 2University of Texas at Arlington

 Ranking queries(top-k queries). Aim to provide only the top-k results that are ordered on some computed score  Top-k join queries.  Queries which involve joins. University of Texas at Arlington3

 video database system (VDBMS)VDBMS University of Texas at Arlington4 Ranking wrt edge orientation Ranking wrt color histogram Ranking wrt texture SIMILARITY MATCHING

University of Texas at Arlington5 C1: r.cuisine=“italian” C2:h.price+r.price<100 C3: r.area=m.area Hotel(h) person Restaurant (r) Museum (m)

 Predicates: p1: cheap (h:price) p2: close (h:addr; r:addr) p3: related (m:collection, \dinosaur")  Scoring function : p1+p2+p3  Query: University of Texas at Arlington6 SELECT FROM Hotel h, Restaurant r, Museum m WHERE c1 AND c2 AND c3 ORDER BY p1 + p2 + p3 LIMIT k prohibitively expensive

 MGJN : In Sort-merge joins only the order on the join column can be preserved.  NLJN: In Nested-loop joins only orders on the outer relations are preserved through the join.  HSJN : In Hash join orders from both inputs are usually destroyed after the join, when hash tables do not fit in memory. University of Texas at Arlington7

SELECT A.1, B.2 FROM A, B, C WHERE A.1 = B.1 and B.2 = C.2 ORDER BY (0.3*A *B.2) STOP AFTER 5 University of Texas at Arlington8

9 What are the problems in this??

We need the new rank-join operator to :  Perform basic join operation.  Conform with current query operator interface.  Make use of the individual orders of its inputs.  Produce the first ranked join results.  Adapt to input fluctuations. University of Texas at Arlington10

 Proposes a new Rank-Join algorithm.  Analyzes the I/O cost of the algorithm along with proof of its optimality.  Implements the proposed algorithm in pipelined rank-join operators (based on ripple join).  Proposes an optimal join strategy  Provides optimization mechanism to determine best order to perform the rank-join operations.  Evaluates performance and compares other approaches University of Texas at Arlington11

University of Texas at Arlington12

 Fagin et al. introduced the 1 st set of algorithms to answer ranking queries  The TA Algorithm  The NRA Algorithm  The J* Algorithm  The NRA-RJ Algorithm  Importance-based join processing University of Texas at Arlington13

University of Texas at Arlington14 JOIN : L.A = R.A L and R are descending, ordered by B We get a tuple from L and a tuple from R (L1(1,1,5) R1(1,3,5)) No valid join result L R

University of Texas at Arlington 15 JOIN: L.A = R.A We get a second tuple from L and a second tuple from R and join with prior tuples, creating all possible combinations (L2,R2) {(2,2,4),(2,1,4)} L R

University of Texas at Arlington 16 JOIN: L.A = R.A (L2,R2) {(2,2,4),(2,1,4)} (L2,R1) {2,2,4), (1,3,5)} (L1,R2) {(1,1,5), (2,1,4)} is a valid join result! L R

University of Texas at Arlington17 VARIATIONS: Rectangle – obtain tuples from one source at a higher rate than from the other source Block – obtain data b tuples at a time, for classic ripple join b = 1 Hash –in memory, maintain hash tables of the samples obtained so far Faster IO Degrades to block ripple join when hash tables exceed memory size

1. Generate new valid join combinations 2. Compute score for each valid combination 3. For each incoming input, calculate the threshold value T 4. Store top k in priority queue L k 5. Halt when lowest value of queue, score k ≥ T University of Texas at Arlington18

University of Texas at Arlington19 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 Initial Input: (1). Get a valid join combination using some join strategy Ripple Select (L1, R1) => Not a valid join result Next input: (1). Get a valid join combination using some join strategy Ripple Select (L2, R2) (L2, R2), (L2, R1), (L1, R2) => (L1, R2) is a valid join result

University of Texas at Arlington20 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 ( 2). Compute the score (J) for the result J1(L1, R2) => L.B + R.B = = 9 (3). Compute a threshold score T = Max ( Last L.B + First R.B, First L.B + Last R.B ) For Ripple Selection (L2, R2) => T = Max ( L2.B + R1.B, L1.B + R2.B ) = Max (4+5, 5+4) = 9 J1 = 9 T = 9

University of Texas at Arlington21 (4). J1 >= T, so report J1 in top-k results (i.e. add it to list Lk ) Since we need top 2 (k=2), continue until k=2 and Min(J1, J2, …Jk) >= T Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

University of Texas at Arlington22 Next input: 1). Get a valid join combination using some join strategy Ripple Select (L3, R3) (L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3) => (L3, R3), (L2, R3) are valid join results (2). Compute the scores (J) for the results J2(L2, R3) = = 7J3(L3, R3) = 3 + 3= 6 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

University of Texas at Arlington23 (3). Calculate a NEW threshold T T = Max ( Last L.B + First R.B, First L.B + Last R.B ) = Max ( L3.B + R1.B, L1.B + R3.B )= Max(3 + 5, 5 + 3) =8 T = 8 J1(L1,R2) = 9 reported J2( L2, R3) = 7 J3(L3, R3) = 6 Note, J’s are in descending order (4). Min (J) = 6 < T so continue Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

University of Texas at Arlington24 Next input: (1). Get a valid join combination using some join strategy Ripple Select ( L4, R4) => (L4, R1), (L2, R4), (L3, R4) are valid join results (2). Compute the scores (J) for the results J(L4, R4) = 7, J(L2, R4) = 6, J(L3, R4) = 5 (3). Calculate a NEW threshold T T = Max( L4.B+R1.B, L1.B + R4.B ) = Max( 7, 7 ) = 7 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

University of Texas at Arlington25 T= 7 J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7, J3(L3, R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5 (4). Min(J1, J2) = 7 >= T (k = 2), so report J2 and STOP Comment: When reach all records, T does not need to be calculated, unless, calculate T first, and compare each J(i) with T immediately. Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2

University of Texas at Arlington26 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 Instead of Select Ripple, Select Rectangle Ripple… Obtain all tuples in L and only 1st tuple in R Initial Input (1). Get a valid join combination using some join strategy Rectangle Ripple (L1 to L4, R1) => (L4, R1) is a valid join result (2). Compute the score (J) for the result J1(L4, R1) = = 7 (3). Calculate a NEW threshold T T = Max( L4.B+R1.B, L1.B + R1.B ) = Max( 7, 10 ) = 10

University of Texas at Arlington27 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (4). Min(J1, J2) = 7 < T = 10 (k = 2), so just continue Next Input (1). Get a valid join combination using some join strategy Rectangle Ripple (L1 to L4, R2) => (L1, R2) is a new valid join result (2). Compute the score (J) for the result J2(L1, R2) = 9 (3). Calculate a NEW threshold T T = Max( L1.B+R1.B, L1.B + R2.B ) = Max( 10, 9 ) = 10

University of Texas at Arlington28 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (4). Min(J1, J2) = 9 < T = 10 (k = 2), so continue J2 cannot be reported because of threshold = 10, but this was the top- ranked join result in prior strategy… This suggests using join strategies that reduce the threshold value as quickly as possible to be able to report top-ranked join results early on.

 Built on idea of hash ripple join.  Inputs are stored in two hash tables  A queue holds ordered join results  Ltop, Rtop, Lbottom, Rbottom are used to calculate T  Advantages: ◦ Smaller space requirement ◦ Can be pipelined University of Texas at Arlington29

University of Texas at Arlington30 1) Buffer problem 2) Local Ranking Problem Problems in Implementation:

University of Texas at Arlington31 Use Block Ripple Join to Solve Local Ranking Problem – - Set p = 2 Solution:

Score-Guided Strategy  Consider L = 100, 50, 25, 10…. R= 10, 9, 8, 5..  For 3 tuples from each input,  T = max(108,35) = 108 T1= 108, T2 =35  For 4 tuples from R and 2 tuples from L,  T = max(105,60) = 105 T1= 105, T2 =60 If T1 > T2, more inputs should be taken from R and reduce T1. Therefore value of Threshold T will reduce => faster reporting of ranked join results. University of Texas at Arlington32

 Using indexes 1)an index on only one of the two inputs 2)an index on each of the two inputs.  Eliminate duplications  Faster termination University of Texas at Arlington33

University of Texas at Arlington34

University of Texas at Arlington35

University of Texas at Arlington36

University of Texas at Arlington37 ◦ Integrates well with query plans and supports top – k queries ◦ Produces results as fast as possible ◦ Minimizes space requirements ◦ Eliminates needs for sorting after join

 New join-rank algorithm  Hash Rank Join  HRJN*  General rank-join algorithm  Experiments Result University of Texas at Arlington38

University of Texas at Arlington39  “Supporting top-k join queries in relational databases” - Ihab Ilyas, Walid Aref, Ahmed Elmagarmid   ng%20top- k%20queries%20in%20Relational%20databases.pptxhttp://crystal.uta.edu/~cse6339/Slides09/Final%20Supporti ng%20top- k%20queries%20in%20Relational%20databases.pptx

University of Texas at Arlington40