Ripple Joins for Online Aggregation

Ripple Joins for Online Aggregation
Peter J. Haas Joseph M. Hellerstein Joseph, Z.M. – CSE, UTA 2/16/2006

Ripple Joins: Introduction
Follow up to Online Aggregation Extends Online Aggregation to a family of join algorithms Allows online aggregation to be used on multiple table queries

Ripple Joins: Introduction
Targets queries of the form: SELECT op(expression) FROM R1, R2, … , RK WHERE predicate GROUP BY columns; Running estimates can be calculated based statistical properties of data already seen. User can control frequency of updates to this estimate.

Ripple Join vs. Online Nested Loop
Problems with online nested loop If one table is large → Long time between updates Confidence Interval may not narrow down enough Ripple Join avoids complete relation scan.

Ripple Join: Operation
Assume ripple join of relations R and S Select random tuple r from R Join with previously selected S tuples Do random select s from S Join with previous R tuples

Ripple Join: Square Two-Table Join
S X N = 1

S X X X X N = 2

S X X X X X X N = 3

S X X X X X X X X N = 4

Ripple Join: Operation
Thus is like nested loop join, but alternates between sampling and scanning from either relation. Can have various aspect ratios (non unitary) Select more samples from one table Leads to rectangular ripple Configurable by user

Enhanced Ripple Join Iterator: Rectangular
Requires special handling by iterator to ensure that ripple grows correctly.

Pipelined Ripple Join Can easily be pipelined for multiple binary joins Cannot do three-table joins as two binary ripple joins. Authors recommend additional steps to handle building of such K-dimensional hyper rectangles.

Block Ripple Join Takes disk blocks of R and S in turn (not tuples)
Read a disk block of R and scan against old S Evict from memory Read Block of S and compare with older R tuples. Exact same growth as normal, except thicker. Has I/O saving since each block is taken out at a time.

Further Variations of Ripple Joins
Index Ripple Join Identical to indexed-enhanced nested loop join Hash Ripple Join Used only for Equijoin

Statistics As with online aggregation, ripple joins allow continuously updating running estimates Estimator unbiased, consistent Running average is biased but consistent Capable of giving tight confidence intervals Variance can also be calculated

Optimization and Design
Can choose aspect ratios Animation Speed – Sweep out of rectangles Aim is to maximize the rate of updates Make confidence interval get as narrow as fast as possible

Conclusion Gives users visible progress of query as it zones in on average Useful UI enhancement Achieves reasonable answer in up to two orders of magnitude faster than normal offline techniques. Sublinear confidence interval guarantee Prototypes in Informix, IBM DB2

References Haas & Hollerstein, “Ripple Joins for Online Aggregation” (SIGMOD ’99) Haas & Hollerstein, “Online Query Processing: A Tutorial” Elmasri & Navathe, “Fundamentals of Database Systems”

Ripple Joins for Online Aggregation

Similar presentations

Presentation on theme: "Ripple Joins for Online Aggregation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ripple Joins for Online Aggregation

Similar presentations

Presentation on theme: "Ripple Joins for Online Aggregation"— Presentation transcript:

Similar presentations

About project

Feedback