Ripple Joins for Online Aggregation

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Introduction Simple Random Sampling Stratified Random Sampling
A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh.
© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.
Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Online Aggregation Joe Hellerstein UC Berkeley Online Aggregation: Motivation Select AVG(grade) from ENROLL; A “fancy” interface: + Query Results AVG.
Online Aggregation Liu Long Aggregation Operations related to aggregating data in DBMS –AVG –SUM –COUNT.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley CONTROL: Continuous.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
Database Management 9. course. Execution of queries.
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Nag Prajval B.C.
Histograms for Selectivity Estimation
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
Indexes and Views Unit 7.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
CS4432: Database Systems II Query Processing- Part 2.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
University of Texas at Arlington Presented By Srikanth Vadada Fall CSE rd Sep 2010 Dynamic Sample Selection for Approximate Query Processing.
CURE: An Efficient Clustering Algorithm for Large Databases Authors: Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Presentation by: Vuk Malbasa For CIS664.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Dense-Region Based Compact Data Cube
Chiu Luk CS257 Database Systems Principles Spring 2009
Practical Database Design and Tuning
Compression and Storage Optimization IDS xC4 Kevin Cherkauer
Tuning Transact-SQL Queries
BlinkDB.
15.1 – Introduction to physical-Query-plan operators
Wander Join: Online Aggregation via Random Walks
Query Processing Exercise Session 4.
BlinkDB.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Overview of Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
Spatial Online Sampling and Aggregation
Database Query Execution
Practical Database Design and Tuning
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Homogeneity of Variance
One-Pass Algorithms for Database Operations (15.2)
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Ripple Joins for Online Aggregation Peter J. Haas Joseph M. Hellerstein Joseph, Z.M. – CSE, UTA 2/16/2006

Ripple Joins: Introduction Follow up to Online Aggregation Extends Online Aggregation to a family of join algorithms Allows online aggregation to be used on multiple table queries

Ripple Joins: Introduction Targets queries of the form: SELECT op(expression) FROM R1, R2, … , RK WHERE predicate GROUP BY columns; Running estimates can be calculated based statistical properties of data already seen. User can control frequency of updates to this estimate.

Ripple Join vs. Online Nested Loop Problems with online nested loop If one table is large → Long time between updates Confidence Interval may not narrow down enough Ripple Join avoids complete relation scan.

Ripple Join: Operation Assume ripple join of relations R and S Select random tuple r from R Join with previously selected S tuples Do random select s from S Join with previous R tuples

Ripple Join: Square Two-Table Join S X N = 1

Ripple Join: Square Two-Table Join S X X X X N = 2

Ripple Join: Square Two-Table Join S X X X X X X N = 3

Ripple Join: Square Two-Table Join S X X X X X X X X N = 4

Ripple Join: Operation Thus is like nested loop join, but alternates between sampling and scanning from either relation. Can have various aspect ratios (non unitary) Select more samples from one table Leads to rectangular ripple Configurable by user

Enhanced Ripple Join Iterator: Rectangular Requires special handling by iterator to ensure that ripple grows correctly.

Pipelined Ripple Join Can easily be pipelined for multiple binary joins Cannot do three-table joins as two binary ripple joins. Authors recommend additional steps to handle building of such K-dimensional hyper rectangles.

Block Ripple Join Takes disk blocks of R and S in turn (not tuples) Read a disk block of R and scan against old S Evict from memory Read Block of S and compare with older R tuples. Exact same growth as normal, except thicker. Has I/O saving since each block is taken out at a time.

Further Variations of Ripple Joins Index Ripple Join Identical to indexed-enhanced nested loop join Hash Ripple Join Used only for Equijoin

Statistics As with online aggregation, ripple joins allow continuously updating running estimates Estimator unbiased, consistent Running average is biased but consistent Capable of giving tight confidence intervals Variance can also be calculated

Optimization and Design Can choose aspect ratios Animation Speed – Sweep out of rectangles Aim is to maximize the rate of updates Make confidence interval get as narrow as fast as possible

Conclusion Gives users visible progress of query as it zones in on average Useful UI enhancement Achieves reasonable answer in up to two orders of magnitude faster than normal offline techniques. Sublinear confidence interval guarantee Prototypes in Informix, IBM DB2

References Haas & Hollerstein, “Ripple Joins for Online Aggregation” (SIGMOD ’99) Haas & Hollerstein, “Online Query Processing: A Tutorial” Elmasri & Navathe, “Fundamentals of Database Systems”