Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.

Slides:



Advertisements
Similar presentations
Two-Pass Algorithms Based on Sorting
Advertisements

1 Lecture 23: Query Execution Friday, March 4, 2005.
15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University.
Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
COMP 451/651 Optimizing Performance
Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Lecture 24: Query Execution Monday, November 20, 2000.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
1 Query Processing Two-Pass Algorithms Source: our textbook.
Parallel Algorithms for Relational Operations. Many processors...and disks There is a collection of processors. –Often the number of processors p is large,
One Pass Algorithm Presented By: Presented By: Farzana Forhad Farzana Forhad ID : 107.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Parallel Algorithms for Relational Operations Class ID: 21 Name: Shujia Zhang.
15.3 Nested-Loop Joins - Medha Pradhan - ID: CS 257 Section 2 - Spring 2008.
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSE 444: Lecture 24 Query Execution Monday, March 7, 2005.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CPS216: Advanced Database Systems Notes 06:Query Execution (Sort and Join operators) Shivnath Babu.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators) Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Chapter 6 Query Execution. Query Query Compilation (Chapter 7 ) query plan Query execution metadata ( Chapter 6 ) data the major parts Of the query processor.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Lecture 24 Query Execution Monday, November 28, 2005.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators) Shivnath Babu.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
Two-Pass Algorithms Based on Sorting
15.5 Two-Pass Algorithms Based on Hashing
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Query Execution Two-pass Algorithms based on Hashing
(Two-Pass Algorithms)
One-Pass Algorithms for Database Operations (15.2)
Lecture 24: Query Execution
Lecture 23: Query Execution
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Lecture 22: Query Execution
Lecture 22: Query Execution
CPSC-608 Database Systems
Lecture 11: B+ Trees and Query Execution
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 20: Query Execution
Presentation transcript:

Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna

Introduction Hashing is done if the data is too big to store in main memory buffers. – Hash all the tuples of the argument(s) using an appropriate hash key. – For all the common operations, there is a way to select the hash key so all the tuples that need to be considered together when we perform the operation have the same hash value. – This reduces the size of the operand(s) by a factor equal to the number of buckets.

Partitioning Relations by Hashing Algorithm: initialize M-1 buckets using M-1 empty buffers; FOR each block b of relation R DO BEGIN read block b into the Mth buffer; FOR each tuple t in b DO BEGIN IF the buffer for bucket h(t) has no room for t THEN BEGIN copy the buffer t o disk; initialize a new empty block in that buffer; END; copy t to the buffer for bucket h(t); END ; FOR each bucket DO IF the buffer for this bucket is not empty THEN write the buffer to disk;

Duplicate Elimination For the operation δ(R) hash R to M-1 Buckets. (Note that two copies of the same tuple t will hash to the same bucket) Do duplicate elimination on each bucket R i independently, using one-pass algorithm The result is the union of δ(R i ), where R i is the portion of R that hashes to the ith bucket

Requirements Number of disk I/O's: 3*B(R) – B(R) < M(M-1), only then the two-pass, hash-based algorithm will work In order for this to work, we need: – hash function h evenly distributes the tuples among the buckets – each bucket R i fits in main memory (to allow the one-pass algorithm) – i.e., B(R) ≤ M 2

Grouping and Aggregation Hash all the tuples of relation R to M-1 buckets, using a hash function that depends only on the grouping attributes (Note: all tuples in the same group end up in the same bucket) Use the one-pass algorithm to process each bucket independently Uses 3*B(R) disk I/O's, requires B(R) ≤ M 2

Union, Intersection, and Difference For binary operation we use the same hash function to hash tuples of both arguments. R U S we hash both R and S to M-1 R ∩ S we hash both R and S to 2(M-1) R-S we hash both R and S to 2(M-1) Requires 3(B(R)+B(S)) disk I/O’s. Two pass hash based algorithm requires min(B(R)+B(S))≤ M2

Hash-Join Algorithm Use same hash function for both relations; hash function should depend only on the join attributes Hash R to M-1 buckets R 1, R 2, …, R M-1 Hash S to M-1 buckets S 1, S 2, …, S M-1 Do one-pass join of R i and S i, for all i 3*(B(R) + B(S)) disk I/O's; min(B(R),B(S)) ≤ M 2

Sort based Vs Hash based For binary operations, hash-based only limits size to min of arguments, not sum Sort-based can produce output in sorted order, which can be helpful Hash-based depends on buckets being of equal size Sort-based algorithms can experience reduced rotational latency or seek time