CPSC-608 Database Systems

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

Two-Pass Algorithms Based on Sorting
CS 4432query processing - lecture 161 CS4432: Database Systems II Lecture #16 Join Processing Algorithms Professor Elke A. Rundensteiner.
CS 540 Database Management Systems
Join Processing in Databases Systems with Large Main Memories
15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
15.3 Nested-Loop Joins By: Saloni Tamotia (215). Introduction to Nested-Loop Joins  Used for relations of any side.  Not necessary that relation fits.
Lecture 24: Query Execution Monday, November 20, 2000.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Lecture 24 Query Execution Monday, November 28, 2005.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CSCE Database Systems Chapter 15: Query Execution 1.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 5.
Two-Pass Algorithms Based on Sorting
CS 540 Database Management Systems
CS 440 Database Management Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
15.5 Two-Pass Algorithms Based on Hashing
Implementation of Relational Operations (Part 2)
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Query Execution Two-pass Algorithms based on Hashing
(Two-Pass Algorithms)
Performance Join Operator Select * from R, S where R.a = S.a;
Lecture 2- Query Processing (continued)
One-Pass Algorithms for Database Operations (15.2)
Lecture 23: Query Execution
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 22: Query Execution
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 22: Query Execution
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 11: B+ Trees and Query Execution
Lecture 22: Friday, November 22, 2002.
Lecture 20: Query Execution
Presentation transcript:

CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #30

Algorithms Implementing Relational Algebraic Operations Projection and selection π, σ Set/bag operations US, ∩S, −S, UB, ∩B, −B Join operations Extended operations γ, δ, τ, table-scan × C , π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Algorithms Implementing Relational Algebraic Operations Quick Review What We did Operations requiring almost no space: π, σ, UB, table-scan One-pass Algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, Nested-loop Algorithms For binary operations: US, ∩S, −S, × C , × C , Memory: M = 2 Cost: π (R), σ(R), table-scan(R): B(R) UB(R, S): B(R) + B(S) Unary: Memory: M ≥ B(R) Cost: B(R) Binary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass algorithms Basic Ideas: Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. 4 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; R 5 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) Summary: Memory: M ≥ √B(R) Cost: 3B(R) main memory Phase II R R 6 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R S 7 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R Phase I S 8 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R End of phase I S 9 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R Phase II S 10 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R US S REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE send the smaller to output, and delete it. main memory R Phase II S 11 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R ∩S S REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE delete the smaller. main memory R Phase II S 12 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

The same algorithm works for R ∩B S! Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R ∩S S REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE delete the smaller. main memory R Phase II S The same algorithm works for R ∩B S! 13 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R −S S REPEAT IF Rmin = Smin THEN delete both; ELSE IF Rmin > Smin THEN delete Smin; ELSE \\ Rmin < Smin send Rmin to output; and delete Rmin. main memory R Phase II S 14 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

The same algorithm works for R −B S! Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R −S S REPEAT IF Rmin = Smin THEN delete both; ELSE IF Rmin > Smin THEN delete Smin; ELSE \\ Rmin < Smin send Rmin to output; and delete Rmin. main memory R Phase II S The same algorithm works for R −B S! 15 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R × S main memory R Phase II S 16 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R × S Sorted sublists do not seem to help computing R×S: each tuple of R should be joined with all tuples of S main memory R Phase II S 17 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R × S R C S Sorted sublists do not seem to help computing R×S: each tuple of R should be joined with all tuples of S main memory The same is true for R C S R Phase II S 18 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R × S R C S main memory R Simply use the Nested-Loop algorithm Memory: M ≥ 2 cost: B(R)B(S)/M + B(R) Phase II S 19 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R S How about R S? main memory R Phase II S 20 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R S How about R S? main memory R In extreme cases (a large number of tuples are joinable), the sort-based algorithm does not work for . On the other hand, most practical and interesting cases are not that extreme Phase II S 21 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R S \\ sublists are sorted by the join \\ attributes. REPEAT IF Rmin = Smin THEN collect all Rmin-tuples and all Smin-tuples, and send their join to the output; delete all Rmin and Smintuples; ELSE delete the smaller; main memory R Phase II S 22 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Summary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) main memory R Phase II S 23 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Summary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) Comments: main memory R Phase II S 24 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Summary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) Comments: 1. Applicable unless the relations are extremely large, in that case we can extend the method to multiway pass; main memory R Phase II S 25 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Summary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) Comments: 1. Applicable unless the relations are extremely large, in that case we can extend the method to multiway pass; 2. The output is sorted. main memory R Phase II S 26 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass algorithms Basic Ideas: Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. Two-pass sort-based algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, Unary: Memory: M ≥ √ B(R) Cost: 3B(R) Binary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) 27 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass algorithms Basic Ideas: Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. 28 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. R 29 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I …… …… …… …… 30 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I 31 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I …… …… …… …… 32 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. R 33 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. One-pass Algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, × C , Unary: Memory: M ≥ B(R) Cost: B(R) Binary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) R 34 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) main memory R 35 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) main memory R Phase I 36 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) main memory …… Bucket 1 Bucket 2 R Phase I Bucket M …… …… 37 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) main memory Bucket 1 Bucket 2 …… R End of phase I …… …… Bucket M 38 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) main memory Bucket 2 …… Bucket 1 One bucket per time R …… …… Bucket M Phase II 39 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) τ(R) main memory Bucket 2 …… Bucket 1 One bucket per time R …… …… Bucket M Phase II 40 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Hash-based algorithm does not apply for sorting Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) τ(R) Hash-based algorithm does not apply for sorting main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time 41 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) δ(R) \\ all duplicates are in the \\ same bucket. Call one-pass algorithm on each bucket main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R 42 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

M must be large enough to hold an entire bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) δ(R) \\ all duplicates are in the \\ same bucket. Call one-pass algorithm on each bucket main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R M must be large enough to hold an entire bucket 43 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold an entire bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) Assumed condition: M is large enough to hold an entire bucket δ(R) \\ all duplicates are in the \\ same bucket. Call one-pass algorithm on each bucket main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R 44 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) Assumed condition: M is large enough to hold an entire bucket δ(R) \\ all duplicates are in the \\ same bucket. Call one-pass algorithm on each bucket main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R Also work for γ(R) if hash is based on the grouping attributes 45 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold an entire bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R) Assumed condition: M is large enough to hold an entire bucket Summary: Memory: M ≥ √B(R) Cost: 3B(R) main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R Assume a good hash function 46 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,