CPSC-608 Database Systems

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
CS 540 Database Management Systems
1 Lecture 23: Query Execution Friday, March 4, 2005.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Lecture 24: Query Execution Monday, November 20, 2000.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Lecture 24 Query Execution Monday, November 28, 2005.
CS4432: Database Systems II Query Processing- Part 2.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CSCE Database Systems Chapter 15: Query Execution 1.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Lecture 23: Query Execution Monday, November 26, 2001.
15.1 – Introduction to physical-Query-plan operators
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Prepared by : Ankit Patel (226)
CPSC-608 Database Systems
CPSC-608 Database Systems
Evaluation of Relational Operations
CPSC-608 Database Systems
Chapter 15 QUERY EXECUTION.
15.5 Two-Pass Algorithms Based on Hashing
Database Systems Ch Michael Symonds
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Selected Topics: External Sorting, Join Algorithms, …
(Two-Pass Algorithms)
Lecture 2- Query Processing (continued)
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 12 Query Processing (1)
Lecture 24: Query Execution
CPSC-608 Database Systems
Lecture 23: Query Execution
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 22: Query Execution
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 22: Query Execution
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 11: B+ Trees and Query Execution
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 20: Query Execution
Presentation transcript:

CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #31

Algorithms Implementing Relational Algebraic Operations Projection and selection π, σ Set/bag operations US, ∩S, −S, UB, ∩B, −B Join operations Extended operations γ, δ, τ, table-scan × C , π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass algorithms Basic Ideas: Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. 3 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I …… …… …… …… 4 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. One-pass Algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, × C , Unary: Memory: M ≥ B(R) Cost: B(R) Binary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) R 5 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold an entire bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R) Assumed condition: M is large enough to hold an entire bucket Summary: Memory: M ≥ √B(R) Cost: 3B(R) main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R Assume a good hash function 6 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R S 7 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R Phase I S 8 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 1 Bucket 2 …… Bucket M R Phase I S 9 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 1 Bucket 2 …… Bucket M R Phase I Bucket 1 Bucket 2 S …… …… …… Bucket M 10 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 1 Bucket 2 …… Bucket M R End of phase I Bucket 1 Bucket 2 …… S …… …… Bucket M 11 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 1 Bucket 2 …… Bucket M R Phase II Bucket 1 Bucket 2 …… S …… …… Bucket M 12 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 1 Bucket 2 …… Bucket M R Again we can apply the one-pass algorithm if the smaller bucket can fit M Phase II Bucket 1 Bucket 2 …… S …… …… Bucket M 13 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 2 …… Bucket 1 R Again we can apply the one-pass algorithm if the smaller bucket can fit M …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 14 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Assumed condition: M is large enough to hold the smaller bucket main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 15 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Assumed condition: M is large enough to hold the smaller bucket main memory Bucket 2 …… Bucket 1 Hash-based algorithm does not work for × and C R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 16 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 17 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket R US S \\ the same work for \\ ∩S, −S, ∩B, −B FOR each bucket index i DO call the one-pass algorithm on the Ri-bucket and Si-bucket main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 18 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket R S \\ hash based on join attributes FOR each bucket index i DO call the one-pass algorithm on the Ri-bucket and Si-bucket main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 19 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket Summary: Memory: M ≥ √ B(Rsmall) Cost: 3(B(R) + B(S)) main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 20 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket Summary: Memory: M ≥ √ B(Rsmall) Cost: 3(B(R) + B(S)) Comments: Memory use is better than sort-based; The output is not sorted; Requires a good hash function. main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 21 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s main memory R S 22 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; main memory R Phase I S Some free space 23 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); main memory R Phase I S Use it to hold some buckets 24 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket M = 5, keep k = 2 buckets in M, only write D = M − k = 3 buckets back to disk A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); main memory R Phase I S Bucket 1 S-buckets not written back to disk …… Bucket D 25 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket M = 5, keep k = 2 buckets in M, only write D = M − k = 3 buckets back to disk A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); main memory R Phase I Bucket 1 …… Bucket D S S-buckets not written back to disk 26 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket M = 5, keep k = 2 buckets in M, only write D = M − k = 3 buckets back to disk A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S Directly operate with S-tuples here. 27 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket M = 5, keep k = 2 buckets in M, only write D = M − k = 3 buckets back to disk A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R End of phase I Bucket 1 …… Bucket D S Only D pairs of buckets left. 28 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket M = 5, keep k = 2 buckets in M, only write D = M − k = 3 buckets back to disk A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S So the tuples in these k = 2 buckets save two disk I/O’s 29 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many (k) buckets should be left in M? Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S So the tuples in these k = 2 buckets save two disk I/O’s 30 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many (k) buckets should be left in M? 1. Does not matter. More critical is the amount of M space used for holding these buckets; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S So the tuples in these k buckets save two disk I/O’s 31 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many (k) buckets should be left in M? 1. Does not matter. More critical is the amount of M space used for holding these buckets; 2. larger k → smaller bucket → more buckets → more bucket blocks in M → less M space for holding buckets Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S So the tuples in these k buckets save two disk I/O’s 32 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many (k) buckets should be left in M? 1. Does not matter. More critical is the amount of M space used for holding these buckets; 2. larger k → smaller bucket → more buckets → more bucket blocks in M → less M space for holding buckets 3. So k should be as small as possible: k = 1 Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S So the tuples in these k buckets save two disk I/O’s 33 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many (k) buckets should be left in M? 1. Does not matter. More critical is the amount of M space used for holding these buckets; 2. larger k → smaller bucket → more buckets → more bucket blocks in M → less M space for holding buckets 3. So k should be as small as possible: k = 1 Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S The tuples in this bucket save two disk I/O’s 34 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S The tuples in this bucket save two disk I/O’s 35 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 36 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 37 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 38 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s 4. R saves 2B(R)/(M/2) = B(R)·(M/B(S)) disk I/O’s Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 39 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s 4. R saves 2B(R)/(M/2) = B(R)·(M/B(S)) disk I/O’s 5. Original cost without these savings: 3(B(R) + B(S)) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 40 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s 4. R saves 2B(R)/(M/2) = B(R)·(M/B(S)) disk I/O’s 5. Original cost without these savings: 3(B(R) + B(S)) 6. So now the cost is (3 – M/B(S))(B(R) + B(S)) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 41 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s 4. R saves 2B(R)/(M/2) = B(R)·(M/B(S)) disk I/O’s 5. Original cost without these savings: 3(B(R) + B(S)) 6. So now the cost is (3 – M/B(S))(B(R) + B(S)) (with a memory requirement M ≥ 2√ B(S)) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 42 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s 4. R saves 2B(R)/(M/2) = B(R)·(M/B(S)) disk I/O’s 5. Original cost without these savings: 3(B(R) + B(S)) 6. So now the cost is (3 – M/B(S))(B(R) + B(S)) (with a memory requirement M ≥ 2√ B(S)) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 43 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 44 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: 1. Let bucket size = c·M; \\ 0 < c < 1 #buckets = (1−c)M + 1 ≈ (1−c)M; \\B(S) = c(1−c)M2 \\ c is the larger root of c2 − c + B(S)/M2 Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 45 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: 1. Let bucket size = c·M; \\ 0 < c < 1 #buckets = (1−c)M + 1 ≈ (1−c)M; \\B(S) = c(1−c)M2 \\ c is the larger root of c2 − c + B(S)/M2 2. S saves 2c·M = 2c·B(S)·(M/B(S)) disk I/O’s Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 46 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: 1. Let bucket size = c·M; \\ 0 < c < 1 #buckets = (1−c)M + 1 ≈ (1−c)M; \\B(S) = c(1−c)M2 \\ c is the larger root of c2 − c + B(S)/M2 2. S saves 2c·M = 2c·B(S)·(M/B(S)) disk I/O’s 3. R saves 2B(R)/(1-c)M = 2c·B(R)·(M/B(S)) disk I/O’s Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 47 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: 1. Let bucket size = c·M; \\ 0 < c < 1 #buckets = (1−c)M + 1 ≈ (1−c)M; \\B(S) = c(1−c)M2 \\ c is the larger root of c2 − c + B(S)/M2 2. S saves 2c·M = 2c·B(S)·(M/B(S)) disk I/O’s 3. R saves 2B(R)/(1-c)M = 2c·B(R)·(M/B(S)) disk I/O’s 4. Original cost without these savings: 3(B(R) + B(S)) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 48 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: 1. Let bucket size = c·M; \\ 0 < c < 1 #buckets = (1−c)M + 1 ≈ (1−c)M; \\B(S) = c(1−c)M2 \\ c is the larger root of c2 − c + B(S)/M2 2. S saves 2c·M = 2c·B(S)·(M/B(S)) disk I/O’s 3. R saves 2B(R)/(1-c)M = 2c·B(R)·(M/B(S)) disk I/O’s 4. Original cost without these savings: 3(B(R) + B(S)) 5. So now the cost is (3 – 2c·M/B(S))(B(R) + B(S)) \\ with a memory requirement M ≥ √ B(S)/c(1−c) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 49 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Algorithms Implementing Relational Algebraic Operations Summary Memory: M = 2 Cost: π (R), σ(R), table-scan(R): B(R) UB(R, S): B(R) + B(S) Operations requiring almost no space: π, σ, UB, table-scan One-pass Algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, Unary: Memory: M ≥ B(R) Cost: B(R) Binary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) × C , Nested-loop Algorithms For binary operations: US, ∩S, −S, × C , Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) Two-pass sort-based algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, Unary: Memory: M ≥ √ B(R) Cost: 3B(R) Binary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) Two-pass hash-based algorithms: γ, δ, US, ∩S, −S, ∩B, −B, Unary: Memory: M ≥ √ B(R) Cost: 3B(R) Binary: Memory: M ≥ √ B(Rsmall) Cost: 3(B(Rsmall) + B(Rlarge)) Two-pass hybrid hash-based: γ, δ, US, ∩S, −S, ∩B, −B, Unary: Memory: M ≥ 2√ B(R) Cost: 3B(R) − M Binary: Memory: M ≥ √ B(Rsmall)/c(c-1) Cost: (3 – 2c·M/B(Rsmall)) · (B(Rsmall) + B(Rlarge)) (c is the larger root of c2−c+B(S)/M2) π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

parse tree-lqp convertor Query Optimization An input database program P SELECT c FROM S(a,b), T(b,c) WHERE S.b = T.b AND a>4; Prepare a collection C of efficient algorithms for operations in relational algebra; parser <statement> <select-statement> select <select-list> from <tbl-list> where <search-condition> <select-sublist> <column-name> <tbl-name> , S(a,b) <b-term> <b-facor> and T(b,c) <b-primary> <comp-pred> <exp> <co-op> = <term> <factor> S.b T.b > a 4 <integer> parse tree View processing, Semantic checking preprocessing parse tree parse tree-lqp convertor logic query plan push selections, group joins apply logic laws logic query plan S(a,b) T(b,c) c × π σ S.b=T.b AND a>4 reduce the size of intermediate results Optimization via logic and size choices of algorithms, data structures, and computational modes logic query plan ScanTable(S(a,b)) ScanTable(T(b,c)) Alg-CrossProd Select(S.b=T.b & a>4) Project(c) output Lqp-pqp convertor take care of issues in optimization and security. physical query plan Optimization via algorithms and cost Machine executable code

parse tree-lqp convertor Query Optimization An input database program P SELECT c FROM S(a,b), T(b,c) WHERE S.b = T.b AND a>4; Prepare a collection C of efficient algorithms for operations in relational algebra; parser <statement> <select-statement> select <select-list> from <tbl-list> where <search-condition> <select-sublist> <column-name> <tbl-name> , S(a,b) <b-term> <b-facor> and T(b,c) <b-primary> <comp-pred> <exp> <co-op> = <term> <factor> S.b T.b > a 4 <integer> parse tree View processing, Semantic checking preprocessing parse tree parse tree-lqp convertor logic query plan push selections, group joins apply logic laws logic query plan S(a,b) T(b,c) c × π σ S.b=T.b AND a>4 reduce the size of intermediate results Optimization via logic and size choices of algorithms, data structures, and computational modes logic query plan ScanTable(S(a,b)) ScanTable(T(b,c)) Alg-CrossProd Select(S.b=T.b & a>4) Project(c) output Lqp-pqp convertor take care of issues in optimization and security. physical query plan Optimization via algorithms and cost Machine executable code