CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #28
Algorithms Implementing Relational Algebraic Operations
Algorithms Implementing Relational Algebraic Operations Projection and selection π, σ Set/bag operations US, ∩S, −S, UB, ∩B, −B Join operations Extended operations γ, δ, τ, table-scan × C ,
Algorithms Implementing Relational Algebraic Operations Projection and selection π, σ Set/bag operations US, ∩S, −S, UB, ∩B, −B Join operations Extended operations γ, δ, τ, table-scan × C , π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
Algorithms Implementing Relational Algebraic Operations Operations based on tuples: π, σ, UB, table-scan Operations based on entire relation: US, ∩S, −S, ∩B, −B, γ, δ, τ, Unary operations: π, σ, γ, δ, τ , table-scan Binary operations: US, ∩S, −S, UB, ∩B, −B, × C , × C , π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
Algorithms Implementing Relational Algebraic Operations Operations based on tuples: π, σ, UB, table-scan Operations based on entire relation: US, ∩S, −S, ∩B, −B, γ, δ, τ, Unary operations: π, σ, γ, δ, τ , table-scan Binary operations: US, ∩S, −S, UB, ∩B, −B, × C , × C , π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
Important Facts Data (relations) are in disks Disk IO’s are time-consuming Relations are too large to fit into main memory Different algorithms are needed when (assigned/available) main memory buffer size is different.
DBMS graduate database in tables (relations) lock table DDL language administrator DDL complier lock table DDL language file manager logging & recovery concurrency control transaction manager database programmer index/file manager buffer manager DML (query) language query execution engine DML complier main memory buffers secondary storage (disks) DBMS graduate database
Disks slow (read/write: 1~40 millisecond) large capacity (100’s gigabytes) non-volatile Main Memory fast (read/write: 10-100 nanosecond) small capacity (gigabytes) volatile Disks are about 105~106 times slower than main memory
Disk I/O Model of Computation Dominance of I/O cost: if a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time likely to be used to manipulate that data in main memory. The number of disk block reads/writes is a good approximation to the entire computation.
Disk I/O Model of Computation Dominance of I/O cost: if a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time likely to be used to manipulate that data in main memory. The number of disk block reads/writes is a good approximation to the entire computation. Assume: -- inputs are on disk (so must be read in) -- but output is not written back to disk (may not have to; hard to estimate output size, which also does not depend on the adopted algorithms)
Disk I/O Model of Computation Dominance of I/O cost: if a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time likely to be used to manipulate that data in main memory. The number of disk block reads/writes is a good approximation to the entire computation. Assume: -- inputs are on disk (so must be read in) -- but output is not written back to disk (may not have to; hard to estimate output size, which also does not depend on the adopted algorithms)
Parameters for algorithm complexity R: a relation B(R): # of blocks containing tuples of R T(R): # of tuples in R V(R, A): # of distinct values on attribute A of R M: # of useable main memory blocks π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
A Remark on Main Memory Size M × ∩ π σ G F E D C B A π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
A Remark on Main Memory Size M × ∩ π σ scan(G) scan(F) scan(E) scan(D) scan(C) scan(B) scan(A) index-scan J2P J1P CJ I1P × ∩ π σ G F E D C B A π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
Operations requiring (almost) no M π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
Operations requiring (almost) no M Tuple-based operations: π, σ, UB, table-scan π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
Operations requiring (almost) no M Tuple-based operations: π, σ, UB, table-scan General framework: Read in a block; Process; Send to the output main memory process disk π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
Operations requiring (almost) no M Tuple-based operations: π, σ, UB, table-scan General framework: Read in a block; Process; Send to the output Memory: M = 2 Cost: π (R), σ(R), table-scan(R): B(R) UB(R, S): B(R) + B(S) main memory process disk π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
σ A=c(R) can be done with cost B(R)/V(R, A) if R has an index on A. Operations requiring (almost) no M Tuple-based operations: π, σ, UB, table-scan General framework: Read in a block; Process; Send to the output Memory: M = 2 Cost: π (R), σ(R), table-scan(R): B(R) UB(R, S): B(R) + B(S) σ A=c(R) can be done with cost B(R)/V(R, A) if R has an index on A. main memory process disk π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
Operations requiring (almost) no M Tuple-based operations: π, σ, UB, table-scan General framework: Read in a block; Process; Send to the output Memory: M = 2 Cost: π (R), σ(R), table-scan(R): B(R) UB(R, S): B(R) + B(S) main memory process disk π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, 22 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large 23 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: Read in an entire relation R; Process R; Read in the other relation S block by block; Sent the results to an output block 24 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
If the operation is binary One-pass algorithms Condition: the main memory M is sufficiently large General framework: Read in an entire relation R; Process R; Read in the other relation S block by block; Sent the results to an output block If the operation is binary 25 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: Read in an entire relation R; Process R; Read in the other relation S block by block; Sent the results to an output block Unary operations γ(R), δ(R), τ(R) main memory R 26 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: Read in an entire relation R; Process R; Read in the other relation S block by block; Sent the results to an output block Unary operations γ(R), δ(R), τ(R) main memory R R 27 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: Read in an entire relation R; Process R; Read in the other relation S block by block; Sent the results to an output block Unary operations γ(R), δ(R), τ(R) main memory R R process 28 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: Read in an entire relation R; Process R; Read in the other relation S block by block; Sent the results to an output block Unary operations γ(R), δ(R), τ(R) Summary: Memory: M ≥ B(R) Cost: B(R) main memory R R process 29 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
Apply efficient main memory algorithms One-pass algorithms Condition: the main memory M is sufficiently large General framework: Read in an entire relation R; Process R; Read in the other relation S block by block; Sent the results to an output block Unary operations γ(R), δ(R), τ(R) Summary: Memory: M ≥ B(R) Cost: B(R) main memory R R process Apply efficient main memory algorithms (e.g., sort R) 30 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: Read in an entire relation R; Process R; Read in the other relation S block by block; Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 31 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rsmall Rlarge disk 32 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rsmall Rsmall Rlarge disk 33 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rsmall Rsmall Rlarge disk 34 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rsmall Rsmall Rlarge process disk 35 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
Build an efficient data structure for Rsmall (e.g., sort Rsmall) One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rsmall Rsmall Build an efficient data structure for Rsmall (e.g., sort Rsmall) Rlarge process disk 36 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rlarge US Rsmall 1. Sort Rsmall; 2. FOR each tuple t in Rlarge DO IF t is not in Rsmall THEN put t to the output; 3. Send Rsmall to the output. Rsmall Rsmall Rlarge process disk 37 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rlarge ∩S Rsmall 1. Sort Rsmall; 2. FOR each tuple t in Rlarge DO IF t is in Rsmall THEN put t to the output; Rsmall Rsmall Rlarge process disk 38 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rsmall Rsmall Rlarge process disk 39 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , −S is not commutative main memory Rsmall Rsmall Rlarge process disk 40 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , −S is not commutative main memory Rsmall Rlarge−S Rsmall 1. sort Rsmall; 2. FOR each tuple t in Rlarge DO IF t is not in Rsmall THEN put t to the output. Rsmall Rlarge process disk 41 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,
One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , −S is not commutative main memory Rsmall Rsmall −S Rlarge 1. sort Rsmall; 2. FOR each tuple t in Rlarge DO IF t is in Rsmall THEN remove t from Rsmall; 3. send Rsmall to the output Rsmall Rlarge process disk 42 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,