CS4432: Database Systems II Query Processing- Part 1 1
2 Example Data: relation R (A, B, C) relation S (C, D, E) Query: SELECT B, D FROM R, S WHERE R.A = “ c ” and S.E = 2 and R.C=S.C Select B, D From R, S Where R.A = “ c ” And S.E = 2 And R.C=S.C
3 Relational Algebra – Possible Query Plans OR: B,D [ R.A= “ c ” S.E=2 R.C = S.C (RXS)] Plan 1 Select B, D From R, S Where R.A = “ c ” And S.E = 2 And R.C=S.C
4 Plan 2 Relational Algebra – Possible Query Plans Select B, D From R, S Where R.A = “ c ” And S.E = 2 And R.C=S.C Natural join (C is common column)
5 Select B, D From R, S Where R.A = “ c ” And S.E = 2 And R.C=S.C Plan 2
6 Plan 3 Relational Algebra – Possible Query Plans Select B, D From R, S Where R.A = “ c ” And S.E = 2 And R.C=S.C Natural join (1) Use R.A index to select R tuples with R.A = “ c ” (2) For each R.C value found, use S.C index to find matching tuples (3) Eliminate S tuples S.E 2 (4) Join matching R,S tuples, project B,D attributes and place in result Assume indexes on R.A and S.C
7 =“c”=“c” Check E=2? output: Select B, D From R, S Where R.A = “ c ” And S.E = 2 And R.C=S.C Plan 3
8 Query Compilation, Optimization and Execution
Overview of Query Execution 9 SQL Query Compile Optimize Execute
Example 10 Query : Find the movies with stars born in 1960 SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘ %1960 ’ ); SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘ %1960 ’ );
Step 1: Generate Parse Tree 11
Step 2: Relational Algebra & Logical Plan 12 SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘ %1960 ’ ); SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘ %1960 ’ ); Expression TreeLogical Query Plan Expression Tree is a midway between a parse tree and relational algebra
Step 3: Optimize & Create Several Logical Plans 13 SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘ %1960 ’ ); SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘ %1960 ’ ); Plan 1 Plan 2 Question: Push project to StarsIn?
Overview of Query Execution 14 SQL Query Compile Optimize Execute
Step 4: Estimate the Sizes That is done for each plan 15
Step 5: Consider Physical Plans Physical plan means how each operator will execute (which algorithm) – E.g., Join can be nested-loop, hash-based, merge-based, or sort-based Each logical plan will map to multiple physical plans 16 Logical Plan One Physical Plan
Overview of Query Execution 17 SQL Query Compile Optimize Execute
Step 6: Estimate the Cost This is done for each physical plan 18 Select the cheapest to execute…
Overview of Query Execution 19 SQL Query Compile Optimize Execute
Evaluating Relational Operators 20
Common Techniques Algorithms for evaluating relational operators use some simple ideas extensively: Indexing: Can use WHERE conditions to retrieve small set of tuples (selections, joins) Iteration: Sometimes, faster to scan all tuples even if there is an index. (And sometimes, we can scan the data entries in an index instead of the table itself.) Partitioning: By using sorting or hashing, we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs. 21
Another Categorization One Pass Algorithms – Need one pass over the input relation(s) – Puts limitations on the size of the inputs vs. memory Two Pass Algorithms – Need two pass over the input relation(s) – Puts limitations on the size of the inputs vs. memory Multi-Pass Algorithms – Scale to any size and may need several passes over the input relation(s) 22
Common Statistics over Relation R 23 B(R): # of blocks to hold all R tuples T(R): # tuples in R S(R): # of bytes in each of R’s tuple V(R, A): # distinct values in attribute R.A M: # of memory buffers available R R R is “clustered” R’s tuples are packed into blocks Accessing R requires B(R) I/Os R is “not clustered” R’s tuples are distributed over the blocks Accessing R requires T(R) I/Os
Example: Join (R,S) 24 One Pass Iteration Open(): read S into memory GetNext(): for b in blocks of R: for t in tuples of b: if t matches tuple s: return join (t,s) return NotFound Close(): Clean memory Assume S is smaller than R Key Metrics: – M >= B(S) + 1 I/O Cost: – B(S) + B(R) Notes: – Can use prefetching for R