CSE 544: Query Execution Wednesday, 5/12/2004.

CSE 544: Query Execution Wednesday, 5/12/2004

Architecture of a Database Engine
SQL query Parse Query Logical plan Select Logical Plan Query optimization Select Physical Plan Physical plan Query Execution

Logical Algebra Operators
Union, intersection, difference Selection s Projection P Join |x| Duplicate elimination d Grouping g Sorting t

Logical Query Plan T3(city, c) SELECT city, count(*) FROM sales
GROUP BY city HAVING sum(price) > 100 P city, c T2(city,p,c) s p > 100 T1(city,p,c) g city, sum(price)→p, count(*) → c sales(product, city, price) T1, T2, T3 = temporary tables

Logical Query Plan Q.age > 25  SELECT P.buyer
FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.age > 25 buyer  City=‘seattle’ age>25 Buyer=name Purchase Person

Physical Query Plan Query Plan: logical tree
SELECT P.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.age > 25 Purchase Person Buyer=name City=‘seattle’ age > 25 buyer (Simple Nested Loops)  (Table scan) (Index scan) Query Plan: logical tree implementation choice at every node scheduling of operations. Some operators are from relational algebra, and others (e.g., scan) are not.

More Complex Plans SELECT Q.name FROM Person Q
- SELECT Q.name FROM Person Q WHERE Q.age > and not exists SELECT * FROM Purchase P WHERE P.buyer = Q.name and P.price > 100 name name  Price > 100  age>25 Buyer=name Person Purchase Person

Question in Class Logical operator:
Product(pname, cname) || Company(cname, city) Propose three physical operators for the join, assuming the tables are in main memory:

Question in Class Product(pname, cname) |x| Company(cname, city)
products 1000 companies What is the cost ? Nested loop join time = Sort and merge = merge-join time = Hash join time =

Cost Parameters In database systems the data is on disks
Cost = total number of I/Os Parameters: B(R) = # of blocks for relation R T(R) = # of tuples in relation R V(R, a) = # of distinct values of attribute a

Cost Parameters Clustered table R: Unclustered table R:
Blocks consists only of records from this table B(R)  T(R) / blockSize Unclustered table R: Its records are placed on blocks with other tables When R is unclustered: B(R)  T(R) When a is a key, V(R,a) = T(R) When a is not a key, V(R,a)

Pipelining ⋈ ⋈ U ⋈ T R S Need to read R,S,T,U No need to write
pipeline ⋈ Need to read R,S,T,U No need to write ⋈ U ⋈ T R S

Blocking ⋈ V2 Now we need to write ⋈ U V1 ⋈ T R S

Complex Operator Trees
⋈ ⋈ ⋈ ⋈ ⋈ ⋈ V Z ⋈ R S X T I Y

Cost Cost of an operation = number of disk I/Os needed to:
read the operands compute the result Cost of writing the result to disk is not included; need to account separately if blocking Answer: 3B

Scanning a Table Clustered relation: Unclustered relation
Result may be unsorted: B(R) Result needs to be sorted: 3B(R) Unclustered relation Unsorted: T(R) Sorted: T(R) + 2B(R)

One-Pass Algorithms Selection s(R), projection P(R)
Both are tuple-at-a-time algorithms Cost: B(R) Unary operator Input buffer Output buffer

One-pass Algorithms Hash join: R |x| S
Scan S, build buckets in main memory Then scan R and join Cost: B(R) + B(S) Assumption: B(S) <= M

One-pass Algorithms Duplicate elimination d(R)
Need to keep tuples in memory When new tuple arrives, need to compare it with previously seen tuples Balanced search tree, or hash table Cost: B(R) Assumption: B(d(R)) <= M

Question in Class Grouping: Product(name, department, quantity)
gdepartment, sum(quantity) (Product)  Answer(department, sum) Question: how do you compute it in main memory ? Answer:

One-pass Algorithms Grouping: g department, sum(quantity) (R)
Need to store all departments in memory Also store the sum(quantity) for each department Balanced search tree or hash table Cost: B(R) Assumption: number of departments fits in memory

One-pass Algorithms Binary operations: R ∩ S, R ∪ S, R – S
Assumption: min(B(R), B(S)) <= M Scan one table first, then the next, eliminate duplicates Cost: B(R)+B(S)

Question in Class What do we do in each of these cases:
R ∩ S, R ∪ S, R – S H  emptyHashTable /* scan R */ For each x in R do insert(H, x ) /* scan S */ For each y in S do _____________________ /* collect result */ for each z in H do output(z)

Nested Loop Joins for each tuple r in R do for each tuple s in S do
Tuple-based nested loop R ⋈ S Cost: T(R) B(S) when S is clustered Cost: T(R) T(S) when S is unclustered for each tuple r in R do for each tuple s in S do if r and s join then output (r,s)

Nested Loop Joins We can be much more clever
Question: how would you compute the join in the following cases ? What is the cost ? B(R) = 1000, B(S) = 2, M = 4 B(R) = 1000, B(S) = 3, M = 4 B(R) = 1000, B(S) = 6, M = 4

Nested Loop Joins Block-based Nested Loop Join
for each (M-2) blocks bs of S do for each block br of R do for each tuple s in bs for each tuple r in br do if “r and s join” then output(r,s)

Hash table for block of S
Nested Loop Joins R & S Join Result Hash table for block of S (M-2 pages) . . . . . . . . . Input buffer for R Output buffer

Nested Loop Joins Block-based Nested Loop Join Cost:
Read S once: cost B(S) Outer loop runs B(S)/(M-2) times, and each time need to read R: costs B(S)B(R)/(M-2) Total cost: B(S) + B(S)B(R)/(M-2) Notice: it is better to iterate over the smaller relation first R |x| S: R=outer relation, S=inner relation

Two Pass Algorithms Based on Hashing
Idea: partition a relation R into buckets, on disk Each bucket has size approx. B(R)/M M main memory buffers Disk Relation R OUTPUT 2 INPUT 1 hash function h M-1 Partitions . . . 1 2 B(R) Does each bucket fit in main memory ? Yes if B(R)/M <= M, i.e. B(R) <= M2

Hash Based Algorithms for d
Recall: d(R) = duplicate elimination Step 1. Partition R into buckets Step 2. Apply d to each bucket (may read in main memory) Cost: 3B(R) Assumption:B(R) <= M2

Hash Based Algorithms for g
Recall: g(R) = grouping and aggregation Step 1. Partition R into buckets Step 2. Apply g to each bucket (may read in main memory) Cost: 3B(R) Assumption:B(R) <= M2

Partitioned Hash Join R |x| S Step 1: Step 2 Step 3
Hash S into M buckets send all buckets to disk Step 2 Hash R into M buckets Send all buckets to disk Step 3 Join every pair of buckets

Hash table for partition
Hash-Join B main memory buffers Disk Original Relation OUTPUT 2 INPUT 1 hash function h M-1 Partitions . . . Partition both relations using hash fn h: R tuples in partition i will only match S tuples in partition i. Partitions of R & S Input buffer for Ri Hash table for partition Si ( < M-1 pages) B main memory buffers Disk Output buffer Join Result hash fn h2 Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, search for matches. 14

Partitioned Hash Join Cost: 3B(R) + 3B(S)
Assumption: min(B(R), B(S)) <= M2

Hybrid Hash Join Algorithm
Partition S into k buckets t buckets S1 , …, St stay in memory k-t buckets St+1, …, Sk to disk Partition R into k buckets First t buckets join immediately with S Rest k-t buckets go to disk Finally, join k-t pairs of buckets: (Rt+1,St+1), (Rt+2,St+2), …, (Rk,Sk)

Hybrid Join Algorithm How to choose k and t ?
Choose k large but s.t. k <= M Choose t/k large but s.t t/k * B(S) <= M Moreover: t/k * B(S) + k-t <= M Assuming t/k * B(S) >> k-t: t/k = M/B(S)

Hybrid Join Algorithm How many I/Os ?
Cost of partitioned hash join: B(R) + 3B(S) Hybrid join saves 2 I/Os for a t/k fraction of buckets Hybrid join saves 2t/k(B(R) + B(S)) I/Os Cost: (3-2t/k)(B(R) + B(S)) = (3-2M/B(S))(B(R) + B(S))

Hybrid Join Algorithm Question in class: what is the real advantage of the hybrid algorithm ?

External Sorting Problem: Sort a file of size B with memory M
Where we need this: ORDER BY in SQL queries Several physical operators Bulk loading of B+-tree indexes. Will discuss only 2-pass sorting, for when B < M2 4

External Merge-Sort: Step 1
Phase one: load M bytes in memory, sort M . . . . . . Disk Disk Main memory Runs of length M bytes

External Merge-Sort: Step 2
Merge M – 1 runs into a new run Result: runs of length M (M – 1) M2 Input 1 . . . Input 2 . . . Output Input M Disk Disk Main memory If B <= M2 then we are done 7

Cost of External Merge Sort
Read+write+read = 3B(R) Assumption: B(R) <= M2 8

Initial Run Creation Method 1: using some internal sorting
Initial runs of length M Good processor cache locality and I/O behavior Method 2: “with replacement” Initial runs > M (expected size: 2M) Poor processor cache locality and I/O behavior

Duplicate Elimination
Duplicate elimination d(R) Idea: do a two step merge sort, but change one of the steps Question in class: which step needs to be changed and how ? Cost = 3B(R) Assumption: B(d(R)) <= M2 Step 2: merge M-1 runs, but include each tuple only once cost B(R)

Grouping Grouping: ga, sum(b) (R)
Same as before: sort, then compute the sum(b) for each group of a’s Total cost: 3B(R) Assumption: B(R) <= M2

Merge-Join Join R |x| S Step 1a: initial runs for R
Step 1b: initial runs for S Step 2: merge and join

Merge-Join . . . . . . M1 = B(R)/M runs for R M2 = B(S)/M runs for S
Input 1 . . . Input 2 . . . Output Input M Disk Disk Main memory M1 = B(R)/M runs for R M2 = B(S)/M runs for S If B <= M2 then we are done 7

Two-Pass Algorithms Based on Sorting
Join R |x| S If the number of tuples in R matching those in S is small (or vice versa) we can compute the join during the merge phase Total cost: 3B(R)+3B(S) Assumption: B(R) + B(S) <= M2

Assumption: multi-way merge sort needs only two passes Assumption: B(R) <= M2 Cost for sorting: 3B(R)

Duplicate elimination d(R) Trivial idea: sort first, then eliminate duplicates Step 1: sort chunks of size M, write cost 2B(R) Step 2: merge M-1 runs, but include each tuple only once cost B(R) Total cost: 3B(R), Assumption: B(R) <= M2

Grouping: ga, sum(b) (R) Same as before: sort, then compute the sum(b) for each group of a’s Total cost: 3B(R) Assumption: B(R) <= M2

x = first(R) y = first(S) While (_______________) do { case x < y: output(x) x = next(R) case x=y: case x > y; } R ∪ S Complete the program in class:

x = first(R) y = first(S) While (_______________) do { case x < y: case x=y: case x > y; } R ∩ S Complete the program in class:

x = first(R) y = first(S) While (_______________) do { case x < y: case x=y: case x > y; } R - S Complete the program in class:

Binary operations: R ∪ S, R ∩ S, R – S Idea: sort R, sort S, then do the right thing A closer look: Step 1: split R into runs of size M, then split S into runs of size M. Cost: 2B(R) + 2B(S) Step 2: merge M/2 runs from R; merge M/2 runs from S; ouput a tuple on a case by cases basis Total cost: 3B(R)+3B(S) Assumption: B(R)+B(S)<= M2

R(A,C) sorted on A S(B,D) sorted on B x = first(R) y = first(S) While (_______________) do { case x.A < y.B: case x.A=y.B: case x.A > y.B; } R |x|R.A =S.B S Complete the program in class:

Join R |x| S Start by sorting both R and S on the join attribute: Cost: 4B(R)+4B(S) (because need to write to disk) Read both relations in sorted order, match tuples Cost: B(R)+B(S) Difficulty: many tuples in R may match many in S If at least one set of tuples fits in M, we are OK Otherwise need nested loop, higher cost Total cost: 5B(R)+5B(S) Assumption: B(R) <= M2, B(S) <= M2

Join R |x| S If the number of tuples in R matching those in S is small (or vice versa) we can compute the join during the merge phase Total cost: 3B(R)+3B(S) Assumption: B(R) + B(S) <= M2

Indexed Based Algorithms
Recall that in a clustered index all tuples with the same value of the key are clustered on as few blocks as possible Note: book uses another term: “clustering index”. Difference is minor… a a a a a a a a a a

Index Based Selection Selection on equality: sa=v(R)
Clustered index on a: cost B(R)/V(R,a) Unclustered index on a: cost T(R)/V(R,a)

Index Based Selection B(R) = 2000 T(R) = 100,000 V(R, a) = 20
Example: Table scan (assuming R is clustered): B(R) = 2,000 I/Os Index based selection: If index is clustered: B(R)/V(R,a) = 100 I/Os If index is unclustered: T(R)/V(R,a) = 5,000 I/Os Lesson: don’t build unclustered indexes when V(R,a) is small ! cost of sa=v(R) = ?

Index Based Join R S Assume S has an index on the join attribute
Iterate over R, for each tuple fetch corresponding tuple(s) from S Assume R is clustered. Cost: If index is clustered: B(R) + T(R)B(S)/V(S,a) If index is unclustered: B(R) + T(R)T(S)/V(S,a)

Index Based Join Assume both R and S have a sorted index (B+ tree) on the join attribute Then perform a merge join called zig-zag join Cost: B(R) + B(S)

Summary of External Join Algorithms
Block Nested Loop Join: B(S) + B(R)*B(S)/M Partitioned Hash Join: 3B(R)+3B(S) Assuming min(B(R),B(S)) <= M2 Merge Join Assuming B(R)+B(S) <= M2 Index Join B(R) + T(R)B(S)/V(S,a) Assuming…

Example Select Product.pname From Product, Company
Product(pname, maker), Company(cname, city) How do we execute this query ? Select Product.pname From Product, Company Where Product.maker=Company.cname and Company.city = “Seattle”

Example Product(pname, maker), Company(cname, city) Assume:
Clustered index: Product.pname, Company.cname Unclustered index: Product.maker, Company.city

Logical Plan: scity=“Seattle” Product (pname,maker)
maker=cname scity=“Seattle” Product (pname,maker) Company (cname,city)

Index-based selection
Physical plan 1: Index-based join Index-based selection cname=maker scity=“Seattle” Company (cname,city) Product (pname,maker)

Scan and sort (2a) index scan (2b)
Physical plans 2a and 2b: Merge-join Which one is better ?? maker=cname scity=“Seattle” Product (pname,maker) Company (cname,city) Index- scan Scan and sort (2a) index scan (2b)

Index-based selection
Physical plan 1:  T(Product) / V(Product, maker) Index-based join Index-based selection Total cost: T(Company) / V(Company, city)  T(Product) / V(Product, maker) cname=maker scity=“Seattle” Company (cname,city) Product (pname,maker) T(Company) / V(Company, city)

Scan and sort (2a) index scan (2b)
Total cost: (2a): 3B(Product) + B(Company) (2b): T(Product) + B(Company) Physical plans 2a and 2b: Merge-join No extra cost (why ?) maker=cname scity=“Seattle” 3B(Product) Product (pname,maker) Company (cname,city) T(Product) Table- scan Scan and sort (2a) index scan (2b) B(Company)

Which one is better ?? It depends on the data !!
Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker) Plan 2a: B(Company) + 3B(Product) Plan 2b: B(Company) + T(Product) Which one is better ?? It depends on the data !!

Example Case 1: V(Company, city)  T(Company)
T(Company) = 5, B(Company) = M = 100 T(Product) = 100, B(Product) = 1,000 We may assume V(Product, maker)  T(Company) (why ?) Case 1: V(Company, city)  T(Company) Case 2: V(Company, city) << T(Company) V(Company,city) = 2,000 V(Company,city) = 20

Which Plan is Best ? Case 1: Case 2:
Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker) Plan 2a: B(Company) + 3B(Product) Plan 2b: B(Company) + T(Product) Case 1: Case 2:

Lessons Need to consider several physical plan
even for one, simple logical plan No magic “best” plan: depends on the data In order to make the right choice need to have statistics over the data the B’s, the T’s, the V’s

CSE 544: Query Execution Wednesday, 5/12/2004.

Similar presentations

Presentation on theme: "CSE 544: Query Execution Wednesday, 5/12/2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 544: Query Execution Wednesday, 5/12/2004.

Similar presentations

Presentation on theme: "CSE 544: Query Execution Wednesday, 5/12/2004."— Presentation transcript:

Similar presentations

About project

Feedback