Download presentation
Presentation is loading. Please wait.
Published byRobert Gardner Modified over 9 years ago
1
L4.2.2. Distributed Query Optimization Algorithms -- 1 Distributed Query Optimization Algorithms v System R and R* v Hill Climbing and SDD-1
2
L4.2.2. Distributed Query Optimization Algorithms -- 2 System R (Centralized) Algorithm v Simple (one relation) queries are executed according to the best access path. v Execute joins Determine the possible ordering of joins Determine the cost of each ordering Choose the join ordering with the minimal cost v For joins, two join methods are considered: Nested loops Merge join
3
L4.2.2. Distributed Query Optimization Algorithms -- 3 System R Algorithm -- Example Names of employees working on the CAD/CAM project v Assume EMP has an index on ENO, ASG has an index on PNO, PROJ has an index on PNO and an index on PNAME
4
L4.2.2. Distributed Query Optimization Algorithms -- 4 System R Algorithm -- Example v Choose the best access paths to each relation EMP: sequential scan (no selection on EMP) ASG: sequential scan (no selection on ASG) PROJ: index on PNAME (there is a selection on PROJ based on PNAME) v Determine the best join ordering EMP ASG PROJ ASG PROJ EMP PROJ ASG EMP ASG EMP PROJ EMP PROJ ASG PROJ EMP ASG Select the best ordering based on the join costs evaluated according to the two methods
5
L4.2.2. Distributed Query Optimization Algorithms -- 5 System R Example (cont'd) v Best total join order is one of EMPASG PROJ EMP ASGASG EMPPROJ × EMPASG PROJEMP × PROJ (ASG EMP) PROJ (PROJ ASG) EMP PROJ ASG (ASG EMP) PROJ (PROJ ASG) EMP
6
L4.2.2. Distributed Query Optimization Algorithms -- 6 System R Algorithm v (PROJ ASG) EMP has a useful index on the select attribute and direct access to the join attributes of ASG and EMP. v Final plan: select PROJ using index on PNAME then join with ASG using index on PNO then join with EMP using index on ENO
7
L4.2.2. Distributed Query Optimization Algorithms -- 7 System R* Distributed Query Optimization v Total-cost minimization. Cost function includes local processing as well as transmission. v Algorithm For each relation in query tree find the best access path For the join of n relations find the optimal join order strategy each local site optimizes the local query processing
8
L4.2.2. Distributed Query Optimization Algorithms -- 8 Data Transfer Strategies v Ship-whole. entire relation is shipped and stored as temporary relation. If merge join algorithm is used, no need for temporary storage, and can be done in pipeline mode v Fetch-as-needed. this method is equivalent to semijoin of the inner relation with the outer relation tuple
9
L4.2.2. Distributed Query Optimization Algorithms -- 9 Join Strategy 1 v External relation R with internal relation S, let LC be local processing cost, CC be data transfer cost, let average number of tuples of S that match one tuple of R be s v Strategy 1. Ship the entire outer relation to the site of internal relation TC = LC(get R) + CC(size(R)) + LC(get s tuples from S)*card(R)
10
L4.2.2. Distributed Query Optimization Algorithms -- 10 Join Strategy 2 v Ship the entire inner relation to the site of the outer relation TC = LC(get S) + CC(size(S)) + LC(store S) + LC(get R) + LC(get s tuples from S)*card(R)
11
L4.2.2. Distributed Query Optimization Algorithms -- 11 Join Strategy 3 v Fetch tuples of the inner relation for each tuple of the outer relation TC = LC(get R) + CC(len(A)) * card(R) + LC(get s tuples from S) * card(R) + CC(s*len(S))*card(R)
12
L4.2.2. Distributed Query Optimization Algorithms -- 12 Join Strategy 4 v Move both relations to 3rd site and join there TC = LC(get R) + LC(get S) + CC(size(S)) + LC(store S) + CC(size(R)) + LC(get s tuples from S)*card(R) v Conceptually, the algorithm does an exhaustive search among all alternatives and selects one that minimizes total cost
13
L4.2.2. Distributed Query Optimization Algorithms -- 13 Hill Climbing Algorithm - Algorithm Inputs query graph, locations of relations, and relation statistics Initial solution the least costly among all when the relations are sent to a candidate result site denoted by ES 0, and the site as chosen site Splits ES 0 into ES 1 : ship one relation of join to the site of other relation ES 2 : these two relations are joined locally and the result is transmitted to the chosen site If cost(ES 1 ) + cost(ES 2 ) + LC > cost (ES 0 ) select ES 0, else select ES 1 and ES 2. The process can be recursively applied to ES 1 and ES 2 till no more benefit occurs
14
L4.2.2. Distributed Query Optimization Algorithms -- 14 Hill Climbing Algorithm - Example SAL PNAME=“CAD/CAM” PROJ ASG EMP PNO TITLE ENO PAY Ignore the local processing cost Length of tuples is 1 for all relation Site1 EMP(8) Site2 PAY(4) Site3 PROJ(1) Site4 ASG(10) ES 0 Cost = 13 8 4 1
15
L4.2.2. Distributed Query Optimization Algorithms -- 15 HCA - Example Site1 EMP(8) Site2 PAY(4 ) Site3 PROJ(1) Site4 ASG(10) ? ? ? TITLE ES 1 ES 2 ES 3 Site1 EMP(8) Site2 PAY(4) Site3 PROJ(1) Site4 ASG(10) Site1 EMP(8) Site2 PAY(4) Site3 PROJ(1) Site4 ASG(10) ES 0 Cost = 13 8 4 1 Solution 1 Cost = Solution 2 Cost = ES 1 ES 2 ES 3 ESo is the “BEST”
16
L4.2.2. Distributed Query Optimization Algorithms -- 16 Hill Climbing Algorithm - Comments v Greedy algorithm: determines an initial feasible solution and iteratively tries to improve it. v If there are local minimas, it may not find the global minima v If the optimal solution has a high initial cost, it won’t be found since it won’t be chosen as the initial feasible solution. Site1 EMP(8) Site2 PAY(4) Site3 PROJ(1) Site4 ASG(10) COST =
17
L4.2.2. Distributed Query Optimization Algorithms -- 17 SDD-1 Algorithm v SDD-1 algorithm generalized the hill-climbing algorithm to determine ordering of beneficial semijoins; and uses statistics on the database, called database profiles. v Cost of semijoin: Cost (R SJ A S) = C MSG + C TR *size( A (S)) v Benefit is the cost of transferring irrelevant tuple Benefit(R SJ A S) = (1-SF SJ (S.A)) * size(R) * C TR v A semijoin is beneficial if cost < benefit.
18
L4.2.2. Distributed Query Optimization Algorithms -- 18 SDD-1: The Algorithm v initialization phase generates all beneficial semijoins, and an execution strategy that includes only local processing v most beneficial semijoin is selected; statistics are modified and new beneficial semijoins are selected v the above step is done until no more beneficial joins are left v assembly site selection to perform local operations v postoptimization removes unnecessary semijoins
19
L4.2.2. Distributed Query Optimization Algorithms -- 19 SDD1 - Example SELECT * FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO Site 1 EMP Site 2 ASG Site 3 PROJ ENO PNO
20
L4.2.2. Distributed Query Optimization Algorithms -- 20 SDD1 - First Iteration v SJ1: ASG SJ EMP benefit = (1-0.3)*3000 = 2100; cost = 120 v SJ2: ASG SJ PROJ benefit = (1-0.4)*3000 = 1800 cost = 200 v SJ3: EMP SJ ASG benefit = (1-0.8)*1500 = 300; cost = 400 v SJ4: PROJ SJ ASG benefit = 0; cost = 400 v SJ1 is selected v ASG size is reduced to 3000*0.3=900 ASG’ = ASG SJ EMP v Semijoin selectivity factor is reduced; it is approximated by SF SJ (G’.ENO)= 0.8*0.3 = 0.24, SF SJ (G’PNO)=1.0*0.3 =0.3, size(G’.ENO)= 400*0.3=120, size(G’.PNO) = 120
21
L4.2.2. Distributed Query Optimization Algorithms -- 21 SDD-1 - Second & Third Iterations Second iteration v SJ2: ASG’ SJ PROJ benefit=(1- 0.4)*900=540 cost=200; v SJ3: EMP SJ ASG’; benefit=(1- 0.24)*1500=1140 cost=120 v SJ4: PROJ SJ ASG’, benefit=(1- 0.3)*2000=1400 cost=120 è SJ4 is selected PROJ’ = PROJ SJ ASG’ size(PROJ’) = 2000*0.3 = 600 SF SJ (J’)=0.4*0.3=0.12 size(J’.PNO)=200*0.3=60 Third Iteration v SJ2: ASG’ SJ PROJ benefit=(1-0.12)*900=792 cost=60; v SJ3: EMP SJ ASG’; benefit=(1- 0.24)*1500=1140 cost=120 è SJ3 is selected reduces size of E to 1500*0.24=360 è Finally SJ2 is selected, with size of G as 108
22
L4.2.2. Distributed Query Optimization Algorithms -- 22 Local Optimization v Each site optimizes the plan to be executed at the site v A centralized query optimization problem
23
L4.2.2. Distributed Query Optimization Algorithms -- 23 SDD-1 - Assembly Site Selection v After reduction EMP is at site 1 with size 360 ASG is at site 2 with size 108 PROJ is at site 3 with size 600 è Site 3 is chosen as assembly site v SJ4 is removed in post optimization. Site1 EMP Site3 PROJ Site2 ASG (ASG SJ EMP) SJ PROJ site 3 (EMP SJ ASG) site 3 join at site 3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.