Download presentation
Presentation is loading. Please wait.
Published byFrederica Craig Modified over 9 years ago
1
Query Processor A query processor is a module in the DBMS that performs the tasks to process, to optimize, and to generate execution strategy for a high-level query For a DDBMS, the QP also does data localization for the query based on the fragmentation scheme and generates the execution strategy that incorporates the communication operations involved in processing the query
2
Query Optimizer Queries expressed in SQL can have multiple equivalent relational algebra query expressions The distributed query optimizer must select the ordering of relational algebra operations, sites to process data, and possibly the way data should be transferred. This makes distributed query processing significantly more difficult
3
Complexity of Relational Algebra Operations The relational algebra is used to express the output of the query. The complexity of relational algebra operations play a role in defining some of the principles of query optimization. All complexity measures are based on the cardinality of the relation Operations Complexity Select, Project (w/o duplicate elimination)O(n) Project (with duplicate elimination), GroupO(n logn) Join, Semi-join, Division, Set OperatorsO(n logn) Cartesian ProductO(n 2 )
4
Characteristics of Query Processors Languages Input language can be relational algebra or calculus; output language is relational algebra (annotated with communication primitives). The query processor must efficiently map input language to output language Types of Optimization The output language specification represents the execution strategy. There can be many such strategies, the best one can be selected through exhaustive search, or by applying heuristic (minimize size of intermediate relations). For distributed databases semijoins can be applied to reduce data transfer.
5
When to Optimize Static: done before executing the query (at compilation time), cost of optimization amortized over multiple executions, mostly based on exhaustive search. Since sizes of intermediate relations need to be estimated, it can result in sub-optimal strategies. Dynamic: done at run time; every time the query is executed, can make use of exact sizes of intermediate relations, expensive, based on heuristics Hybrid: mixes static and dynamic approaches; the approach is mainly static, but dynamic query optimization may take place when high difference between predicted and actual sizes are detected
6
Characteristics of Query Processors Statistics fragment cardinality and size size and number of distinct values for each attribute. detailed histograms of attribute values for better selectivity estimation. Decision Sites one site or several sites participate in selection of strategy Exploitation of network topology wide area network communication cost local area network parallel execution
7
Characteristics of Query Processors Exploitation of replicated fragments larger number of possible strategies Use of Semijoins reduce size of data transfer increase # of messages and local processing good for fast or slow networks?
8
Layers of Query Processing QUERY DECOMPOSITION DATA LOCALIZATION GLOBAL OPTIMIZATION LOCAL OPTIMIZATION FRAGMENT SCHEMA STATISTICS ON FRAGMENTS LOCAL SCHEMA GLOBAL SCHEMA Calculus Query on Distributed Relations Algebra Query on Distributed Relations Fragment Query Optimized Fragment Query With Communication Operations Optimized Local Queries CONTROL SITE LOCAL SITE
9
Query Decomposition Normalization The calculus query is written in a normalized form (CNF or DNF) for subsequent manipulation Analysis The query is analyzed for semantic correctness Simplification Redundant predicates are eliminated to obtain simplified queries Restructuring The calculus query is translated to optimal algebraic query representation
10
Query Decomposition: Normalization Lexical and syntactic analysis check validity check for attributes and relations type checking on the qualification There are two possible forms of representing the predicates in query qualification: Conjunctive Normal Form (CNF) or Disjunctive Normal Form (DNF) CNF: (p 11 p 12 ... p 1n ) ... (p m1 p m2 ... p mn ) DNF: (p 11 p 12 ... p 1n ) ... (p m1 p m2 ... p mn ) OR's mapped into union AND's mapped into join or selection
11
Query Decomposition: Analysis Queries are rejected because the attributes or relations are not defined in the global schema; or operations used in qualifiers are semantically incorrect For only those queries that do not use disjunction or negation semantic correctness can be determined by using query graph One node of the query graph represents result sites, others operand relations, edge between nodes operand nodes represent joins, and edge between operand node and result node represents project
12
Query Graph and Join Graph SELECT Ename, Resp FROM E, G, J WHERE E. ENo = G. ENO AND G.JNO = J.JNO AND JNAME = ``CAD'' AND DUR >= 36 AND Title = ``Prog'' EMPResultASG G.JNO = J.JNO E. ENo = G. ENO Resp Ename DUR >= 36 JNAME = ``CAD'' Title = ``Prog'' EMPPROJASG G.JNO = J.JNO E. ENo = G. ENO PROJ
13
Disconnected Query Graph Semantically incorrect conjunctive multivariable query without negation have query graphs which are not connected SELECT Ename, Resp FROM E, G, J WHERE E. ENo = G. ENO AND JNAME = ``CAD'' AND DUR >= 36 AND Title = ``Prog'' EMPResultASG E. ENo = G. ENO Resp Ename DUR >= 36JNAME = ``CAD'' Title = ``Prog'' PROJ
14
Simplification: Eliminating Redundancy Elimination of redundant predicates using well known idempotency rules: p p = p;p p = p; p true = true; p false = p; p true = p; p false = false; p 1 (p 1 p 2 ) = p 1 ; p 1 (p 1 p 2 ) = p 1 Such redundant predicates arise when user query is enriched with several predicates to incorporate view relation correspondence, and ensure semantic integrity and security
15
Eliminating Redundancy-- An Example SELECT TITLE FROM E WHERE (NOT (TITLE = ``Programmer'') AND (TITLE = ``Programmer'' OR TITLE = ``Elec.Engr'') AND NOT (TITLE = ``Elec.Engr'')) OR ENAME = ``J.Doe''; SELECT TITLE FROM E WHERE ENAME = ``J.Doe'';
16
Eliminating Redundancy-- An Example p1 = p2 = p3 = The disjunctive normal form of the query is = ( ¬ p1 p1 ¬ p2) ( ¬ p1 p2 ¬ p2) p3 = (false ¬ p2) ( ¬ p1 false) Ú p3 = false false p3 = p3 Let the query qualification is ( ¬ p1 (p1 p2) ¬ p2) p3
17
Query Decomposition: Rewriting Rewriting calculus query in relational algebra; straightforward transformation from relational calculus to relational algebra, and restructuring relational algebra expression to improve performance
18
Rewriting -- Transformation Rules (I) Commutativity of binary operations: R S S R R S S R Associativity of binary operations: (R S) T R ( S T ) Idempotence of unary operations: grouping of projections and selections A’ ( A’’ (R )) A’ (R ) for A’ A’’ A p1(A1) ( p2(A2) (R )) p1(A1) p2(A2) (R ) R S S R (R S) T R (S T)
19
Rewriting -- Transformation Rules (II) Commuting selection with projection A1, …, An ( p (Ap) (R )) A1, …, An ( p (Ap) ( A1, …, An, Ap (R ))) Commuting selection with binary operations p (Ai) (R S) ( p (Ai) (R)) S p (Ai) (R S) ( p (Ai) (R)) S p (Ai) (R S) p (Ai) (R) p (Ai) (S) Commuting projection with binary operations C (R S) A (R) B (S) C = A B C (R S) C (R) C (S) C (R S) C (R) C (S)
20
An SQL Query and Its Query Tree ASGEMP ENAME (ENAME<>“J.DOE” ) (JNAME=“CAD/CAM” ) (Dur=12 Dur=24) PROJ SELECT Ename FROM J, G, E WHERE G.Eno=E.ENo AND G.JNo = J.JNo AND ENAME <> `J.Doe' AND JName = `CAD' AND (Dur=12 or Dur=24 ) JNO ENO
21
Query Decomposition: Rewriting ENAME JNO JNO, ENAME JNO, ENO ENO, ENAME Dur=12 Dur=24 JNAME=“CAD/CAM” ENAME<>“J.DOE” PROJASGEMP ENO JNO
22
Data Localization Input: Algebraic query on distributed relations Determine which fragments are involved Localization program substitute for each global query its materialization program optimize
23
Data Localization-- An Example PROJ ASG1 EMP1 ENAME Dur=12 Dur=24 JNAME=“CAD/CAM” ENAME<>“J.DOE” ENO JNO EMP is fragmented into EMP1 = ENO “E3” (EMP) EMP2 = “E3” < ENO “E6” (EMP) EMP3 = ENO >“E6” (EMP) ASG is fragmented into ASG1 = ENO “E3” (ASG) ASG2 = ENO >“E3” (ASG) EMP1 ASG2 ASG1
24
Reduction with Selection EMP is fragmented into EMP1 = ENO “E3” (EMP) EMP2 = “E3” < ENO “E6” (EMP) EMP3 = ENO >“E6” (EMP) SELECT * FROM EMP WHERE ENO=“E5” EMP1EMP2EMP3 ENO=“E5” EMP2 ENO=“E5” EMP ENO=“E5” Given Relation R, F R ={R 1, R 2, …, R n } where R j = pj (R) pj (R j ) = if x R: (p i (x) p j (x))
25
Reduction with join EMP is fragmented into EMP1 = ENO “E3” (EMP) EMP2 = “E3” < ENO “E6” (EMP) EMP3 = ENO >“E6” (EMP) ASG is fragmented into ASG1 = ENO “E3” (ASG) ASG2 = ENO >“E3” (ASG) ASG1 EMP1 ENO EMP1 ASG2 ASG1 SELECT * FROM EMP, ASG WHERE EMP.ENO=ASG.ENO ENO ASG EMP
26
ASG1 EMP1 ENO EMP2EMP3 ASG2 ASG1 Reduction with Join (I) (R1 R2) S (R1 S) (R2 S) ASG1EMP1 ENO ASG1EMP2 ENO ASG2EMP2 ENO ASG1EMP3 ENO ASG2EMP3 ENO ASG2EMP1 ENO
27
Reduction with Join (II) ASG1 EMP1 ENO ASG2 EMP2 ENO ASG2 EMP3 ENO Given R i = pi (R) and R j = pj (R) R i Rj = if x R i, y R j : (p i (x) p j (y)) Reduction with join 1. Distribute join over union 2. Eliminate unnecessary work
28
Reduction for VF Find useless intermediate relations Relation R defined over attributes A = {A1, A2, …, An} vertically fragmented as R i = A’ (R) where A’ A K,D (R i ) is useless if the set of projection attributes D is not in A’ EMP1= ENO,ENAME (EMP) EMP2= ENO,TITLE (EMP) SELECT ENAME FROM EMP EMP2 EMP1 ENO ENAME EMP1 ENAME
29
Reduction for DHF Distribute joins over union Apply the join reduction for horizontal fragmentation EMP1: TITLE=“Programmer” (EMP) EMP2: TITLE “Programmer” (EMP) ASG1: ASG ENO EMP1 ASG2: ASG ENO EMP2 SELECT * FROM EMP, ASG WHERE ASG.ENO = EMP.ENO AND EMP.TITLE = “Mech. Eng.” ASG1 EMP1 ENO EMP2 ASG2 ASG1 TITLE=“MECH. Eng.”
30
Reduction for DHF (II) ASG1 EMP2 TITLE=“Mech. Eng.” ENO ASG1 ASG2 EMP2 TITLE=“Mech. Eng.” ENO ASG1 ASG2 EMP2 TITLE=“Mech. Eng.” ENO ASG1 ENO EMP2 ASG2 ASG1 TITLE=“Mech. Eng.” Selection first Joins over union
31
Reduction for HF Remove empty relations generated by contradicting selection on horizontal fragments; Remove useless relations generated by projections on vertical fragments; Distribute joins over unions in order to isolate and remove useless joins
32
Reduction for HF --An Example EMP1 = ENO “E4” ( ENO,ENAME (EMP)) EMP2 = ENO>“E4” ( ENO,ENAME (EMP)) EMP3 = ENO,TITLE (EMP) QUERY SELECT ENAME FROM EMP WHERE ENO = “E5” ASG1 ENO EMP3 EMP2 EMP1 ENO=“E5” ENAME EMP2 ENO=“E5” ENAME
33
Why Optimization – An Example Query Select ename From EMP e, ASG g Where e.Eno = g. Eno And resp = ‘‘manager’’ EMP(eno, ename, title) ASG(eno, jno, resp, dur) Find the name of the employees who are managing a project? ASG EMP ASG resp=”manager” EMP.Eno=ASG.Eno Ename Database SQL Query RA tree
34
Example - Strategies EMP1 = ENO <= 100 (EMP) at site 1 EMP2 = ENO > 100 (EMP) at site 2 ASG1 = ENO <= 100 (ASG) at site 3 ASG2 = ENO > 100 (ASG) at site 4 Fragment Schema Query site: Site 5 ENO ASG1 resp=“manager ” EMP1 ENO ASG2 resp=“manager ” EMP2 Site 5 ASG1 resp=“manager ” EMP1 ENO ASG2 EMP2 Plan A Plan B ASG1’ASG2’
35
Example – DB Statistics & Costs Database Statistics EMP has 400 tuples, ASG has 1000 tuples, there are 20 managers in G the data is uniformly distributed among sites. ASG and EMP are locally clustered on attributes RESP and ENO, respectively Costs tuple access t acc = 1 unit, tuple transfer t trans = 10 units,
36
Costs for Example Plan The cost of Plan A: Produce ASG’ = 20 t acc =20 (processing locally) Transfer ASG’ = 20 *t trans =200(transfer to EMP site) Produce EMP’ = (10+10) * t acc * 2 = 40(join at the EMP site) Transfer EMP’ = 20 * t trans =200(send to Site 5) Total cost = 460 The cost of Plan B: Transfer EMP = 400 * t trans = 4,000(send EMP to Site 5) Transfer ASG = 1000 * t trans = 10,000(send ASG to Site 5) Produce ASG’ = 1000 * t acc = 1,000(selection at Site 5) Join EMP and ASG’ = 400 * 20 * t acc = 8,000 (join at Site 5) Total cost = 23,000
37
Query Optimization Problems in query optimization Determining the physical copies of the fragments upon which to execute the fragment query expressions (also known as materialization) Selecting the order of execution of operations Selecting the method for executing each operation The above problems are not independent, for instance, the choice of the best materialization for a query depends on the order in which operations are executed. But they are treated as independent. Further, We bypass (1) by taking materialization for granted We bypass (3) by clustering all operations at the same site as a local database system dependent problem
38
Query Optimization - Objectives The selection of alternative query execution strategies is made based on predetermined objectives Two main objectives: minimize the total processing time (total cost) –network and computers at nodes do not get loaded. –Response time cannot be guaranteed minimize the response time –allocation must facilitate parallel execution of the query –but throughput may decrease and cost can be higher than total cost Total processing time (cost) is the sum of all the time (cost) incurred in executing the query (CPU, I/O, data transfer) Response time is the elapsed time from the initiation till the completion of the query
39
Optimization Algorithms – The Issues Cost model cost components weights for each components costs for primitive operations Search space The set of equivalent algebra expressions (query trees) Search strategies How do we move inside the search space Exhaustive search, heuristics, …
40
Cost Models The cost measures are: I/O and CPU for centralized DBMSs and I/O, CPU and data transfer costs for DDBMS Total cost = CPU cost + I/O cost + communication cost CPU cost: C cpu * #insts I/O cost:C i/o * #i/os Communication CostC msg *#msgs + C tr *#bytes –C cpu, C i/o, C tr and C msg are all assumed to be constants. Response time = sum (sequential operations) C cpu *s_#insts C i/o *s_#i/os C msg *s_#msg + c tr *s_#bytes –S_x stands for maximum number of sequential x’s that need to be executed to process the query
41
Intermediate Result Size The size of the intermediate relations produced during the execution facilitates the selection of the execution strategy This is useful in selecting an execution strategy that reduces data transfer The sizes of intermediate relations need to be estimated based on cardinalities of relations and lengths of attributes R{A 1, A 2,..., A n } fragmented as R 1,R 2,…, R n the statistical data collected typically are len(A i ), length of attribute A i in bytes min(A i ) and max(A i ) for ordered domains card(dom(A i )) unique values in dom[A i ] Number of tuples in each fragment card(R j )
42
Intermediate Size Estimation Join selectivity factor SF j (r,s) = card(r * s) / card(r) * card(s) Selecton selectivity factor SF S (F) = card( f (r)) / card(r) size(r) = card(r) * len(r) Cardinality of intermediate relations SF S (A = value) = 1/card(dom(A)) SF S (A > value) = max(A) - value/max(A)-min(A) SF S (A < value) = value - min(A)/max(A)-min(A) Sf s (p(A i ) p(A j )) = sf s (p(A i )) * sf s (p(A j )) Sf s (p(A i ) p(A j )) = sf s (p(A i )) + sf s (p(A j )) - sf s (p(A i )) * sf s (p(A j )) SF S (A {values}) = SF S (A = value) * card(values)
43
Intermediate Size Estimation (II) Projection card( a (r)) = card(r) Cartesian product card(r X S) = card(r) * card(s) Join card(R A=B S) = card(s); if A is key in R, B is foreign key in S card(R A=B S) = SF J (R,S) * card(r) * card(s) Union Upper bound = card(r) + card(s) Lower bound = max{card(r), card(s)}
44
Cost of Processing Primitive Operations Selection Projection Union Join nested-loops sort-merge hash-based For distributed join, semi-join is proposed to perform joins
45
Semi-join R S R’= A (R) S’ = R’ S S’ R S’ R S Amount of data transferred: |R’| + |S’| 1. join is replaced with a project; followed by semi-join; and then join 2. the project and join operations are done at one site, and semi-join at another site 3.amount of data transferred: |R’| + |S’|
46
Semi-join versus Join using sem-ijoin increases local processing costs because a relation must be scanned twice (join, project) For joining intermediate relations produced during sem-ijoin one cannot exploit indices on the base relations Sem-ijoin may not be good when communication costs are low
47
Search Space Search space is characterized by alternative execution plans Most optimizers focus on join trees For N relations, there are O(N!) equivalent join trees SELECT ENAME, RESP FROM EMP, ASG, PROJ WHERE EMP.ENO=ASG.ENO AND ASG.PNO=PROJ.PNO ENO ASG EMP PNO PROJ ENO ASG EMP PNO PROJ ASG EMP PNO,EN O PROJ
48
Restricting Search Space O(N!) is large Considering join methods, the search space is even bigger Restrict by means of heuristics Ignore cartisian product … Restrict the shape of the join tree Only consider deep trees …. R1R1 R2R2 R3R3 R1R1 R2R2 R3R3 R4R4 R4R4 R1R1 R2R2 R3R3 R4R4 deep tree Left-deep tree bushy tree
49
Search Strategy How to move in the search space to find the optimal plan Deterministic Start from base relations and build plans by adding relations at each step Dynamic programming: breadth-first Greedy: depth-first Randomized Search for the optimal one around a particular starting point –simulated annealing –iterative improvement
50
Search Strategies -- Example R1R1 R2R2 R3R3 R4R4 R1R1 R2R2 R1R1 R2R2 R3R3 R1R1 R3R3 R4R4 R2R2 R1R1 R3R3 R2R2 R4R4 R1R1 R2R2 R3R3 R4R4 Deterministic Randomized
51
Distributed Query Optimization Algorithms System R and R* Hill Climbing and SDD-1
52
System R (Centralized) Algorithm Simple (one relation) queries are executed according to the best access path. Execute joins Determine the possible ordering of joins Determine the cost of each ordering Choose the join ordering with the minimal cost For joins, two join methods are considered: Nested loops Merge join
53
System R Algorithm -- Example Names of employees working on the CAD/CAM project Assume EMP has an index on ENO, ASG has an index on PNO, PROJ has an index on PNO and an index on PNAME
54
System R Algorithm -- Example Choose the best access paths to each relation EMP: sequential scan (no selection on EMP) ASG: sequential scan (no selection on ASG) PROJ: index on PNAME (there is a selection on PROJ based on PNAME) Determine the best join ordering EMP ASG PROJ ASG PROJ EMP PROJ ASG EMP ASG EMP PROJ EMP PROJ ASG PROJ EMP ASG Select the best ordering based on the join costs evaluated according to the two methods
55
System R Example (cont'd) Best total join order is one of EMPASG PROJ EMP ASGASG EMPPROJ × ASGASG PROJEMP × PROJ (ASG EMP) PROJ (PROJ ASG) EMP PROJ ASG (ASG EMP) PROJ (PROJ ASG) EMP
56
System R Algorithm (PROJ ASG) EMP has a useful index on the select attribute and direct access to the join attributes of ASG and EMP. Final plan: select PROJ using index on PNAME then join with ASG using index on PNO then join with EMP using index on ENO
57
System R* Distributed Query Optimization Total-cost minimization. Cost function includes local processing as well as transmission. Algorithm For each relation in query tree find the best access path For the join of n relations find the optimal join order strategy each local site optimizes the local query processing
58
Data Transfer Strategies Ship-whole. entire relation is shipped and stored as temporary relation, merge join algorithm is used, done in pipeline mode Fetch-as-needed. this method is equivalent to semijoin of the inner relation with the outer relation tuple
59
Join Strategy 1 External relation R with internal relation S, let LC be local processing cost, CC be data transfer cost, let average number of tuples of S that match one tuple of R be s Strategy 1. Ship the entire outer relation to the site of internal relation TC = LC(get R) + CC(size(R)) + LC(get s tuples from S)*card(R)
60
Join Strategy 2 Ship the entire inner relation to the site of the outer relation TC = LC(get S) + CC(size(S)) + LC(store S) + LC(get R) + LC(get s tuples from S)*card(R)
61
Join Strategy 3 Fetch tuples of the inner relation for each tuple of the outer relation TC = LC(get R) + CC(len(A)) * card(R) + LC(get s tuples from S) * card(R) + CC(s*len(S))*card(R)
62
Join Strategy 4 Move both relations to 3rd site and join there TC = LC(get R) + LC(get S) + CC(size(S)) + LC(store S) + CC(size(R)) + LC(get s tuples from S)*card(R) Conceptually, the algorithm does an exhaustive search among all alternatives and selects one that minimizes total cost
63
Hill Climbing Algorithm - Algorithm Inputs query graph, locations of relations, and relation statistics Initial solution the least costly among all when the relations are sent to a candidate result site denoted by ES 0, and the site as chosen site Splits ES 0 into ES 1 : ship one relation of join to the site of other relation ES 2 : these two relations are joined locally and the result is transmitted to the chosen site If cost(ES 1 ) + cost(ES 2 ) + LC > cost (ES 0 ) select ES 0, else select ES 1 and ES 2. The process can be recursively applied to ES 1 and ES 2 till no more benefit occurs
64
Hill Climbing Algorithm - Example SAL PNAME=“CAD/CAM” PROJ ASG EMP PNO TITLE ENO PAY Ignore the local processing cost Length of tuples is 1 for all relation Site1 EMP(8) Site2 PAY(4) Site3 PROJ(1) Site4 ASG(10) ES 0 Cost = 13 8 4 1
65
HCA - Example Site1 EMP(8) Site2 PAY(4 ) Site3 PROJ(1) Site4 ASG(10) ? ? ? TITLE ES 1 ES 2 ES 3 Site1 EMP(8) Site2 PAY(4) Site3 PROJ(1) Site4 ASG(10) Site1 EMP(8) Site2 PAY(4) Site3 PROJ(1) Site4 ASG(10) ES 0 Cost = 13 8 4 1 Solution 1 Cost = Solution 2 Cost = ES 1 ES 2 ES 3 ESo is the “BEST”
66
Hill Climbing Algorithm - Comments Greedy algorithm: determines an initial feasible solution and iteratively tries to improve it. If there are local minimas, it may not find the global minima If the optimal solution has a high initial cost, it won’t be found since it won’t be chosen as the initial feasible solution. Site1 EMP(8) Site2 PAY(4) Site3 PROJ(1) Site4 ASG(10) COST =
67
SDD-1 Algorithm SDD-1 algorithm generalized the hill-climbing algorithm to determine ordering of beneficial semijoins; and uses statistics on the database, called database profiles. Cost of semijoin: Cost (R SJ A S) = C MSG + C TR *size( A (S)) Benefit is the cost of transferring irrelevant tuple Benefit(R SJ A S) = (1-SF SJ (S.A)) * size(R) * C TR A semijoin is beneficial if cost < benefit.
68
SDD-1: The Algorithm initialization phase generates all beneficial semijoins, and an execution strategy that includes only local processing most beneficial semijoin is selected; statistics are modified and new beneficial semijoins are selected the above step is done until no more beneficial joins are left assembly site selection to perform local operations postoptimization removes unnecessary semijoins
69
SDD1 - Example SELECT * FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO Site 1 EMP Site 2 ASG Site 3 PROJ ENO PNO
70
SDD1 - First Iteration SJ1: ASG SJ EMP benefit = (1-0.3)*3000 = 2100; cost = 120 SJ2: ASG SJ PROJ benefit = (1-0.4)*3000 = 1800 cost = 200 SJ3: EMP SJ ASG benefit = (1-0.8)*1500 = 300; cost = 400 SJ4: PROJ SJ ASG benefit = 0; cost = 400 SJ1 is selected ASG size is reduced to 3000*0.3=900 ASG’ = ASG SJ EMP Semijoin selectivity factor is reduced; it is approximated by SF SJ (G.ENO)= 0.8*0.3 = 0.24
71
SDD-1 - Second & Third Iterations Second iteration SJ2: ASG’ SJ PROJ benefit=(1-0.4)*900=540 cost=200; SJ3: EMP SJ ASG’; benefit=(1-0.24)*1500=1140 cost=400 è SJ3 is selected EMP’ = EMP SJ ASG size(EMP’) = 1500*0.24 = 360 Third Iteration SJ2: ASG’ SJ PROJ benefit=(1-0.4)*900=540 cost=200; è it is selected reduces size of G further to 900*0.4=360
72
Local Optimization Each site optimizes the plan to be executed at the site A centralized query optimization problem
73
SDD-1 - Assembly Site Selection After reduction EMP is at site 1 with size 360 ASG is at site 2 with size 360 PROJ is at site 3 with size 2000 è Site 3 is chosen as assembly site no semijoins reduced in post optimization. Site1 EMP Site3 PROJ Site2 ASG (ASG SJ EMP) SJ PROJ site 3 (EMP SJ ASG) site 3 join at site 3
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.