Presentation is loading. Please wait.

Presentation is loading. Please wait.

L4: Query Optimization (1) - 1 L4: Query Processing and Optimization v 4.1 Query Processing  Query Decomposition  Data Localization v 4.1 Query Optimization.

Similar presentations


Presentation on theme: "L4: Query Optimization (1) - 1 L4: Query Processing and Optimization v 4.1 Query Processing  Query Decomposition  Data Localization v 4.1 Query Optimization."— Presentation transcript:

1 L4: Query Optimization (1) - 1 L4: Query Processing and Optimization v 4.1 Query Processing  Query Decomposition  Data Localization v 4.1 Query Optimization

2 L4: Query Optimization (1) - 2 Query Processing v Any high-level query (SQL) on a database must be processed, optimized and executed by the DBMS v The high-level query is scanned, and parsed to check for syntactic correctness v An internal representation of a query is created, which is either a query tree or a query graph v The DBMS then devises an execution strategy for retrieving the result of the query. (An execution strategy is a plan for executing the query by accessing the data, and storing the intermediate results) v The process of choosing one out of the many execution strategies is known as query optimization

3 L4: Query Optimization (1) - 3 Query Processor v A query processor is a module in the DBMS that performs the tasks to process, to optimize, and to generate execution strategy for a high-level query v For a DDBMS, the QP also does data localization for the query based on the fragmentation scheme and generates the execution strategy that incorporates the communication operations involved in processing the query

4 L4: Query Optimization (1) - 4 Query Optimizer v Queries expressed in SQL can have multiple equivalent relational algebra query expressions v The distributed query optimizer must select the ordering of relational algebra operations, sites to process data, and possibly the way data should be transferred. This makes distributed query processing significantly more difficult

5 L4: Query Optimization (1) - 5 Complexity of Relational Algebra Operations v The relational algebra is used to express the output of the query. The complexity of relational algebra operations play a role in defining some of the principles of query optimization. All complexity measures are based on the cardinality of the relation v Operations Complexity Select, Project (w/o duplicate elimination)O(n) Project (with duplicate elimination), GroupO(n logn) Join, Semi-join, Division, Set OperatorsO(n logn) Cartesian ProductO(n 2 ) This was given in the book (p194). It is over simplified.

6 L4: Query Optimization (1) - 6 Characteristics of Query Processors v Statistics  fragment cardinality and size  size and number of distinct values for each attribute. detailed histograms of attribute values for better selectivity estimation. v Decision Sites  one site or several sites participate in selection of strategy v Exploitation of network topology  wide area network ­ communication cost  local area network ­ parallel execution

7 L4: Query Optimization (1) - 7 Characteristics of Query Processors v Exploitation of replicated fragments  larger number of possible strategies v Use of Semijoins  reduce size of data transfer  increase # of messages and local processing  good for fast or slow networks?

8 L4: Query Optimization (1) - 8 Layers of Query Processing QUERY DECOMPOSITION DATA LOCALIZATION GLOBAL OPTIMIZATION LOCAL OPTIMIZATION FRAGMENT SCHEMA STATISTICS ON FRAGMENTS LOCAL SCHEMA GLOBAL SCHEMA Calculus Query on Distributed Relations Algebra Query on Distributed Relations Fragment Query Optimized Fragment Query With Communication Operations Optimized Local Queries CONTROL SITE LOCAL SITE

9 L4: Query Optimization (1) - 9 Query Decomposition v Normalization  Convert from general language (SQL) to a “standard” form (e.g., Relational Algebra)  Query qualification is written in a normalized form (CNF or DNF) for subsequent manipulation v Analysis  The query is analyzed for syntactic semantic correctness v Simplification  Redundant predicates are eliminated to obtain simplified queries v Restructuring  The calculus query is translated to optimal algebraic query representation

10 L4: Query Optimization (1) - 10 Query Decomposition: Normalization v There are two possible forms of representing the predicates in query qualification: Conjunctive Normal Form (CNF) or Disjunctive Normal Form (DNF)  CNF: (p 11  p 12 ...  p 1n ) ...  (p m1  p m2 ...  p mn )  DNF: (p 11  p 12 ...  p 1n ) ...  (p m1  p m2 ...  p mn )  OR's mapped into union  AND's mapped into join or selection v Lexical and syntactic analysis  check validity  check for attributes and relations  type checking on the qualification

11 L4: Query Optimization (1) - 11 Example Select A,C From R,S Where (R.B=1 and S.D=2) or (R.C>3 and S.D.=2)  (R.B=1 v R.C>3)  S.D.=2 RS Conjunctive normal form  A, C

12 L4: Query Optimization (1) - 12 Query Decomposition: Analysis v Queries are rejected because  the attributes or relations are not defined in the global schema; or  operations used in qualifiers are semantically incorrect v For only those queries that do not use disjunction or negation semantic correctness can be determined by using query graph v One node of the query graph represents result sites, others operand relations, edge between nodes operand nodes represent joins, and edge between operand node and result node represents project

13 L4: Query Optimization (1) - 13 Analysis: Detect invalid expressions E.g.: Select * from R where R.A =3  R does not have “A” attribute

14 L4: Query Optimization (1) - 14 Query Graph and Join Graph SELECT Ename, Resp FROM E, G, J WHERE E. ENo = G. ENO AND G.JNO = J.JNO AND JNAME = ``CAD'' AND DUR >= 36 AND Title = ``Prog'' EMPResultASG G.JNO = J.JNO E. ENo = G. ENO Resp Ename DUR >= 36 JNAME = ``CAD'' Title = ``Prog'' EMPPROJASG G.JNO = J.JNO E. ENo = G. ENO PROJ

15 L4: Query Optimization (1) - 15 Disconnected Query Graph v Semantically incorrect conjunctive multivariable query without negation have query graphs which are not connected SELECT Ename, Resp FROM E, G, J WHERE E. ENo = G. ENO AND JNAME = ``CAD'' AND DUR >= 36 AND Title = ``Prog'' EMPResultASG E. ENo = G. ENO Resp Ename DUR >= 36JNAME = ``CAD'' Title = ``Prog'' PROJ

16 L4: Query Optimization (1) - 16 Simplification: Eliminating Redundancy v Elimination of redundant predicates using well known idempotency rules: p  p = p;p  p = p; p  true = true; p  false = p; p  true = p; p  false = false; p 1  (p 1  p 2 ) = p 1 ; p 1  (p 1  p 2 ) = p 1 v Such redundant predicates arise when user query is enriched with several predicates to incorporate view­ relation correspondence, and ensure semantic integrity and security

17 L4: Query Optimization (1) - 17 Eliminating Redundancy-- An Example SELECT TITLE FROM E WHERE (NOT (TITLE = ``Programmer'') AND (TITLE = ``Programmer'' OR TITLE = ``Elec.Engr'') AND NOT (TITLE = ``Elec.Engr'')) OR ENAME = ``J.Doe''; SELECT TITLE FROM E WHERE ENAME = ``J.Doe'';

18 L4: Query Optimization (1) - 18 Eliminating Redundancy-- An Example p1 = p2 = p3 = The disjunctive normal form of the query is = ( ¬ p1  p1  ¬ p2)  ( ¬ p1  p2  ¬ p2)  p3 = (false  ¬ p2)  ( ¬ p1  false)  p3 = false  false  p3 = p3 Let the query qualification is ( ¬ p1  (p1  p2)  ¬ p2)  p3

19 L4: Query Optimization (1) - 19 Query Decomposition: Rewriting v Rewriting calculus query in relational algebra;  straightforward transformation from relational calculus to relational algebra, and  restructuring relational algebra expression to improve performance

20 L4: Query Optimization (1) - 20 Rewriting -- Transformation Rules (I) v Commutativity of binary operations: R  S  S  R R  S  S  R v Associativity of binary operations: (R  S)  T  R  ( S  T ) v Idempotence of unary operations: grouping of projections and selections   A’ (  A’’ (R ))   A’ (R ) for A’  A’’  A   p1(A1) (  p2(A2) (R ))   p1(A1)  p2(A2) (R ) R S  S R (R S) T  R (S T)

21 L4: Query Optimization (1) - 21 Rewriting -- Transformation Rules (II) v Commuting selection with projection  A1, …, An (  p (Ap) (R ))   A1, …, An (  p (Ap) (  A1, …, An, Ap (R ))) v Commuting selection with binary operations  p (Ai) (R  S)  (  p (Ai) (R))  S  p (Ai) (R S)  (  p (Ai) (R)) S  p (Ai) (R  S)   p (Ai) (R)   p (Ai) (S) v Commuting projection with binary operations  C (R  S)   A (R)   B (S) C = A  B  C (R S)   C (R)  C (S)  C (R  S)   C (R)   C (S)

22 L4: Query Optimization (1) - 22 An SQL Query and Its Query Tree ASGEMP  ENAME  (ENAME<>“J.DOE” )  (JNAME=“CAD/CAM” )  (Dur=12  Dur=24) PROJ SELECT Ename FROM J, G, E WHERE G.Eno=E.ENo AND G.JNo = J.JNo AND ENAME <> `J.Doe' AND JName = `CAD' AND (Dur=12 or Dur=24 ) JNO ENO

23 L4: Query Optimization (1) - 23 Query Decomposition: Rewriting  ENAME  JNO  JNO, ENAME  JNO, ENO  ENO, ENAME  Dur=12  Dur=24  JNAME=“CAD/CAM”  ENAME<>“J.DOE” PROJASGEMP ENO JNO

24 L4: Query Optimization (1) - 24 Data Localization Input: Algebraic query on distributed relations v Determine which fragments are involved v Localization program  substitute for each global query its materialization program  optimize

25 L4: Query Optimization (1) - 25 Data Localization-- An Example PROJ ASG1 EMP1  ENAME  Dur=12  Dur=24  JNAME=“CAD/CAM”  ENAME<>“J.DOE” ENO JNO EMP is fragmented into EMP1 =  ENO  “E3” (EMP) EMP2 =  “E3” < ENO  “E6” (EMP) EMP3 =  ENO >“E6” (EMP) ASG is fragmented into ASG1 =  ENO  “E3” (ASG) ASG2 =  ENO >“E3” (ASG) EMP2EMP3  ASG2 ASG1 

26 L4: Query Optimization (1) - 26 Reduction with Selection EMP is fragmented into EMP1 =  ENO  “E3” (EMP) EMP2 =  “E3” < ENO  “E6” (EMP) EMP3 =  ENO >“E6” (EMP) SELECT * FROM EMP WHERE ENO=“E5” EMP1EMP2EMP3   ENO=“E5” EMP2  ENO=“E5” EMP  ENO=“E5” Given Relation R, F R ={R 1, R 2, …, R n } where R j =  pj (R)  pi (R j ) =  if  x  R:  (p i (x)  p j (x))

27 L4: Query Optimization (1) - 27 Reduction with join EMP is fragmented into EMP1 =  ENO  “E3” (EMP) EMP2 =  “E3” < ENO  “E6” (EMP) EMP3 =  ENO >“E6” (EMP) ASG is fragmented into ASG1 =  ENO  “E3” (ASG) ASG2 =  ENO >“E3” (ASG) ASG1 EMP1 ENO EMP2EMP3  ASG2 ASG1  SELECT * FROM EMP, ASG WHERE EMP.ENO=ASG.ENO ENO ASG EMP

28 L4: Query Optimization (1) - 28 ASG1 EMP1 ENO EMP2EMP3  ASG2 ASG1  Reduction with Join (I) (R1  R2) S  (R1 S)  (R2 S) ASG1EMP1 ENO ASG1EMP2 ENO ASG2EMP2 ENO ASG1EMP3 ENO ASG2EMP3 ENO ASG2EMP1 ENO 

29 L4: Query Optimization (1) - 29 Reduction with Join (II) ASG1 EMP1 ENO ASG2 EMP2 ENO ASG2 EMP3 ENO  Given R i =  pi (R) and R j =  pj (R) R i Rj =  if  x  R i,  y  R j :  (p i (x)  p j (y)) Reduction with join 1. Distribute join over union 2. Eliminate unnecessary work

30 L4: Query Optimization (1) - 30 Reduction for VF v Find useless intermediate relations Relation R defined over attributes A = {A1, A2, …, An} vertically fragmented as R i =  A’ (R) where A’  A  K,D (R i ) is useless if the set of projection attributes D is not in A’ EMP1=  ENO,ENAME (EMP) EMP2=  ENO,TITLE (EMP) SELECT ENAME FROM EMP EMP2 EMP1 ENO  ENAME EMP1  ENAME

31 L4: Query Optimization (1) - 31 Reduction for DHF Distribute joins over union Apply the join reduction for horizontal fragmentation EMP1:  TITLE=“Programmer” (EMP) EMP2:  TITLE  “Programmer” (EMP) ASG1: ASG ENO EMP1 ASG2: ASG ENO EMP2 SELECT * FROM EMP, ASG WHERE ASG.ENO = EMP.ENO AND EMP.TITLE = “Mech. Eng.” ASG1 EMP1 ENO EMP2  ASG2 ASG1   TITLE=“MECH. Eng.”

32 L4: Query Optimization (1) - 32 Reduction for DHF (II)  ASG1 EMP2  TITLE=“Mech. Eng.” ENO ASG1 ASG2 EMP2  TITLE=“Mech. Eng.” ENO ASG1 ASG2 EMP2  TITLE=“Mech. Eng.” ENO ASG1 ENO EMP2 ASG2 ASG1   TITLE=“Mech. Eng.” Selection first Joins over union

33 L4: Query Optimization (1) - 33 Reduction for HF v Remove empty relations generated by contradicting selection on horizontal fragments; v Remove useless relations generated by projections on vertical fragments; v Distribute joins over unions in order to isolate and remove useless joins

34 L4: Query Optimization (1) - 34 Reduction for HF --An Example EMP1 =  ENO  “E4” (  ENO,ENAME (EMP)) EMP2 =  ENO>“E4” (  ENO,ENAME (EMP)) EMP3 =  ENO,TITLE (EMP) QUERY SELECT ENAME FROM EMP WHERE ENO = “E5” ASG1 ENO EMP3 EMP2 EMP1   ENO=“E5”  ENAME EMP2  ENO=“E5”  ENAME

35 L4: Query Optimization (1) - 35 Why Optimization – An Example Query Select ename From EMP e, ASG g Where e.Eno = g. Eno And resp = ‘‘manager’’ EMP(eno, ename, title) ASG(eno, jno, resp, dur) Find the name of the employees who are managing a project? ASG EMP ASG   resp=”manager”  EMP.Eno=ASG.Eno  Ename Database SQL Query RA tree

36 L4: Query Optimization (1) - 36 Example - Strategies EMP1 =  ENO <= 100 (EMP) at site 1 EMP2 =  ENO > 100 (EMP) at site 2 ASG1 =  ENO <= 100 (ASG) at site 3 ASG2 =  ENO > 100 (ASG) at site 4 Fragment Schema Query site: Site 5 ENO  ASG1  resp=“manager ” EMP1 ENO ASG2   resp=“manager ” EMP2 Site 5  ASG1  resp=“manager ” EMP1 ENO ASG2  EMP2 Plan A Plan B ASG1’ASG2’

37 L4: Query Optimization (1) - 37 Example – DB Statistics & Costs Database Statistics v EMP has 400 tuples, v ASG has 1000 tuples, v there are 20 managers in ASG v the data is uniformly distributed among sites. v ASG and EMP are locally clustered on attributes RESP and ENO, respectively Costs v tuple access t acc = 1 unit, v tuple transfer t trans = 10 units,

38 L4: Query Optimization (1) - 38 Costs for Example Plan v The cost of Plan A: Produce ASG’ = 20  t acc =20 (processing locally) Transfer ASG’ = 20 *t trans =200(transfer to EMP site) Produce EMP’ = (10+10) * t acc * 2 = 40(join at the EMP site) Transfer EMP’ = 20 * t trans =200(send to Site 5) Total cost = 460 v The cost of Plan B: Transfer EMP = 400 * t trans = 4,000(send EMP to Site 5) Transfer ASG = 1000 * t trans = 10,000(send ASG to Site 5) Produce ASG’ = 1000 * t acc = 1,000(selection at Site 5) Join EMP and ASG’ = 400 * 20 * t acc = 8,000 (join at Site 5) Total cost = 23,000

39 L4: Query Optimization (1) - 39 Query Optimization v Problems in query optimization 1. Determining the physical copies of the fragments upon which to execute the fragment query expressions (also known as materialization) 2. Selecting the order of execution of operations 3. Selecting the method for executing each operation v The above problems are not independent, for instance, the choice of the best materialization for a query depends on the order in which operations are executed. But they are treated as independent. Further,  We bypass (1) by taking materialization for granted  We bypass (3) by clustering all operations at the same site as a local database system dependent problem

40 L4: Query Optimization (1) - 40 Query Optimization - Objectives v The selection of alternative query execution strategies is made based on predetermined objectives v Two main objectives:  minimize the total processing time (total cost) –network and computers at nodes do not get loaded. –Response time cannot be guaranteed  minimize the response time –allocation must facilitate parallel execution of the query –but throughput may decrease and cost can be higher than total cost v Total processing time (cost) is the sum of all the time (cost) incurred in executing the query (CPU, I/O, data transfer) v Response time is the elapsed time from the initiation till the completion of the query

41 L4: Query Optimization (1) - 41 Optimization Algorithms – The Issues v Cost model  cost components  weights for each components  costs for primitive operations v Search space  The set of equivalent algebra expressions (query trees) v Search strategies  How do we move inside the search space  Exhaustive search, heuristics, …

42 L4: Query Optimization (1) - 42 Cost Models v The cost measures are: I/O and CPU for centralized DBMSs and I/O, CPU and data transfer costs for DDBMS v Total cost = CPU cost + I/O cost + communication cost  CPU cost: C cpu * #insts  I/O cost:C i/o * #i/os  Communication CostC msg *#msgs + C tr *#bytes –C cpu, C i/o, C tr and C msg are all assumed to be constants. v Response time = sum (sequential operations)  C cpu *s_#insts  C i/o *s_#i/os  C msg *s_#msg + c tr *s_#bytes –S_x stands for maximum number of sequential x’s that need to be executed to process the query

43 L4: Query Optimization (1) - 43 Intermediate Result Size v The size of the intermediate relations produced during the execution facilitates the selection of the execution strategy v This is useful in selecting an execution strategy that reduces data transfer v The sizes of intermediate relations need to be estimated based on cardinalities of relations and lengths of attributes v R{A 1, A 2,..., A n } fragmented as R 1,R 2,…, R n the statistical data collected typically are  len(A i ), length of attribute A i in bytes  min(A i ) and max(A i ) for ordered domains  card(dom(A i )) unique values in dom[A i ]  Number of tuples in each fragment card(R j )

44 L4: Query Optimization (1) - 44 Intermediate Size Estimation v Join selectivity factor SF j (r,s) = card(r * s) / card(r) * card(s) v Selecton selectivity factor SF S (F) = card(  F (r)) / card(r) v size(r) = card(r) * len(r) v Cardinality of intermediate relations  SF S (A = value) = 1/card(dom(A))  SF S (A > value) = max(A) - value/max(A)-min(A)  SF S (A < value) = value - min(A)/max(A)-min(A)  Sf s (p(A i )  p(A j )) = sf s (p(A i )) * sf s (p(A j ))  Sf s (p(A i )  p(A j )) = sf s (p(A i )) + sf s (p(A j )) - sf s (p(A i )) * sf s (p(A j ))  SF S (A  {values}) = SF S (A = value) * card(values)

45 L4: Query Optimization (1) - 45 Intermediate Size Estimation (II) v Projection card(  a (r)) = card(r) v Cartesian product card(r X s) = card(r) * card(s) v Join card(R A=B S) = card(s); if A is key in R, B is foreign key in S card(R A=B S) = SF J (R,S) * card(r) * card(s) v Union Upper bound = card(r) + card(s) Lower bound = max{card(r), card(s)}

46 L4: Query Optimization (1) - 46 Cost of Processing Primitive Operations v Selection v Projection v Union v Join  nested-loops  sort-merge  hash-based v For distributed join, semi-join is proposed to perform joins

47 L4: Query Optimization (1) - 47 Semi-join R S R’=  A (R) S’ = R’ S S’ R S’ R S Amount of data transferred: |R’| + |S’| 1. join is replaced with a project; followed by semi-join; and then join 2. the project and join operations are done at one site, and semi-join at another site 3.amount of data transferred: |R’| + |S’|

48 L4: Query Optimization (1) - 48 Semi-join versus Join v using sem-ijoin increases local processing costs because a relation must be scanned twice (join, project) v For joining intermediate relations produced during semi-join one cannot exploit indices on the base relations v Semi-join may not be good when communication costs are low

49 L4: Query Optimization (1) - 49 Search Space v Search space is characterized by alternative execution plans v Most optimizers focus on join trees v For N relations, there are O(N!) equivalent join trees SELECT ENAME, RESP FROM EMP, ASG, PROJ WHERE EMP.ENO=ASG.ENO AND ASG.PNO=PROJ.PNO ENO ASG EMP PNO PROJ ENO ASG EMP PNO PROJ ASG EMP PNO,ENO PROJ 

50 L4: Query Optimization (1) - 50 Restricting Search Space v O(N!) is large v Considering join methods, the search space is even bigger v Restrict by means of heuristics  Ignore cartisian product  … v Restrict the shape of the join tree  Only consider deep trees  …. R1R1 R2R2 R3R3 R1R1 R2R2 R3R3 R4R4 R4R4 R1R1 R2R2 R3R3 R4R4 deep tree Left-deep tree bushy tree

51 L4: Query Optimization (1) - 51 Search Strategy v How to move in the search space to find the optimal plan v Deterministic  Start from base relations and build plans by adding relations at each step  Dynamic programming: breadth-first  Greedy: depth-first v Randomized  Search for the optimal one around a particular starting point –simulated annealing –iterative improvement

52 L4: Query Optimization (1) - 52 Search Strategies -- Example R1R1 R2R2 R3R3 R4R4 R1R1 R2R2 R1R1 R2R2 R3R3 R1R1 R3R3 R4R4 R2R2 R1R1 R3R3 R2R2 R4R4 R1R1 R2R2 R3R3 R4R4 Deterministic Randomized

53 L4: Query Optimization (1) - 53 INGRES CQO v Uses a dynamic query optimization technique that recursively breaks up a calculus query (SQL) into manageable smaller queries v A multivariable query is first decomposed into a sequence of queries having an unique variable in common v Each monovariable query is processed by optimizing the access to a single relation v The algorithm first executes unary operations and tries to minimize the sizes of intermediate results in ordering binary operations

54 L4: Query Optimization (1) - 54 INGRES CQO Algorithm - Detachment v SELECT Q.B, R.C, T.D v FROM O, Q, R, T v WHERE p1(O.X) AND p2(O.X, Q.W, R.U, T.V); v into sub queries  SELECT O.X into O '  FROM O  WHERE p1(O.X);  SELECT Q.B, R.C, T.D  FROM O ', Q, R, T  WHERE p2(O’.X, Q.W, R.U, T.V);

55 L4: Query Optimization (1) - 55 INGRES CQO Algorithm - Substitution v A n-variable query that cannot be detached is substituted by set of v (n-1)-variable queries by using tuple substitution. v Consider R(v) then q(v,x,y,w) is replaced by a set of queries {q ' (t,x,y,w) | t  R} v After multiple substitutions, a set of monovariable queries are generated, and then executed in a pipeline fashion

56 L4: Query Optimization (1) - 56 INGRES CQO Algorithm - Example v E(ENO, ENAME, TITLE), G(ENO, JNO, RESP, DUR), J(JNO, JNAME, BUDGET) v q1: SELECT ENAME FROM E,G,J WHERE E.ENO=G.ENO AND G.JNO=J.JNO AND JNAME= " CAD " ; v Detachment: v q11: SELECT JNO INTO JVAR FROM J WHERE JNAME= " CAD " ; v q ' : SELECT ENAME FROM E, G, JVAR WHERE E.ENO=G.ENO AND G.JNO=JVAR.JNO; v q ' is further detached to v q12: SELECT G.ENO INTO GVAR FROM G, JVAR WHERE v G.JNO= JVAR.JNO; v q13: SELECT E.ENAME FROM E,GVAR WHERE E.ENO=GVAR.ENO;

57 L4: Query Optimization (1) - 57 INGRES CQO Algorithm - Example v Substitution v The order of processing q is q11->q12->q13 v q12 is replaced by the set of queries v {q12t = SELECT G.ENO into GVAR FROM G WHERE G.JNO = t.JNO |  t  JVAR} v q13 is replaced by set of queries v {q13t = SELECT ENAME FROM E WHERE E.ENO = t.eno |  t  GVAR}

58 L4: Query Optimization (1) - 58 Distributed INGRES Query Optimization Algorithm v Let there be n relations R 1,R 2,...,R n involved in a n-variable query. R j i denotes the fragment of R i stored at site j (m sites), data transfer cost of sending #bytes to k sites is CC k (#bytes) v Broadcast network v CC k (#bytes) = CC 1 (#bytes) v if max j=1,m (  i=1,n (size(R j i )) > max i=1,n (size(R i )) v then v the processing site is j which has largest amount of data v else v R p is the largest relation and sites of R p are the processing sites

59 L4: Query Optimization (1) - 59 Distributed INGRES Query Optimization Algorithm v Point-to-point network v CC k (#bytes) = k*CC 1 (#bytes) v The choice of R p that minimizes data transfer is the largest relation; partition R p to increase parallelism; let sites be placed in decreasing order of useful data for the query  i=1,n size(R j i ) >  i=1,n size(R j+1 i ), v then the choice of number of sites k at which processing needs to be done is given by v if  i<>p (size(R i ) - size(R 1 i )) >size(R 1 p ) then k =1 v else k is the largest j such that v  i<>p (size(R i ) - size(R j i ))<=size(R j p ) v this rule chooses a site as processing site only if the amount of data it receives is smaller than amount of data it sends out if it were not the processing site. Step 3.3 transfers all the fragments to their processing sites. In Step 3.4, MVQ ' is executed.

60 L4: Query Optimization (1) - 60 Distributed INGRES Query Optimization Example v Consider J JN G, where J and G are fragmented. Assume following allocation and sizes of fragments. v Site1Site2Site3Site4Total v J 10001000100010004000 v G20002000 v Total 1000 100030001000 v Point-to-Point network, send each J i to site3 v Broadcast network, broadcast G to sites 1,2, and 4


Download ppt "L4: Query Optimization (1) - 1 L4: Query Processing and Optimization v 4.1 Query Processing  Query Decomposition  Data Localization v 4.1 Query Optimization."

Similar presentations


Ads by Google