Session – 10 QUERY OPTIMIZATION Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
OBJECTIVE QUERY Optimization process Distribute query optimization algorithms
DISTRIBUTED QUERY OPTIMIZATION ALGORITHMS Algorith ms Optm. Timing Obj. Function Optm. Factors Networ k Semi Joins DIST. INGRES Dynami c Resp.Time or Total Cost Msg Size, Proj Cost General / broadca st No R *StaticTotal Cost#Msg, Msg Size, IO, CPU General / Local No SDD – 1StaticMsg SizeGeneral Yes Source : Principles Of Distributed Database Systems
DISTRIBUTED INGRES Algorithm Consist of dynamically optimizing the processing strategy of a given query. The objective function of the algorithm is to minimize response time and cost General and broadcast network are consider, therefore the data unit can be transmitted from one site to all the other sites in a single transfer This algorithm is executed by the site, called Master site where the query is initiated
R * Algorithm Uses compilation approach where an exhaustive search of all alternative strategies is performed in order to choose the one with the least cost. Query compilation is a distributed task in R*, coordinated by master site, where the query is initiated. To join two relation, there are three candidate sites : site for 1 st relation, site for 2 nd relation and 3 rd site. Two method for intersite data transfers : Ship Whole and Fetch as Needed
SDD-1 Algorithm Refinements of an initial feasible solution are recursively computed until no more cost improvement can be made. It is devised for wide area point-to-point network The cost of transferring the result to final site is ignored
Method of Performing JOIN On the basic relational operation (SELECT, JOIN, PROJECT) the most expensive in term in both time and money is JOIN. It causes more page swaps. The frequency of page swaps is often used as the cost measure when modeling these system
SEMI-JOIN Method Symbol : Semi Join of Relation R1 and Relation R2 The second example has potential advantage over the first if R1 and R2 are at different sites A and B, and the execution of full JOIN between R1 and R2 is to be done at the 3 rd Site C – Refer to SDD-1 algorithm system =R1R2R1 ( π R2) π =R1R2 (R1R2) 1 ………… (1.1) ………… (1.2)
SEMI-JOIN Method Cont’d Procedure SEMI-JOIN Project the JOIN attributes from R2 at B ( = πR2), after applying any required SELECTIONs Transmit πR2 to A Compute Semi Join of R1 at A Move the result to C
Non SEMI-JOIN Methods In R*, execution is generated at compilation time and therefore it uses a static optimization algorithms. Query can be compiled during times when the system is slack and the optimal access plan for their execution are determined then. The nested loop and merge scan method are used for join in R* The nested loop method scans one relation called outer relation for each tuple of the other relation, called the inner relation with a matching JOIN Value
Non SEMI-JOIN Methods Preceding query execution, four stages of “query preparation” are completed in R*, they are : 1.Analyze the SQL query and perform object name resolution 2.Look up the catalog to check authorization and view integration 3.Plan the minimum cost access strategy for the global query 4.Generate the access modules and store them at a master site
Specific Issues for Modern Query Optimization Is a static or a dynamic algorithm used Are independence and uniformity assumptions made What JOIN method is used Is account taken of the effects of other system modules on optimization Is simultaneous query traffic taken into account Is the search for an optimum exhaustive Is any semantic query transformation carried out Is the influence of heterogeneity considered