Download presentation
Presentation is loading. Please wait.
1
Distributed Database Management Systems
Lecture 30
2
In the previous lecture
Locking based CC Timestamp ordering based CC Concluded TM.
3
In this Lecture Basic Concepts of Query Optimization
QP in centralized and Distributed DBs.
4
Introduction SQL one of the success factors of RDBMS
Query processor transforms complex queries into concise and simple ones
5
Query processing is critical performance issue
QP a complex problem specially in DDBS environment
6
Main function of QP is to transform an SQL query into equivalent relational algebra one (low level language) Transformation must achieve correctness and efficiency
7
Correctness is straightforward since rules exist
An SQL query can have many equivalents in R Algebra
8
Considering the tables
EMP(eNo, eName, title) ASG(eNo, pNo, resp, dur) PROJ(pNo, pName, budget, loc) Query: Get the names of employees who are managing a project
9
SELECT eName FROM EMP, ASG WHERE EMP.eNo = ASG.eNo AND resp = ‘Manager’
10
eName(resp=‘Manager’ ^ EMP.eNo = ASG.eNo) (EMPxASG)
eName(EMP ⋈ (resp=‘Manager’ (ASG))) Obviously second one needs less computing resources since avoids Cartesian product
11
Centralized QP is to choose best query execution plan
Distributed is more complex; it also involves the selection of site to execute query
12
Same query in DDBS Suppose EMP and ASG are HF as EMP1 = eNo ≤ ‘E3’ (EMP) EMP2 = eNo > ‘E3’ (EMP) ASG1 = eNo ≤ ‘E3’ (ASG) ASG2 = eNo > ‘E3’ (ASG)
13
Further suppose these fragments are stored at site 1, 2, 3 and 4 and result at site 5
14
Site 5 Site 4 Site 3 Site 2 Site 1 EMP1’ EMP2’ ASG1’ ASG2’
ASC1’=resp = ‘Manager(ASG1) EMP1’=EMP1 ⋈(ASG1’) Site 1 Site 3 ASC2’=resp = ‘Manager(ASG2) EMP2’=EMP2 ⋈(ASG2’) Site 2 Site 4 ASG1’ ASG2’ result = EMP1’ U EMP2’ Site 5 EMP1’ EMP2’
15
resp = ‘Manager’ (ASG1 U ASG2)
result = (EMP1 U EMP2) ⋈ eNo resp = ‘Manager’ (ASG1 U ASG2) Site 1 Site 2 Site 3 Site 4 ASG1 ASG2 EMP1 EMP2
16
Lets Assume size(EMP) size(ASG) 400 1000 tuple access cost
tuple transfer cost 1 unit 10 units There are 20 Managers Data distributed evenly at all sites
17
Strategy 1 produce ASG': 20*1 20
transfer ASG' to the sites of E: 20 * 10 200 produce EMP': (10+10) *1*2 40 transfer EMP' to result site: 20*10 Total 460
18
Strategy 2 Transfer EMP to site 5: 400 * 10 4000
Transfer ASG to the site * 10 10000 Produce ASG‘ by selecting ASG 1000 Join EMP and ASG’ 8000 Total 23000
19
Query Optimization An important aspect of QP
Minimize resource consumption I/O cost + CPU cost + communication cost First two in Centralized DB
20
Communication Cost will dominate in WAN
Not that dominant in LANs, so total cost should be considered in LANs QO can also maximize throughput
21
Operators’ Complexity
Select, Project (without duplicate elimination) O(n) Project (with duplicate elimination), Group O(nlogn) Join, Semi-Join, Division, Set Operators O(nlog n) Cartesian Product O(n2)
22
Characterization of Query Processors
23
Types of Optimization Exhaustive search for the cost of each strategy to find the most optimal one May be very costly in case of multiple options and more fragments Heuristics
24
Optimization Timing Static: during compilation
Size of intermediate tables not known always Cost justified with repeated execution Dynamic: during execution Intermediate tables’ size known Re-optimzation may be required
25
Statistics Relation/Fragment: Cardinality, size of a tuple, fraction of tuples participating in a join with another relation Attribute: cardinality of domain, actual number of distinct values
26
Decision Sites Centralized: simple, need knowledge about the entire distributed database Distributed: cooperation among sites to determine the schedule, need only local information Hybrid: one site determines the global schedule, each site optimizes the local subqueries
27
Other factors like: Network topology Replicated fragments
Use of semijoins.
28
Optimized Local Query SQL Query on Distributed Relations QUERY GLOBAL
DECOMPOSITION GLOBAL SCHEMA Algebraic Query on Distributed Relations DATA LOCALIZATION FRAGMENT Fragment Query OPTIMIZATION STAT OF FRAGMENTS Optimized Fragment Query with Communication Operations LOCAL Optimized Local Query
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.