Download presentation
Presentation is loading. Please wait.
Published bySally Elmes Modified over 9 years ago
1
Distributed DBMSPage 7-9. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database Design Distributed Query Processing à Query Processing Methodology à Distributed Query Optimization Distributed Transaction Management (Extensive) n Building Distributed Database Systems (RAID) Mobile Database Systems Privacy, Trust, and Authentication Peer to Peer Systems
2
Distributed DBMSPage 7-9. 2© 1998 M. Tamer Özsu & Patrick Valduriez Query Processing high level user query query processor low level data manipulation commands
3
Distributed DBMSPage 7-9. 3© 1998 M. Tamer Özsu & Patrick Valduriez Query Processing Components Query language that is used à SQL: “intergalactic dataspeak” Query execution methodology à The steps that one goes through in executing high- level (declarative) user queries. Query optimization à How do we determine the “best” execution plan?
4
Distributed DBMSPage 7-9. 4© 1998 M. Tamer Özsu & Patrick Valduriez SELECTENAME Project FROMEMP,ASG Select WHEREEMP.ENO = ASG.ENO Join ANDDUR > 37 Strategy 1 ENAME ( DUR>37 EMP.ENO=ASG.ENO (EMP ASG)) Strategy 2 ENAME (EMP ENO ( DUR>37 (ASG))) Strategy 2 avoids Cartesian product, so is “better” Selecting Alternatives
5
Distributed DBMSPage 7-9. 5© 1998 M. Tamer Özsu & Patrick Valduriez What is the Problem? Site 1Site 2Site 3Site 4Site 5 EMP 1 = ENO≤“E3” (EMP)EMP 2 = ENO>“E3” (EMP) ASG 2 = ENO>“E3” (ASG) ASG 1 = ENO≤“E3” (ASG) Result Site 5 Site 1Site 2Site 3Site 4 ASG 1 EMP 1 EMP 2 ASG 2 result 2 =(EMP 1 EMP 2 ) ENO DUR>37 (ASG 1 ASG 1 ) Site 4 result = EMP 1 ’ EMP 2 ’ Site 3 Site 1Site 2 EMP 2 ’ =EMP 2 ENO ASG 2 ’ EMP 1 ’ =EMP 1 ENO ASG 1 ’ ASG 1 ’ = DUR>37 (ASG 1 )ASG 2 ’ = DUR>37 (ASG 2 ) Site 5 ASG 2 ’ ASG 1 ’ EMP 1 ’ EMP 2 ’
6
Distributed DBMSPage 7-9. 6© 1998 M. Tamer Özsu & Patrick Valduriez Assume: à size (EMP) = 400, size (ASG) = 1000 à tuple access cost = 1 unit; tuple transfer cost = 10 units Strategy 1 produce ASG': (10+10) tuple access cost 20 transfer ASG' to the sites of EMP: (10+10) tuple transfer cost 200 produce EMP': (10+10) tuple access cost 2 40 transfer EMP' to result site: (10+10) tuple transfer cost 200 Total cost 460 Strategy 2 transfer EMP to site 5:400 tuple transfer cost 4,000 transfer ASG to site 5 :1000 tuple transfer cost 10,000 produce ASG':1000 tuple access cost 1,000 join EMP and ASG':400 20 tuple access cost 8,000 Total cost23,000 Cost of Alternatives
7
Distributed DBMSPage 7-9. 7© 1998 M. Tamer Özsu & Patrick Valduriez Minimize a cost function I/O cost + CPU cost + communication cost These might have different weights in different distributed environments Wide area networks à communication cost will dominate (80 – 200 ms) low bandwidth low speed high protocol overhead à most algorithms ignore all other cost components Local area networks à communication cost not that dominant (1 – 5 ms) à total cost function should be considered Can also maximize throughput Query Optimization Objectives
8
Distributed DBMSPage 7-9. 8© 1998 M. Tamer Özsu & Patrick Valduriez Assume à relations of cardinality n à sequential scan Complexity of Relational Operations OperationComplexity Select Project (without duplicate elimination) O( n ) Project (with duplicate elimination) Group O( n log n ) Join Semi-join Division Set Operators O( n log n ) Cartesian ProductO( n 2 )
9
Distributed DBMSPage 7-9. 9© 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Types of Optimizers Exhaustive search à cost-based à optimal à combinatorial complexity in the number of relations Heuristics à not optimal à regroup common sub-expressions à perform selection, projection first à replace a join by a series of semijoins à reorder operations to reduce intermediate relation size à optimize individual operations
10
Distributed DBMSPage 7-9. 10© 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Optimization Granularity Single query at a time à cannot use common intermediate results Multiple queries at a time à efficient if many similar queries à decision space is much larger
11
Distributed DBMSPage 7-9. 11© 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Optimization Timing Static compilation optimize prior to the execution difficult to estimate the size of the intermediate results error propagation à can amortize over many executions à R* Dynamic à run time optimization à exact information on the intermediate relation sizes à have to reoptimize for multiple executions à Distributed INGRES Hybrid à compile using a static algorithm à if the error in estimate sizes > threshold, reoptimize at run time à MERMAID
12
Distributed DBMSPage 7-9. 12© 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Statistics Relation à cardinality à size of a tuple à fraction of tuples participating in a join with another relation Attribute à cardinality of domain à actual number of distinct values Common assumptions à independence between different attribute values à uniform distribution of attribute values within their domain
13
Distributed DBMSPage 7-9. 13© 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Decision Sites Centralized à single site determines the “best” schedule à simple à need knowledge about the entire distributed database Distributed à cooperation among sites to determine the schedule à need only local information à cost of cooperation Hybrid à one site determines the global schedule à each site optimizes the local subqueries
14
Distributed DBMSPage 7-9. 14© 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Network Topology Wide area networks (WAN) – point-to-point à characteristics low bandwidth low speed high protocol overhead à communication cost will dominate; ignore all other cost factors à global schedule to minimize communication cost à local schedules according to centralized query optimization Local area networks (LAN) à communication cost not that dominant à total cost function should be considered à broadcasting can be exploited (joins) à special algorithms exist for star networks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.