Download presentation
Presentation is loading. Please wait.
Published byCamilla Morton Modified over 8 years ago
1
1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20
2
2 Outline Parallelism Databases Distributed Databases Overview Distributed Query Processing Replicated Data Algorithms Distributed Locking Distributed Commit Protocol (2PC) Reliable Distributed Database Management Peer-2-Peer Systems Summary
3
333 Parallel Databases
4
4 Models of Parallelism Pipelining: Example: T 1 SELECT * FROM A WHERE cond; T 2 JOIN T 1 and B Concurrent Operations: Example: SELECT * FROM A WHERE cond; Parallel Databases A B (with index) select join A where A.x < 10 select A where 10 A.x < 20 select A where 20 A.x merge data location is important...
5
5 Goals: Improve performance by executing multiple operations in parallel Want to scale transactions per second Want to scale large analytics queries (complexity) Key benefit: Cheaper to scale than relying on a single increasingly more powerful processor Key challenge: Ensure overhead and contention do not kill performance Parallel Databases
6
6 Speedup: More processors higher speed Scale-up: More processors can process more data Transaction scale-up vs. batch scale-up Challenges to speedup and scale-up: Startup cost: cost of starting an operation on many processors Interference: contention for resources between processors Skew: slowest step becomes the bottleneck Parallel Databases
7
7 Linear vs. Non-linear speedupLinear vs. Non-linear sclae-up
8
8 Architecture for Parallel Databases: Shared memory Shared disk (Oracle RAC) Shared nothing (Teradata, HP, Vertica, etc.) Parallel Databases
9
9 Taxonomy for Parallel Query Evaluation Inter-query parallelism Each query runs on one processor Inter-operator parallelism A query runs on multiple processors An operator runs on one processor Intra-operator parallelism An operator runs on multiple processors Parallel Databases Intra-operator parallelism is the most scalable
10
10 Vertical Data Partitioning: Relation R split into P projected chunks (set of attributes) R 0, …, R P-1, stored at P nodes Reconstruct relation by join on tid, then evaluate the query Parallel Databases
11
11 Horizontal Data Partitioning: Relation R split into P chunks R 0, …, R P-1, stored at P nodes Round robin: good for scanning the whole relation. Not good for range queries Tuple t i to chunk (i mod P) Hash based partitioning on attribute A: good for point queries, and for joins. Not good for range for range queries or join for non key point queries Tuple t to chunk h(t.A) mod P Range based partitioning on attribute A: good for some range queries. Not good for skewed data skewed execution Tuple t to chunk i if v i-1 < t.A < v i Parallel Databases
12
12 Cost with Parallelism: Compute σ A=v (R), or σ v1<A<v2 (R) On a conventional database: cost = B(R) What is the cost on a parallel database with P processors? B(R) / P in all cases However, different processors do the work: Round robin partitioning: all servers do the work Hash partitioning: one server for σ A=v (R), or σ v1<A<v2 (R) Range partitioning: one server only Parallel Databases
13
13 Parallel Group-by: Compute A, sum(B) (R)? Step 1: server i partitions chunk Ri using a hash function h(t.A) mod P: R i0, R i1, …, R i,P-1 Step 2: server i sends partition R ij to serve j Step 3: server j computes A, sum(B) on R 0j, R 1j, …, R P-1,j Step 4: compute the Group-by on the results from all j servers Parallel Databases
14
14 Parallel Join: Step 1 For all servers in [0,k], server i partitions chunk R i using a hash function h(t.A) mod P: R i0, R i1, …, R i,P-1 For all servers in [k+1,P], server j partitions chunk S j using a hash function h(t.A) mod P: S j0, S j1, …, R j,P-1 Step 2: Server i sends partition R iu to server u Server j sends partition S ju to server u Steps 3: Server u computes the join of R iu with S ju Parallel Databases 01 K-1 01 K P-1 K-1 01 K P-1 K-1 … … … … …
15
15 Parallel Dataflow Implementation: Use relational operators unchanged Add special split and merge operators Handle data routing, buffering, and flow control Example: exchange operator Inserted between consecutive operators in the query plan Can act as either a producer or consumer Producer pulls data from operator and sends to n consumers Producer acts as driver for operators below it in query plan Consumer buffers input data from n producers and makes it available to operator through getNext() interface Parallel Databases
16
16 Distributed Databases
17
17 Approaches: Top-down approach: have one DB and split and allocate the sites – can be an issue… Multi-DBs or bottom-up: no design issues Overview: Data is stored at several nodes: each node is managed possibly by a DBMS that can run independently Distributed Data Independence: users should not have to know where relevant data is located Distributed Transaction Atomicity: users should be able to write Xacts accessing multiple nodes just like local Xacts Distributed Databases: Overview
18
18 Types of Distributed Databases: Homogeneous: all nodes run the same database type (Oracle, DB2, etc.) Heterogeneous: different nodes run different database type Architecture: Client-Server: Client ships query to a single node to be all processed at that node Collaborating-Servers: Query can span multiple nodes Distributed Databases: Overview
19
19 Distributed Catalog Management: Must keep track of how data is distributed across nodes Must be able to name each replica of each fragment Node catalog: describes all objects (fragments, replicas) at a node + keep track of replicas of relations created at this node Distributed Databases: Overview
20
20 Distributed Query Optimization: Cost-based approach: consider all plans, pick cheapest; similar to centralized optimization Difference-1: communication costs must be considered Difference-2: local site autonomy must be respected Difference-3: new distributed join methods Query node constructs global plan, with suggested local plans describing processing at each node; a node can improve it local plan execution Distributed Databases: Distributed Query Optimization
21
21 Distributed Join Problem: Assume you want to compute R(A,B) ⋈ S(B,C) where R and S are on different nodes? Obvious solution is to copy R to the node of S or vice versa, and compute the join In some scenarios this is not a viable solution because of the communication overhead! Use Semijoin Reductions! Distributed Databases: Distributed Query Optimization
22
22 Semijoin Reductions: We will try to ship the relevant part of each relation to the other node(s) R S = R ⋈ ( π Y (S)); that is we project S onto the common attribute and ship to node of R. Node of R does the semijoin R S to eliminate dangling tuples of R and ship back to node S Node S compute the true join Distributed Databases: Distributed Query Optimization π Y (S) R S R(X,Y)S(Y,Z)
23
23 Replication: Gives increased availability Faster query evaluation Synchronous vs. Asynchronous; vary in how current copies are! Synchronous replication: all copies of a modified relation (fragment) must be updated before the modifying Xact commits: Asynchronous replication: copies of a fragment are only periodically updated; different copies may get out of synch in the meantime Distributed Databases: Replicated Data Algorithms
24
24 Synchronous Replication – Basic Solution for CC: Treat each copy as an independent data item Distributed Databases: Replicated Data Algorithms X1X2X3 Lock mgr X3 Lock mgr X2 Lock mgr X1 Tx i Tx j Tx k Object X has copies X1, X2, X3
25
25 Synchronous Replication – Basic Solution for CC: Read(X): get shared X1 lock get shared X2 lock get shared X3 lock read one of X1, X2, X3 at end of transaction, release X1, X2, X3 locks Distributed Databases: Replicated Data Algorithms X1 lock mgr X3 lock mgr X2 lock mgr read
26
26 Synchronous Replication – Basic Solution for CC: Write(X): get exclusive X1 lock get exclusive X2 lock get exclusive X3 lock write new value into X1, X2, X3 at end of transaction, release X1, X2, X3 locks Distributed Databases: Replicated Data Algorithms X1 X3 X2 lock write
27
27 Synchronous Replication – Problem (Low Availability): Correctness OK 2PL serializability 2PC atomic transactions Distributed Databases: Replicated Data Algorithms X1 X3 X2 down! cannot access X!
28
28 Synchronous Replication – Enhanced Algorithms: Voting: Xact must write a majority of copies to modify an object; must read enough copies to ensure seeing at least one most recent copy Read-any Write-all: reads are fast and write is slower than voting Choice between the above two models will impact which locks to set Before an update Xact can commit, it must obtain locks on all modified copies (expensive protocol with many messages) As a result asynchronous replication is widely used Distributed Databases: Replicated Data Algorithms
29
29 Asynchronous Replication: Allows modifying Xact to commit before all copies have been changed. Reader look at just one copy Two approaches: Primary site: exactly one copy of a relation is designated as the primary/master copy. Other replicas can’t be updated directly. Q) How updates are propagated to the secondary copies? A) Two steps: (1) log-based capture (for recovery). (2) apply snapshot at the secondary site periodically Peer-to-Peer replication: more than one copy can be the master of this object. Changes to a master must be propagated to the other copies somehow. Two master copies are updated in a conflicting fashion must be resolved somehow. Best used for scenarios where conflicts do not arise frequently! Distributed Databases: Replicated Data Algorithms
30
30 Data Warehousing and Replication: A hot trend for building giant “warehouse” with data from many data sources to enable complex decision support queries (DSS and BI) over data from across the enterprise Warehouse can be viewed as an instance of asynchronous replication from the different data sources (each is typically controlled by an RDBMS) in the enterprise ETL (Extract-Transform-Load): is typically used for updating the warehouse Distributed Databases: Replicated Data Algorithms
31
31 Distributed Locking: How do we manage locks for objects across many nodes? Centralized: one node does all the locking; vulnerable to single node failure Primary Copy: all locking for a given object is done at the primary copy node for this object Fully Distributed: locking for a copy is done at the node where the copy is stored Distributed Databases: Distributed Locking
32
32 Distributed Deadlock Detection: Each site maintains a local waits-for graph A global deadlock might exist even if the local graphs contain no cycles: Three solutions: Centralized: send all local graphs to one site Hierarchical: organize nodes into hierarchy and send local graphs to parent in the hierarchy Timeout: abort the Xact if it waits too long (most common approach) Distributed Databases: Distributed Locking
33
33 Distributed Recovery: New issues: New kind of failures, e.g., links and remote nodes If sub-transaction executes at different node, all or none must commit! A log is maintained at each site, and commit protocol actions are additionally logged We need 2PC! Distributed Databases: Distributed Commit Protocol
34
34 Two-Phase Commit (2PC): Node at which Xact originated is coordinator; other nodes at which it executes are subordinates When a Xact wants to commit: Coordinator sends prepare msg to each subordinate Subordinate force-writes an abort or prepare log record and then sends a No or Yes msg to the coordinator If coordinator gets unanimous yes vote, force-write a commit record and sends commit msg to all subordinates. Else, force- write abort log record, and sends abort msg Subordinates force-write abort/commit log record based on the msg they receives, then send back an ack msg to coordinator Coordinator writes end log record after getting all acks Distributed Databases: Distributed Commit Protocol
35
35 Observations on 2PC: Two rounds of communication messages: first for voting, then to terminate Any node can decide to abort a Xact Every msg reflects a decision by the sender; to ensure that this decision survives failures, it is first recorded in the local log All commit protocol log records for a Xact contain the Xact- id and the coordinator-id. The coordinator’s abort/commit record also includes ids of all subordinates Distributed Databases: Distributed Commit Protocol
36
36 Observations on 2PC (Contd.): Ack msgs used to let coordinator know when it can “forget” a Xact; until it receives all acks, it must keep T in the Xact Table If coordinator fails after sending prepare msgs but before writing commit/abort log records, when it comes back up it aborts the Xact If a subtransaction does no updates, its commit or abort status is irrelevant Distributed Databases: Distributed Commit Protocol
37
37 2PC: Restart After a Failure at a Node: If we have a commit or abort log record for Xact T, but not an end record, must redo/undo T If this site is the coordinator for T, keep sending commit/abort msgs to subs until acks received If we have a prepare log record for Xact T, but not commit/abort, this site is a subordinate for T Repeatedly contact the coordinator to find status of T, then write commit/abort log record; redo/undo T; and write end log record If we don’t have even a prepare log record for T, unilaterally abort and undo T This site may be coordinator! If so, subs may send msgs Distributed Databases: Distributed Commit Protocol
38
38 2PC: Blocking If coordinator for Xact T fails, subordinates who have voted yes cannot decide whether to commit or abort T until coordinator recovers: T is blocked Even if all subordinates know each other (extra overhead in prepare msg) they are blocked unless one of them voted no Distributed Databases: Distributed Commit Protocol
39
39 2PC: Communication Link and Remote Node Failures If a remote site does not respond during the commit protocol for Xact T, either because the site failed or the link failed: If the current site is the coordinator for T, should abort T If the current site is a subordinate, and has not yet voted yes, it should abort T If the current site is a subordinate and has voted yes, it is blocked until the coordinator responds Distributed Databases: Distributed Commit Protocol
40
40 2PC with Presumed Abort: When coordinator aborts T, it undoes T and removes it from the Xact Table immediately Doesn’t wait for acks; “presumes abort” if Xact not in Xact Table. Names of subs not recorded in abort log record Subordinates do not send acks on abort If subxact does not do updates, it responds to prepare msg with reader instead of yes/no Coordinator subsequently ignores readers If all subxacts are readers, 2nd phase is not needed Distributed Databases: Distributed Commit Protocol
41
41 Variants of 2PC: Linear Hierarchical ok commit Coord Distributed Databases: Distributed Commit Protocol
42
42 Variants of 2PC: Distributed Nodes broadcast all messages Every node knows when to commit Distributed Databases: Distributed Commit Protocol
43
43 3PC = non-blocking commit: Assume: failed node is down forever Key idea: before committing, coordinator tells participants everyone is ok Distributed Databases: Distributed Commit Protocol
44
44 3PC – Recovery Rules (Termination Protocol): Survivors try to complete transaction, based on their current states Goal: If dead nodes committed or aborted, then survivors should not contradict! Else, survivors can do as they please... Distributed Databases: Distributed Commit Protocol survivors
45
45 3PC – Recovery Rules (Termination Protocol): Let {S 1,S 2,…S n } be survivor sites If one or more S i = COMMIT COMMIT T If one or more S i = ABORT ABORT T If one or more S i = PREPARE T could not have aborted COMMIT T If no S i = PREPARE (or COMMIT) T could not have committed ABORT T Distributed Databases: Distributed Commit Protocol survivors
46
46 Reliability: Correctness Serializability Atomicity Persistence Availability Failure types: Processor failure, storage failure, and network failure – multiple failures! Scenarios Distributed Databases: Reliable Distributed Database Mgmt
47
47 Failure Models Cannot protect against everything Unlikely failures (e.g., flooding in the Sahara) Expensive to protect failures (e.g., earthquake) Failures we know how to protect against (e.g., message sequence numbers; stable storage) Distributed Databases: Reliable Distributed Database Mgmt Events Desired Undesired Expected Unexpected
48
48 Node Models: Fail-stop nodes Distributed Databases: Reliable Distributed Database Mgmt Volatile memory lost Stable storage ok PerfectHaltedRecoveryPerfect time
49
49 Node Models: Byzantine nodes At any given time, at most some fraction f of nodes failed (typically f < 1/2 or f < 1/3) Distributed Databases: Reliable Distributed Database Mgmt PerfectHaltedRecoveryPerfect time A B C
50
50 Network Models: Reliable Network In order messages no spontaneous messages timeout TD Distributed Databases: Reliable Distributed Database Mgmt If no ack in TD sec. Destination down (not paused) I.e., no lost messages, except for node failures
51
51 Network Models: Variation of Reliable Network Persistent messages If destination down, net will eventually deliver message Simplifies node recovery, but leads to inefficiencies (hides too much) Not considered here Distributed Databases: Reliable Distributed Database Mgmt
52
52 Network Models: Partitioned Network In order messages No spontaneous messages Distributed Databases: Reliable Distributed Database Mgmt no timeout; nodes can have different view of failures
53
53 Scenarios: Reliable network Fail-stop nodes No data replication (1) Data replication (2) Partitionable network Fail-stop nodes (3) Distributed Databases: Reliable Distributed Database Mgmt
54
54 Scenarios: Reliable network / Fail-stop nodes / No data replication (1) Basic idea: node P controls X Single control point simplifies concurrency control, recovery Not an availability hit: if P down, X is unavailable too! “P controls X” means P does concurrency control for X P does recovery for X Distributed Databases: Reliable Distributed Database Mgmt PP Item X net Node
55
55 Scenarios: Reliable network / Fail-stop nodes / No data replication (Contd.) Say transaction T wants to access X: Distributed Databases: Reliable Distributed Database Mgmt PTPT req Local DMBS Lock mgr LOG X P T is process that represents T at this node
56
56 Scenarios: Reliable network / Fail-stop nodes / No data replication (Contd.) Process Model: Cohorts Distributed Databases: Reliable Distributed Database Mgmt USER T1T1 Local DMBS T2T2 Local DMBS T3T3 Local DMBS Spawn process Communication Data Access
57
57 Scenarios: Reliable network / Fail-stop nodes / No data replication (Contd.) Process Model: Transaction servers (manager) Distributed Databases: Reliable Distributed Database Mgmt Spawn process Communication Data Access USER Local DMBS Trans MGR Local DMBS Trans MGR Local DMBS Trans MGR
58
58 Scenarios: Reliable network / Fail-stop nodes / No data replication (Contd.) Process Model Summary: Cohorts: application code responsible for remote access Transaction manager: “system” handles distribution, remote access Distributed Databases: Reliable Distributed Database Mgmt
59
59 Peer-2-Peer Systems
60
60 Distributed application where nodes are: Autonomous Very loosely coupled Equal in role or functionality Share & exchange resources with each other Distributed Databases: Peer-2-Peer Distributed Systems
61
61 Related Terms: File Sharing Ex: Napster, Gnutella, Kaaza, E-Donkey, BitTorrent, FreeNet, LimeWire, Morpheus Grid Computing Autonomic Computing Distributed Databases: Peer-2-Peer Distributed Systems
62
62 Search in a P2P System: Distributed Databases: Peer-2-Peer Distributed Systems Resources: R 1,1, R 1,2,... Resources: R 2,1, R 2,2,... Resources: R 3,1, R 3,2,... Query: Who has X?
63
63 Search in a P2P System: Distributed Databases: Peer-2-Peer Distributed Systems Resources: R 1,1, R 1,2,... Resources: R 2,1, R 2,2,... Resources: R 3,1, R 3,2,... Query: Who has X? answers
64
64 Search in a P2P System: Distributed Databases: Peer-2-Peer Distributed Systems Resources: R 1,1, R 1,2,... Resources: R 2,1, R 2,2,... Resources: R 3,1, R 3,2,... Query: Who has X? answers request resource receive resource
65
65 Distributed Lookup: Have pairs Given k, find matching values Distributed Databases: Peer-2-Peer Distributed Systems 1 a 1 b 7 c 4 d 1 a 3 a 4 a K V lookup(4) = {a, d}
66
66 Data Distributed Over Nodes: N nodes Each holds some data Notations: X.func(params) means RPC of procedure func(params) at node X X.A means send message to X to get value of A If X omitted, we refer to local procedure or data structures Distributed Databases: Peer-2-Peer Distributed Systems node Y data: A, B... node X data: A, B...... B := X.A A := X.f(B)...
67
67 Summary
68
68 Parallel DBMSs are designed for scalable performance. Relational operators very well-suited for parallel execution. Pipeline and partitioned parallelism Distributed DBMSs offer site autonomy and distributed administration. Must revisit storage and catalog techniques, concurrency control, and recovery issues. Summary
69
69 END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.