1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Concurrency Control for Distributed Databases These slides are licensed under a Creative Commons.

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Concurrency Control for Distributed Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see http://www.openlineconsult.com/db

4 Copyright © Ellis Cohen, 2002-2005 Sub-query Distribution Suppose a coordinator wants to execute the query that lists the project managed by the highest paid employee SELECT * FROM Projs WHERE pmgr = (SELECT empno FROM Emps WHERE sal = (SELECT max(sal) FROM Emps)) If subordinate S1 holds the Projs table, and subordinate S2 holds the Emps tables, then the coordinator will request S2 to execute the sub-query SELECT empno FROM Emps WHERE sal = (SELECT max(sal) FROM Emps) Will get the result back (let's call it result), and request S1 to execute (and return the results of) the sub-query SELECT * FROM Projs WHERE pmgr = result

5 Copyright © Ellis Cohen, 2002-2005 Sub-transactions Imagine a coordinator C has started a transaction TC, and is executing a query as part of TC. –The coordinator divides the query up into sub-queries, which it sends to various subordinates. –It labels each subquery with TC, the identity of the main transaction. When a subordinate S is passed a sub-query –If it has not yet seen the label TC, it creates a local transaction TS (called a sub-transaction), and associates TS with TC. –If it has seen TC before, it looks up the corresponding TS. In either case, S runs the sub-query as part of the local sub-transaction TS

6 Copyright © Ellis Cohen, 2002-2005 Centralized Locking Each query & commit funneled through central Lock Manager site which maintains all locks Evaluation: Only supports table-level granularity (but predicate locks could achieve the effect of row-level granularity) Cost Issue: Requires extra communication for each query Reliability Issue: single point of failure; crash of Lock Manager requires abort of all transactions + election of new Lock Manager Scalability Issue: Lock Manager is bottleneck Note: Depending upon pattern of communication, address both reliability & scalability via hierarchy of lock managers

7 Copyright © Ellis Cohen, 2002-2005 Distributed Deadlock Prevention Each subordinate –Locks its own DB objects –Can make WAIT/WOUND/DIE decisions locally (requires transaction properties - e.g. timestamp, priority - passed with each sub-query) WOUND or DIE –Aborts local sub-transaction –Notifies coordinator who aborts main transaction (if not already aborted) & informs other subordinates (and, if hierarchical, notifies its parent coordinator) Consider two transactions, T1 and T2, managed by different coordinators C1 and C2, that both try to lock the same resource. If T1's clock is set a year in the past, will it ever be wounded or die?

8 Copyright © Ellis Cohen, 2002-2005 Local & Global WFG's Consider T1 locks A at site S1, requests B at site S2 T2 locks B at site S2, requests A at site S1 T2 A T1 S1 knows: S2 knows: Local WFGs (Wait For Graphs) S1 knows: T2  T1 S2 knows: T1  T2 Need to build Global WFG to discover cycle T1  T2  T1 T2 T1 B T2 T1

9 Copyright © Ellis Cohen, 2002-2005 DDBMS Deadlock Detection Timeout-based Deadlock Detection (Oracle) Subordinate detects local deadlocks via local WFG Use timeouts to detect global deadlocks Centralized Deadlock Detection Each subordinate sends local WFG to central site regularly which informs coordinator of deadlock Can also do this hierarchically Phantom Deadlock Problem Suppose central site detects deadlock between T1 and T2, and chooses to tell T1's coordinator to abort In the meantime, T2 is aborted for some other reason (e.g. T2's coordinator crashes) How could phantom deadlocks be avoided?

10 Copyright © Ellis Cohen, 2002-2005 Distributed Deadlock Detection Path Pushing Algorithm When coordinator makes a subquery for transaction T, pass along sites at which T has already acquired locks If subquery causes wait, and deadlock can't be detected locally, send (own & propagated) knowledge about path to sites at which T has acquired locks, as well other [higher numbered] waiting sites you know about

12 Copyright © Ellis Cohen, 2002-2005 Distributed System Failures Site failures Site crashes or is unable to respond to messages Link failures Messages may be undeliverable, lost, or garbled, so understandable response is not received Link failures can cause network partition; some sites become unreachable from other sites Failure detection Usually via timeouts (time it takes for remote site to respond to message exceeds threshold) If failure is suspected (a message timed out), a ping message can be sent to site; if ping response is received, timeout period can be extended (but not indefinitely)

13 Copyright © Ellis Cohen, 2002-2005 Distributed Algorithms Because of failures, distributed algorithms are complicated. In designing distributed algorithms, we need to work out The messages that need to go back and forth between nodes, and how a node responds to each message, to accomplish the algorithm How to handle timeouts: what to do when a node expects a message, but doesn’t received it in a reasonable time How to handle recovery: what a node does on recovering, if it crashed while it was in participating in the distributed algorithm

14 Copyright © Ellis Cohen, 2002-2005 Aborting Distributed Transactions To explore distributed algorithms, we'll consider distributed abort: How a coordinator gets all the subordinates to abort a transaction. Coordinator Subordinate ABORT ABORT- ACK ABORT ABORT- ACK 1 1 2 First, what could make a coordinator start an ABORT

15 Copyright © Ellis Cohen, 2002-2005 Causes of Distributed Abort Subordinate Raise error in executing a sub-query Crashes (or appears to) Coordinator Raise error in executing local sub-query Crashes (or appears to) Told to ROLLBACK (by application) Told to ABORT (e.g. deadlock detection)

16 Copyright © Ellis Cohen, 2002-2005 Standard Abort Protocol COORDINATOR (Abort) (when it decides / is told to abort) Force Abort to log (with list of subordinates) Send ABORT to each Subordinate Aborts main transaction SUBORDINATE (Abort) (when it receives an ABORT message) Force Abort to Log (unless already aborted) Send ABORT-ACK to coordinator Abort own subtransaction (unless already aborted) COORDINATOR (AbortComplete) (when it receives all ABORT-ACK back) Write AllAbortsDone to log Suppose it doesn't receive all ACKS back? Is ABORT-ACK even necessary?

17 Copyright © Ellis Cohen, 2002-2005 Timeouts SUBORDINATE (Waiting) Subordinates at any time can send an INQUIRE message to the coordinator. If response is –ACTIVE  wait some more –ABORT  Do standard Abort action –none  decide whether to abort or to wait some more COORDINATOR (waiting for ABORT-ACK) Regularly keep sending ABORT & wait for ABORT-ACK

18 Copyright © Ellis Cohen, 2002-2005 Recovery COORDINATOR (on discovering Abort T in log, without corresponding AllAbortsDone) Send ABORTs to all subordinates (in Abort entry) (on discovering Start T in log, without corresponding Commit or Abort) Subordinates are unknown: Answer INQUIREs. SUBORDINATE (on discovering Abort T in Log) Send ABORT-ACK to coordinator (on discovering Start T in Log, but no corresponding Commit or Abort) Send ABORT to coordinator (directs coordinator to abort transaction) Force ABORT to log Abort own subtransaction Are ABORT-ACK & AllAbortsDone necessary?

19 Copyright © Ellis Cohen, 2002-2005 ABORT-ACK & AllAbortsDone The ABORT-ACK message and the AllAbortsDone log entry are not completely necessary. That's because subordinates can abort on their own (for any reason, but especially) if they don't hear from the coordinator. ACKs and completion log entries are much more crucial when we talk abort commit

21 Copyright © Ellis Cohen, 2002-2005 Atomic Commit Protocols Distributed Atomic Commit Protocols ensure atomicity & durability in distributed environments –A transaction which executes at multiple sites must either be committed at all sites or aborted at all sites –Not acceptable to have a transaction committed at one site and aborted at another 2 Phase Commit (2PC) Industry Standard Protocol 3 Phase Commit (3PC) Extension of 2PC which reduces blocking when coordinator fails occur during protocol

22 Copyright © Ellis Cohen, 2002-2005 2PC Motivation Suppose Transaction coordinator, with subordinates S1 and S2 is ready to commit (in particular, all subqueries have finished successfully) Coordinator sends COMMIT messages for the transaction to S1 and S2. S1 commits its local subtransactions. S2 crashes just before receiving the COMMIT message (and before writing any local subtransaction state to stable storage) -- i.e. S2 aborts. Problem Need a way to ensure that once the coordinator has decided to commit & has started to send COMMIT messages, a subordinate crash does not cause that subtransaction to abort

24 Copyright © Ellis Cohen, 2002-2005 2PC Approach PREPARE Phase: Coordinator sends PREPARE message to each subordinate Each subordinate prepares to commit by ensuring that the sub-transaction can be made locally durable (e.g. by forcing out log entries, including the Prepare log entry) Once the subordinate has prepared it can commit even after it crashes, and it is not allowed to abort unless it knows the coordinator aborted the transaction COMMIT Phase: Coordinator sends COMMIT only after all subordinates are prepared. The transaction is unalterably committed when the Commit entry is forced to the coordinator's log (because if it crashes, it can complete the commit on recovery)

25 Copyright © Ellis Cohen, 2002-2005 Prepare Phase COORDINATOR (Prepare) (when it decides / is told to commit) Force out log (with Prepare entry containing list of subordinates) Send PREPARE to each Subordinate (with list of subordinates) SUBORDINATE (Prepare) (when it receives a PREPARE message) Decides whether it can commit (NO only if it is already aborting or it uses optimistic concurrency and local validation fails) NO  Force Abort to Log (unless already aborted) Send NO to coordinator Abort own subtransaction (unless already aborted) YES  Force out Log with Prepare entry Send YES to coordinator

26 Copyright © Ellis Cohen, 2002-2005 Period of Uncertainty Once a subordinate answers YES to PREPARE The subordinate cannot unilaterally decide whether to commit or abort The subtransaction enters a period of uncertainty, not knowing whether the main transaction will ultimately commit or abort The subordinate must wait until the coordinator tells it which to do

27 Copyright © Ellis Cohen, 2002-2005 Coordinator Commit Phase The coordinator waits for all subordinates to respond If any subordinate responds NO, or does not respond within the timeout period (possibly after sending PREPARE again), the coordinator –Forces Abort to the log –Sends ABORT to each subordinate that did not respond with a NO –Aborts the main transaction If all subordinates respond YES within the timeout period, the coordinator –Forces Commit to the log This is the moment at which the transaction is durably committed –Sends COMMIT to each subordinate –Commits the main transaction

28 Copyright © Ellis Cohen, 2002-2005 Subordinate Commit Phase SUBORDINATE (receiving ABORT) –Force Abort to log –Abort own subtransaction SUBORDINATE (receiving COMMIT) –Force Commit to log –Send COMMIT-ACK back to Coordinator COORDINATOR (receiving all COMMIT-ACKs) –Writes CommitComplete to Log –If it times out waiting for a COMMIT-ACK from a subordinate, it will keep sending COMMITs

29 Copyright © Ellis Cohen, 2002-2005 Subordinate Timeouts SUBORDINATE (waiting for Prepare/Abort) Send an INQUIRE message to the coordinator. If response is –ACTIVE  wait some more –ABORT  Do standard Abort action –PREPARING  Do standard Prepare action –none  decide whether to abort or to wait some more SUBORDINATE (after Prepare) Send an INQUIRE message to the coordinator. If response is –PREPARING  continue to wait –ABORT  Do standard Abort action –COMMIT  Do standard Commit action –none  Cannot make a unilateral decision! Must either wait or find out the transaction disposition in some other way (e.g. by using a Termination Protocol)

30 Copyright © Ellis Cohen, 2002-2005 Recovery COORDINATOR (on discovering Commit T in log, without corresponding CommitComplete) Send COMMIT to all subordinates. (on discovering Prepare T in log, without corresponding Commit or Abort) Send ABORTs to all subordinates SUBORDINATE (on discovering Commit T in Log) Send COMMIT-ACK to coordinator (on discovering Prepare T in Log) Send YES to coordinator (on discovering Start T in Log, but no corresponding Commit or Abort) Send ABORT to coordinator (directs coordinator to abort transaction) Force ABORT to log Abort own subtransaction

31 Copyright © Ellis Cohen, 2002-2005 Termination Protocol Motivation A subordinate can get stuck in a period of uncertainty if –The subordinate has already prepared –Either (a) the coordinator crashed or (b) the coordinator & subordinate became disconnected before the coordinator could send ABORT or COMMIT to the subordinate. However, –Maybe the coordinator did get an ABORT or COMMIT message off to another subordinate. –The subordinate might be able to proceed if it could check with the other subordinates!

32 Copyright © Ellis Cohen, 2002-2005 Termination Protocol Along with PREPARE message, each subordinate gets a list of other subordinates If coordinator does not respond to INQUIRE, it sends INQUIRE to (some or all of) the other subordinates. Other subordinates respond –COMMIT - if it received COMMIT from coordinator –ABORT - if it aborted -- e.g. it received ABORT from coordinator, or it responded NO to PREPARE, or didn't receive PREPARE, and chooses to abort –UNCERTAIN - otherwise Subordinate commits or aborts if COMMIT or ABORT is received from any other subordinate, else it remains uncertain (occasionally keep trying INQUIREs to coordinator & other subordinates) Blocking problem: If all responses are UNCERTAIN or time out, a subordinate may have to wait for coordinator recovery or network repair

33 Copyright © Ellis Cohen, 2002-2005 3PC Motivation If a subordinate is uncertain, and every subordinate it can communicate with is uncertain, they ALL MUST WAIT. With 3PC, if the group of communicating subordinates are a [weighted] majority of the participants, they can always proceed!

34 Copyright © Ellis Cohen, 2002-2005 3PC Extends 2PC to 3 phases: PREPARE, PRECOMMIT, COMMIT A subordinate is uncertain after sending YES and before getting back PRECOMMIT A [weighted] minority partition of subordinates must wait for network repair. A [weighted] majority partition of the subordinates Aborts if all are uncertain Else if at least one has received PRECOMMIT, uses an election protocol to elect a new coordinator if necessary (e.g. the one with the highest IP address), who then continues with the protocol A coordinator (original or elected) sends COMMIT when it gets PRECOMMIT-ACKS from a [weighted] majority of the subordinates

36 Copyright © Ellis Cohen, 2002-2005 Optimistic Concurrency Control Assumes (optimistically) that a transaction will not have conflicts with other transactions, avoiding the overhead of locks. Cache-Based: Reads all possible data from and writes all data to its client cache. Validation-Based: When the transaction commits, writes all changes back the DB server, but only after validating that the data it used during the transaction is still up-to-date.

37 Copyright © Ellis Cohen, 2002-2005 Distributed Validation S TblB TblA AB B's cache for S A's cache for S When S commits, A & B will both receive PREPARE messages. They will each locally do validation for their respective subtransactions, and only respond YES if validation succeeds. Consider a distributed DB which uses server-managed client caches. Note: With a client-side cache, S would need, as part of PREPARE, to pass back to A & B the timestamps of the data items read from A and B respectively. How can this be supported if cross-DB query processing (e.g. joins) are done at other nodes, and only the final results are passed back to S?

38 Copyright © Ellis Cohen, 2002-2005 Distributed Ordering Problem What if S and T want to commit at the same time, A receives S's PREPARE message first, and B receives T's PREPARE message first  result can be non-serializable ST TblB AB B's cache for S B's cache for T TblA A's cache for S A's cache for T

39 Copyright © Ellis Cohen, 2002-2005 Non-Serializable Result S 1) UPDATE AT SET a2 = a2 + 100 2) UPDATE BT SET b2 = b1 3) COMMIT T 1) UPDATE BT SET b2 = b2 + 100 2) UPDATE AT SET a2 = a1 3) COMMIT Assume a1=1 a2=2 b1=3 b2=4 There are two possible serial schedules S T  a2=1 b2=103 T S  a2=101 b2=3 But suppose S & T execute in parallel, and send PREPAREs to A and B in parallel If A get PREPAREs & validates T after S, no R/W conflicts and both validations succeed  a2=1 If B get PREPAREs & validates S after T, no R/W conflicts and both validations succeed  b2=3 When using Distributed Optimistic Concurrency Control subordinates cannot independently order commits!

40 Copyright © Ellis Cohen, 2002-2005 Timestamped Cache Checking Suppose all sites have access to the same global clock, and when S and T want to commit, they pass the current global time as part of their PREPARE messages (the PrepareTime) ST A TblA A's cache for S A's cache for T Suppose T sends PREPARE to A after S does, and suppose A receives them in the same order. When A receives T's PREPARE, it's PrepareTime is larger than every PrepareTime it already received, including S's. A can do Timestamped Cache Checking: For every local data item A read that is in A's cache for T, check whether A's version is the latest one (compare its read timestamp in the cache to the local DB's timestamp for it)

41 Copyright © Ellis Cohen, 2002-2005 Out of Order Prepares ST A TblA A's cache for S A's cache for T Suppose T sends PREPARE to A after S does, but A receives them in the opposite order. A receives PREPARE for T first, validates it, and responds YES, and then receives PREPARE for S, with an earlier PrepareTime. Problem: If S wrote something that T read, T read the wrong version of it; T should have read the version that S wrote. Too late to fail validation for T, but we can fail validation for S. Problem: If T wrote (and committed) something that S read, S read the wrong version of it. S should have read the data before T persisted it! Also fail validation for S in this case. T already committed, but S should have committed first These checks must be done in addition to Timestamped Cache Checking

42 Copyright © Ellis Cohen, 2002-2005 Loosely Synchronized Clocks In fact, distributed systems generally do not all have access to a global time. Instead, they use a Distributed Time Service, which sends time messages between sites, and ensures that all clocks stay reasonably close to one another. Increasing clock skew Will, at worst, cause the algorithm described to fail more validations unnecessarily (since more PREPARE's will appear to be received out of order), but Will not cause validation to incorrectly succeed. Are out of order PREPAREs a problem for Timestamp-Based or Read-Consistent concurrency control?

43 Copyright © Ellis Cohen, 2002-2005 Timestamp-Based Concurrency Ordering does not affect the Timestamped-Based Concurrency Control Algorithm Ordering already taken into account Data items are marked with read times as well as their write times. Timestamp-based checks effectively already do the appropriate validation based on order. Increasing clock skew Simply causes more timestamp-based checks to fail

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Concurrency Control for Distributed Databases These slides are licensed under a Creative Commons.

Similar presentations

Presentation on theme: "1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Concurrency Control for Distributed Databases These slides are licensed under a Creative Commons."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Concurrency Control for Distributed Databases These slides are licensed under a Creative Commons.

Similar presentations

Presentation on theme: "1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Concurrency Control for Distributed Databases These slides are licensed under a Creative Commons."— Presentation transcript:

Similar presentations

About project

Feedback