CSC 8320 Advanced Operating Systems Xueting Liao

CSC 8320 Advanced Operating Systems Xueting Liao
Distributed Commit CSC 8320 Advanced Operating Systems Xueting Liao

Basic Transaction Operations
Begin transaction: mark the start of a transaction End transaction: mark the end of a transaction – no more tasks Commit transaction: make the results permanent Abort transaction: kill the transaction, restore old values Read/write/compute data (modify files or objects) • But data will have to be restored if the transaction is aborted. Let’s first get to know the background and environment. What is transaction. It has several primitives.

Example Book a flight from Atlanta, GA to Guangzhou, China. No non-stop flights are available: Transaction begin: 1. Reserve a seat from Atlanta to Detroit(ATL -> DTW) 2. Reserve a seat from Detroit to Shanghai(DTW -> PVG) 3. Reserve a seat from Shanghai to Guangzhou(PVG -> CAN) Transaction end If there are no seats available on the PVG -> CAN leg of the journey, the transaction is aborted and reservations for 1 and 2 are undone.

Distributed Commit Problem
Some applications perform operations on multiple databases Transfer funds between two bank accounts Debiting one account and crediting another We would like a guarantee that either all the databases get updated, or none does Distributed Commit Problem (all-or-none semantics): Operation is committed when all participants can perform it Once a commit decision is reached, this requirement holds even if some participants fail and later recover

Properties of transactions
Atomicity: all-or-none, if transaction failed then no changes apply to the database Consistency: there is no violation of the database integrity constraints Isolation: partial results are hidden (due to incomplete transactions) Durability: the effects of transactions that were committed are permanent =Atomic – The transaction happens as a single indivisible action. Everything succeeds or else the entire transaction is rolled back. Others do not see intermediate results. =Consistent – A transaction cannot leave the database in an inconsistent state. E.g., total amount of money in all accounts must be the same before and after a “transfer funds” transaction. =Isolated (Serializable) – Transactions cannot interfere with each other or see intermediate results If transactions run at the same time, the final result must be the same as if they executed in some serial order. =Durable – Once a transaction commits, the results are made permanent. Failures after a commit will cause the results to revert.

Distributed Transactions
Transaction that updates data on two or more systems Each computer runs a transaction manager – Responsible for subtransactions on that system – Transaction managers communicate with other transaction managers – Performs prepare, commit, and abort calls for subtransactions Every subtransaction must agree to commit changes before the overall transaction can complete

One-Phase Commit A coordinator tells all other processes (participants) whether or not to perform the operation in question Problem: If one participant fails to perform the operation, no way to tell the coordinator Violate the all-or-none rule!

Two-Phase Commit (2PC) Overview
Assumes a coordinator that initiates the commit/abort Each participant votes if it is ready to commit Until the commit actually occurs, the update is considered temporary (placed in temp area) The participant is permitted to discard a pending update Until all participants vote “ok”, a participant can abort Coordinator decides outcome and informs all participants

Two-Phase Commit Operates in rounds
Coordinator assigns unique identifiers for each protocol run. How? It’s time to use logical clocks: run identifier can be process ID and the value of logical clock Messages carry the identifier of protocol run they are part of Since lots of messages must be stored, a garbage collection must be performed, the challenge is to determine when it is safe to remove the information

Participant States Initial state: pi is not aware that protocol started, ends when pi received the ready_to_commit and it is ready to send its Ok Prepared to commit: pi sent its Ok, saves in temp area and waits for the final decision (Commit or Abort) from coordinator Commit or abort state: pi knows the final decision, it must execute it When a participant enters the prepared state, it contacts the coordinator to start the commit protocol to commit the entire transaction

2PC State Transition (a) The finite state machine for the coordinator in 2PC. (b) The finite state machine for a participant. Timeout mechanism is used here for coordinator and participants. Coordinator blocked in “WAIT”, participant blocked in “INIT” The first phase is the voting phase, and consists of steps 1 and 2. The second phase is the decision phase, and consists of steps 3 and 4 Several problems arise when this basic 2PC protocol is used in a system where failures occur. First, note that the coordinator as well as the participants have states in which they block waiting for incoming messages. Consequently, the protocol can easily fail when a process crashes for other processes may be indefinitely waiting for a message from that process.

Problems 2PC assumes any failed system will eventually recover
If a node agreed to commit and then crashed, it must be willing and able to commit on recovery. Each system will use a transaction log The write-ahead log is used to – Keep track of where it is in the protocol (and what it agreed to) – As well as values to enable commit or abort (rollback) – This enables fail-recover

2PC: Overcome Coordinator Failures
Multicast: vote-request Collect replies/votes All vote-commit => log ‘commit’ to ‘outcomes’ table and send commit Else => log ‘abort’ send abort Collect acknowledgments Garbage-collect protocol outcome information

2PC: Overcome Participant Failures
vote-request => log its vote and send vote-(commit/abort) commit => make changes permanent, send acknowledgment abort => delete temp area After failure: For each pending protocol: contact coordinator (or other participants) to learn outcome

What if coordinator fails?
If coordinator crashed during first phase (WAIT): some participants will be ready to commit others will not be able to (they voted on abort) other processes may not know what the state is If coordinator crashed during its decision or before sending it out: some processes will be in READY state some others will know the outcome

2PC: Improvement for Overcome Coordinator Failures
Multicast: vote-request Collect replies/votes All vote-commit => log ‘commit’ to ‘outcomes’ table, wait until safe on persistent storage and send commit Else => log ‘abort’, send abort Collect acknowledgments Garbage collect protocol outcome information After failure: For each pending protocol in outcomes table Possibly re-transmit VOTE_REQUEST if in WAIT Send outcome (commit or abort) Wait for acknowledgments Garbage collect outcome information

2PC: Improvement for Overcome Participant Failures
Participant: first time message received Vote-request save to temp area and reply its vote –(commit / abort) Global_commit make changes permanent Global_abort delete temp area Message is a duplicate (recovering coordinator) Send acknowledgment After failure: For each pending protocol: contact coordinator to learn outcome

Problems The crash of the coordinator may block participants to reach a final decision until it recovers When a participant gets a commit/abort message, it does not know if every other participant was informed of the result However, in real world, the distributed systems have lots of different servers. The servers can crash, or link failure. How to overcome these problems? I will introduce 3 phrase commit protocol

Three-Phase Commit Protocol
Same setup as the two-phase commit protocol: – Coordinator & Participants Add timeouts to each phase that result in an abort Propagate the result of the commit/abort vote to each participant before telling them to act on it – This will allow us to recover the state if any participant dies Introduces an additional round of communication and delays to prepare-to-commit state to ensure that the state of the system can always be deduced by a subset of alive participants that can communicate with each other. And before the commit, coordinator tells all participants that everyone sent Oks (ready_commit)

3PC: Coordinator Coordinator: Multicast: vote-request
Collect votes/replies All commit => log ‘precommit’ and send precommit Else => log ‘abort’, send abort Collect acks from non-failed participants All ‘ready-commit’ => log commit and send global-commit Collect acknowledgements Garbage collect protocol outcome information

3PC: Participant Participant: logs state on each message
Vote-request: save to temp area and reply vote-(commit/abort) Precommit: Enter precommit state, send ack (ready-commit) Commit: make changes permanent Abort: delete temp area After failure: Collect participant state information All precommit or any committed: Push forward the commit Else: Push back the abort

3 PC Weakness May have problems when the network gets partitioned
Not good when a crashed coordinator recovers, it needs to find out that someone took over and stay quiet. Otherwise it will mess up the protocol, leading to an inconsistent state

Recent researches Calvin: fast distributed transactions for partitioned database systems Thomson, A., Diamond, T., Weng, S.C., Ren, K., Shao, P. and Abadi, D.J., 2012, May. Calvin: fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (pp. 1-12). ACM. No compromises: distributed transactions with consistency, availability, and performance Dragojević, A., Narayanan, D., Nightingale, E.B., Renzelmann, M., Shamis, A., Badam, A. and Castro, M., 2015, October. No compromises: distributed transactions with consistency, availability, and performance. In Proceedings of the 25th symposium on operating systems principles (pp ). ACM.

Future works Extending the capabilities of the database to large scale-out clusters Instantaneous restart using persistent RAM technologies such as Phase Change Memory, Continued optimization for high performance network technologies such as Infiniband Optimistic, lockless algorithms for maximal scaling on very large SMP systems. All these technologies will bring challenges to the commit protocols.

Thank you!

CSC 8320 Advanced Operating Systems Xueting Liao

Similar presentations

Presentation on theme: "CSC 8320 Advanced Operating Systems Xueting Liao"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSC 8320 Advanced Operating Systems Xueting Liao

Similar presentations

Presentation on theme: "CSC 8320 Advanced Operating Systems Xueting Liao"— Presentation transcript:

Similar presentations

About project

Feedback