Introduction Many distributed systems require that participants agree on something On changes to important data On the status of a computation On what.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
1 Chapter 3. Synchronization. STEMPusan National University STEM-PNU 2 Synchronization in Distributed Systems Synchronization in a single machine Same.
1 Algorithms and protocols for distributed systems We have defined process groups as having peer or hierarchical structure and have seen that a coordinator.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
CS 582 / CMPE 481 Distributed Systems
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
CS 582 / CMPE 481 Distributed Systems
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed Systems Fall 2009 Distributed transactions.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Election Algorithms and Distributed Processing Section 6.5.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
CS162 Section Lecture 10 Slides based from Lecture and
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Distributed Transactions Chapter 13
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
University of Tampere, CS Department Distributed Commit.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Distributed Transactions Chapter – Vidya Satyanarayanan.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Revisiting Logical Clocks: Mutual Exclusion Problem statement: Given a set of n processes, and a shared resource, it is required that: –Mutual exclusion.
Lecture 11: Coordination and Agreement Central server for mutual exclusion Election – getting a number of processes to agree which is “in charge” CDK4:
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Coordination and Agreement
Outline Introduction Background Distributed DBMS Architecture
Mutual Exclusion Continued
Lecture 17: Leader Election
Two phase commit.
Distributed Mutex EE324 Lecture 11.
Distributed Consensus
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
Distributed Mutual Exclusion
Replication and Recovery in Distributed Systems
Outline Distributed Mutual Exclusion Introduction Performance measures
Distributed Transactions
Causal Consistency and Two-Phase Commit
Lecture 10: Coordination and Agreement
Exercises for Chapter 14: Distributed Transactions
Physical clock synchronization
Distributed Databases Recovery
Synchronization (2) – Mutual Exclusion
UNIVERSITAS GUNADARMA
Transactions in Distributed Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Prof. Leonardo Mostarda University of Camerino
Lecture 11: Coordination and Agreement
Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis.
Distributed Mutual eXclusion
CIS 720 Concurrency Control.
Transaction Communication
Presentation transcript:

Commitment and Mutual Exclusion CS 188 Distributed Systems February 18, 2015

Introduction Many distributed systems require that participants agree on something On changes to important data On the status of a computation On what to do next Reaching agreement in a general distributed system is challenging

Commitment Reaching agreement in a distributed system is extremely important Usually impossible to control a system’s behavior without agreement One approach to agreement is to get all participants to prepare to agree Then, once prepared, to take the action

Challenges to Commitment There are challenges to ensuring that commitment occurs Different nodes’ actions aren’t synchronous Communication only via messages Other actions can intervene Failures can occur

For Example, An optimistically replicated file system like Ficus We want to be able to add replicas of a volume Which is a lot easier to do if all nodes hosting existing replicas agree

But we need a version vector element for the new replica The Scenario A B C 1 2 4 3 3 7 5 3 7 5 3 7 5 3 7 5 But we need a version vector element for the new replica D I want a replica, too!

So What’s the Problem? A and C don’t know about the new replica But they can learn about it as soon as they contact B So why is there any difficulty?

One Problem A B C D E 1 2 4 3 4 Now for some updates! 7 5 3 7 5 3 7 5 Now for some updates! D E Different updates . . . Same version vector . . . 5 7 3 1 5 7 3 5 7 3 1 5 7 3

And It Can Be a Lot Worse What if replicas are being added and dropped frequently? How will we keep track of which ones are live and which ones are which? It can get very confusing

But That’s Not What I Want To Do, Anyway A common answer from system designers They don’t care about the odd corner cases They don’t expect them to happen So why pay a lot to handle them right? Sometimes a reasonable answer . . .

Why You Should Care If you allow a system to behave a certain way Even if you don’t think it ever will And your system is widely deployed and used Sooner or later that improbable thing will happen And who knows what happens next?

The Basic Solution Use a commitment protocol To ensure that all participating nodes understand what’s happening And agree to it Handles issues of concurrency and failures

Transactions A mechanism to achieve commitment By ensuring atomicity Also consistency, isolation, and durability Very important in database community Set of asynchronous request/reply communications Either all of set are complete or none

Transactions and ACID Properties ACID - Atomicity, Consistency, Isolation, and Durability Atomicity - all happen or none Consistency - Outcome equivalent to some serial ordering of actions Isolation - Partial results are invisible outside the transaction Durability - Committed transactions survive crashes and other failures

Achieving the ACID Properties In distributed environment, use two-phase commit protocol A unanimous voting protocol Do something if all participants agree it should be done Essentially, hold on to results of a transaction until all participants agree

Basics of Two-Phase Commit Run at the end of all application actions in a transaction Must end in commit or abort decision Must work despite delays and failures Require access to stable storage Usually started by a coordinator But coordinator has no more power than any other participant

The Two Phases Phase one: prepare to commit All participants are informed that they should get ready to commit All agree to do so Phase two: commitment Actually commit all effects of the transaction

Outline of Two-Phase Commit Protocol 1. Coordinator writes prepare to his local stable log 2. Coordinator sends prepare message to all other participants 3. Each participant either prepares or aborts, writing choice to its local log 4. Each participant sends his choice to the coordinator

The Two-Phase Commit Protocol, continued 5. The coordinator collects all votes 6. If all participants vote to commit, coordinator writes commit to its log 7. If any participant votes to abort, coordinator writes abort to its log 8. Coordinator sends his decision to all others

The Two-Phase Commit Protocol, concluded 9. If other participants receive a commit message, write commit to log and release transaction resources 10. If other participants receive an abort message, write abort to log and release transaction resources 11. Return acknowledgement to coordinator

A Two-Phase Commit Example Node 4 Node 1 coordinator committed commit prepared prepare prepare commit commit prepare All voted yes! committed prepared committed prepared prepare commit commit prepare Node 2 Node 3

What About the Abort Case? Same as commit, except not everyone voted yes Instead of committing, send aborts And abort locally at coordinator On receipt of an abort message, undo everything

Overheads of Two-Phase Commit For n participants, 4*(n-1) messages Each participant (except coordinator) gets a prepare and a commit message Each participant (except coordinator) sends a prepared and a committed message Can optimize committed messages away With potential cost of serious latencies in clearing log records

Two-Phase Commit and Failures Two-phase commit behaves well in the face of all single node failures May not be able to commit But will cleanly commit or abort And, if anyone commits, eventually everyone will Assumes fail-stop failures

Some Failure Examples: Example 1 Node 4 Node 1 Failure of coordinator after prepare sent; not all participants get prepare prepare Nodes 2, 3, 4 consult on timeout and abort Node 2 Node 3

Some Failure Examples: Example 2 Node 4 Node 1 Failure of other participant before it replied to prepare abort prepare prepare prepared Node 1 never got a response from node 4 prepare Node 2 Node 3

Some Failure Examples: Example 3 Node 4 Node 1 Failure of other participant after replying prepared prepare commit Query commit status Commit commit All voted yes! committed commited commit Node 4 consults its log and notices it was prepared Node 1 never got the committed message from node 4 What happens if node 4 recovers? commit Node 2 Node 3

Handling Failures Non-failed nodes still recover if some participants failed The coordinator can determine what other nodes did Did we commit or did we not? If the coordinator failed, a new coordinator can be elected And can determine state of commit Except . . .

An Issue With Two-Phase Commit What if both the coordinator and another node fail? During the commit phase Two possibilities The other failed node committed The other failed node did not commit

Possibility 1 Node 4 Node 1 Node 2 Node 3 commit commit prepare

Possibility 2 Node 4 Node 1 prepare prepare commit Node 2 Node 3

What Do the Other Nodes Do? Here’s what they see, in both cases: Node 4 Node 3 prepare Node 1 Node 2 prepare commit Node 1 Node 2 prepare commit But what happened at the failed nodes? This? Or this?

Why Does It Matter? Well, why? Consider, for each case, what would have happened if node 2 hadn’t failed

Handling the Problem Go to three phases instead of two Third phase provides the necessary information to distinguish the cases So if this two node failure occurs, other nodes can tell what happened

Three Phase Commit Coordinator Participant(s) send canCommit receive canCommit wait nak timeout OK no send ack all ack wait abort abort timeout abort send startCommit receive startCommit nak timeout prep send ack all ack prep send Commit timeout receive Commit confirm send ack Commit

Why Three Phases? First phase tells everyone a commit is in progress Second phase ensures that everyone knows that everyone else was told No chance that only some were told Third phase actually performs the commit Three phases ensures that failures of coordinator plus another participant is non-ambiguous

How Does This Work? So it’s safe to commit and nodes 3 and 4 These status records tell us more than the prepare record did Node 4 So it’s safe to commit and nodes 3 and 4 startCommit Node 2 ACKed the canCommit message Node 1 knew all participants did a canCommit startCommit Node 3

Overhead of Three Phase Commit For n participants, 6*(n-1) messages Each participant (except coordinator) gets a canCommit, startCommit, and a doCommit message Each participant (except coordinator) ACKed each of those messages Again, the final ACK can be optimized away But coordinator can’t delete record till it knows of all ACKs

Distributed Mutual Exclusion Another common problem in synchronizing distributed systems One-way communications can use simple synchronization Built into the paradigm Or handled at the shared server More general communications require more complex synchronization To ensure multiple simultaneously running processes interact properly

Synchronization and Mutual Exclusion Mutual exclusion ensures that only one of a set of participants uses a resource At any given moment In certain cases, that’s all the synchronization required In other cases, more synchronization can be built on top of mutual exclusion

The Basic Mutual Exclusion Problem n independent participants are sharing a resource In distributed case, each participant on a different node At any moment, only one participant can use the resource Must avoid deadlock, ensure fairness, and use few resources

Mutual Exclusion Approaches Contention-based Controlled

Contention-Based Mutual Exclusion Each process freely and equally competes for the resource Some algorithm used to evaluate request resolution criterion Timestamps, priorities, and voting are ways to resolve conflicting requests Problem assumes everyone cooperates and follows the rules

Timestamp Schemes Whoever asked first should get the resource Runs into obvious problems of distributed clocks Usually handled with logical clocks, not physical clocks

Lamport’s Mutual Exclusion Algorithm Uses Lamport clocks With total order Assumes N processes Any pair can communicate directly Assumes reliable, in-order delivery of messages Though arbitrary message delays allowable

Outline of Lamport’s Algorithm Each process keeps a queue of requests When process wants the resource, it adds request to local queue, in order Sends REQUEST to all other processes All other processes send REPLY msgs When done with resource, process sends RELEASE msg to all others Lamport timestamps on all msgs

When Does Someone Get the Resource? A requesting process gets the resource when: 1) It has received replies from all other processes 2) Its request is at the top of its queue 3) A RELEASE message was received

Lamport’s Algorithm At Work B 11 12 A 13 10 14 11 B B 10 B 11 RELEASE A 10 B 11 A 10 B receives the resource B requests the resource C 10 D 10 A 10 A 10

Dealing With Multiple Requests C 11 B 15 16 14 A 10 B 11 C A 10 B 11 B 10 A 10 A 10 A releases the resource B receives the resource B requests the resource B and C send messages C 11 C 10 D 10 A 10 A 10 C requests the resource

Complexity of Lamport Algorithm For N participants, 3*(N-1) per completion of critical section Requester sends N-1 REQUEST messages N-1 other processes each REPLY When requester relinquishes critical section, sends N-1 RELEASE messages

A Problem With Lamport Algorithm One slow/failed process can cripple anyone getting the resource Since no process can claim the resource unless it knows all other processes have seen its request

Voting Schemes Processes vote on who should get the shared resource next Can work even if one process fails Or even if a minority of processes fail Variants can allow weighted voting

Basics of Voting Algorithms Process needing shared resource sends a REQUEST to all other processes Each process receiving a request checks if it has already voted for someone else If not, it votes for the requester By replying

Obtaining the Shared Resource In Voting Schemes When a requester gets replies from a majority of voters, it gets the section Since any voting process only replies to one requester, only one requester can get a majority When done with resource, send RELEASE message to all who voted for this process

Avoiding Deadlock If more than two processes request resource, sometimes no one wins Effectively a deadlock condition Can be fixed by allowing processes to change their votes Requires permission from the process that originally got the vote

Complexity of Voting Schemes for Mutual Exclusion for reasons similar to Lamport discussion Use of quorums can reduce to O(SQRT(N))

Token Based Mutual Exclusion Maintain a token shared by all processes needing the resource Current holder of the token has access to resource To gain access to resource, must obtain token

Obtaining the Token Typically done by asking for it through some topology of the processes Ring Tree Broadcast

Ring Topologies for Tokens The token circulates along a pre-defined logical ring of processes As token arrives, if local process wants the resource the token is held Once finished, the token is passed on Good for high loads, high overhead for low loads

A Token Ring

Tree Topologies Only pass token when needed Use a tree structure to pass requests from requesting process to current token holder When token passed, re-arrange the tree to put new token holder at root

Broadcast Topologies When a process wants the token, it sends a request to all other processes If current token holder isn’t using it, it sends the token to requester If the token is in use, its holder adds the request to the queue Use timestamp scheme to order the queue

A Common Problem With Token Schemes What happens if the token-holder fails? Could keep token in stable storage But still unavailable until token-holder recovers Could create new token Must be careful not to end up with two tokens, though Typically by running voting algorithm