Coordination and Agreement, Multicast, and Message Ordering.

Slides:



Advertisements
Similar presentations
CS542 Topics in Distributed Systems Diganta Goswami.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Consensus Hao Li.
Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture IX: Coordination And Agreement.
CS 582 / CMPE 481 Distributed Systems
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Distributed Systems Fall 2009 Coordination and agreement, Multicast, and Message ordering.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Coordination in Distributed Systems Lecture # 8. Coordination Anecdotes  Decentralized, no coordination  Aloha ~ 18%  Some coordinating Master  Slotted.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
Consensus and Its Impossibility in Asynchronous Systems.
Time and Coordination March 13, Time and Coordination What is time? :-)  Issue: How do you coordinate distributed computers if there is no global.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
November 2005Distributed systems: distributed algorithms 1 Distributed Systems: Distributed algorithms.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Vector Clock Each process maintains an array of clocks –vc.j.k denotes the knowledge that j has about the clock of k –vc.j.j, thus, denotes the clock of.
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Exercises for Chapter 15: COORDINATION AND AGREEMENT From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley.
SysRép / 2.5A. SchiperEté The consensus problem.
Distributed Systems Topic 5: Time, Coordination and Agreement
Lecture 10: Coordination and Agreement (Chap 12) Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 15: Coordination.
CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Lecture 11: Coordination and Agreement Central server for mutual exclusion Election – getting a number of processes to agree which is “in charge” CDK4:
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Oct 1, 2015 Lecture 12: Mutual Exclusion All slides © IG.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Slides for Chapter 11: Coordination and Agreement From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley.
Exercises for Chapter 11: COORDINATION AND AGREEMENT
Coordination and Agreement
The consensus problem in distributed systems
Distributed Systems – Paxos
Coordination and Agreement
Distributed Systems, Consensus and Replicated State Machines
EEC 688/788 Secure and Dependable Computing
Outline Distributed Mutual Exclusion Introduction Performance measures
Consensus, FLP, and Paxos
CSE 486/586 Distributed Systems Mutual Exclusion
EEC 688/788 Secure and Dependable Computing
Lecture 10: Coordination and Agreement
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 11: Coordination and Agreement
CSE 486/586 Distributed Systems Mutual Exclusion
Presentation transcript:

Coordination and Agreement, Multicast, and Message Ordering

Fall 20105DV0203 Outline Coordination –Distributed mutual exclusion –Election algorithms Multicast Message ordering Agreement –Byzantine generals problem –Paxos algorithm Summary

Coordination and Agreement Processes often need to coordinate their actions and/or reach an agreement –Which process gets to access a shared resource? –Has the master crashed? Elect a new one! Failure detection – how can we know that a node has failed (e.g. crashed)? Fall 20105DV0204

Failure detection Keepalive messages every X time units –No message: the process has failed –Crashing immediately after sending? –Message delays, re-routing, etc. Unreliable failure detection –Unsuspected vs. Suspected Reliable failure detection –Unsuspected vs. Failed –Synchronous systems Fall 20105DV0205

Distributed mutual exclusion ME1 (safety): at most 1 process may enter the critical section at a time ME2 (liveness): requests to enter the critical section eventually succeed ME3 ( → ordering): if a request to enter the critical section happened- before another, then access is granted according to that order Fall 20105DV0206

Mutex algorithms Central server –Simple and error-prone! Ring-based algorithm –Also simple, but not single point of failure Ricart and Agrawala –Multicast request, access when all have replied (ordered using logical clocks) Maekawa's voting algorithm –Ask only a subset for access: works if subsets are overlapping Fall 20105DV0207

Mutex: Central server Send request to server, oldest process in queue gets access, return token when done Safety? –Yes! Liveness? –Yes (as long as server does not crash) → ordering? –No! Why not? Fall 20105DV0208

Mutex: Ring-based Token is passed around a ring of processes –Want access? Wait until token comes, and claim it (then pass the token along) Safety? Liveness? –Yes (assuming no crashes) → ordering? –Not even close Fall 20105DV0209

Mutex: Ricart and Agrawala Each process –Has unique process ID –Maintains a logical (Lamport) clock –Has state ∈ {wanted, held, released} Requests are multicasted to group –Contain process ID and clock value Lowest clock value gets access first –Equal values? Check process ID! Fall 20105DV02010

Mutex: Ricart and Agrawala Have access or want it yourself, and your is lower than incoming request? Queue the request! Else, reply to it Safety? Liveness? → ordering? –Yes! Improve performance –If process wants to re-enter critical section, and no new requests have been made, just do it! Fall 20105DV02011

Mutex: Maekawa’s voting Optimization: only ask subset of processes for entry –Works as long as subsets overlap –Vote only in one election at a time! Safety? –Yes! Liveness? → ordering? –No, deadlocks can happen! But, enhance using vector clocks, and we get both! Fall 20105DV02012

Comparison of mutex algos Study the algorithms yourselves, including their performance –This is an excellent question for exams (just sayin') Message loss? –None of the algorithms handle this Crashing processes? –Ring? No! Others? Only if the process is unimportant (not server, in voting set) Fall 20105DV02013

Election algorithms Ring-based –You may be familiar with this one! Bully –Very specific requirements, not generally applicable Fall 20105DV02014

Election requirements E1 (safety): A participant has elected = False or elected = P, where P is chosen as the non- crashed process with the highest identifier E2 (liveness): All processes participate and eventually set elected to non-False or crash Fall 20105DV02015

Election: Ring-based Processes have unique identifiers During election, pass max(own ID, incoming ID) to next process –If a process receives own ID, it must have been highest and may send that it has been elected Safety? Liveness? –Yes! Tolerates no failures (limited use) Fall 20105DV02016

Election: Bully algorithm Requires –Synchronous system –All processes know of each other –Reliable failure detectors –Reliable message delivery Allows –Crashing processes Safety? Not if process IDs can be reused! Liveness? Yes (if message delivery is reliable) Fall 20105DV02017

Election algorithms Study these algorithms on your own, and their properties –Again, excellent exam material... Want to read more about non- trivial election algorithms? – Fall 20105DV02018

Multicast Receive vs. Deliver –Receive: message has arrived and will be processed –Deliver: message is allowed to reach upper layer Closed vs. open groups –(Dis-)allow external processes to send to group Unreliable (basic) multicast –Send (unicast) to each other process! What if sender fails halfway through? Fall 20105DV02019

Reliable multicast Integrity: messages delivered at most once Validity: if a correct process multicasts message m, it will eventually deliver m Agreement: if a correct process delivers m, then all correct processes will eventually deliver m Fall 20105DV02020

Reliable multicast Use basic multicast to send to all (including self) –When basic multicast delivers, check if message has been received before If it has, do nothing further If not, and sender is not own process: –Basic multicast message to others Deliver message to upper layer Fall 20105DV02021

Reliable multicast Integrity? Validity? Agreement? –Yes! Insane amounts of traffic? –Yes! Every message is sent sizeof(group) to each process! –A single message will be sent 100 times if we just have 10 processes Fall 20105DV02022

Reliable multicast over IP multicast IP multicast requires hardware support –IP, however, is unreliable Study algorithm on your own –Perfect exam material... Fall 20105DV02023

Message ordering FIFO Total Causal Hybrids such as Causal-Total Fall 20105DV02024

FIFO ordering Intuition: messages from a process should be delivered in the order in which they were sent Solution: let sending process number the messages and hold back those that have been received out of order Fall 20105DV02025

Total ordering Intuition: messages from all processes should get a groupwide ordering number, so all processes can deliver in a single order! –Mental pitfall: the order itself does not have to make any sense, as long as all processes abide by it! Fall 20105DV02026

Implementing total ordering Sequencer –Simple –Central server (= single point of failure) ISIS-algorithm –Not as simple –Distributed –Study on your own! Fall 20105DV02027

Total ordering via sequencer Sequencer is logically external to the group Messages are sent to all members and the Sequencer –Initially, have no “ordering” number Sequencer maps message identifiers to ordering numbers –Multicasts mapping to group Once a message has an ordering number, it can be delivered according to that number Fall 20105DV02028

Total ordering contd. Note, again, that the ordering is completely up to the sequencer –It could collect all messages for half an hour and then assign numbers according to how many “a”:s there are in the message While annoying to use, this is still a total order, and all processes will have to follow it! Fall 20105DV02029

Causal ordering Intuition: captures causal (cause and effect) relationships via happened-before ordering Using vector clocks, ensures that replies are delivered after the message that they are replying to Example: USENET replication Fall 20105DV02030

Hybrid orderings Causal order is not unique –Concurrent messages...neither is FIFO –FIFO only guarantees per process, not inter-process Total order only guarantees a unique order –Combine with others to get stronger delivery semantics! Fall 20105DV02031

Message ordering You will need to get familiar with these orderings during the assignment –Read the book carefully Avoid mistakes Adhere to definitions Get a few very nice hints... Fall 20105DV02032

Agreement Consensus –Any process can suggest a value, all have to reach consensus on which is correct Byzantine generals problem –One process can suggest a value, others have to agree on what that value was Fall 20105DV02033

Byzantine Generals problem Commander –Issues order to attack or retreat Lieutenants –Decide what to do –Coordinate their actions by repeating the order from the commander to each other One or more processes (commander or lieutenants) may be “treacherous” and report wrong/inconsistent values! Fall 20105DV02034

Byzantine Generals problem Synchronous system! Impossible with three processes –See Figure in the book, © Pearson Education 2005 Possible with N ≥ 3f + 1 processes, where f is amount of treacherous ones –See Figure in the book, © Pearson Education 2005 Fall 20105DV02035

Byzantine Generals problem Asynchronous system – bad! –No timing constraints Fischer's impossibility result –Even with just one crashing process, we cannot reach consensus in an asynchronous system –This also applies to total ordering of messages and reliable multicast... –Still, we manage to do quite well in practice – how can that be? Fall 20105DV02036

Impossibility work-arounds Mask the faults –Use persistent storage and allow process restarts Use failure detectors –No reliable detectors: but good enough (agree that process is crashed if too long time has passed) –Eventually weak failure detector Randomization –Great paper (which is required reading) on the literature page! Fall 20105DV02037

Paxos algorithm Properties: –Non-triviality: only proposed values can be learned –Consistency: at most one value can be learned (two learners cannot learn different values) –Liveness (C;L): if a value C has been proposed, then eventually learner L will learn some value Asynchronous Fall 20105DV02038

Paxos algorithm contd. Client –Initiates with a request, expects response Acceptor –Collected into Quorums (groups) Messages must be sent to a whole quorum Messages are ignored unless they come from the whole quorum Proposer –Handles request on behalf of Client Learner –Acts if Acceptors agree on a value Leader –Distinguished Proposer, handles protocol Fall 20105DV02039

Paxos Quorums A quorum contains a majority of available Acceptors Several subsets of Acceptors such that any pair of Quorums always have some overlap Fall 20105DV02040

Paxos deployment Each Client reqest starts a new instance Each instance has two successful phases per round –If a phase is unsuccessful, start a new round Output from an instance is a single agreed-upon value Fall 20105DV02041

Paxos phases Prepare (1a) –Leader proposes number N Promise (1b) –If N is greater than any previous proposal, Acceptors in some quorum promise to reject proposals less than N –Acceptors respond with the last (highest) value they have accepted Accept (2a) –Leader chooses value from response set, sends value to some quorum of Acceptors Accepted (2b) –If Acceptors have not promised to do otherwise (promised to another concurrent proposal > N), they accept the value to Learners and the Leader Fall 20105DV02042

Paxos further reading Leslie Lamport Paxos Made Simple ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) –PDF herePDF –Well worth your time… Fall 20105DV02043

Summary Failure detection –Hard to get right in async. systems Mutual exclusion Election algorithms –Seems like a simple problem, but non- trivial solutions are... non-trivial Multicast, reliable and unreliable Message ordering –Very important for the assignment Fall 20105DV02044

Summary Byzantine generals problem Fisher's impossibility result –Avoiding it by various techniques Paxos algorithm Fall 20105DV02045

Next lecture Replication –Group communication –Fault-tolerant services Active and Passive replication Fall 20105DV02046