CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble
Paxos CS 5204 – Operating Systems2 Overview Distributed State Machine Distributed Consensus Fault-tolerance Requirements/Assumptions Paxos Algorithm Concepts The Algorithm Itself Examples Multi-Paxos Variations of Paxos
Paxos CS 5204 – Operating Systems3 Distributed State Machine Fault-tolerance through replication. Need to ensure that replicas remain consistent. Replicas must process requests in the same order.
Paxos CS 5204 – Operating Systems4 The Distributed Consensus Problem In a distributed system, how can we: Select a single action among many options? How can this be done in a fault-tolerant way? Simple solution: A single node acts as the “decider.” But this is not fault tolerant. (What if the decider fails?) A better solution: Paxos
Paxos CS 5204 – Operating Systems5 Fault-Tolerant Consensus Requirements Safety Only a value that has been proposed may be chosen. Only a single value is chosen. A process never learns that a value has been chosen until it actually has been. Goals Liveness FLP Impossibility Proof (1985)
Paxos CS 5204 – Operating Systems6 Assumptions Failures “Fail Stop” assumption When a node fails, it ceases to function entirely. May resume normal operation when restarted. Messages May be lost. May be duplicated. May be delayed (and thus reordered). May not be corrupt. Stable Storage
Paxos CS 5204 – Operating Systems7 Proposal An alternative proposed by a proposer. Consists of a unique number and a proposed value. ( 42, B ) We say a value is chosen when consensus is reached on that value. Paxos Terms Proposer Suggests values for consideration by Acceptors. Advocates for a client. Acceptor Considers the values proposed by proposers. Renders an accept/reject decision. Learner Learns the chosen value. In practice, each node will usually play all three roles. A1 P1
Paxos CS 5204 – Operating Systems8 Strong Majority “Strong Majority” / “Quorum” A set of acceptors consisting of more than half of all acceptors. Any two quorums have a nonempty intersection. Helps avoid “split-brain” problem. Acceptors decisions are not in agreement. Common node acts as “tie-breaker.” In a system with 2F+1 acceptors, F acceptors can fail and we'll be OK. Quorums in a system with seven acceptors. A1A6 A4 A5 A7 A3 A2
Paxos CS 5204 – Operating Systems9 Consensus time (N 5, V 3 ) (N 3, V 3 ) (N 2, V 2 ) (N 4, V 1 ) (N 1, V 1 ) (N 6, V 3 ) consensus reached, V 3 chosen (N 7, V 3 ) Values proposed by proposers are constrained so that once consensus has been reached, all future proposals will carry the chosen value. P2c. For any v and n, if a proposal with value v and number n is issued, then there is a set S consisting of a majority of acceptors such that either: (a) no acceptor in S has accepted any proposal numbered less than n, or (b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S. (N 2, V 2 ) A1 A2 A3 A4 A5
Paxos CS 5204 – Operating Systems10 Phase 1a:“Prepare” Select proposal number* N and send a prepare(N) request to a quorum of acceptors. Phase 1b:“Promise” If N > number of any previous promises or acceptances, * promise to never accept any future proposal less than N, - send a promise(N, U) response (where U is the highest-numbered proposal accepted so far (if any)) Phase 2a:“Accept!” If proposer received promise responses from a quorum, - send an accept(N, W) request to those acceptors (where W is the value of the highest-numbered proposal among the promise responses, or any value if no promise contained a proposal) Phase 2b:“Accepted” If N >= number of any previous promise, * accept the proposal - send an accepted notification to the learner Acceptor Proposer Basic Paxos Algorithm * = record to stable storage
Paxos CS 5204 – Operating Systems11 A1 P1 A2A3 time start promise(1) promise(1, -) prepare(1) accept(1, A) prepare(1) accepted(1, A)
Paxos CS 5204 – Operating Systems12 A1 P1 A2A3 time P2 prepare(1) accepted(1, A) prepare(1) prepare(2) promise(2, -) promise(2, (1,A)) prepare(1) accept(2, A) accepted(2, I) accepted(2, A) continued...
Paxos CS 5204 – Operating Systems13 A1 P1 A2A3 time promise(1) prepare(1) prepare(2) accepted(1, C) P2 prepare(1) accept(1, C) prepare(1) promise(2, -) prepare(1) accept(2, D) accepted(1, I) accepted(2, D) start
Paxos CS 5204 – Operating Systems14 A1 P1 A2A3 time promise(1) prepare(1) prepare(2) accepted(1, F) P2 prepare(1) accept(1, F) prepare(1) promise(2, -) prepare(1) prepare(3)... prepare(1) prepare(4) prepare(1) prepare(2)... start
Paxos CS 5204 – Operating Systems15 Other Considerations Liveness Can't be guaranteed in general. Distinguished Proposer All proposals are funneled through one node. Can re-elect on failure. Learning the Chosen Value Acceptors notify some set of learners upon acceptance. Distinguished Learner A node may play the role of both distinguished proposer and distinguished learner – we call such a node the master.
Paxos CS 5204 – Operating Systems16 Multi-Paxos A single instance of Paxos yields a single chosen value. To form a sequence of chosen values, simply apply Paxos iteratively. To distinguish, include an instance number in messages. Facilitates replication of a state machine. “time” instance chosen valueSP???O
Paxos CS 5204 – Operating Systems17 Paxos Variations Cheap Paxos Reconfiguration Eject failed acceptors. Fault-tolerant with only F+1 nodes (vs 2F+1). Failures must not happen too quickly. Fast Paxos Clients send accept messages to acceptors. Master is responsible for breaking ties. Reduces message traffic. Byzantine Paxos Arbitrary failures – lying, collusion, fabricated messages, selective non-participation. Adds an extra “verify” phase to the algorithm.
Paxos CS 5204 – Operating Systems18 Conclusion State-Machine Replication Distributed Consensus Basic Paxos Examples Optimizations Variations
Paxos CS 5204 – Operating Systems19 Questions?
Paxos CS 5204 – Operating Systems20 References Paxos Made Simple Paxos Made Live Wikipedia – Paxos Algorithm The Byzantine Generals Problem Impossibility of distributed consensus with one faulty process