Distributed Consensus Paxos

Slides:



Advertisements
Similar presentations
Paxos and Zookeeper Roy Campbell.
Advertisements

NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Indranil Gupta (Indy) Lecture 8 Paxos February 12, 2015 CS 525 Advanced Distributed Systems Spring 2015 All Slides © IG 1.
Consensus Hao Li.
The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010.
Paxos Made Simple Gene Pang. Paxos L. Lamport, The Part-Time Parliament, September 1989 Aegean island of Paxos A part-time parliament – Goal: determine.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Distributed Databases
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Paxos A Consensus Algorithm for Fault Tolerant Replication.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
Implementing Replicated Logs with Paxos John Ousterhout and Diego Ongaro Stanford University Note: this material borrows heavily from slides by Lorenzo.
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
The Raft Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
BChain: High-Throughput BFT Protocols
The consensus problem in distributed systems
CS 525 Advanced Distributed Systems Spring 2013
Distributed Systems – Paxos
CSE 486/586 Distributed Systems Paxos
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
Two phase commit.
COS 418: Distributed Systems
Distributed Systems: Paxos
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
Implementing Consistency -- Paxos
CS 525 Advanced Distributed Systems Spring 2018
EECS 498 Introduction to Distributed Systems Fall 2017
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
Distributed Systems, Consensus and Replicated State Machines
Principles of Computer Security
Fault-tolerance techniques RSM, Paxos
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
EEC 688/788 Secure and Dependable Computing
From Viewstamped Replication to BFT
Consensus, FLP, and Paxos
Lecture 21: Replication Control
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
EEC 688/788 Secure and Dependable Computing
Fault-Tolerant State Machine Replication
EECS 498 Introduction to Distributed Systems Fall 2017
IS 651: Distributed Systems Final Exam
Our Final Consensus Protocol: RAFT
Consensus: Paxos Haobin Ni Oct 22, 2018.
Replicated state machine and Paxos
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
The SMART Way to Migrate Replicated Stateful Services
EEC 688/788 Secure and Dependable Computing
Paxos Made Simple.
Lecture 21: Replication Control
Implementing Consistency -- Paxos
Concurrent, Consistent Applications over a Distributed Shared Log
CSE 486/586 Distributed Systems Paxos
Presentation transcript:

Distributed Consensus Paxos Ethan Cecchetti October 18, 2016 CS6410 Some structure taken from Robert Burgess’s 2009 slides on this topic

State Machine Replication (SMR) Server 1 Server 2 View a server as a state machine. To replicate the server: Replicate the initial state Replicate each transition S0 S0 S1 a Client S2 b

Paxos: Fault-Tolerant SMR Devised by Leslie Lamport, originally in 1989 Written as “The Part-Time Parliament” Abstract: Recent archaeological discoveries on the island of Paxos reveal that the parliament functioned despite the peripatetic propensity of its part-time legislators. The legislators maintained consistent copies of the parliamentary record, despite their frequent forays from the chamber and the forgetfulness of their messengers. The Paxon parliament’s protocol provides a new way of implementing the state-machine approach to the design of distributed systems. Rejected as unimportant and too confusing

Paxos: The Lost Manuscript Finally published in 1998 after it was put into use Published as a “lost manuscript” with notes from Keith Marzullo “This submission was recently discovered behind a filing cabinet in the TOCS editorial office. Despite its age, the editor-in-chief felt that it was worth publishing. Because the author is currently doing field work in the Greek isles and cannot be reached, I was asked to prepare it for publication.” “Paxos Made Simple” simplified the explanation…a bit too much Abstract: The Paxos algorithm, when presented in plain English, is very simple.

Paxos Made Moderately Complex Robbert van Renesse and Deniz Altinbuken (Cornell University) ACM Computing Surveys, 2015 “The Part-Time Parliament” was too confusing “Paxos Made Simple” was overly simplified Better to make it moderately complex! Much easier to understand

Paxos Structure Figure from James Mickens. ;login: logout. The Saddest Moment. May 2013

Paxos Structure Proposers Acceptors Learners

Moderate Complexity: Notation Function as proposers and learners without persistent storage Store data and propose to proposers Figure from van Renesse and Altinbuken 2015

Single-Decree Synod Proposer Acceptori Decides on one command b = 0 Acceptori b' = 0 Decides on one command System is divided into proposers and acceptors The protocol executes in phases: b = b + 1 Send (p1a,b) if (b' < b) b' = b Send (p1b,b',ci) Proposer proposes a ballot b if (b' > b) b = b' abort if majority c = b-max(ci) Send (p2a,b,c) Acceptori responds with (b', ci) If b' > b, update b and abort Else wait for majority of acceptors Request received ci with highest ballot number if (b' == b) accept (b',c) Send (p2b,b',c) If b' has not changed, accept A learner learns c if it receives the same (p2b, b',c) from a majority of acceptors

Optimizations: Distinguished Learner Proposers Acceptors Distinguished Learner Other Learners

Optimizations: Distinguished Proposer Other Proposers Distinguished Proposer Acceptors Learners

What can go wrong? A bunch of preemption Too many faults If two proposers keep preempting each other, no decision will be made Too many faults Liveness requirements majority of acceptors one proposer one learner Correctness requires one learner

Deciding on Multiple Commands Run Synod protocol for multiple slots Slot 1 c1 Slot 2 c2 Slot 3 c3 Sequential separate runs Slow Parallel separate runs Broken (no ordering) One run with multiple slots Multi-decree Synod! Synod Syond Multi-decree Synod

Paxos with Multi-Decree Synod Like single-decree Synod with one key difference: Every proposal contains a both a ballot and slot number Each slot is decided independently On preemption (if (b' > b) {b = b'; abort;}), proposer aborts active proposals for all slots

Moderate Complexity: Leaders Leader functionality is split into pieces Scouts – perform proposal function for a ballot number While a scout is outstanding, do nothing Commanders – perform commit requests If a majority of acceptors accept, the commander reports a decision Both can be preempted by a higher ballot number Causes all commanders and scouts to shut down and spawn a new scout

Moderate Complexity: Optimizations Distinguished Leader Provides both distinguished proposer and distinguished learner Garbage Collection Each acceptor has to store every previous decision Once f + 1 have all decisions up to slot s, no need to store s or earlier

Paxos Questions?

CORFU: A Distributed Shared Log Mahesh Balakrishnan†, Dahlia Malkhi†, John Davis†, Vijayan Prabhakaran†, Michael Wei‡, and Ted Wobber† †Microsoft Research, ‡University of California, San Diego TOCS 2013 Distributed log designed for high throughput and strong consistency. Breaks log across multiple servers “Write once” semantics ensure serializability of writes

CORFU: Conflicts What happens on concurrent writes? The first write wins and the rest must retry Retrying repeatedly is very slow. Use sequencer to get write locations first

CORFU: Holes and fill What if a writer fails between getting a location and writing? Hole in the log! Can block applications which require complete logs (e.g. SMR) Provide a fill command to fill holes with junk Anyone can call fill If a writer was just slow, it will have to retry

CORFU: Replication Shards can be replicated however we want Chain replication is good for low replication factors (2-5) On failure, replacement server can take writes immediately Copying over the old log can happen in the background.

Thank You!