The Raft Consensus Algorithm

Slides:



Advertisements
Similar presentations
Paxos and Zookeeper Roy Campbell.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Consensus: Bridging Theory and Practice
Distributed Systems Overview Ali Ghodsi
The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010.
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
In Search of an Understandable Consensus Algorithm Diego Ongaro John Ousterhout Stanford University.
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
Distributed Deadlocks and Transaction Recovery.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Cool ideas from RAMCloud Diego Ongaro Stanford University Joint work with Asaf Cidon, Ankita Kejriwal, John Ousterhout, Mendel Rosenblum, Stephen Rumble,
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
An Introduction to Consensus with Raft
Control Systems Design Part: FS Slovak University of Technology Faculty of Material Science and Technology in Trnava 2007.
Practical Byzantine Fault Tolerance
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
RAMCloud Overview and Status John Ousterhout Stanford University.
Consensus and Consistency When you can’t agree to disagree.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Consensus and leader election Landon Cox February 6, 2015.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
CS426: Building Decentralized Systems Mahesh Balakrishnan.
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2013.
CSci8211: Distributed Systems: Raft Consensus Algorithm 1 Distributed Systems : Raft Consensus Alg. Developed by Stanford Platform Lab  Motivated by the.
Implementing Replicated Logs with Paxos John Ousterhout and Diego Ongaro Stanford University Note: this material borrows heavily from slides by Lorenzo.
The Raft Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University.
PIROGUE, A LIGHTER DYNAMIC VERSION OF THE RAFT DISTRIBUTED CONSENSUS ALGORITHM Jehan-François Pâris, U. of Houston Darrell D. E. Long, U. C. Santa Cruz.
CSci8211: Distributed Systems: State Machines 1 Detour: Some Theory of Distributed Systems Supplementary Materials  Replicated State Machines Notion of.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Consensus 2 Replicated State Machines, RAFT
Primary-Backup Replication
The consensus problem in distributed systems
Distributed Systems – Paxos
Paxos and Raft (Lecture 21, cs262a)
COS 418: Distributed Systems
How To Make MySQL Work With Raft
View Change Protocols and Reconfiguration
Paxos and Raft (Lecture 9 cont’d, cs262a)
Yang Zhang, Eman Ramadan, Hesham Mekky, Zhi-Li Zhang
Strong consistency and consensus
Providing Secure Storage on the Internet
RAFT: In Search of an Understandable Consensus Algorithm
Distributed Systems, Consensus and Replicated State Machines
EECS 498 Introduction to Distributed Systems Fall 2017
Middleware for Fault Tolerant Applications
PERSPECTIVES ON THE CAP THEOREM
From Viewstamped Replication to BFT
Our Final Consensus Protocol: RAFT
Log Structure log index term leader Server 1 command
EEC 688/788 Secure and Dependable Computing
THE GOOGLE FILE SYSTEM.
The SMART Way to Migrate Replicated Stateful Services
John Kubiatowicz (Lots of borrowed slides see attributions inline)
Implementing Consistency -- Paxos
Presentation transcript:

The Raft Consensus Algorithm Diego Ongaro John Ousterhout Stanford University http://raftconsensus.github.io

Raft Consensus Algorithm What is Consensus? Agreement on shared state (single system image) Recovers from server failures autonomously Minority of servers fail: no problem Majority fail: lose availability, retain consistency Key to building consistent storage systems Servers September 2014 Raft Consensus Algorithm

Inside a Consistent System TODO: eliminate single point of failure An ad hoc algorithm “This case is rare and typically occurs as a result of a network partition with replication lag.” – OR – A consensus algorithm (built-in or library) Paxos, Raft, … A consensus service ZooKeeper, etcd, consul, … September 2014 Raft Consensus Algorithm

Replicated State Machines Clients z6 Log Consensus Module State Machine Log Consensus Module State Machine Consensus Module State Machine x 1 y 2 z 6 x 1 y 2 z 6 x 1 y 2 z 6 Servers Log x3 y2 x1 z6 x3 y2 x1 z6 x3 y2 x1 z6 Replicated log  replicated state machine All servers execute same commands in same order Consensus module ensures proper log replication System makes progress as long as any majority of servers are up Failure model: fail-stop (not Byzantine), delayed/lost messages September 2014 Raft Consensus Algorithm

Raft Consensus Algorithm How Is Consensus Used? Top-level system configuration Replicate entire database state September 2014 Raft Consensus Algorithm

Raft Consensus Algorithm Paxos Protocol Leslie Lamport, 1989 Nearly synonymous with consensus “The dirty little secret of the NSDI community is that at most five people really, truly understand every part of Paxos ;-).” – NSDI reviewer “There are significant gaps between the description of the Paxos algorithm and the needs of a real-world system…the final system will be based on an unproven protocol.” – Chubby authors September 2014 Raft Consensus Algorithm

Raft’s Design for Understandability We wanted the best algorithm for building real systems Must be correct, complete, and perform well Must also be understandable “What would be easier to understand or explain?” Fundamentally different decomposition than Paxos Less complexity in state space Less mechanism September 2014 Raft Consensus Algorithm

Raft Consensus Algorithm Raft User Study Quiz Grades Survey Results September 2014 Raft Consensus Algorithm

Raft Consensus Algorithm Raft Overview Leader election Select one of the servers to act as cluster leader Detect crashes, choose new leader Log replication (normal operation) Leader takes commands from clients, appends them to its log Leader replicates its log to other servers (overwriting inconsistencies) Safety Only a server with an up-to-date log can become leader September 2014 Raft Consensus Algorithm

RaftScope Visualization September 2014 Raft Consensus Algorithm

Raft Consensus Algorithm Core Raft Review Leader election Heartbeats and timeouts to detect crashes Randomized timeouts to avoid split votes Majority voting to guarantee at most one leader per term Log replication (normal operation) Leader takes commands from clients, appends them to its log Leader replicates its log to other servers (overwriting inconsistencies) Built-in consistency check simplifies how logs may differ Safety Only elect leaders with all committed entries in their logs New leader defers committing entries from prior terms September 2014 Raft Consensus Algorithm

Raft Consensus Algorithm Randomized Timeouts How much randomization is needed to avoid split votes? Conservatively, use random range ~10x network latency September 2014 Raft Consensus Algorithm

Raft Implementations (Stale) go-raft Go Ben Johnson (Sky) and Xiang Li (CoreOS) kanaka/raft.js JS Joel Martin hashicorp/raft Armon Dadgar (HashiCorp) rafter Erlang Andrew Stone (Basho) ckite Scala Pablo Medina kontiki Haskell Nicolas Trangez LogCabin C++ Diego Ongaro (Stanford) akka-raft Konrad Malawski floss Ruby Alexander Flatten CRaft C Willem-Hendrik Thiart barge Java Dave Rusek harryw/raft Harry Wilkinson py-raft Python Toby Burress … September 2014 Raft Consensus Algorithm

Facebook HydraBase Example https://code.facebook.com/posts/321111638043166/hydrabase-the-evolution-of-hbase-facebook/ September 2014 Raft Consensus Algorithm

Raft Consensus Algorithm Conclusions Consensus widely regarded as difficult Raft designed for understandability Easier to teach in classrooms Better foundation for building practical systems Paper/thesis covers much more Cluster membership changes (simpler in thesis) Log compaction (expanded tech report/thesis) Client interaction (expanded tech report/thesis) Evaluation (thesis) September 2014 Raft Consensus Algorithm

Raft Consensus Algorithm Questions raftconsensus.github.io September 2014 Raft Consensus Algorithm