© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de HP: Hybrid Paxos for WANs Dan Dobre, Matthias.

Slides:

Advertisements

Similar presentations

There is more Consensus in Egalitarian Parliaments Presented by Shayan Saeed Used content from the author's presentation at SOSP '13

Advertisements

CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.

Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001.

Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.

Distributed Systems Overview Ali Ghodsi

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.

Consensus Hao Li.

The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research.

Byzantine Generals Problem: Solution using signed messages.

Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.

State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010.

Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :

Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

CS 582 / CMPE 481 Distributed Systems

Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Paxos Quorum Leases Sayed Hadi Hashemi.

Chapter 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.

Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks Jiaqing Du, Daniele Sciascia, Sameh Elnikety.

CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.

State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.

Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.

Byzantine Fault Tolerance CS 425: Distributed Systems Fall Material drived from slides by I. Gupta and N.Vaidya.

Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:

Fault Tolerance via the State Machine Replication Approach Favian Contreras.

Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.

HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.

Consensus and Its Impossibility in Asynchronous Systems.

From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.

1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.

S-Paxos: Eliminating the Leader Bottleneck

Paxos A Consensus Algorithm for Fault Tolerant Replication.

Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.

Prof. Mort AnvariStrayer University at Arlington, VAAugust Exposing and Eliminating Vulnerabilities to Denial of Service Attacks in Secure Gossip-Based.

Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2012 Lecture 26 November 29, 2012 Presented By: Imranul Hoque 1.

CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.

SysRép / 2.5A. SchiperEté The consensus problem.

Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.

Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.

Detour: Distributed Systems Techniques

BChain: High-Throughput BFT Protocols

The consensus problem in distributed systems

Distributed Systems – Paxos

Alternative system models

Distributed Systems, Consensus and Replicated State Machines

Principles of Computer Security

Strayer University at Arlington, VA

EEC 688/788 Secure and Dependable Computing

From Viewstamped Replication to BFT

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EECS 498 Introduction to Distributed Systems Fall 2017

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

The SMART Way to Migrate Replicated Stateful Services

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Implementing Consistency -- Paxos

Sisi Duan Assistant Professor Information Systems

Presentation transcript:

© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group HP: Hybrid Paxos for WANs Dan Dobre, Matthias Majuntke, Marco Serafini and Neeraj Suri TU Darmstadt, Germany

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 2 Resilience of Critical Services request reply clients n ≥ 2t+1 replicas server request no reply clients SMR  Safety Critical Systems  Resilience against catastrophic failures  State Machine Replication  Illusion of a single server that never fails  Wide Area Replication  Large and unpredictable delays in WANs  latency-optimal protocol

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 3 Which Consensus Protocol  State Machine Replication (SMR)  Clients propose commands to replicas  Agreement on sequence of commands → replicas are in consistent state when executing command sequence  Consensus protocol needed  Latency-optimal protocols  Latency: #message delays between when client proposes command and when command is learned by learner (to be executed).  Two Protocols by Lamport  Classic Paxos (CP) 3 message delays (during normal operation)3 message delays (during normal operation) Majority quorum for recoveryMajority quorum for recovery  Fast Paxos (FP) 2 message delays (during normal operation)2 message delays (during normal operation) message delays in presence of collisions2 + 4 message delays in presence of collisions Larger quorum for recoveryLarger quorum for recovery Client → Leader → Acceptors → Client Client → Acceptors → Client

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 4 Paxos vs. Fast Paxos  Compared Latency  “Planetlab” Experiments  Simulation of the CP and FP msg. patterns (different topologies)  FP not always faster than CP  Some clients prefer CP, some FP  Single crash can turn setting

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 5 Motivation for a Hybrid Protocol  No clear winner between CP and FP  With respect to latency  Hybrid Protocol: Hybrid Paxos (HP)  Runs CP and FP in parallel  Chooses quickest outcome of two protocols  Implements Generalized Consensus Commuting commands may be chosen in any orderCommuting commands may be chosen in any order  Does not negatively affect throughput FP mode switched off when not beneficialFP mode switched off when not beneficial

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 6 Outline of the Talk  Contribution  System Model  Background on Paxos and Generalized Consensus  Hybrid Paxos protocol  Evaluation  Discussion  Conclusion

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 7 Contribution  Hybrid Paxos (HP)  CP with additional “fast mode“  Fast learning in absence of collisions  3 msg delays as CP in presence of collisions  Latency optimal  2f+1 servers, f may crash (optimal)  Linear number of messages (optimal)  First efficient implementation of Generalized Consensus  Experiments using Emulab  HP reaches theoretical minimum of latency  HP does not negatively affect throughput

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 8 System Model  Distributed System  n servers  Any number of clients (may crash)  Communication via reliable FIFO channels  Crash-stop model  At most minority of servers fails (n ≥ 2f+1), f = #crashes  Asynchrony  Ω Failure detector (eventually outputs same correct leader)  Generalized Consensus  Command History  Equivalence class of command sequences  Sequences c 1 and c 2 are equivalent iff executing them produces same outputs and state  commuting commands clientsservers

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 9 Background on Generalized Consensus  Protocol operates on command history = equivalence class of command sequences  Terms on histories  Prefix relation on histories  glb of histories (largest common prefix, intersection)  lub of histories (smallest common extension, union)  h and h‘ compatible iff exists g: h g, h‘ g  Definition of Generalized Consensus  Consistency: every two learned histories are compatible.  Nontriviality: if history is chosen than all contained commands have been proposed.  Conservatism: if history h is learned, then h was chosen.  Progress: if command c is proposed, eventually a history containing c is learned.

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 10 Background on Paxos Family  Following holds for CP, FP, and HP  Clients are proposers and learners  Servers are acceptors  Cooperate to choose single comand history  Acceptors query Ω and elect leader among them  Unique Leader needed for progress only  Paxos * protocols operate in rounds  Each leader is preassigned a set of round numbers  Operation modes  Recovery, to change rounds (must ensure consistency)  Normal operation  Quorums of acceptors  CP: any two quorums intersect  FP: requires larger fast quorums intersection of quorum and fast quorum FQ is larger than n-|FQ|intersection of quorum and fast quorum FQ is larger than n-|FQ| |FQ| n-|FQ| n-|FQ|+1

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 11 CP and FP Message Patterns Recovery (all protocols) cl ld acc Normal Operation of FP cl ld acc Normal Operation of CP Fast modeRecovery from collision 1a 1b2a 2b Phase 1Phase 2 2a2b 2bfast 1a 1b 2a 2b propose chosen

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 12 Ideas behind Message Patterns  Normal Operation CP  Client sends proposal (command) to leader  Leader appends command to history and sends history to acceptors (2a)  Acceptors accept history as local history  Acceptors send history back to client (2b)  Normal Operation FP  Client sends proposal to acceptors  Acceptors append commands to local fast history (optimistic)  Acceptors send history back to client (and leader) (2bfast)  Collision Recovery triggered by Leader  Recovery (to start a new round)  Phase 1: initialized by new leader (1a)  Acceptors send local histories to leader (1b)  Leader determines chosen history  Phase 2: Leader synchronizes acceptors to chosen history (2a)  Reply to clients (2b) Core of protocol

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 13 Combining the two protocols cl ld acc 2a 2b propose 2bfast chosen propose 2bfast  Execute CP and FP pattern in parallel  CP with additional FP mode  Acceptors locally maintain fast and classic history History from ld as classic historyHistory from ld as classic history Commands from cl appended to fast historyCommands from cl appended to fast history  No naïve combination  Clients learn either by receiving Quorum of equal 2b messages (learn)Quorum of equal 2b messages (learn) Fast Quorum of equal 2bfast messages and one 2b messageFast Quorum of equal 2bfast messages and one 2b message (hybrid learn) CPFP HP Needed also in FP for speculative execution

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 14 Hybrid Recovery  Same message pattern  Acceptors maintain separate histories  Classic history  Fast history  Leader perform CP and FP like recoveries in parallel  Determines history fh from FP recovery  Determines history h from CP recovery  Problem: h and fh might be incompatible (no common extension)  Determine largest prefix pfh of fh which is compatible with h  Pick lub of pfh and h (smallest common extension)  Why is this correct (sufficient for Consistency)?  To show: any history lh learned by hybrid learn is prefix of pfh.  lh fh, and all prefixes of fh compatible with h are prefixes of pfh  Sufficient to show: lh compatible with h  By hybrid learning: some acceptor holds lh as classic history  lh and h have been sent by leader  lh and h are compatible Neither h nor fh sufficient Goal: lub of h and fh

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 15 Implementation Optimization  Optimization 1 (msg complexity)  Leader does not send entire history to acceptors (2a)  FIFO channels  Optimization 2 (execution)  Implementing state machine at servers  Only leader executes commands (speculatively)  Prevents rollbacks at acceptors  Clients receive history digests + result  Optimization 3 (latency)  Diverging fast and classic histories during normal mode prevents hybrid learning  Periodically acceptors locally align fh to h (as in hybrid recovery)  Optimization 4 (throughput)  FP mode switched off during high load  Leader monitors load Also true for FP

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 16 Evaluation  Experimental setting  Banking system, two operations deposit and withdraw  deposit operations are commutable (Generalized Consensus)  Emulab test bed  20ms link delay between client and servers, 100Mbps  Topology similar to “Europe“ topology from beginning of presentation  Servers 600Mhz PC, Fedora 6

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 17 Latency  Latency of HP with varying withdraw rate = probability of collisions  Latency vs throughput (with and w/o batching)

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 18 Throughput  Throughput with increasing clients  Throughput with increasing number of f

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 19 Related Work  [Lamport: ACM Computer 1998] The Part-Time parliament  [Lamport: Dist. Comp. 2006] Fast Paxos  [Lamport: TR2005] Generalized Consensus and Paxos  [Dobre, Suri DSN2006] One-step Consensus with Zero-degradation  [Charron-Bost, Schiper: PRDC2006] Improving Fast Paxos: Being Optimal with no Overhead  Minimum latency of FP and CP only in failure-free runs  [Camargos, Schmidt, Pedone: NCA2008] Mulitcoordinated Agreement Protocols for Higher Availability  Improved availability of CP by multiple leaders; collision resolution req.  [Zielinski: DISC2005] Optimistic Generic Broadcast  Parallel execution of CP and FP; not resilience optimal; quadratic msg complexity  [Mao, Junqueira, Marzullo: OSDI2008] Mencius: Building Efficient Replicated State Machine for WANs  Based on CP; partition consensus instances among several leaders (throughput)  Each client has LAN connection to one leader (latency)  Perfect failure detector needed

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 20 Discussion  Comparison to CP  Implements CP  Never worse than CP  FP mode switched off when leader is highly loaded  Comparison to FP  HP and FP need 2 msg delays in absence of collisions  HP needs 3, FP needs 6 msg delays in presence of collisions  Experiments: Collision rate grows faster than server utilization rate Servers underutilized when hybrid learning rate below 50%Servers underutilized when hybrid learning rate below 50% FP would spend >50% of the time recovering from collisionsFP would spend >50% of the time recovering from collisions  Optimizations  Batching possible  Increasing throughput by a magnitude

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 21 Summary  HP: Hybrid Paxos  Idea: add fast learning to Paxos  Generalized Consensus protocol  First protocol with 2 msg delays in absence of collisions and 3 msg delays otherwise  Optimal latency, resilience and number of messages  Generalized Consensus is practical approach for WAN replication  HP can outperform state of the art protocols HP is a Generalized Consensus protocol that features minimal latency and maximum throughput in most situations !

EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 22 Thank you for your attention! Questions?