Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de HP: Hybrid Paxos for WANs Dan Dobre, Matthias.

Similar presentations


Presentation on theme: "© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de HP: Hybrid Paxos for WANs Dan Dobre, Matthias."— Presentation transcript:

1 © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de HP: Hybrid Paxos for WANs Dan Dobre, Matthias Majuntke, Marco Serafini and Neeraj Suri {dan,majuntke,marco,suri}@cs.tu-darmstadt.de TU Darmstadt, Germany

2 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 2 Resilience of Critical Services request reply clients n ≥ 2t+1 replicas server request no reply clients SMR  Safety Critical Systems  Resilience against catastrophic failures  State Machine Replication  Illusion of a single server that never fails  Wide Area Replication  Large and unpredictable delays in WANs  latency-optimal protocol

3 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 3 Which Consensus Protocol  State Machine Replication (SMR)  Clients propose commands to replicas  Agreement on sequence of commands → replicas are in consistent state when executing command sequence  Consensus protocol needed  Latency-optimal protocols  Latency: #message delays between when client proposes command and when command is learned by learner (to be executed).  Two Protocols by Lamport  Classic Paxos (CP) 3 message delays (during normal operation)3 message delays (during normal operation) Majority quorum for recoveryMajority quorum for recovery  Fast Paxos (FP) 2 message delays (during normal operation)2 message delays (during normal operation) 2 + 4 message delays in presence of collisions2 + 4 message delays in presence of collisions Larger quorum for recoveryLarger quorum for recovery Client → Leader → Acceptors → Client Client → Acceptors → Client

4 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 4 Paxos vs. Fast Paxos  Compared Latency  “Planetlab” Experiments  Simulation of the CP and FP msg. patterns (different topologies)  FP not always faster than CP  Some clients prefer CP, some FP  Single crash can turn setting

5 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 5 Motivation for a Hybrid Protocol  No clear winner between CP and FP  With respect to latency  Hybrid Protocol: Hybrid Paxos (HP)  Runs CP and FP in parallel  Chooses quickest outcome of two protocols  Implements Generalized Consensus Commuting commands may be chosen in any orderCommuting commands may be chosen in any order  Does not negatively affect throughput FP mode switched off when not beneficialFP mode switched off when not beneficial

6 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 6 Outline of the Talk  Contribution  System Model  Background on Paxos and Generalized Consensus  Hybrid Paxos protocol  Evaluation  Discussion  Conclusion

7 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 7 Contribution  Hybrid Paxos (HP)  CP with additional “fast mode“  Fast learning in absence of collisions  3 msg delays as CP in presence of collisions  Latency optimal  2f+1 servers, f may crash (optimal)  Linear number of messages (optimal)  First efficient implementation of Generalized Consensus  Experiments using Emulab  HP reaches theoretical minimum of latency  HP does not negatively affect throughput

8 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 8 System Model  Distributed System  n servers  Any number of clients (may crash)  Communication via reliable FIFO channels  Crash-stop model  At most minority of servers fails (n ≥ 2f+1), f = #crashes  Asynchrony  Ω Failure detector (eventually outputs same correct leader)  Generalized Consensus  Command History  Equivalence class of command sequences  Sequences c 1 and c 2 are equivalent iff executing them produces same outputs and state  commuting commands clientsservers

9 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 9 Background on Generalized Consensus  Protocol operates on command history = equivalence class of command sequences  Terms on histories  Prefix relation on histories  glb of histories (largest common prefix, intersection)  lub of histories (smallest common extension, union)  h and h‘ compatible iff exists g: h g, h‘ g  Definition of Generalized Consensus  Consistency: every two learned histories are compatible.  Nontriviality: if history is chosen than all contained commands have been proposed.  Conservatism: if history h is learned, then h was chosen.  Progress: if command c is proposed, eventually a history containing c is learned.

10 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 10 Background on Paxos Family  Following holds for CP, FP, and HP  Clients are proposers and learners  Servers are acceptors  Cooperate to choose single comand history  Acceptors query Ω and elect leader among them  Unique Leader needed for progress only  Paxos * protocols operate in rounds  Each leader is preassigned a set of round numbers  Operation modes  Recovery, to change rounds (must ensure consistency)  Normal operation  Quorums of acceptors  CP: any two quorums intersect  FP: requires larger fast quorums intersection of quorum and fast quorum FQ is larger than n-|FQ|intersection of quorum and fast quorum FQ is larger than n-|FQ| |FQ| n-|FQ| n-|FQ|+1

11 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 11 CP and FP Message Patterns Recovery (all protocols) cl ld acc Normal Operation of FP cl ld acc Normal Operation of CP Fast modeRecovery from collision 1a 1b2a 2b Phase 1Phase 2 2a2b 2bfast 1a 1b 2a 2b propose chosen

12 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 12 Ideas behind Message Patterns  Normal Operation CP  Client sends proposal (command) to leader  Leader appends command to history and sends history to acceptors (2a)  Acceptors accept history as local history  Acceptors send history back to client (2b)  Normal Operation FP  Client sends proposal to acceptors  Acceptors append commands to local fast history (optimistic)  Acceptors send history back to client (and leader) (2bfast)  Collision Recovery triggered by Leader  Recovery (to start a new round)  Phase 1: initialized by new leader (1a)  Acceptors send local histories to leader (1b)  Leader determines chosen history  Phase 2: Leader synchronizes acceptors to chosen history (2a)  Reply to clients (2b) Core of protocol

13 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 13 Combining the two protocols cl ld acc 2a 2b propose 2bfast chosen propose 2bfast  Execute CP and FP pattern in parallel  CP with additional FP mode  Acceptors locally maintain fast and classic history History from ld as classic historyHistory from ld as classic history Commands from cl appended to fast historyCommands from cl appended to fast history  No naïve combination  Clients learn either by receiving Quorum of equal 2b messages (learn)Quorum of equal 2b messages (learn) Fast Quorum of equal 2bfast messages and one 2b messageFast Quorum of equal 2bfast messages and one 2b message (hybrid learn) CPFP HP Needed also in FP for speculative execution

14 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 14 Hybrid Recovery  Same message pattern  Acceptors maintain separate histories  Classic history  Fast history  Leader perform CP and FP like recoveries in parallel  Determines history fh from FP recovery  Determines history h from CP recovery  Problem: h and fh might be incompatible (no common extension)  Determine largest prefix pfh of fh which is compatible with h  Pick lub of pfh and h (smallest common extension)  Why is this correct (sufficient for Consistency)?  To show: any history lh learned by hybrid learn is prefix of pfh.  lh fh, and all prefixes of fh compatible with h are prefixes of pfh  Sufficient to show: lh compatible with h  By hybrid learning: some acceptor holds lh as classic history  lh and h have been sent by leader  lh and h are compatible Neither h nor fh sufficient Goal: lub of h and fh

15 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 15 Implementation Optimization  Optimization 1 (msg complexity)  Leader does not send entire history to acceptors (2a)  FIFO channels  Optimization 2 (execution)  Implementing state machine at servers  Only leader executes commands (speculatively)  Prevents rollbacks at acceptors  Clients receive history digests + result  Optimization 3 (latency)  Diverging fast and classic histories during normal mode prevents hybrid learning  Periodically acceptors locally align fh to h (as in hybrid recovery)  Optimization 4 (throughput)  FP mode switched off during high load  Leader monitors load Also true for FP

16 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 16 Evaluation  Experimental setting  Banking system, two operations deposit and withdraw  deposit operations are commutable (Generalized Consensus)  Emulab test bed  20ms link delay between client and servers, 100Mbps  Topology similar to “Europe“ topology from beginning of presentation  Servers 600Mhz PC, Fedora 6

17 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 17 Latency  Latency of HP with varying withdraw rate = probability of collisions  Latency vs throughput (with and w/o batching)

18 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 18 Throughput  Throughput with increasing clients  Throughput with increasing number of f

19 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 19 Related Work  [Lamport: ACM Computer 1998] The Part-Time parliament  [Lamport: Dist. Comp. 2006] Fast Paxos  [Lamport: TR2005] Generalized Consensus and Paxos  [Dobre, Suri DSN2006] One-step Consensus with Zero-degradation  [Charron-Bost, Schiper: PRDC2006] Improving Fast Paxos: Being Optimal with no Overhead  Minimum latency of FP and CP only in failure-free runs  [Camargos, Schmidt, Pedone: NCA2008] Mulitcoordinated Agreement Protocols for Higher Availability  Improved availability of CP by multiple leaders; collision resolution req.  [Zielinski: DISC2005] Optimistic Generic Broadcast  Parallel execution of CP and FP; not resilience optimal; quadratic msg complexity  [Mao, Junqueira, Marzullo: OSDI2008] Mencius: Building Efficient Replicated State Machine for WANs  Based on CP; partition consensus instances among several leaders (throughput)  Each client has LAN connection to one leader (latency)  Perfect failure detector needed

20 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 20 Discussion  Comparison to CP  Implements CP  Never worse than CP  FP mode switched off when leader is highly loaded  Comparison to FP  HP and FP need 2 msg delays in absence of collisions  HP needs 3, FP needs 6 msg delays in presence of collisions  Experiments: Collision rate grows faster than server utilization rate Servers underutilized when hybrid learning rate below 50%Servers underutilized when hybrid learning rate below 50% FP would spend >50% of the time recovering from collisionsFP would spend >50% of the time recovering from collisions  Optimizations  Batching possible  Increasing throughput by a magnitude

21 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 21 Summary  HP: Hybrid Paxos  Idea: add fast learning to Paxos  Generalized Consensus protocol  First protocol with 2 msg delays in absence of collisions and 3 msg delays otherwise  Optimal latency, resilience and number of messages  Generalized Consensus is practical approach for WAN replication  HP can outperform state of the art protocols HP is a Generalized Consensus protocol that features minimal latency and maximum throughput in most situations !

22 EDCC, Valencia, January 24, 2016January 24, 2016 Matthias Majuntke 22 Thank you for your attention! Questions?


Download ppt "© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de HP: Hybrid Paxos for WANs Dan Dobre, Matthias."

Similar presentations


Ads by Google