Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks Jiaqing Du, Daniele Sciascia, Sameh Elnikety.

Slides:



Advertisements
Similar presentations
Paxos and Zookeeper Roy Campbell.
Advertisements

Chapter 12 Message Ordering. Causal Ordering A single message should not be overtaken by a sequence of messages Stronger than FIFO Example of FIFO but.
There is more Consensus in Egalitarian Parliaments Presented by Shayan Saeed Used content from the author's presentation at SOSP '13
CS542 Topics in Distributed Systems Diganta Goswami.
CS 542: Topics in Distributed Systems Diganta Goswami.
High throughput chain replication for read-mostly workloads
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Distributed Systems Overview Ali Ghodsi
The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research.
Causal Consistency Without Dependency Check Messages Willy Zwaenepoel.
Presented By Alon Adler – Based on OSDI ’12 (USENIX Association)
1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL.
Computer Science 425 Distributed Systems CS 425 / ECE 428  2013, I. Gupta, K. Nahrtstedt, S. Mitra, N. Vaidya, M. T. Harandi, J. Hou.
SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services Zhe Wu, Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, Harsha.
CS 582 / CMPE 481 Distributed Systems Replication.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks Jiaqing Du | Calin Iorgulescu | Amitabha Roy | Willy Zwaenepoel École polytechnique.
Paxos Quorum Leases Sayed Hadi Hashemi.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Low-Latency Multi-Datacenter Databases using Replicated Commit
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
1 System Models. 2 Outline Introduction Architectural models Fundamental models Guideline.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes.
Byzantine fault tolerance
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
S-Paxos: Eliminating the Leader Bottleneck
Paxos A Consensus Algorithm for Fault Tolerant Replication.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group HP: Hybrid Paxos for WANs Dan Dobre, Matthias.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Minimizing Commit Latency of Transactions in Geo-Replicated Data Stores Paper Authors: Faisal Nawab, Vaibhav Arora, Divyakant Argrawal, Amr El Abbadi University.
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
BChain: High-Throughput BFT Protocols
Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 Benjamin Wester1, James Cowling2, Edmund B. Nightingale3, Peter.
Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li
a journey from the simple to the optimal
Distributed Systems – Paxos
Alternative system models
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
CSCI5570 Large Scale Data Processing Systems
Clock-SI: Snapshot Isolation for Partitioned Data Stores
Distributed Transactions and Spanner
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
Principles of Computer Security
EECS 498 Introduction to Distributed Systems Fall 2017
Replication Improves reliability Improves availability
Active replication for fault tolerance
Fault-tolerance techniques RSM, Paxos
From Viewstamped Replication to BFT
Lecture 21: Replication Control
Fault-Tolerant State Machine Replication
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Systems CS
COS 418: Distributed Systems Lecture 16 Wyatt Lloyd
The SMART Way to Migrate Replicated Stateful Services
Lecture 21: Replication Control
Implementing Consistency -- Paxos
Sisi Duan Assistant Professor Information Systems
Presentation transcript:

Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks Jiaqing Du, Daniele Sciascia, Sameh Elnikety Willy Zwaenepoel, Fernando Pedone EPFL, University of Lugano, Microsoft Research

Replicated State Machines (RSM) Strong consistency – Execute same commands in same order – Reach same state from same initial state Fault tolerance – Store data at multiple replicas – Failure masking / fast failover 2

Geo-Replication Data Center High latency among replicas Messaging dominates replication latency 3

Leader-Based Protocols Order commands by a leader replica Require extra ordering messages at follower Leader client request client reply Ordering Replication High latency for geo replication Ordering 4 Follower

Clock-RSM Orders commands using physical clocks Overlaps ordering and replication 5 client request client reply Ordering + Replication Low latency for geo replication

Outline Clock-RSM Comparison with Paxos Evaluation Conclusion 6

Outline Clock-RSM Comparison with Paxos Evaluation Conclusion 7

Property and Assumption Provides linearizability Tolerates failure of minority replicas Assumptions – Asynchronous FIFO channels – Non-Byzantine faults – Loosely synchronized physical clocks 8

Protocol Overview client requestclient reply client requestclient reply 9 PrepOK cmd1.ts = Clock() cmd2.ts = Clock() Clock-RSM cmd1 cmd2 cmd1 cmd2 cmd1 cmd2 cmd1 cmd2 cmd1 cmd2

Major Message Steps Prep: Ask everyone to log a command PrepOK: Tell everyone after logging a command R0R0 R2R2 R1R1 client request R3R3 R4R4 Prep PrepOK cmd1.ts = 24 PrepOK cmd1 committed? client request cmd2.ts = 23 10

Commit Conditions A command is committed if – Replicated by a majority – All commands ordered before are committed Wait until three conditions hold C1: Majority replication C2: Stable order C3: Prefix replication 11

C1: Majority Replication More than half replicas log cmd1 R0R0 R2R2 R1R1 client request R3R3 R4R4 PrepOK cmd1.ts = 24 Prep Replicated by R 0, R 1, R 2 1 RTT: between R 0 and majority 12

C2: Stable Order Replica knows all commands ordered before cmd1 – Receives a greater timestamp from every other replica R0R0 R2R2 R1R1 client request R3R3 R4R4 24 cmd1.ts = RTT: between R 0 and farthest peer cmd1 is stable at R 0 13 Prep / PrepOK / ClockTime

C3: Prefix Replication All commands ordered before cmd1 are replicated by a majority 14 R0R0 R2R2 R1R1 client request R3R3 R4R4 cmd1.ts = 24 cmd2 is replicated by R 1, R 2, R 3 cmd2.ts = 23 Prep PrepOk 1 RTT: R 4 to majority + majority to R 0 client request Prep PrepOk

Overlapping Steps 15 R0R0 R2R2 R1R1 client request R3R3 R4R4 Latency of cmd1 : about 1 RTT to majority client reply Majority replication Stable order Prefix replication PrepOK Prep Log(cmd1) Prep PrepOk cmd1.ts = 24

Commit Latency StepLatency Majority replication 1 RTT (majority1) Stable order 0.5 RTT (farthest) Prefix replication 1 RTT (majority2) Overall latency = MAX{ 1 RTT (majority1), 0.5 RTT (farthest), 1 RTT (majority2) } 16 If 0.5 RTT (farthest) < 1 RTT (majority), then overall latency ≈ 1 RTT (majority).

R0R0 Topology Examples Majority1 Farthest R0R0 Majority1 Farthest R3R3 R4R4 R2R2 R1R1 R4R4 R3R3 R2R2 R1R1 17 client request

Outline Clock-RSM Comparison with Paxos Evaluation Conclusion 18

Paxos 1: Multi-Paxos Single leader orders commands – Logical clock: 0, 1, 2, 3,... R0R0 Leader R 2 R1R1 client request Prep Commit Forward client reply PrepOK R3R3 R4R4 Latency at followers: 2 RTTs (leader & majority) 19

Paxos 2: Paxos-bcast Every replica broadcasts PrepOK – Trades off message complexity for latency R0R0 Leader R 2 R1R1 client request Prep Forward client reply PrepOK R3R3 R4R4 Latency at followers: 1.5 RTTs (leader & majority) 20

Clock-RSM vs. Paxos With realistic topologies, Clock-RSM has – Lower latency at Paxos follower replicas – Similar / slightly higher latency at Paxos leader 21 ProtocolLatency Clock-RSMAll replicas: 1 RTT (majority) if 0.5 RTT (farthest) < 1 RTT (majority) Paxos-bcastLeader: 1 RTT (majority) Follower: 1.5 RTTs (leader & majority)

Outline Clock-RSM Comparison with Paxos Evaluation Conclusion 22

Experiment Setup Replicated key-value store Deployed on Amazon EC2 California (CA) Virginia (VA) Ireland (IR) Singapore (SG) Japan (JP) 23

Latency (1/2) All replicas serve client requests 24

Overlapping vs. Separate Steps CA VA IR SG JP 25 CA VA (L) IR SG JP Clock-RSM latency: max of three Paxos-bcast latency: sum of three client request

Latency (2/2) Paxos leader is changed to CA 26

Throughput Five replicas on a local cluster Message batching is key 27

Also in the Paper A reconfiguration protocol Comparison with Mencius Latency analysis of protocols 28

Conclusion Clock-RSM: low latency geo-replication – Uses loosely synchronized physical clocks – Overlaps ordering and replication Leader-based protocols can incur high latency 29