Fault-Tolerant SemiFast Implementations of Atomic Read/Write Registers

Slides:

Advertisements

Similar presentations

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.

Advertisements

A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:

Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.

Life after CAP Ali Ghodsi CAP conjecture [reminder] Can only have two of: – Consistency – Availability – Partition-tolerance Examples.

Coding for Atomic Shared Memory Emulation Viveck R. Cadambe (MIT) Joint with Prof. Nancy Lynch (MIT), Prof. Muriel Médard (MIT) and Dr. Peter Musial (EMC)

Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :

1 Principles of Reliable Distributed Systems Lecture 10: Atomic Shared Memory Objects and Shared Memory Emulations Spring 2007 Prof. Idit Keidar.

Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

1 © R. Guerraoui - Shared Memory - R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.

OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

Timed Quorum Systems … for large-scale and dynamic environments Vincent Gramoli, Michel Raynal.

SQUARE Scalable Quorum-based Atomic Memory with Local Reconfiguration Vincent Gramoli, Emmanuelle Anceaume, Antonino Virgillito.

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.

Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.

Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.

1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.

Efficient Fork-Linearizable Access to Untrusted Shared Memory Presented by: Alex Shraer (Technion) IBM Zurich Research Laboratory Christian Cachin IBM.

SysRép / 2.5A. SchiperEté The consensus problem.

Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.

Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.

Distributed Storage Systems: Data Replication using Quorums.

“Towards Self Stabilizing Wait Free Shared Memory Objects” By:  Hopeman  Tsigas  Paptriantafilou Presented By: Sumit Sukhramani Kent State University.

Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.

“Distributed Algorithms” by Nancy A. Lynch SHARED MEMORY vs NETWORKS Presented By: Sumit Sukhramani Kent State University.

EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

The consensus problem in distributed systems

Distributed Shared Memory

Atomic register algorithms

CSE 486/586 Distributed Systems Consistency --- 1

Lecturer : Dr. Pavle Mogin

COMP28112 – Lecture 14 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 13-Oct-18 COMP28112.

COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 19-Nov-18 COMP28112.

Introduction There are many situations in which we might use replicated data Let’s look at another, different one And design a system to work well in that.

Consistency Models.

Fault-Tolerant SemiFast Implementations of Atomic Read/Write Registers

Agreement Protocols CS60002: Distributed Systems

Parallel and Distributed Algorithms

Outline Announcements Fault Tolerance.

Distributed Systems, Consensus and Replicated State Machines

Outline Midterm results summary Distributed file systems – continued

CSE 486/586 Distributed Systems Consistency --- 1

Active replication for fault tolerance

PERSPECTIVES ON THE CAP THEOREM

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Distributed Shared Memory

Distributed Transactions

EEC 688/788 Secure and Dependable Computing

Fault-Tolerant State Machine Replication

COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 22-Feb-19 COMP28112.

Lecture 10: Consistency Models

Physical clock synchronization

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Constructing Reliable Registers From Unreliable Byzantine Components

- Atomic register specification -

EEC 688/788 Secure and Dependable Computing

R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch

CSE 486/586 Distributed Systems Consistency --- 1

R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch

Lecture 11: Consistency Models

Presentation transcript:

Fault-Tolerant SemiFast Implementations of Atomic Read/Write Registers Nicolas Nicolaou, University of Connecticut Joint work with: C. Georgiou, University of Cyprus A. A. Shvartsman, University of Connecticut 11/28/2019

What is an Atomic R/W Register? Write(7) Read The goal of this work is to investigate efficient implementations of atomic read/write registers. The read/write property of the register, imposes that only read or write operations can be performed on the register. The write operation writes a value on the register, and a read operation returns the value written. The read/write register is atomic if all the operations performed on the register can be order in a sequential manner. To achieve atomic consistency on a single register is relatively easy. (one operation can access the register at each time unit) However a single copy of the register constitutes a single point of failure and thus the system very vulnerable to failures. To increase availability and fault-tolerance we consider a distributed read/wirte register, where we have multiple copies of the register and these copies are replicated among a set of processes. Some of these replicas might fail, in our case by crashing. The challenge now is to maintain atomicity even though in the distributed environment the register might be accessed concurrently by more than a single process. Essentially we need to be able to order the operations on the distributed register such that they seem they happen in a sequential order. Write(0) 11/28/2019

Prior Results Attiya et al. 1995 - Single Writer Multiple Reader (SWMR) model where <1/2 of processes may crash: Pairs <value, tag> are used for ordering operations Writer increases tag and sends <value, tag> to a majority Reader: Phase 1: obtains maximum tag from a majority Phase 2 propagates the tag to a majority and then returns the value associated with that tag Lynch, Shvartsman 1997 and Englert, Shvartsman 2000 extend the above result for MWMR Quorums instead of majorities 2 round protocols for read/write operations Here we present some prior results on this area. Starting from Attiya et al, they presented the first implementation of a read write register in the SWMR model. In their model they assume that the majority of the processes holding the replicas do not crash. They use the <value, tag> pair (something we are going to use as well) to order the write operations. The protocols for the write and read operations are simple. In particular the writer sends the value he wants to write, with a new tag to the majority of the processes. For the read operation the reader obtains the values of the majority of the processes. It detects the highest timestamp among those values and before returning the associated timestamp it propagates the <maxTag, value> pair to the majority. A generalization of the first work suggested the use of quorums instead of majorities. Note that both approaches use 2 communication rounds for a read operation (2 phases). 11/28/2019

Fast Implementations Dutta, Guerraoui, Levy, Chakraborty 2004 SWMR model Single communication round for all write and read operations Requires R < (S/t) – 2 R: # readers, S: # servers, t: max # server failures Not applicable to MWMR A new result of Dutta et al presented in PODC2004 showed that fast implementations where both write and read operations perform a single communication round are possible in the SWMR. The main disadvantage of this work is that it introduces a strict constraint on the number of reader participants in the system. In particular, the number of the readers is inversely proportional to the number of failures in the system. As a consequence the number of readers must be strictly less than the number of servers (replicas) if there exists at least one failure in the system. The authors of that work though posed the question whether we can obtain semifast implementations, where we have fast reads or fast writes, and relax the bound on the number of readers. This question we are trying to answer in this work and we provide some analysis of our results. Question: Can one introduce SemiFast Implementations (with fast reads or fast writes) to relax the bound on the number of readers? 11/28/2019

Our Contributions Formally define semifast implementations Develop a semifast implementation Based on Fast implementation of Dutta et al. 04 Introduce the notion of virtual nodes Bounds On the Number of Virtual Nodes V < (S/t) - 2 Show that no SemiFast implementations are possible for MWMR Allow n communication rounds for the reads Simulation Results A small percentile of read operations require a second communication round. 11/28/2019

Semifast Implementations Def. An implementation I is semifast if it satisfies the following properties (informally): All writes are fast All complete read operations perform one or two communication rounds Ιf a read operation ρ1 performs two communication rounds, then all read operations that precede or succeed ρ1 and return the same* value as ρ1 are fast Τhere exists some execution of I which contains only fast read and write operations * Assuming all written values are unique 11/28/2019

Simulation Results NS2 Simulator Only 10% of read operations need to perform 2nd communication round Stochastic Environment Fix Interval Environment 11/28/2019

Conclusions Definition of Semifast implementations: Only one complete read operation has to perform 2 comm. rounds for every write operation #Virtual Nodes < (S/t) - 2 No semifast implementation possible for MWMR model 11/28/2019

References [Partha Dutta, Rachid Gerraoui, Ron R. Levy and Arindam Chakraborty, How Fast can a Distributed Atomic Read be, Proceedings of the 23rd annual ACM Symposium on Principles of distributed computing (PODC 2004), pp. 236- 245, ACM press 2004.] [S. Dolev, S. Gilbert, N.A.Lynch,A.A.Shvartsman,J.L.Welch Geoquorums:Implementing Atomic Memory in Mobile Ad-Hoc Networks, Technical Report LCS-TR-900, MIT (2003) ] [Nancy Lynch and Alex Shvartsman. Rambo: A reconfigurable atomic memory service for dynamic networks. In Proceedings of the 16th International Symposium on Distributed Computing, pages 173-- 190, 2002 ] [H.Attiya, A.Bar-Noy, and D.Dolev Sharing memory robustly in message-passing systems, Journal of the ACM, January 1995.] [B. Englert and A. A. Shvartsman. Graceful quorum reconfiguration in a robust emulation of shared memory.In International Conference on Distributed Computing Systems, pages 454–463, 2000] [N. A. Lynch and A. A. Shvartsman. Robust emulation of shared memory using dynamic quorumacknowledged broadcasts. In Symposium on Fault-Tolerant Computing, pages 272–281, 1997] 11/28/2019

Questions? 11/28/2019

Atomicity [Lynch96] Valid Executions: Invalid Executions: write(8) ack( ) write(8) Time Time read( ) ret(0) read( ) ret(0) read( ) ret(8) Definition: Read and Write - arbitrary time to complete Operations appear to occur at some point between its invocation and response. (serialization point) Produce a sequential trace. write(8) ack( ) write(8) Time Time read( ) ret(0) read( ) ret(0) read( ) ret(8) read( ) ret(0) 11/28/2019

Definitions Each process invokes 1 operation at a time. Each operation consists of: Invocation Step Matching Response Step Incomplete Operation: no matching response for the invocation. Complete operation op1 precedes op2 => response for op1 precedes invocation for op2. If op is a read we write “rd” If op is a write we write “wr” 11/28/2019

Definitions (Cont.) If rd returns x then there is wrk s.t. valk=x Algorithm implements a register => satisfies termination and atomicity properties Termination: Every operation by correct process completes. Atomicity (SWMR, wrk:kth write): If rd returns x then there is wrk s.t. valk=x If wrk precedes rd and rd returns valj, then j k If rd returns valk then wrk precedes or is concurrent to rd If rd1 returns valk and a succeeding rd2 returns valj then j k 11/28/2019

Atomic vs Shared Register Accessible from Single Process Write(v): Stores the value v and returns OK Read(): Read the last value stored Atomic Register: A distributed data structure Accessed by multiple processes concurrently Behaves as a sequential register. (Recall Atomicity) 11/28/2019

Atomic vs Shared Register (Graphical) Sequential Register: Atomic Register: Register=0 Register=8 Read(0) Read(8) WriteAck() Read() Write(8) Read() Read() Write(8) Write(8) Register WriteAck( ) ReadAck2(0) Read1( ) ReadAck1(8) Read2( ) 11/28/2019

When a SemiFast Impl. is Impossible? When V<(S/t)-2 If V≤(S/t)-1 then No fast implementation even in the case of a skip-free write operation. (violates non-triv. Property 3) If V=(S/t)-2 then there is an execution where we need 2 complete read operations to perform 2 com. rounds. (violates Property 1) When V=(S/t)-2 There exists an execution where 2 read operations return the same value and they both perform 2 com. rounds (violates Prop. 2). 11/28/2019

No Semifast for MWMR model. Proof Sketch: Split multiple round operations into: Read phases Write phases Show that as soon as an operation performs a write phase cannot change its return value. Show a construction where W=2, R=2 and t=1 and atomicity is violated. 11/28/2019

Challenge How fast can a general implementation of an Atomic Register can be? Dynamic Environment (Mobility) Hybrid implementations with some read and write operations to perform multiple roundtrips. Communication Overhead in such impl.? Quorum based algorithms. How fast can they be? 11/28/2019