BREWER’S CONJECTURE AND THE FEASIBILITY OF CAP WEB SERVICES (Eric Brewer) Seth Gilbert Nancy Lynch Presented by Kfir Lev-Ari.

Slides:



Advertisements
Similar presentations
Global States.
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
CS 542: Topics in Distributed Systems Diganta Goswami.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
6.852: Distributed Algorithms Spring, 2008 Class 7.
Life after CAP Ali Ghodsi CAP conjecture [reminder] Can only have two of: – Consistency – Availability – Partition-tolerance Examples.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Consensus Hao Li.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 Complexity of Network Synchronization Raeda Naamnieh.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Ordering and Consistent Cuts Presented By Biswanath Panda.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Data Sharing in OSD Environment Dingshan He September 30, 2002.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Introduction Consistency models –Data-centric consistency models.
Concurrency. Correctness Principle A transaction is atomic -- all or none property. If it executes partly, an invalid state is likely to result. A transaction,
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Chapter 18.3: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
IBM Haifa Research 1 The Cloud Trade Off IBM Haifa Research Storage Systems.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
CAP + Clocks Time keeps on slipping, slipping…. Logistics Last week’s slides online Sign up on Piazza now – No really, do it now Papers are loaded in.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
Efficient Fork-Linearizable Access to Untrusted Shared Memory Presented by: Alex Shraer (Technion) IBM Zurich Research Laboratory Christian Cachin IBM.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
SysRép / 2.5A. SchiperEté The consensus problem.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Part 1. Managing replicated server groups These questions pertain to managing server groups with replication, as in e.g., Chubby, Dynamo, and the classical.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
“Towards Self Stabilizing Wait Free Shared Memory Objects” By:  Hopeman  Tsigas  Paptriantafilou Presented By: Sumit Sukhramani Kent State University.
CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
“Distributed Algorithms” by Nancy A. Lynch SHARED MEMORY vs NETWORKS Presented By: Sumit Sukhramani Kent State University.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
The consensus problem in distributed systems
Trade-offs in Cloud Databases
Database Concepts.
Strong Consistency & CAP Theorem
Strong Consistency & CAP Theorem
Alternating Bit Protocol
EECS 498 Introduction to Distributed Systems Fall 2017
Outline Announcements Fault Tolerance.
PERSPECTIVES ON THE CAP THEOREM
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
CAP Theorem and Consistency Models
Transaction Properties: ACID vs. BASE
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Implementing Consistency -- Paxos
CSE 486/586 Distributed Systems Consistency --- 3
Presentation transcript:

BREWER’S CONJECTURE AND THE FEASIBILITY OF CAP WEB SERVICES (Eric Brewer) Seth Gilbert Nancy Lynch Presented by Kfir Lev-Ari

INTRODUCTION Brewer’s Conjecture (At PODC 2000) - It is impossible for a web service to provide the following three guarantees: o Consistency o Availability o Partition-tolerance

STORY TIME The story of 2 servers 1 CAP

MOTIVATION (1) And you have a brilliant idea! You’ll create a web service named :

MOTIVATION (2) You’ll give your users the following API: SetValue(Key, Value) GetValue(Key) And you’ll promise them two basic things: 1. To be available 24/7 2. GetValue will return the last value that was set for a given key.

MOTIVATION (3)

MOTIVATION (4)

MOTIVATION (5) ? Send

MOTIVATION (6)

MOTIVATION (7)

FORMAL MODEL AND THEOREM

FORMAL MODEL (1) Atomic / Linearizable Consistency (of a web service) – o There must exist a total order on all operations such that each operation looks as if it were completed at a single thread. o i.e. Each server returns the right response to each request. o Equivalent to having a single up-to-date copy of the data.

FORMAL MODEL (2) Availability (of a web service) – o Every request received by a non-failing node in the system must result in a response. o In other words – any algorithm used by the service must eventually terminate. o Note that there is no bound on how long the algorithm may ran before terminating, and therefore the theorem allows unbounded computation. o On the other hand, even when severe network failures occur, every request must terminate.

FORMAL MODEL (3) Partition Tolerance (of a web service) – o When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost. o Note that unlike the previous two requirements, partition tolerance is really a statement about the underlying system rather than the service itself : it is the communication among the servers that is unreliable.

NOTE – THIS CAP ISN’T MADE OF ACID The ACID (Atomicity, Consistency, Isolation, Durability) properties focus on consistency and are the traditional approach of databases. CAP properties describe desirable network shared-data system. In ACID C means that a transaction preserves all the database rules, such as unique keys. (ACID consistency cannot be maintained across partitions.) The C in CAP refers only to single-copy consistency (request/response operation sequence), a strict subset of ACID consistency.

ASYNCHRONOUS NETWORKS (1) Theorem 1 It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties: Availability Atomic consistency in all fair executions (including those in which messages are lost).

ASYNCHRONOUS NETWORKS (2) Proof: We prove this by contradiction. Assume an algorithm A exists that meets the three criteria: atomicity, availability, and partition tolerance. We construct an execution of A in which there exists a request that returns an inconsistent response. Assume that the network consists of at least two nodes. Thus it can be divided into two disjoint, non-empty sets: {G1, G2}. The basic idea of the proof is to assume that all messages between G1 and G2 are lost. If a write occurs in G1, and later a read occurs in G2, then the read operation cannot return the results of the earlier write operation.

ASYNCHRONOUS NETWORKS (3) The good scenario: 1. A writes a new value of V, which we'll call V Then a message (M) is passed from N 1 to N 2 which updates the copy of V there. 3. Now any read by B of V will return V 1. If the network partitions (that is messages from N 1 to N 2 are not delivered) then N 2 contains an inconsistent value of V when step (3) occurs.

ASYNCHRONOUS NETWORKS (4) Corollary 1.1 It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties: Availability, in all fair executions. Atomic consistency, in fair executions in which no messages are lost.

ASYNCHRONOUS NETWORKS (5) Proof: The main idea is that in the asynchronous model an algorithm has no way of determining whether a message has been lost, or has been arbitrarily delayed in the transmission channel. Therefore if there existed an algorithm that guaranteed atomic consistency in executions in which no messages were lost, then there would exist an algorithm that guaranteed atomic consistency in all executions. This would violate Theorem 1.

FROM A TRANSACTIONAL PERSPECTIVE Say we have a transaction called α – in α 1 A writes new values of V and in α 2 B reads values of V. On a local system this would be easily handled by a database with some simple locking, isolating any attempt to read in α 2 until α 1 completes safely. In the distributed model though, with nodes N 1 and N 2 to worry about, the intermediate synchronizing message has also to complete. Unless we can control when α 2 happens, we can never guarantee it will see the same data values α 1 writes. All methods to add control (blocking, isolation, centralized management, etc.) will impact either partition tolerance or the availability of α 1 (A) and/or α 2 (B).

SOLUTIONS IN THE ASYNCHRONOUS MODEL (1) “2 of 3” : o CP (Atomic consistency, Partition Tolerant) : By using stronger liveness criterion, many distributed databases provide this type of guarantee, especially algorithms based on distributed locking or quorums: if certain failure patterns occur, then the liveness condition is weakened and the service no longer returns responses. If there are no failures, then liveness is guaranteed. o CA (Atomic consistency, Available) : Systems that run on intranets and LANs are an example of these types of algorithms. o AP (Available, Partition Tolerant) : Web caches are one example of a weakly consistent network.

SOLUTIONS IN THE ASYNCHRONOUS MODEL (2)

CIRCUMVENT THE IMPOSSIBILITY? (1)

CIRCUMVENT THE IMPOSSIBILITY? (2) Theorem 2 It is impossible in the partially synchronous network model to implement a read/write data object that guarantees the following properties: Availability Atomic consistency in all executions (even those in which messages are lost)

CIRCUMVENT THE IMPOSSIBILITY? (3) Proof: Same methodology as in case of Theorem 1 is used. We divide the network into two components {G1, G2} and construct an admissible execution in which a write happens in one component, followed by a read operation in the other component. This read operation can be shown to return inconsistent data.

CIRCUMVENT THE IMPOSSIBILITY? (4) (Reminder from asynchronous model) Corollary 1.1 It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties: Availability, in all fair executions, Atomic consistency, in fair executions in which no messages are lost. In partially synchronous model - the analogue of Corollary 1.1 does not hold, the proof of this corollary depends on nodes being unaware of when a message is lost. There are partially synchronous algorithms that will 1. return atomic data when all messages in an execution are delivered (i.e. there are no partitions) 2. return inconsistent data only when messages are lost.

CIRCUMVENT THE IMPOSSIBILITY? (5)

CIRCUMVENT THE IMPOSSIBILITY? (6)

CAP OF CONFUSION

WHY “2 OF 3” IS MISLEADING? (1) I.Partitions are rare, and there is little reason to forfeit C or A when the system is not partitioned. II.The choice between C and A can occur many times within the same system at very fine granularity. Not only can subsystems make different choices, but the choice can change according to the operation or even the specific data or user involved. III.All three properties are more continuous than binary: 1. Availability is obviously continuous from 0% to 100%. 2. There are many levels of consistency. 3. Partitions have nuances, including disagreement within the system about whether a partition exists.

WHY “2 OF 3” IS MISLEADING? (2)

CAP-LATENCY CONNECTION The essence of CAP takes place during a timeout, a period when the program must make a fundamental decision – the partition decision: cancel the operation and thus decrease availability, or proceed with the operation and thus risk inconsistency. In its classic interpretation, the CAP theorem ignores latency, although in practice, latency and partitions are deeply related. Partition is a time bound on communication. Failing to achieve consistency within the time bound (due to high latency) implies a partition and thus a choice between C and A for this operation. In addition, some systems (for example Yahoo’s PNUTS) gives up consistency not for the goal of improving availability, but for lower latency.

MORE PROBLEMS WITH CAP? We saw that there is no real use in CP systems (systems that aren’t available?!) so the real meaning is that availability is only sacrificed when there is a network partition. In practice, this means that the roles of A and C in CAP are asymmetric - Systems that sacrifice consistency (AP systems) tend to do so all the time, not just when there is a network partition. Is there any practical difference between CA and CP systems? As written above, CP system sacrificed availability when there is a network partition. CA systems are not tolerance for network partitions, thus they won’t be available if there is a partition. So practically speaking, CA and CP are identical. The only real question is – what are you going to give up on partition, C or A? “2 out of 3” is just confusing..

THE BIG (THEORETICAL) PICTURE

WHAT’S THE REAL STORY HERE? (1)

WHAT’S THE REAL STORY HERE? (2) Replicated state machine paradigm is one of the most common approached for building reliable distributed services. This paradigm achieves availability by replicating the service across a set of servers. The servers then agree [aka consensus] on every operation performed by the service. The impossibility of fault-tolerant consensus implies that services built according to the replicated state machine paradigm cannot achieve both availability and consistency in an asynchronous network. (consensus impossibility was proved in 1985)

CONCLUSION We have shown that it impossible to reliably provide atomic consistent data when there are partitions in the network. It is feasible, however, to achieve any two of the three properties : consistency, availability and partition tolerance. In an asynchronous model, when no clocks are available, the impossibility result is fairly strong : it is impossible to provide consistent data, even allowing stale data to be returned when messages are lost. However, in partially synchronous models it is possible to achieve a practical compromise between consistency and availability.

REFERENCES 1.Seth Gilbert and Nancy Lynch “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services” SigAct News, June, 2002Seth Gilbert and Nancy Lynch “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services” SigAct News, June, Eric Brewer “CAP Twelve Years Later: How the “Rules” Have Changed” IEEE Computer (Volume:45, Issue: 2 ) Feb. 2012Eric Brewer “CAP Twelve Years Later: How the “Rules” Have Changed” IEEE Computer (Volume:45, Issue: 2 ) Feb Seth Gilbert and Nancy Lynch “Perspectives on the CAP theorem” IEEE Computer (Volume:45, Issue: 2 ) Feb. 2012Seth Gilbert and Nancy Lynch “Perspectives on the CAP theorem” IEEE Computer (Volume:45, Issue: 2 ) Feb. 2012

APPENDIX A – PROOF OF THEOREM 1

APPENDIX B – PROOF OF COROLLARY 1.1

APPENDIX C – PROOF OF THEOREM 2

BACKUP SLIDES

WEAKER CONSISTENCY CONDITIONS (1) While it is useful to guarantee that atomic data will be returned in executions in which all messages are delivered, it is equally important to specify what happens in executions in which some of the messages are lost We discuss possible weaker consistency condition that allows stale data to be returned when there are partitions, yet place formal requirements on the quality of stale data returned This consistency guarantee will require availability and atomic consistency in executions in which no messages are lost and is therefore impossible to guarantee in the asynchronous model as a result of corollary In the partially synchronous model it often makes sense to base guarantees on how long an algorithm has had to rectify a situation This consistency model ensures that if messages are delivered, then eventually some notion of atomicity is restored

WEAKER CONSISTENCY CONDITIONS (2) In a atomic execution, we define a partial order of the read and write operations and then require that if one operation begins after another one ends, the former does not precede the latter in the partial order. We define a weaker guarantee, t-Connected Consistency, which defines a partial order in similar manner, but only requires that one operation not precede another if there is an interval between the operations in which all messages are delivered

WEAKER CONSISTENCY CONDITIONS (3) A timed execution, α of a read-write object is t-Connected Consistent if two criteria hold. First in executions in which no messages are lost, the execution is atomic. Second, in executions in which messages are lost, there exists a partial order P on the operations in α such that : 1. P orders all write operations, and orders all read operations with respect to the write operations 2. The value returned by every read operation is exactly the one written by the previous write operation in P or the initial value if there is no such previous write in P 3. The order in P is consistent with the order of read and write requests submitted at each node 4. Assume that there exists an interval of time longer than t in which no messages are lost. Further, assume an operation, θ completes before the interval begins, and another operation φ, begins after the interval ends. Then φ does not precede θ in the partial order P

WEAKER CONSISTENCY CONDITIONS (4) t-Connected Consistency  This guarantee allows for some stale data when messages are lost, but provides a time limit on how long it takes for consistency to return, once the partition heals.  This definition can be generalized to provide consistency guarantees when only some of the nodes are connected and when connections are available only some of the time.

WEAKER CONSISTENCY CONDITIONS (5) A variant of ”centralized algorithm” is t-Connected Consistent. Assume node C is the centralized node. The algorithm behaves as follows: Read at node A : A sends a request to C from the most recent value. If A receives a response from C within time 2 ∗ tmsg + tlocal, it saves the value and returns it to the client. Otherwise, A concludes that a message was lost and it returns the value with the highest sequence number that has ever been received from C, or the initial value if no value has yet been received from C. (When a client read request occurs at C it acts like any other node, sending messages to itself)

WEAKER CONSISTENCY CONDITIONS (6) Write at A : A sends a message to C with the new value. A waits 2 ∗ tmsg + tlocal, or until it receives an acknowledgement from C and then sends an acknowledgement to the client. At this point, either C has learned of the new value, or a message was lost, or both events occurred. If A concludes that a message was lost, it periodically retransmits the value to C (along with all values lost during earlier write operations) until it receives an acknowledgement from C. (As in the case of read operations, when a client write request occurs at C it acts like any other node, sending messages to itself) New value is received at C: C serializes the write requests that it hears about by assigning them consecutive integer tags. Periodically C broadcasts the latest value and sequence number to all other nodes.

WEAKER CONSISTENCY CONDITIONS (7)

WEAKER CONSISTENCY CONDITIONS (8) Theorem 4 The modified centralized algorithm is t-Connected consistent