Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 7: Consistency 4/13/20151Distributed Systems - COMP 655.
My first computer: The Apple ][ It wanted to be programmed.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
High throughput chain replication for read-mostly workloads
Teaser - Introduction to Distributed Computing
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Distributed Systems Overview Ali Ghodsi
Byzantine Generals Problem: Solution using signed messages.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
September 24, 2007The 3 rd CSAIL Student Workshop Byzantine Fault Tolerant Cooperative Caching Raluca Ada Popa, James Cowling, Barbara Liskov Summer UROP.
The ANSA project Failures and Dependability in ANSA.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Dynamo: Amazon’s Highly Available Key-value Store
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Consistency without consensus Linearizable Resilient Data Types (LRDT) Kaushik Rajan Sagar Chordia Kapil Vaswani Ganesan Ramalingam Sriram Rajamani.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Write Conflicts in Optimistic Replication Problem: replicas may accept conflicting writes. How to detect/resolve the conflicts? client B client A replica.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
TensorFlow– A system for large-scale machine learning
CSE 486/586 Distributed Systems Consistency --- 2
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Distributed Systems – Paxos
Parallel Programming By J. H. Wang May 2, 2017.
CSE 486/586 Distributed Systems Consistency --- 1
Dynamo: Amazon’s Highly Available Key-value Store
Asynchronous rebalancing of a replicated tree
Outline Announcements Fault Tolerance.
CRDTs and Coordination Avoidance (Lecture 8, cs262a)
CSE 486/586 Distributed Systems Consistency --- 1
Presented by Marek Zawirski
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Prophecy: Using History for High-Throughput Fault Tolerance
Scalable Causal Consistency
Distributed Transactions
Replication and Availability in Distributed Systems
Atomic Commit and Concurrency Control
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Based on slides by Ali Ghodsi and Ion Stoica
CSE 486/586 Distributed Systems Consistency --- 2
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Presentation transcript:

Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman

Motivation  Replication and Consistency - essential features of large distributed systems such as www, p2p, and cloud computing  Lots of replicas Great for fault-tolerance and read latency × Problematic when updates occur Slow synchronization Conflicts in case of no synchronization 2

Motivation  We look for an approach that:  supports Replication  guarantees Eventual Consistency  is Fast and Simple  Conflict-free objects = no synchronization whatsoever  Is this practical? 3

Contributions Theory Strong Eventual Consistency (SEC)  A solution to the CAP problem  Formal definitions  Two sufficient conditions  Strong equivalence between the two  Incomparable to sequential consistency Practice CRDTs = Convergent or Commutative Replicated Data Types  Counters  Set  Directed graph 4

Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes  Preclude conflicts  Replicas update in the same total order  Any deterministic object  Consensus  Serialization bottleneck  Tolerates < n/2 faults  Correct, but doesn’t scale 5

Strong Consistency 6 Ideal consistency: all replicas know about the update immediately after it executes  Preclude conflicts  Replicas update in the same total order  Any deterministic object  Consensus  Serialization bottleneck  Tolerates < n/2 faults  Correct, but doesn’t scale

Strong Consistency 7 Ideal consistency: all replicas know about the update immediately after it executes  Preclude conflicts  Replicas update in the same total order  Any deterministic object  Consensus  Serialization bottleneck  Tolerates < n/2 faults  Correct, but doesn’t scale

Strong Consistency 8 Ideal consistency: all replicas know about the update immediately after it executes  Preclude conflicts  Replicas update in the same total order  Any deterministic object  Consensus  Serialization bottleneck  Tolerates < n/2 faults  Correct, but doesn’t scale

Strong Consistency 9 Ideal consistency: all replicas know about the update immediately after it executes  Preclude conflicts  Replicas update in the same total order  Any deterministic object  Consensus  Serialization bottleneck  Tolerates < n/2 faults  Correct, but doesn’t scale

Eventual Consistency  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex 10

Eventual Consistency 11  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

Eventual Consistency 12  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

Eventual Consistency 13  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

Eventual Consistency 14  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

Eventual Consistency 15  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

Eventual Consistency Reconcile 16  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

Strong Eventual Consistency  Update local and propagate  No synch  Eventual, reliable delivery  No conflict  deterministic outcome of concurrent updates  No consensus: ≤ n-1 faults  Solves the CAP problem 17

Strong Eventual Consistency 18  Update local and propagate  No synch  Eventual, reliable delivery  No conflict  deterministic outcome of concurrent updates  No consensus: ≤ n-1 faults  Solves the CAP problem

Strong Eventual Consistency 19  Update local and propagate  No synch  Eventual, reliable delivery  No conflict  deterministic outcome of concurrent updates  No consensus: ≤ n-1 faults  Solves the CAP problem

Strong Eventual Consistency 20  Update local and propagate  No synch  Eventual, reliable delivery  No conflict  deterministic outcome of concurrent updates  No consensus: ≤ n-1 faults  Solves the CAP problem

Strong Eventual Consistency 21  Update local and propagate  No synch  Eventual, reliable delivery  No conflict  deterministic outcome of concurrent updates  No consensus: ≤ n-1 faults  Solves the CAP problem

Definition of EC  Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas  Termination: All method executions terminate  Convergence: Correct replicas that have delivered the same updates eventually reach equivalent state  Doesn’t preclude roll backs and reconciling 22

Definition of SEC 23  Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas  Termination: All method executions terminate  Strong Convergence: Correct replicas that have delivered the same updates have equivalent state

System model  System of non- byzantine processes interconnected by an asynchronous network  Partition-tolerance and recovery  What are the two simple conditions that guarantee strong convergence? 24

Query  Client sends the query to any of the replicas  Local at source replica  Evaluate synchronously, no side effects 25

Query  Client sends the query to any of the replicas  Local at source replica  Evaluate synchronously, no side effects 26

Query  Client sends the query to any of the replicas  Local at source replica  Evaluate synchronously, no side effects 27

State-based approach 28 payload set initial state query update merge

State-based replication 29

State-based replication 30

State-based replication 31

State-based replication 32

State-based replication 33

Semi-lattice 34

If: then replicas converge to LUB of last values 35  payload type forms a semi-lattice  updates are increasing  merge computes Least Upper Bound

Operation-based approach 36 payload set initial state query prepare-update effect-update delivery precondition

Operation-based replication 37  Local at source  Precondition, compute  Broadcast to all replicas

Operation-based replication 38  Local at source  Precondition, compute  Broadcast to all replicas  Eventually, at all replicas:  Downstream precondition  Assign local replica

Operation-based replication 39  Local at source  Precondition, compute  Broadcast to all replicas  Eventually, at all replicas:  Downstream precondition  Assign local replica

If: then replicas converge  Liveness: all replicas execute all operations in delivery order where the downstream precondition (P) is true  Safety: concurrent operations all commute 40

 A state-based object can emulate an operation-based object, and vice-versa  Use state-based reasoning and then covert to operation based for better efficiency 41

Comparison State-based  Update ≠ merge operation  Simple data types  State includes preceding updates; no separate historical information  Inefficient if payload is large  File systems (NFS, Dynamo) Operation-based  Update operation  Higher level, more complex  More powerful, more constraining  Small messages  Collaborative editing (Treedoc), Bayou, PNUTS State-based or op-based, as convenient 42

SEC is incomparable to sequential consistency 43

SEC is incomparable to sequential consistency 44

Example CRDTs  Multi-master counter  Observed-Remove Set  Directed Graph 45

Multi-master counter 46

Multi-master counter 47

Set design alternatives  Sequential specification:  {true} add(e) {e ∈ S}  {true} remove(e) {e ∈ S}  Concurrent: {true} add(e) ║ remove(e) {???}  linearizable?  error state?  last writer wins? add wins?  remove wins? 48

Observed-Remove Set 49

Observed-Remove Set 50

Observed-Remove Set 51

Observed-Remove Set 52

Observed-Remove Set 53

Observed-Remove Set 54

Observed-Remove Set 55

Observed-Remove Set 56

OR-Set + Snapshot 57

Sharded OR-Set  Very large objects  Independent shards  Static: hash, Dynamic: consensus  Statically-Sharded CRDT  Each shard is a CRDT  Update: single shard  No cross-object invariants  A combination of independent CRDTs remains a CRDT  Statically-Sharded OR-Set  Combination of smaller OR-Sets  Consistent snapshots: clock cross shards 58

Directed Graph – Motivation  Design a web search engine  compute page rank by a directed graph  Efficiency and scalability  Asynchronous processing  Responsiveness  Incremental processing, as fast as each page is crawled 59  Operations  Find new pages: add vertex  Parse page links: add/remove arc  Add URLs of linked pages to be crawled: add vertex  Deleted pages: remove vertex (lookup masks incident arcs)  Broken links allowed: add arc works even if tail vertex doesn’t exist

Graph design alternatives 60

Directed Graph (op-based) Payload: OR-Set V (vertices), OR-Set A (arcs) 61

Directed Graph (op-based) Payload: OR-Set V (vertices), OR-Set A (arcs) 62

Summary Principled approach  Strong Eventual Consistency Two sufficient conditions:  State: monotonic semi-lattice  Operation: commutativity Useful CRDTs  Multi-master counter, OR-Set, Directed Graph 63

Future Work Theory  Class of computations accomplished by CRDTs  Complexity classes of CRDTs  Classes of invariants supported by a CRDT  CRDTs and self-stabilization, aggregation, and so on Practice  Library implementation of CRDTs  Supporting non-critical synchronous operations (commiting a state, global reset, etc)  Sharding 64

Extras: MV-Register and the Shopping Cart Anomaly 65  MV-Register ≈ LWW-Set Register  Payload = { (value, versionVector) }  assign: overwrite value, vv++  merge: union of every element in each input set that is not dominated by an element in the other input set  A more recent assignment overwrites an older one  Concurrent assignments are merged by union (VC merge)

Extras: MV-Register and the Shopping Cart Anomaly 66  Shopping cart anomaly  deleted element reappears  MV-Register does not behave like a set  Assignment is not an alternative to proper add and remove operations

The problem with eventual consistency jokes is that you can't tell who doesn't get it from who hasn't gotten it. 67