Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman.

Similar presentations


Presentation on theme: "Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman."— Presentation transcript:

1 Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman

2 Motivation  Replication and Consistency - essential features of large distributed systems such as www, p2p, and cloud computing  Lots of replicas Great for fault-tolerance and read latency × Problematic when updates occur Slow synchronization Conflicts in case of no synchronization 2

3 Motivation  We look for an approach that:  supports Replication  guarantees Eventual Consistency  is Fast and Simple  Conflict-free objects = no synchronization whatsoever  Is this practical? 3

4 Contributions Theory Strong Eventual Consistency (SEC)  A solution to the CAP problem  Formal definitions  Two sufficient conditions  Strong equivalence between the two  Incomparable to sequential consistency Practice CRDTs = Convergent or Commutative Replicated Data Types  Counters  Set  Directed graph 4

5 Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes  Preclude conflicts  Replicas update in the same total order  Any deterministic object  Consensus  Serialization bottleneck  Tolerates < n/2 faults  Correct, but doesn’t scale 5

6 Strong Consistency 6 Ideal consistency: all replicas know about the update immediately after it executes  Preclude conflicts  Replicas update in the same total order  Any deterministic object  Consensus  Serialization bottleneck  Tolerates < n/2 faults  Correct, but doesn’t scale

7 Strong Consistency 7 Ideal consistency: all replicas know about the update immediately after it executes  Preclude conflicts  Replicas update in the same total order  Any deterministic object  Consensus  Serialization bottleneck  Tolerates < n/2 faults  Correct, but doesn’t scale

8 Strong Consistency 8 Ideal consistency: all replicas know about the update immediately after it executes  Preclude conflicts  Replicas update in the same total order  Any deterministic object  Consensus  Serialization bottleneck  Tolerates < n/2 faults  Correct, but doesn’t scale

9 Strong Consistency 9 Ideal consistency: all replicas know about the update immediately after it executes  Preclude conflicts  Replicas update in the same total order  Any deterministic object  Consensus  Serialization bottleneck  Tolerates < n/2 faults  Correct, but doesn’t scale

10 Eventual Consistency  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex 10

11 Eventual Consistency 11  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

12 Eventual Consistency 12  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

13 Eventual Consistency 13  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

14 Eventual Consistency 14  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

15 Eventual Consistency 15  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

16 Eventual Consistency Reconcile 16  Update local and propagate  No foreground synch  Eventual, reliable delivery  On conflict  Arbitrate  Roll back  Consensus moved to background Better performance × Still complex

17 Strong Eventual Consistency  Update local and propagate  No synch  Eventual, reliable delivery  No conflict  deterministic outcome of concurrent updates  No consensus: ≤ n-1 faults  Solves the CAP problem 17

18 Strong Eventual Consistency 18  Update local and propagate  No synch  Eventual, reliable delivery  No conflict  deterministic outcome of concurrent updates  No consensus: ≤ n-1 faults  Solves the CAP problem

19 Strong Eventual Consistency 19  Update local and propagate  No synch  Eventual, reliable delivery  No conflict  deterministic outcome of concurrent updates  No consensus: ≤ n-1 faults  Solves the CAP problem

20 Strong Eventual Consistency 20  Update local and propagate  No synch  Eventual, reliable delivery  No conflict  deterministic outcome of concurrent updates  No consensus: ≤ n-1 faults  Solves the CAP problem

21 Strong Eventual Consistency 21  Update local and propagate  No synch  Eventual, reliable delivery  No conflict  deterministic outcome of concurrent updates  No consensus: ≤ n-1 faults  Solves the CAP problem

22 Definition of EC  Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas  Termination: All method executions terminate  Convergence: Correct replicas that have delivered the same updates eventually reach equivalent state  Doesn’t preclude roll backs and reconciling 22

23 Definition of SEC 23  Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas  Termination: All method executions terminate  Strong Convergence: Correct replicas that have delivered the same updates have equivalent state

24 System model  System of non- byzantine processes interconnected by an asynchronous network  Partition-tolerance and recovery  What are the two simple conditions that guarantee strong convergence? 24

25 Query  Client sends the query to any of the replicas  Local at source replica  Evaluate synchronously, no side effects 25

26 Query  Client sends the query to any of the replicas  Local at source replica  Evaluate synchronously, no side effects 26

27 Query  Client sends the query to any of the replicas  Local at source replica  Evaluate synchronously, no side effects 27

28 State-based approach 28 payload set initial state query update merge

29 State-based replication 29

30 State-based replication 30

31 State-based replication 31

32 State-based replication 32

33 State-based replication 33

34 Semi-lattice 34

35 If: then replicas converge to LUB of last values 35  payload type forms a semi-lattice  updates are increasing  merge computes Least Upper Bound

36 Operation-based approach 36 payload set initial state query prepare-update effect-update delivery precondition

37 Operation-based replication 37  Local at source  Precondition, compute  Broadcast to all replicas

38 Operation-based replication 38  Local at source  Precondition, compute  Broadcast to all replicas  Eventually, at all replicas:  Downstream precondition  Assign local replica

39 Operation-based replication 39  Local at source  Precondition, compute  Broadcast to all replicas  Eventually, at all replicas:  Downstream precondition  Assign local replica

40 If: then replicas converge  Liveness: all replicas execute all operations in delivery order where the downstream precondition (P) is true  Safety: concurrent operations all commute 40

41  A state-based object can emulate an operation-based object, and vice-versa  Use state-based reasoning and then covert to operation based for better efficiency 41

42 Comparison State-based  Update ≠ merge operation  Simple data types  State includes preceding updates; no separate historical information  Inefficient if payload is large  File systems (NFS, Dynamo) Operation-based  Update operation  Higher level, more complex  More powerful, more constraining  Small messages  Collaborative editing (Treedoc), Bayou, PNUTS State-based or op-based, as convenient 42

43 SEC is incomparable to sequential consistency 43

44 SEC is incomparable to sequential consistency 44

45 Example CRDTs  Multi-master counter  Observed-Remove Set  Directed Graph 45

46 Multi-master counter 46

47 Multi-master counter 47

48 Set design alternatives  Sequential specification:  {true} add(e) {e ∈ S}  {true} remove(e) {e ∈ S}  Concurrent: {true} add(e) ║ remove(e) {???}  linearizable?  error state?  last writer wins? add wins?  remove wins? 48

49 Observed-Remove Set 49

50 Observed-Remove Set 50

51 Observed-Remove Set 51

52 Observed-Remove Set 52

53 Observed-Remove Set 53

54 Observed-Remove Set 54

55 Observed-Remove Set 55

56 Observed-Remove Set 56

57 OR-Set + Snapshot 57

58 Sharded OR-Set  Very large objects  Independent shards  Static: hash, Dynamic: consensus  Statically-Sharded CRDT  Each shard is a CRDT  Update: single shard  No cross-object invariants  A combination of independent CRDTs remains a CRDT  Statically-Sharded OR-Set  Combination of smaller OR-Sets  Consistent snapshots: clock cross shards 58

59 Directed Graph – Motivation  Design a web search engine  compute page rank by a directed graph  Efficiency and scalability  Asynchronous processing  Responsiveness  Incremental processing, as fast as each page is crawled 59  Operations  Find new pages: add vertex  Parse page links: add/remove arc  Add URLs of linked pages to be crawled: add vertex  Deleted pages: remove vertex (lookup masks incident arcs)  Broken links allowed: add arc works even if tail vertex doesn’t exist

60 Graph design alternatives 60

61 Directed Graph (op-based) Payload: OR-Set V (vertices), OR-Set A (arcs) 61

62 Directed Graph (op-based) Payload: OR-Set V (vertices), OR-Set A (arcs) 62

63 Summary Principled approach  Strong Eventual Consistency Two sufficient conditions:  State: monotonic semi-lattice  Operation: commutativity Useful CRDTs  Multi-master counter, OR-Set, Directed Graph 63

64 Future Work Theory  Class of computations accomplished by CRDTs  Complexity classes of CRDTs  Classes of invariants supported by a CRDT  CRDTs and self-stabilization, aggregation, and so on Practice  Library implementation of CRDTs  Supporting non-critical synchronous operations (commiting a state, global reset, etc)  Sharding 64

65 Extras: MV-Register and the Shopping Cart Anomaly 65  MV-Register ≈ LWW-Set Register  Payload = { (value, versionVector) }  assign: overwrite value, vv++  merge: union of every element in each input set that is not dominated by an element in the other input set  A more recent assignment overwrites an older one  Concurrent assignments are merged by union (VC merge)

66 Extras: MV-Register and the Shopping Cart Anomaly 66  Shopping cart anomaly  deleted element reappears  MV-Register does not behave like a set  Assignment is not an alternative to proper add and remove operations

67 The problem with eventual consistency jokes is that you can't tell who doesn't get it from who hasn't gotten it. 67


Download ppt "Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman."

Similar presentations


Ads by Google