Distributed Programming and Consistency: Principles and Practice Peter Alvaro Neil Conway Joseph M. Hellerstein UC Berkeley.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Disorderly Distributed Programming with Bloom
Global States.
Edelweiss: Automatic Storage Reclamation for Distributed Programming Neil Conway Peter Alvaro Emily Andrews Joseph M. Hellerstein University of California,
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
BloomUnit Declarative testing for distributed programs Peter Alvaro UC Berkeley.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Distributed Systems Overview Ali Ghodsi
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
Declarative Distributed Programming with Dedalus and Bloom Peter Alvaro, Neil Conway UC Berkeley.
Logic and Lattices for Distributed Programming Neil Conway, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein UC Berkeley David Maier Portland State.
Logic and Lattices for Distributed Programming Neil Conway UC Berkeley Joint work with: Peter Alvaro, Peter Bailis, David Maier, Bill Marczak, Joe Hellerstein,
Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman.
Giovanni Chierico | May 2012 | Дубна Consistency in a distributed world.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Distributed Systems Spring 2009
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
CS 582 / CMPE 481 Distributed Systems
CS 582 / CMPE 481 Distributed Systems Replication.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed Systems Fall 2011 Gossip and highly available services.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Mobility in Distributed Computing With Special Emphasis on Data Mobility.
Disorderly programming for a distributed world Peter Alvaro UC Berkeley.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
CAP + Clocks Time keeps on slipping, slipping…. Logistics Last week’s slides online Sign up on Piazza now – No really, do it now Papers are loaded in.
Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.
The Complexity of Distributed Algorithms. Common measures Space complexity How much space is needed per process to run an algorithm? (measured in terms.
“Virtual Time and Global States of Distributed Systems”
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
ICS 313: Programming Language Theory Chapter 13: Concurrency.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 4: Introduction to C: Control Flow.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
Distributed Systems Lecture 6 Global states and snapshots 1.
Bloom: Big Systems, Small Programs Neil Conway UC Berkeley.
Consistency Analysis in Bloom: a CALM and Collected Approach Authors: Peter Alvaro, Neil Conway, Joseph M. Hellerstein, William R. Marczak Presented by:
CS 440 Database Management Systems
Greetings. Those of you who don't yet know me... Today is... and
Lecturer : Dr. Pavle Mogin
Replication and Consistency
Replication and Consistency
EECS 498 Introduction to Distributed Systems Fall 2017
CS 440 Database Management Systems
CRDTs and Coordination Avoidance (Lecture 8, cs262a)
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
This Lecture Substitution model
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Distributed Programming and Consistency: Principles and Practice Peter Alvaro Neil Conway Joseph M. Hellerstein UC Berkeley

Part I: Principles

Motivation

Why are distributed systems hard? Uncertainty In communication

Why are distributed systems hard? We attack at dawn ? ?

Why are distributed systems hard? Wait for my signal Then attack!

Why are distributed systems hard? Attack! No, WAIT! ?

Distributed systems are easier when messages are

Reorderable

Distributed systems are easier when messages are Reorderable Retryable

Distributed systems are easier when messages are Reorderable Retryable Retraction-free

(notes) Point to make: convergent objects are NOT retraction-free.

Context: replicated distributed systems Distributed: connected (but not always, or well) Replicated: redundant data

Context: replicated distributed systems Running example: a key-value store put(“dinner”, “pizza”) Dinner = pizza get(“dinner”) “pizza”

Context: replicated distributed systems Distributed  replication is desirable Distributed  consistency is expensive

Consistency? Definitions abound: pick one. but try to be consistent…

What isn’t consistent? Replication anomalies: Read anomalies (staleness) Write divergence (concurrent updates)

Anomalies Stale reads put(“dinner”, “pasta”) Dinner = pizza get(“dinner”) “pizza” Dinner = pasta ?

Anomalies Write conflicts put(“dessert”, “cake”) Dessert = cake Dessert = fruit put(“dessert”, “fruit”) ?

Consistency Anomalies witness replication. A consistent replicated datastore rules out (some) replication anomalies.

Consistency models Strong consistency Eventual consistency Weaker models

Strong consistency AKA ``single copy’’ consistency Replication is transparent; no witnesses of replication

Strong consistency Reads that could witness replication block Concurrent writes take turns

Strong consistency Some strategies: Single master with synchronous replication – Writes are totally ordered, reads see latest values Quorum systems – A (majority) ensemble simulates a single master ``State machine replication’’ – Use consensus to establish a total order over reads and writes systemwide

Strong consistency Drawbacks: Latency, availability, partition tolerance

Eventual Consistency Tolerate stale reads and concurrent writes Ensure that eventually* all replicas converge * When activity has ceased and all messages are delivered to all replicas

Eventual Consistency Strategies: Establish a total update order off critical path (eg bayou). – Epidemic (gossip-based) replication – Tentatively apply, then possibly retract, updates as the order is learned

Eventual Consistency Strategies: Deliver updates according to a ``cheap’’ order (e.g. causal). – Break ties with timestamps, merge functions hmmmm

Eventual Consistency Strategies: Constrain the application so that updates are reorderable Won’t always work. When will it work?

Eventual consistency – more definitions Convergence – Conflicting writes are uniformly resolved – Reads eventually return current data – State-centric Confluence – A program has deterministic executions Output is a function of input – Program-centric

Eventual consistency – more definitions Confluence is a strong correctness criterion – Not all programs are meant to be deterministic But it’s a nice property – E.g., for replay-based debugging – E.g., because the meaning of a program is its output (not its ``traces’’) Confluent => convergent

Eventual consistency – more definitions Confluent => convergent Deterministic executions imply replica agreement

Eventual consistency – more definitions But convergent state does not imply deterministic executions

(peter notes) EC systems focus only on controlling write anomalies (stale reads always fair game, though session guarantees may restrict which read anomalies can happen) EC systems are convergent – eventually there are no divergent states Deterministic => convergent (but not tother way) Determinism is compositional: two deterministic systems glued together make a deterministic system Convergence is not compositional: two convergent systems glued together do not necessarily make a convergent system (eg if the glue is NM)

(peter notes) Guarded asynchrony – need to carefully explain the significance of this Essentially, confluent programs cannot allow one-shot queries on changing state (even monotonically changing) One-shot queries must be converted into subscriptions to a stream of updates – That way, we are guaranteed to see the last update to a given lattice

(joe notes) There’s stuff you can do in The storage layer…. Or you can pop up. Layer vs language Sequential emulation at the storage layer Crdts – state-centric attempt to achieve relaxed ordering (object-by-object) Then bloom The programming model should match the computation model

Distributed design patterns for eventual consistency

ACID 2.0 The classic ACID has the goal to make the application perceive that there is exactly one computer and it is doing nothing else while this transaction is being processed. Consider the new ACID (or ACID2.0). The letters stand for: Associative, Commutative, Idempotent, and Distributed. The goal for ACID2.0 is to succeed if the pieces of the work happen:  At least once,  Anywhere in the system,  In any order. - Pat Helland, Building on quicksand

ACID 2.0 Associative -- operations can be ``eagerly’’ processed Commutative – operations can be reordered Idempotent – retry is always an option Distributed – (needed a ``D’’)

ACID 2.0 Instead of low-level reads and writes programmers use an abstract vocabulary of reorderable, retryable actions: Retry – a mechanism to ensure that all messages are delivered Reorderability -- ensures that all replicas converge

Putting ACID 2.0 into practice

1.CRDTs – A state-based approach – Keep distributed state in data structures providing only ACI methods 2.Disorderly programming – A language-based approach – Encourage structuring computation using reorderable statements and data

Formalizing ACID 2.0 ACI are precisely the properties that define the LUB operation in a join semilattice If states form a lattice, we can always merge states using the LUB.

C(v)RDTs Convergent Replicated Datatypes Idea: represent state as a join semilattice. Provide a ACI merge function

CRDTs Data structures: 1.Grow-only set (Gset) – Trivial – merge is union and union is commutative 2.2PSet – Two Gsets – one for adds, the other for tombstones – Idiosyncrasy: you can only add/delete once. 3.Counters 1.Tricky! Vector clock with an entry for each replica replica I => VC[i] += 1 3.Value: sum of all VC values

CRDTs Difficulties: Convergent objects alone are not strong enough to build confluent systems.

Asynchronous messaging You never really know

Asynchronous messaging AB C send AB C

Monotonic Logic The more you know, the more you know.

Monotonic Logic AB C E D A C E select/ filter

Monotonic Logic AB C project / map f(A)f(B) f(C)

Monotonic Logic A B C D E B D B D join / compose F

Monotonic Logic is order-insensitive AB C E D A C E

Monotonic Logic is pipelineable AB C E D A C E

Nonmonotonic Logic When do you know for sure?

Nonmonotonic Logic A B C D E B D set minus A B C D E Retraction! X Y Z

Nonmonotonic logic is order-sensitive A B C D E B D set minus A C E

Nonmonotonic logic is blocking A set minus A ? A?

Nonmonotonic logic is blocking A set minus A ? ``Sealed’’

CALM Analysis Asynchrony => loss of order Nonmonotonicity => order-sensitivity Asynchrony ; Nonmonotonicity => Inconsistency […]

CALM Analysis Asynchrony => loss of order Nonmonotonicity => order-sensitivity Asynchrony ; Nonmonotonicity => Inconsistency ? ``Point of Order’’ […]

Disorderly programming An aside about logic programming: In (classical) logic, theories are Associative and commutative – Consequences are the same regardless of the order in which we make deductions Idempotent – Axioms can be reiterated freely

Disorderly programming An aside about logic programming: In (classical) logic, theories are Associative, Commutative, and Idempotent because Knowledge is monotonic: The more you know, the more you know

Disorderly programming An aside about logic programming: It is challenging to even talk about order in logic programming languages [dedalus]. Yet we can build …

Disorderly programming Idea: embody the ACID 2.0 design patterns in how we structure distributed programs. Disorderly data: unordered relations Disorderly code: specify how data changes over time

Bloom

Bloom Rules do |mes, mem| [mem.address, mes.id, mes.payload] end multicast<~(message * members)

Operational model

Time Set (Union) Integer (Max) Boolean (Or) “Growth”: Larger Sets “Growth”: Larger Numbers “Growth”: false  true

Time Set (merge = Union) Integer (merge = Max) Boolean (merge = Or) size() >= 5 Monotone function from set  max Monotone function from max  boolean

Builtin Lattices NameDescription?a t bSample Monotone Functions lboolThreshold testfalse a ∨ b when_true() ! v lmaxIncreasing number 1max(a,b ) gt(n) ! lbool +(n) ! lmax -(n) ! lmax lminDecreasing number −1−1min(a,b)lt(n) ! lbool lsetSet of values;a [ bintersect(lset) ! lset product(lset) ! lset contains?(v) ! lbool size() ! lmax lpsetNon-negative set;a [ bsum() ! lmax lbagMultiset of values;a [ bmult(v) ! lmax +(lbag) ! lbag lmapMap from keys to lattice values empty map at(v) ! any-lat intersect(lmap) ! lmap 72

Quorum Vote in Bloom L QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud state do channel :vote_chn, :voter_id] channel :result_chn, lset :votes lmax :vote_cnt lbool :got_quorum end bloom do votes <= vote_chn {|v| v.voter_id} vote_cnt <= votes.size got_quorum <= vote_cnt.gt_eq(QUORUM_SIZE) result_chn <~ got_quorum.when_true { [RESULT_ADDR] } end Map set ! max Map max ! bool Threshold test on bool Lattice state declarations 73 Communication interfaces Accumulate votes into set Annotated Ruby class Program state Program logic Merge function for set lattice

Convergence – a 2PSet

The difficulty with queries ``The work of multiple transactions can interleave as long as they are doing the commutative operations. If any transaction dares to READ the value, that does not commute, is annoying, and stops other concurrent work.’’ -- Pat Helland