Logic and Lattices for Distributed Programming Neil Conway UC Berkeley Joint work with: Peter Alvaro, Peter Bailis, David Maier, Bill Marczak, Joe Hellerstein, Sriram Srinivasan Basho Chats #004 June 27, 2012
Programming
Distributed Programming
Dealing with Disorder Introduce order –Paxos, Zookeeper, Two-Phase Commit, … –“Strong Consistency” Tolerate disorder –Correct behavior in the face of many possible network orders –Typical goal: replicas converge to same final state “Eventual Consistency”
Eventual Consistency PopularHard to program
Help developers build reliable programs on top of eventual consistency
This Talk 1. Theory –CRDTs, Lattices, and CALM 2. Practice –Programming with Lattices –Case Study: KVS
Read: {Alice, Bob} Write: {Alice, Bob, Dave} Write: {Alice, Bob, Carol} Students {Alice, Bob, Dave} Students {Alice, Bob, Carol} Client 0 Client 1 Read: {Alice, Bob} Students {Alice, Bob} How to resolve? Students {Alice, Bob}
Proble m Replicas perceive different event orders GoalSame final state at all replicas Solutio n Commutative operations (“merge functions”)
Students {Alice, Bob, Carol, Dave} Client 0 Client 1 Merge = Set Union
Commutative Operations Used by Dynamo, Riak, Bayou, etc. Formalized as CRDTs: Convergent and Commutative Replicated Data Types –Shapiro et al., INRIA ( ) –Based on join semilattices –Commutative, associative, idempotent Practical libraries: Statebox, Knockbox
Time Set (Union) Integer (Max) Boolean (Or) “Growth”: Larger Sets “Growth”: Larger Numbers “Growth”: false true
Client 0 Client 1 Students {Alice, Bob, Carol, Dave} Teams { } Read: {Alice, Bob, Carol, Dave} Read: { } Write: {, } Teams {, } Remove: {Dave} Students {Alice, Bob, Carol} Replica Synchronization Students {Alice, Bob, Carol} Teams {, }
Client 0 Client 1 Students {Alice, Bob, Carol, Dave} Teams { } Read: {Alice, Bob, Carol} Read: { } Teams { } Remove: {Dave} Students {Alice, Bob, Carol} Replica Synchronization Students {Alice, Bob, Carol} Nondeterministic Outcome! Teams { }
Possible Solution: Wrap both replicated values in a single complex CRDT
Goal: Compose larger application using “safe” mappings between simple lattices
Time Set (merge = Union) Integer (merge = Max) Boolean (merge = Or) size() >= 5 Monotone function from set max Monotone function from max boolean
Monotonicity in Practice “The more you know, the more you know” Never retract previous outputs (“mistake-free”) Typical patterns: immutable data accumulate knowledge over time threshold tests (“if” w/o “else”) Typical patterns: immutable data accumulate knowledge over time threshold tests (“if” w/o “else”)
Monotonicity and Determinism Agents strictly learn more knowledge over time Monotone: different learning order, same final outcome Result: Program is deterministic!
A program is confluent if it produces the same results regardless of network nondeterminism 20
A program is confluent if it produces the same results regardless of network nondeterminism 21
Consistency As Logical Monotonicity CALM Analysis 1.All monotone programs are confluent 2.Simple syntactic test for monotonicity Result: Simple static analysis for eventual consistency
Handling Non-Monotonicity … is not the focus of this talk Basic choices: 1.Nodes agree on an event order using a coordination protocol (e.g., Paxos) 2.Allow non-deterministic outcomes If needed, compensate and apologize
Putting It Into Practice What we’d like: Collection of agents No shared state ( message passing) Computation over arbitrary lattices
Bloom OrganizationCollection of agents CommunicationMessage passing StateRelations (sets) ComputationRelational rules over sets (Datalog, SQL)
BloomBloom L OrganizationCollection of agents CommunicationMessage passing StateRelations (sets)Lattices ComputationRelational rules over sets (Datalog, SQL) Functions over lattices
Quorum Vote in Bloom L QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud state do channel :vote_chn, :voter_id] channel :result_chn, lset :votes lmax :vote_cnt lbool :got_quorum end bloom do votes <= vote_chn {|v| v.voter_id} vote_cnt <= votes.size got_quorum <= vote_cnt.gt_eq(QUORUM_SIZE) result_chn <~ got_quorum.when_true { [RESULT_ADDR] } end Map set ! max Map max ! bool Threshold test on bool Lattice state declarations 27 Communication interfaces Accumulate votes into set Annotated Ruby class Program state Program logic Merge function for set lattice
Builtin Lattices NameDescription?a t bSample Monotone Functions lboolThreshold testfalse a ∨ b when_true() ! v lmaxIncreasing number 1max(a,b ) gt(n) ! lbool +(n) ! lmax -(n) ! lmax lminDecreasing number −1−1min(a,b)lt(n) ! lbool lsetSet of values;a [ bintersect(lset) ! lset product(lset) ! lset contains?(v) ! lbool size() ! lmax lpsetNon-negative set;a [ bsum() ! lmax lbagMultiset of values;a [ bmult(v) ! lmax +(lbag) ! lbag lmapMap from keys to lattice values empty map at(v) ! any-lat intersect(lmap) ! lmap 28
Case Study
Goal: Provably eventually consistent key-value store (KVS) Assumption: Map keys to lattice values (i.e., values do not decrease) Assumption: Map keys to lattice values (i.e., values do not decrease) Solution: Use a map lattice
Time Replica 1 Replica 2 Nested lattice value
Time Replica 1 Replica 2 Add new K/V pair
Time Replica 1 Replica 2 “Grow” value in extant K/V pair
Time Replica 1 Replica 2 Replica Synchronization
Goal: Provably eventually consistent KVS that stores arbitrary values Solution: Assign a version to each key-value pair Each replica stores increasing versions, not increasing values
Object Versions in Dynamo/Riak 1.Each KV pair has a vector clock version 2.Given two versions of a KV pair, prefer the one with the strictly greater version 3.If versions are incomparable, invoke user- defined merge function
Vector Clock: Map from node IDs logical clocks Logical Clock: Increasing counter Solution: Use a map lattice Solution: Use an increasing-int lattice
Version-Value Pairs Pair = Pair merge(Pair o) { if self.fst > o.fst: self elsif self.fst < o.fst: o else new Pair(self.fst.merge(o.fst), self.snd.merge(o.snd)) }
Time Replica 1 Replica 2
Time Replica 1 Replica 2 Version increase; NOT value increase
Time Replica 1 Replica 2 R1’s version replaces R2’s version
Time Replica 1 Replica 2 New R2
Time Replica 1 Replica 2 Concurrent writes!
Time Replica 1 Replica 2 Merge VC (automatically), value merge via user’s lattice (as in Dynamo)
Lattice Composition in KVS
Conclusion Dealing with EC Many event orders order- independent (disorderly) programs LatticesDisorderly state Monotone Functions Disorderly computation Monotone Bloom Lattices + monotone functions for safe distributed programming
Questions Welcome Please try Bloom! Or: gem install bud
Backup Slides
Lattices hS,t,?i is a bounded join semi-lattice iff: –S is a partially ordered set –t is a binary operator (“least upper bound”) For all x,y 2 S, x t y = z where x · S z, y · S z, and there is no z’ z 2 S such that z’ · S z. Associative, commutative, and idempotent –? is the “least” element in S (8x 2 S: ? t x = x) 49 Example: increasing integers –S = Z, t = max, ? = -∞
Monotone Functions f : S T is a monotone function iff 8a,b 2 S : a · S b ) f(a) · T f(b) 50 Example: size(Set) ! Increasing-Int size({A, B}) = 2 size({A, B, C}) = 3
From Datalog ! Lattices Datalog (Bloom)Bloom L StateRelationsLattices Example Values[[“red”, 1], [“green”, 2]]set: [“red”, “green”] map: {“red” => 1, “green” => 2} counter: 5 condition: false ComputationRules over relationsFunctions over lattices Monotone Computation Monotone rulesMonotone functions Program SemanticsFixpoint of rules (stratified semantics) Fixpoint of functions (stratified semantics) 51
Bloom Operational Model 52
QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud state do channel :vote_chn, :voter_id] channel :result_chn, table :votes, [:voter_id] scratch :cnt, [] => [:cnt] end bloom do votes <= vote_chn {|v| [v.voter_id]} cnt <= votes.group(nil, count(:voter_id)) result_chn = QUORUM_SIZE} end Quorum Vote in Bloom Communication Persistent Storage Transient Storage Accumulate votes Send message when quorum reached Not (set) monotonic! 53
Current Status WriteupsBloom L : UCB Tech ReportUCB Tech Report Bloom/CALM: CIDR’11, websiteCIDR’11website Lattice Runtime Available as a git branch To be merged soon-ish Examples, Case Studies KVS Shopping carts Causal delivery Under development: MDCC, concurrent editingMDCC