Presentation is loading. Please wait.

Presentation is loading. Please wait.

Declarative Distributed Programming with Dedalus and Bloom Peter Alvaro, Neil Conway UC Berkeley.

Similar presentations


Presentation on theme: "Declarative Distributed Programming with Dedalus and Bloom Peter Alvaro, Neil Conway UC Berkeley."— Presentation transcript:

1 Declarative Distributed Programming with Dedalus and Bloom Peter Alvaro, Neil Conway UC Berkeley

2 This Talk 1. Background – BOOM Analytics 2. Theory – Dedalus – CALM 3. Practice – Bloom – Lattices

3 Berkeley Orders of Magnitude Vision: Can we build small programs for large distributed systems? Approach: Language  system design System  language design

4 Initial Language: Overlog Data-centric programming – Uniform representation of system state High-level, declarative query language – Distributed variant of Datalog Express systems as queries

5 BOOM Analytics Goal: “Big Data” stack – API-compliant – Competitive performance System: [EuroSys’10] – Distributed file system HDFS compatible – Hadoop job scheduler

6 What Worked Well Concise, declarative implementation – 10-20x more concise than Java (LOCs) – Similar performance (within 10-20%) Separation of policy and mechanism Ease of evolution 1. High availability (failover + Paxos) 2. Scalability (hash partitioned FS master) 3. Monitoring as an aspect

7 What Worked Poorly Unclear semantics – “Correct” semantics defined by interpreter behavior In particular, 1. change (e.g., state update) 2. uncertainty (e.g., async communication)

8 Temporal Ambiguity Goal: Increment a counter upon “request” message Send response message with value of counter counter(“hostname”,0). counter(To,X+1) :- counter(To,X), req(To,_). response(@From,X) :- counter(@To,X), req(@To,From). When is counter incremented? What does response contain?

9 Implicit Communication Implicit communication was the wrong abstraction for systems programming. – Hard to reason about partial failure Example: we never used distributed joins in the file system! path(@S,D) :- link(@S,Z), path(@Z,D).

10 Received Wisdom We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and take into account issues of concurrency and partial failure. Jim Waldo et al., A Note on Distributed Computing (1994)

11 Dedalus (it’s about time) Explicitly represent logical time as an attribute of all knowledge “Time is a device that was invented to keep everything from happening at once.” (Unattributed)

12 Dedalus: Syntax Datalog + temporal modifiers 1. Instantaneous (deduction) 2. Deferred (sequencing) True at successor time 3. Asynchronous (communication) True at nondeterministic future time

13 Dedalus: Syntax (1) Deductive rule: (Plain Datalog) (2) Inductive rule: (Constraint across “next” timestep) (3) Async rule: (Constraint across arbitrary timesteps) p(A,B,S) :- q(A,B,T), T=S. p(A,B,S) :- q(A,B,T), S=T+1. p(A,B,S) :- q(A,B,T), time(S), choose((A,B,T), (S)). Logical time

14 Syntax Sugar (1) Deductive rule: (Plain Datalog) (2) Inductive rule: (Constraint across “next” timestep) (3) Async rule: (Constraint across arbitrary timesteps) p(A,B) :- q(A,B). p(A,B)@next :- q(A,B). p(A,B)@async:- q(A,B).

15 State Update p(A, B)@next :- p(A, B), notin p_del(A, B). Example Trace: p(1, 2)@101; p(1, 3)@102; p_del(1, 3)@300; Timep(1, 2)p(1, 3)p_del(1, 3) 101 102... 300 301

16 Logic and time Key relationships: Atomicity Mutual exclusion Sequentiality Overlog: Relationships among facts Dedalus: Also, relationships between states

17 Change and Asynchrony Overlog counter(“hostname”,0). counter(To,X+1) :- counter(To,X), req(To,_). response(@From,X) :- counter(@To,X), req(@To,From). Dedalus counter(“hostname”,0). counter(To,X+1) :- counter(To,X), req(To,_). counter(To,X) :- counter(To,X), notin req(To,_). response(@From,X) :- counter(X), req(@To,From). @async @next Increment is deferred Pre-increment value sent Non-deterministic delivery time

18 Dedalus: Semantics Goal: declarative semantics – Reason about a program’s meaning rather than its executions Approach: model theory

19 Minimal Models A negation-free (monotonic) Datalog program has a unique minimal model ModelNo “missing” facts MinimalNo “extra” facts UniqueProgram has a single meaning

20 Stable Models The consequences of async rules hold at a nondeterministic future time – Captured by the choice construct Greco and Zaniolo (1998) – Each choice leads to a distinct model Intuition: A stable model is an execution trace

21 Traces and models counter(To, X+1)@next :- counter(To, X), request(To, _). counter(To, X)@next :- counter(To, X), notin request(To, _). response(@From, X)@async :- counter(@To, X), request(@To, From). response(From, X)@next :- response(From, X). Persistence rules lead to infinitely large models Async rules lead to infinitely many models

22 An Execution counter(To, X+1)@next :- counter(To, X), request(To, _). counter(To, X)@next :- counter(To, X), notin request(To, _). response(@From, X)@async :- counter(@To, X), request(@To, From). response(From, X)@next :- response(From, X). counter(“node1”, 0)@0. request(“node1”, “node2”)@0.

23 A Stable Model counter(“node1”, 0)@0. request(“node1”, “node2”)@0. counter(“node1”, 1)@1. counter(“node1”,1)@2. […] response(“node2”, 0)@100. counter(“node1”, 1)@101. counter(“node1”, 1)@102. response(“node2”, 0)@101. response(“node2”, 0)@102. […] A stable model for choice = 100 counter(To, X+1)@next :- counter(To, X), request(To, _). counter(To, X)@next :- counter(To, X), notin request(To, _). response(@From, X)@async :- counter(@To, X), request(@To, From). response(From, X)@next :- response(From, X).

24 Ultimate Models A stable model characterizes an execution – Many of these models are not “interestingly” different Wanted: a characterization of outcomes – An ultimate model contains exactly those facts that are “eventually always true”

25 Traces and models counter(To,X+1)@next :- counter(To,X), request(To,_). counter(To,X)@next :- counter(To,X), notin request(To,_). response(@From,X)@async :- counter(@To,X), request(@To,From). response(From,X)@next :- response(From,X). counter(“node1”, 0)@0. request(“node1”, “node2”)@0. counter(“node1”, 1)@1. counter(“node1”, 1)@2. […] response(“node2”, 0)@100. counter(“node1”, 1)@101. response(“node2”, 0)@101. […] counter(“node1”, 1)@102. response(“node2”, 0)@102.

26 Traces and models counter(To,X+1)@next :- counter(To,X), request(To,_). counter(To,X)@next :- counter(To,X), notin request(To,_). response(@From,X)@async :- counter(@To,X), request(@To,From). response(From,X)@next :- response(From,X). counter(“node1”, 1). response(“node2”, 0). Ultimate Model

27 Confluence This program has a unique ultimate model – In fact, all negation-free Dedalus programs have a unique ultimate model [DL2’12] We call such programs confluent: same program outcome, regardless of network non-determinism

28 The Bloom Programming Language

29

30 Lessons from Dedalus 1. Clear program semantics is essential 2. Avoid implicit communication 3. Confluence seems promising

31 Lessons From Building Systems 1. Syntax matters! – Datalog syntax is cryptic and foreign 2. Adopt, don’t reinvent – DSL > standalone language – Use host language’s type system (E. Meijer) 3. Modularity is important – Scoping – Encapsulation

32 Bloom Operational Model

33 Bloom Rule Syntax <= now <+ next <- delete (at next) <~ async table persistent state scratch transient state channel network transient map, flat_map reduce, group join, outerjoin empty?, include? Local computation State update Asynchronous message passing

34 QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud state do channel :vote_chn, [:@addr, :voter_id] channel :result_chn, [:@addr] table :votes, [:voter_id] scratch :cnt, [] => [:cnt] end bloom do votes <= vote_chn {|v| [v.voter_id]} cnt <= votes.group(nil, count(:voter_id)) result_chn = QUORUM_SIZE} end Example: Quorum Vote Communication interfaces Coordinator state Accumulate votes Send message when quorum reached 34 Bloom state Bloom logic Ruby class definition Count votes Asynchronous messaging

35 Question: How does confluence relate to practical problems of distributed consistency?

36 Common Technique: Replicate state at multiple sites, for: Fault tolerance Reduced latency Read throughput

37 Problem: Different replicas might observe events in different orders … and then reach different conclusions!

38 Alternative #1: Enforce consistent event order at all nodes (“Strong Consistency”)

39 Alternative #1: Enforce consistent event order at all nodes (“Strong Consistency”) Problems: Availability CAP Theorem Latency

40 Alternative #2: Achieve correct results for any network order (“Weak Consistency”)

41 Alternative #2: Achieve correct results for any network order (“Weak Consistency”) Concerns: Writing order-independent programs is hard!

42 Challenge: How can we make it easier to write order-independent programs?

43 Order-Independent Programs Alternative #1: – Start with a conventional language – Reason about when order can be relaxed This is hard! Especially for large programs.

44 Taking Order For Granted Data(Ordered) array of bytes Compute(Ordered) sequence of instructions Writing order-sensitive programs is too easy!

45 Order-Independent Programs Alternative #1: – Start with a conventional language – Reason about when order can be relaxed This is hard! Especially for large programs. Alternative #2: – Start with an order-independent language – Add order explicitly, only when necessary – “Disorderly Programming”

46 (Leading) Question: So, where might we find a nice order-independent programming language? Recall: All monotone Dedalus programs are confluent.

47 Monotonic Logic As input set grows, output set does not shrink Order independent e.g., map, filter, join, union, intersection Non-Monotonic Logic New inputs might invalidate previous outputs Order sensitive e.g., aggregation, negation

48 Consistency As Logical Monotonicity CALM Analysis [CIDR’11] 1. Monotone programs are deterministic (confluent) [Ameloot’11, Marczak’12] 2. Simple syntactic test for monotonicity Result: Whole-program static analysis for eventual convergence

49 Case Study

50 Scenario

51

52

53

54 Questions 1. Will cart replicas eventually converge? – “Eventual Consistency” 2. What will client observe on checkout? – Goal: checkout reflects all session actions 3. To achieve #1 and #2, how much additional coordination is required?

55 if kvs[x] exists: old = kvs[x] kvs.delete(x) if old > c kvs[x] = old – c Design #1: Mutable State Add(item x, count c): if kvs[x] exists: old = kvs[x] kvs.delete(x) else old = 0 kvs[x] = old + c Remove(item x, count c): Non-monotonic!

56 CALM Analysis Conclusion: Every operation might require coordination! Non-monotonic!

57 if kvs[x] exists: old = kvs[x] kvs.delete(x) if old > c kvs[x] = old – c Subtle Bug Add(item x, count c): if kvs[x] exists: old = kvs[x] kvs.delete(x) else old = 0 kvs[x] = old + c Remove(item x, count c): What if remove before add?

58 Design #2: “Disorderly” Add(item x, count c): Append x,c to add_log Remove(item x, count c): Append x,c to del_log Checkout(): Group add_log by item ID; sum counts. Group del_log by item ID; sum counts. For each item, subtract deletions from additions. Non-monotonic!

59 CALM Analysis Conclusion: Replication is safe; might need to coordinate on checkout Monotonic

60 Takeaways Major difference in coordination cost! – Coordinate once per operation vs. Coordinate once per checkout Disorderly accumulation when possible – Monotone growth  confluent “Disorderly”: common design in practice! – e.g., Amazon Dynamo

61 Generalizing Monotonicity Monotone logic: growing sets over time – Partial order: set containment In practice, other kinds of growth: – Version numbers, timestamps – “In-progress”  committed/aborted – Directories, sequences, …

62 Example: Quorum Vote 62 Not (set-wise) monotonic! QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud state do channel :vote_chn, [:@addr, :voter_id] channel :result_chn, [:@addr] table :votes, [:voter_id] scratch :cnt, [] => [:cnt] end bloom do votes <= vote_chn {|v| [v.voter_id]} cnt <= votes.group(nil, count(:voter_id)) result_chn = QUORUM_SIZE} end

63 Challenge: Extend monotone logic to allow other kinds of “growth”

64 h S, t, ?i is a bounded join semilattice iff: – S is a set – t is a binary operator (“least upper bound”) Induces a partial order on S: x · S y if x t y = y Associative, Commutative, and Idempotent – “ACID 2.0” Informally, LUB is “merge function” for S – ? is the “least” element in S 8 x 2 S: ? t x = x

65 Time Set ( t = Union) Increasing Int ( t = Max) Boolean ( t = Or)

66 f : S  T is a monotone function iff: 8 a,b 2 S : a · S b ) f(a) · T f(b)

67 Time Set ( t = Union) Increasing Int ( t = Max) Boolean ( t = Or) size() >= 3 Monotone function: set  increase-int Monotone function: increase-int  boolean

68 Quorum Vote with Lattices QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud state do channel :vote_chn, [:@addr, :voter_id] channel :result_chn, [:@addr] lset :votes lmax :vote_cnt lbool :got_quorum end bloom do votes <= vote_chn {|v| v.voter_id} vote_cnt <= votes.size got_quorum <= vote_cnt.gt_eq(QUORUM_SIZE) result_chn <~ got_quorum.when_true { [RESULT_ADDR] } end Monotone function: set  max Monotone function: max  bool Threshold test on bool (monotone) Lattice state declarations 68 Accumulate votes into set Program state Program logic Merge new votes together with stored votes (set LUB) Merge using lmax LUB

69 Conclusions Interplay between language and system design Key question: what should be explicit? – Initial answer: asynchrony, state update – Refined answer: order Disorderly programming for disorderly networks

70 Thank You! Queries welcome. gem install bud http://www.bloom-lang.net Emily Andrews Peter Bailis William Marczak David Maier Tyson Condie Joseph M. Hellerstein Rusty Sears Sriram Srinivasan Collaborators:

71 Extra slides

72 Ongoing Work 1. Lattices – Concurrent editing – Distributed garbage collection 2. Confluence and concurrency control – Support for “controlled non-determinism” – Program analysis for serializability? 3. Safe composition of monotone and non-monotone code

73 Overlog “Our intellectual powers are rather geared to master static relations and […] our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.” Edgar Djikstra

74 (Themes) Disorderly / order-light programming (understanding, simplifying) the relation btw program syntax and outcomes Determinism (in asynchronous executions) as a correctness criterion Coordination – theoretical basis and mechanisms – What programs require coordination? – How can we coordinate them efficiently?

75 Traces and models counter(X+1)@next :- counter(X), request(_, _). counter(X)@next :- counter(X), notin request(_, _). response(@From, X)@async :- counter(X), request(To, From). response(From, X)@next :- response(From, X). counter(0)@0. request(“node1”, “node2”)@0.

76 Traces and models -- 0 counter(0+1)@1 :- counter(0)@0, request(“node1”, “node2”)@0. counter(X)@next :- counter(X), notin request(_, _). response(“node2”, 0)@100 :- counter(0)@0, request(“node1”, “node2”)@0. response(From, X)@next :- response(From, X). counter(0)@0. request(“node1”, “node2”)@0. counter(1)@1. […] response(“node2”, 0)@100.

77 Traces and models -- 1 counter(X+1)@next :- counter(X), request(To, From). counter(1)@?+1 :- counter(1)@?, notin request(_, _)@?. response(From, X)@async :- counter(X), request(To, From). response(“node2”, 0)@101:- response(“node2”, 0)@100. counter(0)@0. request(“node1”, “node2”)@0. counter(1)@1. counter(1)@2. […]

78 Traces and models -- 100 counter(X+1)@next :- counter(X), request(To, From). counter(1)@101 :- counter(1)@100, notin request(_, _)@100. response(From, X)@async :- counter(X), request(To, From). response(“node2”, 0)@101:- response(“node2”, 0)@100. counter(0)@0. request(“node1”, “node2”)@0. counter(1)@1. counter(1)@2. […] response(“node2”, 0)@100. counter(1)@101. response(“node2”, 0)@101.

79 Traces and models – 101+ counter(X+1)@next :- counter(X), request(To, From). counter(1)@102 :- counter(1)@101, notin request(_, _)@101. response(From, X)@async :- counter(X), request(To, From). response(“node2”, 0)@102:- response(“node2”, 0)@101. counter(0)@0. request(“node1”, “node2”)@0. counter(1)@1. counter(1)@2. […] response(“node2”, 0)@100. counter(1)@101. counter(1)@102. response(“node2”, 0)@101. response(“node2”, 0)@102. […] A stable model for choice = 100

80 Traces and models counter(X+1)@next :- counter(X), request(_, _). counter(X)@next :- counter(X), notin request(_, _). response(@From, X)@async :- counter(X), request(To, From). response(From, X)@next :- response(From, X). counter(0)@0. request(“node1”, “node2”)@0. Stable models: { counter(0)@0, counter(1)@1, counter(1)@2, […] response(“node2”, 0)@k, response(“node2”, 0)@k+1, […] }

81 Studying confluence in Dedalus q(#L, X)@async <- e(X), replica(L). p(X) <- q(_, X) p(X)@next <- p(X). Bob Carol q(Bob,1)@1 q(Bob, 2)@2 e(1). e(2). replica(Bob). Replica(Carol). q(Carol, 2)@1 q(Carol, 1)@2 p(1), p(2) Alice UUM

82 Studying confluence in Dedalus q(#L, X)@async <- e(X), replica(L). r(#L, X)@async <- f(X), replica(L). p(X) <- q(_, X), r(_, X). p(X)@next <- p(X). Bob Carol q(Bob,1)@1 r(Bob, 1)@2 e(1). f(1). replica(Bob). replica(Carol). q(Carol, 1)@1 r(Carol, 1)@1 { } p(1) Alice Multiple ultimate models

83 Studying confluence in Dedalus Bob Carol q(Bob,1)@1 r(Bob, 1)@2 e(1). f(1). replica(Bob). replica(Carol). r(Carol, 1)@1 q(Carol, 1)@2 p(1) { } Alice Multiple ultimate models q(#L, X)@async <- e(X), replica(L). r(#L, X)@async <- f(X), replica(L). p(X) <- q(_, X), r(_, X). p(X)@next <- p(X). q(L, X)@next <- q(L, X).

84 Studying confluence in Dedalus Bob Carol q(Bob,1)@1 r(Bob, 1)@2 e(1). f(1). replica(Bob). replica(Carol). r(Carol, 1)@1 q(Carol, 1)@2 p(1) Alice UUM q(#L, X)@async <- e(X), replica(L). r(#L, X)@async <- f(X), replica(L). p(X) <- q(_, X), r(_, X). p(X)@next <- p(X). q(L, X)@next <- q(L, X). r(L, X)@next <- r(L, X).

85 Studying confluence in Dedalus Bob Carol q(Bob,1)@1 r(Bob, 1)@2 e(1). f(1). replica(Bob). replica(Carol). r(Carol, 1)@1 q(Carol, 1)@2 p(1) { } Alice q(#L, X)@async <- e(X), replica(L). r(#L, X)@async <- f(X), replica(L). p(X) <- q(_, X), NOT r(_, X). p(X)@next <- p(X). q(L, X)@next <- q(L, X). r(L, X)@next <- r(L, X). Multiple ultimate models

86 CALM – Consistency as logical monotonicity Logically monotonic => confluent Consequence: a (conservative) static analysis for eventual consistency Practical implications: – Language support for weakly-consistent, coordination-free distributed systems!

87 Does CALM help? Is the monotonic subset of Dedalus sufficiently expressive / convenient to implement distributed systems?

88 Coordination CALM’s complement: – Nonmonotonic => order-sensitive – Ensuring deterministic outcomes may require controlling order. We could constrain the order of – Data E.g., via ordered delivery – Computation E.g., via evaluation barriers

89 Coordination mechanisms Bob Carol r(Bob,1)@1 e(1). f(1). replica(Bob). replica(Carol). { } Alice Approach 1: Deliver the q() and r() tuples in the same total order to all replicas. q(Bob, 1)@2 r(Bob,1)@1 q(Bob, 1)@2 { }

90 Coordination mechanisms Bob Carol q(Bob,1)@1 e(1). f(1). replica(Bob). replica(Carol). { } Alice p(X) <- q(_, X), NOT r(_, X). r(Bob, 1)@2 r(Bob,1)@1 q(Bob, 1)@2 { } Approach 2: Do not evaluate “NOT r(X)” until its contents are completely determined.

91 Ordered delivery vs. stratification (Differences) Stratified evaluation – Unique outcome across all executions – Finite inputs – Communication between producers and consumers Ordered delivery – Different outcomes in different runs – No restriction on inputs – Multiple producers and consumers => need distributed consensus

92 Ordered delivery vs. stratification (Similarities) Stratified evaluation – Control order of evaluation at a course grain table by table. – Order is given by program syntax Ordered delivery – Fine-grained order of evaluation Row by row – Order is ND chosen by oracle (e.g. Paxos) – Analogy: Assign a stratum to each tuple. Ensure that all replicas see the same stratum assignments


Download ppt "Declarative Distributed Programming with Dedalus and Bloom Peter Alvaro, Neil Conway UC Berkeley."

Similar presentations


Ads by Google