Blazes: coordination analysis for distributed program Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State
Distributed systems are hard AsynchronyPartial Failure
Asynchrony isn’t that hard Logical timestamps Deterministic interleaving Ameloriation:
Partial failure isn’t that hard Replication Replay Ameloriation:
Asynchrony * partial failure is hard 2 Logical timestamps Deterministic interleaving Replication Replay
asynchrony * partial failure is hard 2 Replication Replay Today: Consistency criteria for fault-tolerant distributed systems Blazes: analysis and enforcement
This talk is all setup Frame of mind: 1.Dataflow: a model of distributed computation 2.Anomalies: what can go wrong? 3.Remediation strategies 1.Component properties 2.Delivery mechanisms Framework: Blazes – coordination analysis and synthesis
Little boxes: the dataflow model Generalization of distributed services Components interact via asynchronous calls (streams)
Components Input interfacesOutput interface
Streams Nondeterministic order
Example: a join operator R S T
Example: a key/value store put get response
Example: a pub/sub service publish subscribe deliver
Logical dataflow “Software architecture” Data source client Service X filter cache c a b
Dataflow is compositional Components are recursively defined Data source client Service X filter aggregator
Dataflow exhibits self-similarity
DBHDFS Hadoop Index Combine Static HTTP App1App2Buy Content User requests App1 answers App2 answers
Physical dataflow
Data source client Service X filter aggregator c a b
Physical dataflow Data source Service Xfilter aggregator client “System architecture”
What could go wrong?
Cross-run nondeterminism Data source client Service X filter aggregator c a b Run 1 Nondeterministic replays
Cross-run nondeterminism Data source client Service X filter aggregator c a b Nondeterministic replays Run 2
Cross-instance nondeterminism Data source Service X client Transient replica disagreement
Divergence Data source Service X client Permanent replica disagreement
Hazards Data source client Service X filter aggregator c a b Order Contents?
Preventing the anomalies 1.Understand component semantics (And disallow certain compositions)
Component properties Convergence –Component replicas receiving the same messages reach the same state –Rules out divergence
InsertRead Convergent data structure (e.g., Set CRDT) Convergence Insert Read Commutativity Associativity Idempotence Reordering Batching Retry/duplication Tolerant to
Convergence isn’t compositional Data source client Convergent (identical input contents identical state)
Component properties Convergence –Component replicas receiving the same messages reach the same state –Rules out divergence Confluence –Output streams have deterministic contents –Rules out all stream anomalies Confluent convergent
Confluence output set = f(input set) { } =
Confluence is compositional output set = f g(input set)
Preventing the anomalies 1.Understand component semantics (And disallow certain compositions) 2.Constrain message delivery orders 1.Ordering
Ordering – global coordination Deterministic outputs Order-sensitive
Ordering – global coordination Data source client The first principle of successful scalability is to batter the consistency mechanisms down to a minimum. – James Hamilton
Preventing the anomalies 1.Understand component semantics (And disallow certain compositions) 2.Constrain message delivery orders 1.Ordering 2.Barriers and sealing
Barriers – local coordination Deterministic outputs Data source client Order-sensitive
Barriers – local coordination Data source client
Sealing – continuous barriers Do partitions of (infinite) input streams “end”? Can components produce deterministic results given “complete” input partitions? Sealing: partition barriers for infinite streams
Sealing – continuous barriers Finite partitions of infinite inputs are common …in distributed systems –Sessions –Transactions –Epochs / views …and applications –Auctions –Chats –Shopping carts
Blazes: consistency analysis + coordination selection
Blazes: Mode 1: Grey boxes
Grey boxes Example: pub/sub x = publish y = subscribe z = deliver x y z Deterministic but unordered SeverityLabelConfluentStateless 1CRXX 2CWX 3OR gate X 4OW gate x->z : CW y->z : CWT
Grey boxes Example: key/value store x = put; y = get; z = response x y z Deterministic but unordered SeverityLabelConfluentStateless 1CRXX 2CWX 3OR gate X 4OW gate x->z : OW key y->z : ORT
Label propagation – confluent composition CW CR Deterministic outputs CW
Label propagation – unsafe composition OW CR Tainted outputs Interposition point
Label propagation – sealing OW key CR Deterministic outputs OW key Seal(key=x)
Blazes: Mode 1: White boxes
white boxes module KVS state do interface input, :put, [:key, :val] interface input, :get, [:ident, :key] interface output, :response, [:response_id, :key, :val] table :log, [:key, :val] end bloom do log <+ put log :key) response :key) do |s,l| [l.ident, s.key, s.val] end put response: OW key get response: OR key Negation ( order sensitive) Partitioned by :key
white boxes module PubSub state do interface input, :publish, [:key, :val] interface input, :subscribe, [:ident, :key] interface output, :response, [:response_id, :key, :val] table :log, [:key, :val] table :sub_log, [:ident, :key] end bloom do log <= publish sub_log <= subscribe response :key) do |s,l| [l.ident, s.key, s.val] end publish response: CW subscribe response: CR
The Blazes frame of mind: Asynchronous dataflow model Focus on consistency of data in motion –Component semantics –Delivery mechanisms and costs Automatic, minimal coordination
Queries?