Some slides adapted from those of Yuan Yu and Michael Isard Dryad Some slides adapted from those of Yuan Yu and Michael Isard
What we’ve learnt so far Basic distributed systems concepts Consistency (Linearizability/strict serializability, causal, eventual) How to tolerate machine failure? Paxos replication transactions What are distributed systems good for? Storage: fault tolerance, increased storage/serving capacity Computation: large-scale distributed computation (Today’s topic)
Recap: MapReduce API Input data is partitioned into M splits Map: extract information on each split map(k, v) → <k', v'>* Each Map produces R partitions Shuffle and sort Bring M partitions to the same reducer Reduce: aggregate, summarize reduce(k', <v'>*) → <k', v'>* Output is in R result files
Example: Pseudo-code Map(String input_key, String input_value): //input_key: document name //input_value: document contents for each word w in input_values: EmitIntermediate(w, "1"); Reduce(String key, Iterator intermediate_values): //key: a word, same for input and output //intermediate_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));
Parallel MapReduce Runtime Input data Map Map Map Map Master Reduce Shuffle Reduce Shuffle Reduce Shuffle Partitioned output
MapReduce started “big data” Published in 2004 Open source clone Hadoop 2006 It becomes possible to analyze big data (10s, 100s, 1000s) terabytes.
Limitations of the original MapReduce No joins Many computation require many MapReduce jobs chained together. Best suited for processing unstructured texts on disks
Dryad Similar goals as MapReduce focus on throughput, not latency Automatic scheduling, distribution, fault tolerance Computations expressed as a dataflow graph Vertices are computations Edges are communication channels Each vertex has several input and output edges
WordCount in Dryad Count Word:n MergeSort Word:n Distribute Word:n
Why using a dataflow graph? Many programs can be represented as a parallel dataflow graph Dryad will run them for you
Job = Directed Acyclic Graph Outputs Processing vertices Channels (file, pipe, shared memory) A dryad application is composed of a collection of processing vertices (processes). The vertices communicate with each other through channels. The vertices and channels should always compose into a directed acyclic graph. Inputs
Dryad Scheduling All jobs scheduled by a central job manager scheduling rules: Vertex can run anywhere once all its inputs are ready Prefer executing a vertex near its inputs Fault tolerance If A fails, run it again If A’s inputs are gone, run upstream vertices again (recursively) If A is slow, run another copy elsewhere and use output from whichever finishes first
Advantages of DAG over MapReduce Big jobs more efficient with Dryad MapReduce: big job runs >=1 MR stages reducers of each stage write to replicated storage Output of reduce: 2 network copies, 3 disks Dryad: each job is represented with a DAG intermediate vertices write to local file
Advantages of DAG over MapReduce Dryad provides explicit join MapReduce (circa 2004-2007): mapper/reducer reads from shared table(s) as a substitute for join Dryad: explicit join combines inputs of different types E.g. Most expensive product bought by a customer, PageRank computation
Dryad example: the usefulness of join SkyServer Query: Find neighboring stars with similar colors Table U: (objId, color) 11.8GB Table N: (objId, neighborId) 41.8GB Query contains 2 joins: Join U+N to find T = N.neighborID where U.objID = N.objID, U.color Join U+T to find U.objID where U.objID = T.neighborID and U.color ≈ T.color
SkyServer query D M 4n S Y H n X U N select u.color,n.neighborid from u join n where u.objid = n.objid u: objid, color n: objid, neighborid [partition by objid]
D M 4n S Y H n X U N [distinct] [merge outputs] select u.objid from u join <temp> where u.objid = <temp>.neighborjid and |u.color - <temp>.color| < d (u.color,n.neighborid) [re-partition by n.neighborid] [order by n.neighborid]
Another example: how Dryad optimizes DAG automatically Example application: compute query histogram Input: log file (n partitions) Extract queries from log partitions Re-partition by query into k buckets Compute histogram within each bucket
Naïve histogram topology P parse lines D hash distribute S quicksort C count occurrences MS merge sort Q R k n is : Each MS C P S D
Efficient histogram topology P parse lines D hash distribute S quicksort C count occurrences MS merge sort M non-deterministic merge k Each Q' is : Each T k R R C Each is : R S D T is : P C C Q' M MS MS n
P parse lines D hash distribute S quicksort MS merge sort MS►C R R R MS►C►D T M►P►S►C Q’ P parse lines D hash distribute S quicksort MS merge sort C count occurrences M non-deterministic merge
P parse lines D hash distribute S quicksort MS merge sort MS►C R R R MS►C►D T M►P►S►C Q’ Q’ Q’ Q’ P parse lines D hash distribute S quicksort MS merge sort C count occurrences M non-deterministic merge
P parse lines D hash distribute S quicksort MS merge sort MS►C R R R MS►C►D T T M►P►S►C Q’ Q’ Q’ Q’ P parse lines D hash distribute S quicksort MS merge sort C count occurrences M non-deterministic merge
P parse lines D hash distribute S quicksort MS merge sort MS►C R R R MS►C►D T T M►P►S►C Q’ Q’ Q’ Q’ P parse lines D hash distribute S quicksort MS merge sort C count occurrences M non-deterministic merge
P parse lines D hash distribute S quicksort MS merge sort MS►C R R R MS►C►D T T M►P►S►C Q’ Q’ Q’ Q’ P parse lines D hash distribute S quicksort MS merge sort C count occurrences M non-deterministic merge
P parse lines D hash distribute S quicksort MS merge sort MS►C R R R MS►C►D T T M►P►S►C Q’ Q’ Q’ Q’ P parse lines D hash distribute S quicksort MS merge sort C count occurrences M non-deterministic merge
Final histogram refinement Q' R 450 T 217 10,405 99,713 33.4 GB 118 GB 154 GB 10.2 TB 1,800 computers 43,171 vertices 11,072 processes 11.5 minutes
What’s after Dryad Dryad DAGs are tedious to work with for humans DryadLINQ [OSDI’08] LINQ provides constructs to manipulate sets and data sequences select, selectMany, groupBy etc. DryadLINQ compiles LINQ constructs into Dryad DAGs Unfortunately, DryadLINQ are not open-sourced ...