Some slides adapted from those of Yuan Yu and Michael Isard

Some slides adapted from those of Yuan Yu and Michael Isard
Dryad Some slides adapted from those of Yuan Yu and Michael Isard

What we’ve learnt so far
Basic distributed systems concepts Consistency (Linearizability/strict serializability, causal, eventual) How to tolerate machine failure? Paxos replication transactions What are distributed systems good for? Storage: fault tolerance, increased storage/serving capacity Computation: large-scale distributed computation (Today’s topic)

Recap: MapReduce API Input data is partitioned into M splits
Map: extract information on each split map(k, v) → <k', v'>* Each Map produces R partitions Shuffle and sort Bring M partitions to the same reducer Reduce: aggregate, summarize reduce(k', <v'>*) → <k', v'>* Output is in R result files

Example: Pseudo-code Map(String input_key, String input_value): //input_key: document name //input_value: document contents for each word w in input_values: EmitIntermediate(w, "1"); Reduce(String key, Iterator intermediate_values): //key: a word, same for input and output //intermediate_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));

Parallel MapReduce Runtime
Input data Map Map Map Map Master Reduce Shuffle Reduce Shuffle Reduce Shuffle Partitioned output

MapReduce started “big data”
Published in 2004 Open source clone Hadoop 2006 It becomes possible to analyze big data (10s, 100s, 1000s) terabytes.

Limitations of the original MapReduce
No joins Many computation require many MapReduce jobs chained together. Best suited for processing unstructured texts on disks

Dryad Similar goals as MapReduce
focus on throughput, not latency Automatic scheduling, distribution, fault tolerance Computations expressed as a dataflow graph Vertices are computations Edges are communication channels Each vertex has several input and output edges

WordCount in Dryad Count Word:n MergeSort Word:n Distribute Word:n

Why using a dataflow graph?
Many programs can be represented as a parallel dataflow graph Dryad will run them for you

Job = Directed Acyclic Graph
Outputs Processing vertices Channels (file, pipe, shared memory) A dryad application is composed of a collection of processing vertices (processes). The vertices communicate with each other through channels. The vertices and channels should always compose into a directed acyclic graph. Inputs

Dryad Scheduling All jobs scheduled by a central job manager
scheduling rules: Vertex can run anywhere once all its inputs are ready Prefer executing a vertex near its inputs Fault tolerance If A fails, run it again If A’s inputs are gone, run upstream vertices again (recursively) If A is slow, run another copy elsewhere and use output from whichever finishes first

Advantages of DAG over MapReduce
Big jobs more efficient with Dryad MapReduce: big job runs >=1 MR stages reducers of each stage write to replicated storage Output of reduce: 2 network copies, 3 disks Dryad: each job is represented with a DAG intermediate vertices write to local file

Advantages of DAG over MapReduce
Dryad provides explicit join MapReduce (circa ): mapper/reducer reads from shared table(s) as a substitute for join Dryad: explicit join combines inputs of different types E.g. Most expensive product bought by a customer, PageRank computation

Dryad example: the usefulness of join
SkyServer Query: Find neighboring stars with similar colors Table U: (objId, color) 11.8GB Table N: (objId, neighborId) 41.8GB Query contains 2 joins: Join U+N to find T = N.neighborID where U.objID = N.objID, U.color Join U+T to find U.objID where U.objID = T.neighborID and U.color ≈ T.color

SkyServer query D M 4n S Y H n X U N select u.color,n.neighborid
from u join n where u.objid = n.objid u: objid, color n: objid, neighborid [partition by objid]

D M 4n S Y H n X U N [distinct] [merge outputs] select u.objid
from u join <temp> where u.objid = <temp>.neighborjid and |u.color - <temp>.color| < d (u.color,n.neighborid) [re-partition by n.neighborid] [order by n.neighborid]

Another example: how Dryad optimizes DAG automatically
Example application: compute query histogram Input: log file (n partitions) Extract queries from log partitions Re-partition by query into k buckets Compute histogram within each bucket

Naïve histogram topology
P parse lines D hash distribute S quicksort C count occurrences MS merge sort Q R k n is : Each MS C P S D

Efficient histogram topology
P parse lines D hash distribute S quicksort C count occurrences MS merge sort M non-deterministic merge k Each Q' is : Each T k R R C Each is : R S D T is : P C C Q' M MS MS n

P parse lines D hash distribute S quicksort MS merge sort
MS►C R R R MS►C►D T M►P►S►C Q’ P parse lines D hash distribute S quicksort MS merge sort C count occurrences M non-deterministic merge

MS►C R R R MS►C►D T M►P►S►C Q’ Q’ Q’ Q’ P parse lines D hash distribute S quicksort MS merge sort C count occurrences M non-deterministic merge

MS►C R R R MS►C►D T T M►P►S►C Q’ Q’ Q’ Q’ P parse lines D hash distribute S quicksort MS merge sort C count occurrences M non-deterministic merge

Final histogram refinement
Q' R 450 T 217 10,405 99,713 33.4 GB 118 GB 154 GB 10.2 TB 1,800 computers 43,171 vertices 11,072 processes 11.5 minutes

What’s after Dryad Dryad DAGs are tedious to work with for humans
DryadLINQ [OSDI’08] LINQ provides constructs to manipulate sets and data sequences select, selectMany, groupBy etc. DryadLINQ compiles LINQ constructs into Dryad DAGs Unfortunately, DryadLINQ are not open-sourced ...

Some slides adapted from those of Yuan Yu and Michael Isard

Similar presentations

Presentation on theme: "Some slides adapted from those of Yuan Yu and Michael Isard"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Some slides adapted from those of Yuan Yu and Michael Isard

Similar presentations

Presentation on theme: "Some slides adapted from those of Yuan Yu and Michael Isard"— Presentation transcript:

Similar presentations

About project

Feedback