Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008
Goal 2
The Dryad Project Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23,
Dryad Design Implementation Policies as Plug-ins Building on Dryad 4
Design Space 5 ThroughputLatency Internet Private data center Data- parallel Shared memory
Data Partitioning 6 RAM DATA
2-D Piping Unix Pipes: 1-D grep | sed | sort | awk | perl Dryad: 2-D grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 7
Dryad = Execution Layer 8 Job (Application) Dryad Cluster Pipeline Shell Machine
Dryad Design Implementation Policies as Plug-ins Building on Dryad 9
Virtualized 2-D Pipelines 10
Virtualized 2-D Pipelines 11
Virtualized 2-D Pipelines 12
Virtualized 2-D Pipelines 13
Virtualized 2-D Pipelines 14 2D DAG multi-machine virtualized
Dryad Job Structure 15 grep sed sort awk perl grep sed sort awk Input files Vertices (processes) Output files Channels Stage grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50
Channels 16 X M Items Finite Streams of items distributed filesystem files (persistent) SMB/NTFS files (temporary) TCP pipes (inter-machine) memory FIFOs (intra-machine)
Architecture 17 Files, TCP, FIFO, Network job schedule data plane control plane NSPD V VV Job managercluster
JM code vertex code Staging 1. Build 2. Send.exe 3. Start JM 5. Generate graph 7. Serialize vertices 8. Monitor Vertex execution 4. Query cluster resources Cluster services 6. Initialize vertices
Fault Tolerance
Dryad Design Implementation Policies and Resource Management Building on Dryad 20
Policy Managers 21 RR XXXX Stage R RR Stage X Job Manager R managerX Manager R-X Manager Connection R-X
X[0]X[1]X[3]X[2] Completed vertices Slow vertex Duplicate vertex Duplicate Execution Manager Duplication Policy = f(running times, data volumes)
SSSS AAA SS T SSSSSS T # 1# 2# 1# 3 # 2 # 3# 2# 1 static dynamic rack # Aggregation Manager 23
Data Distribution (Group By) 24 Dest Source Dest Source Dest Source m n m x n
TT [0-?)[?-100) Range-Distribution Manager S DDD SS SSS T static dynamic 25 Hist [0-30),[30-100) [30-100)[0-30) [0-100)
Goal: Declarative Programming 26 X T S XX SS TTT X staticdynamic
Dryad Design Implementation Policies as Plug-ins Building on Dryad 27
Software Stack 28 Windows Server Cluster Services Distributed Filesystem Dryad Distributed Shell PSQL DryadLINQ Perl SQL server C++ Windows Server C++ CIFS/NTFS legacy code sed, awk, grep, etc. SSIS Queries C# Vectors Machine Learning C# Job queueing, monitoring
SkyServer Query select distinct P.ObjID into results from photoPrimary U, neighbors N, photoPrimary L where U.ObjID = N.ObjID and L.ObjID = N.NeighborObjID and P.ObjID < L.ObjID and abs((U.u-U.g)-(L.u-L.g))<0.05 and abs((U.g-U.r)-(L.g-L.r))<0.05 and abs((U.r-U.i)-(L.r-L.i))<0.05 and abs((U.i-U.z)-(L.i-L.z))<0.05
Number of Computers Speed-up (times) Dryad In-Memory Dryad Two-pass SQLServer 2005 SkyServer Q18 Performance 30
DryadLINQ 31 Declarative programming Integration with Visual Studio Integration with.Net Type safety Automatic serialization Job graph optimizations static dynamic Conciseness
32 LINQ Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 33 DryadLINQ = LINQ + Dryad C# collection results C# Vertex code Query plan (Dryad job) Data
Sort & Map-Reduce in DryadLINQ 34 S DDD SS Sort Sampl [0-30),[30-100) [30-100)[0-30) [0-100)
PLINQ 35 public static IEnumerable DryadSort (IEnumerable source, Func keySelector, IComparer comparer, bool isDescending) { return source.AsParallel().OrderBy(keySelector, comparer); }
Machine Learning in DryadLINQ 36 Dryad DryadLINQ Large Vector Machine learning Data analysis
Very Large Vector Library PartitionedVector 37 T Scalar TT T
Operations on Large Vectors: Map 1 38 U T T U f f f preserves partitioning
V Map 2 (Pairwise) 39 T U f V U T f
Map 3 (Vector-Scalar) 40 T U f V V U T f
Reduce (Fold) 41 UUU U f fff f UUU U
Linear Algebra 42 T U V =,, T
Linear Regression Data Find S.t. 43
Analytic Solution 44 X×X T Y×X T Σ X[0]X[1]X[2]Y[0]Y[1]Y[2] Σ [ ] -1 * A Map Reduce
Linear Regression Code Vectors x = input(0), y = input(1); Matrices xx = x.PairwiseOuterProduct(x); OneMatrix xxs = xx.Sum(); Matrices yx = y.PairwiseOuterProduct(x); OneMatrix yxs = yx.Sum(); OneMatrix xxinv = xxs.Map(a => a.Inverse()); OneMatrix A = yxs.Map( xxinv, (a, b) => a.Multiply(b)); 45
Expectation Maximization (Gaussians) lines 3 iterations shown
Conclusions Dryad = distributed execution environment Application-independent (semantics oblivious) Supports rich software ecosystem – Relational algebra – Map-reduce – LINQ – Etc. DryadLINQ = A Dryad provider for LINQ This is only the beginning! 47
Backup Slides 48
Many similarities Exe + app. model Map+sort+reduce Few policies Program=map+reduce Simple Mature (> 4 years) Widely deployed Hadoop Dryad Map-Reduce Execution layer Job = arbitrary DAG Plug-in policies Program=graph gen. Complex ( features) New (< 2 years) Still growing Internal 49
Small Cluster Support 50 Sort Merge Sort Merge Sort Merge Grouping vertices Sort Merge Fast channels
SkyServer DB query Took SQL plan Manually coded in Dryad Manually partitioned data u: objid, color n: objid, neighborobjid [partition by objid] select u.color,n.neighborobjid from u join n where u.objid = n.objid (u.color,n.neighborobjid) [re-partition by n.neighborobjid] [order by n.neighborobjid] [distinct] [merge outputs] select u.objid from u join where u.objid =.neighborobjid and |u.color -.color| < d
Optimization D M S Y X M S M S M S UN U
D M S Y X M S M S M S UN U
Query histogram computation Input: log file (n partitions) Extract queries from log partitions Re-partition by hash of query (k buckets) Compute histogram within each bucket
Naïve histogram topology Pparse lines D hash distribute S quicksort C count occurrences MSmerge sort
Efficient histogram topology Pparse lines D hash distribute S quicksort C count occurrences MSmerge sort M non-deterministic merge Q' is:Each R is: Each MS C M P C S Q' RR k T k n T is: Each MS D C
Final histogram refinement 1,800 computers 43,171 vertices 11,072 processes 11.5 minutes