Download presentation
Presentation is loading. Please wait.
Published byJulia Tyler Modified over 9 years ago
1
Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Cloud computing: Infrastructure, Services, and Applications UC Berkeley, March 4 2009
2
Goal 2
3
Design Space 3 ThroughputLatency Internet Private data center Data- parallel Shared memory
4
Execution Application Data-Parallel Computation 4 Storage Language Parallel Databases Map- Reduce GFS BigTable Cosmos Azure SQL Server Dryad DryadLINQ Scope Sawzall Hadoop HDFS S3 Pig, Hive SQL≈SQLLINQ, SQLSawzall
5
SQL Software Stack 5 Windows Server Cluster Services Distributed FS (Cosmos) Dryad Distributed Shell PSQL DryadLINQ SQL server Windows Server C++ NTFS legacy code SSIS Scope C# Machine Learning.Net queueing Distributed Data Structures Graphs Data mining Applications Azure XComputeWindows HPC Azure XStoreSQL Server Log parsing
6
Introduction Dryad DryadLINQ Conclusions 6
7
Dryad Continuously deployed since 2006 Running on >> 10 4 machines Sifting through > 10Pb data daily Runs on clusters > 3000 machines Handles jobs with > 10 5 processes each Platform for rich software ecosystem Used by >> 100 developers Written at Microsoft Research, Silicon Valley 7
8
Dryad = Execution Layer 8 Job (application) Dryad Cluster Pipeline Shell Machine ≈
9
2-D Piping Unix Pipes: 1-D grep | sed | sort | awk | perl Dryad: 2-D grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 9
10
Virtualized 2-D Pipelines 10
11
Virtualized 2-D Pipelines 11
12
Virtualized 2-D Pipelines 12
13
Virtualized 2-D Pipelines 13
14
Virtualized 2-D Pipelines 14 2D DAG multi-machine virtualized
15
Dryad Job Structure 15 grep sed sort awk perl grep sed sort awk Input files Vertices (processes) Output files Channels Stage
16
Channels 16 X M Items Finite streams of items distributed filesystem files (persistent) SMB/NTFS files (temporary) TCP pipes (inter-machine) memory FIFOs (intra-machine)
17
Dryad System Architecture 17 Files, TCP, FIFO, Network job schedule data plane control plane NSPD V VV Job managercluster
18
Fault Tolerance
19
Policy Managers 19 RR XXXX Stage R RR Stage X Job Manager R managerX Manager R-X Manager Connection R-X
20
X[0]X[1]X[3]X[2]X’[2] Completed vertices Slow vertex Duplicate vertex Dynamic Graph Rewriting Duplication Policy = f(running times, data volumes)
21
Cluster network topology rack top-of-rack switch top-level switch
22
SSSS AAA SS T SSSSSS T # 1# 2# 1# 3 # 2 # 3# 2# 1 static dynamic rack # Dynamic Aggregation 22
23
Policy vs. Mechanism 23 Application-level Most complex in C++ code Invoked with upcalls Need good default implementations DryadLINQ provides a comprehensive set Built-in Scheduling Graph rewriting Fault tolerance Statistics and reporting
24
Introduction Dryad DryadLINQ Conclusions 24
25
LINQ 25 Dryad => DryadLINQ
26
26 LINQ =.Net+ Queries Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
27
Collections and Iterators 27 class Collection : IEnumerable ; public interface IEnumerable { IEnumerator GetEnumerator(); } public interface IEnumerator { T Current { get; } bool MoveNext(); void Reset(); }
28
DryadLINQ Data Model 28 Partition Collection.Net objects
29
Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 29 DryadLINQ = LINQ + Dryad C# collection results C# Vertex code Query plan (Dryad job) Data
30
Demo 30
31
Example: Histogram 31 public static IQueryable Histogram( IQueryable input, int k) { var words = input.SelectMany(x => x.line.Split(' ')); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x.count); var top = ordered.Take(k); return top; } “A line of words of wisdom” [“A”, “line”, “of”, “words”, “of”, “wisdom”] [[“A”], [“line”], [“of”, “of”], [“words”], [“wisdom”]] [ {“A”, 1}, {“line”, 1}, {“of”, 2}, {“words”, 1}, {“wisdom”, 1}] [{“of”, 2}, {“A”, 1}, {“line”, 1}, {“words”, 1}, {“wisdom”, 1}] [{“of”, 2}, {“A”, 1}, {“line”, 1}]
32
Histogram Plan 32 SelectMany Sort GroupBy+Select HashDistribute MergeSort GroupBy Select Sort Take MergeSort Take
33
Map-Reduce in DryadLINQ 33 public static IQueryable MapReduce ( this IQueryable input, Expression >> mapper, Expression > keySelector, Expression,S>> reducer) { var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result; }
34
Map-Reduce Plan 34 M R G M Q G1G1 R D MS G2G2 R staticdynamic X X M Q G1G1 R D MS G2G2 R X M Q G1G1 R D G2G2 R X M Q G1G1 R D M Q G1G1 R D G2G2 R X M Q G1G1 R D G2G2 R X M Q G1G1 R D G2G2 R G2G2 R map sort groupby reduce distribute mergesort groupby reduce mergesort groupby reduce consumer map partial aggregation reduce SSSS AAA SS T dynamic
35
Distributed Sorting Plan 35 O DS H D M S H D M S D H D M S D M S M S staticdynamic
36
Expectation Maximization 36 160 lines 3 iterations shown
37
Probabilistic Index Maps 37 Images features
38
Language Summary 38 Where Select GroupBy OrderBy Aggregate Join Apply Materialize
39
LINQ System Architecture 39 Local machine.Net program (C#, VB, F#, etc) LINQ Provider Execution engine Query Objects LINQ-to-obj PLINQ LINQ-to-SQL LINQ-to-WS DryadLINQ Flickr Oracle LINQ-to-XML Your own
40
The DryadLINQ Provider 40 DryadLINQ Client machine (11) Distributed query plan.Net Query Expr Data center Output Tables Results Input Tables Invoke Query Output DryadTable Dryad Execution.Net Objects Dryad JM ToCollection foreach Vertex code Con- text
41
Combining Query Providers 41 PLINQ Local machine.Net program (C#, VB, F#, etc) LINQ Provider Execution engines Query Objects SQL Server DryadLINQ LINQ Provider LINQ-to-obj
42
Using PLINQ 42 Query DryadLINQ PLINQ Local query
43
LINQ to SQL Using LINQ to SQL Server 43 Query DryadLINQ Query LINQ to SQL
44
Using LINQ-to-objects 44 Query DryadLINQ Local machine Cluster LINQ to obj debug production
45
Introduction Dryad DryadLINQ Conclusions 45
46
Lessons Learned (1) What worked well? – Complete separation of storage / execution / language – Using LINQ +.Net (language integration) – Strong typing for data – Allowing flexible and powerful policies – Centralized job manager: no replication, no consensus, no checkpointing – Porting (HPC, Cosmos, Azure, SQL Server) – Technology transfer (done at the right time) 46
47
Lessons Learned (2) What worked less well – Error handling and propagation – Distributed (randomized) resource allocation – TCP pipe channels – Hierarchical dataflow graphs (each vertex = small graph) – Forking the source tree 47
48
Lessons Learned (3) Tricks of the trade – Asynchronous operations hide latency – Management through distributed state machines – Logging state transitions for debugging – Complete separation of data and control – Leases clean-up after themselves – Understand scaling factors O(machines) < O(vertices) < O(edges) – Don’t fix a broken API, re-design it – Compression trades-off bandwidth for CPU – Managed code increases productivity by 10x10 48
49
Ongoing Dryad/DryadLINQ Research Performance modeling Scheduling and resource allocation Profiling and performance debugging Incremental computation Hardware acceleration High-level programming abstractions Many domain-specific applications 49
50
50 Sample applications written using DryadLINQClass Distributed linear algebraNumerical Accelerated Page-Rank computationWeb graph Privacy-preserving query languageData mining Expectation maximization for a mixture of GaussiansClustering K-meansClustering Linear regressionStatistics Probabilistic Index MapsImage processing Principal component analysisData mining Probabilistic Latent Semantic IndexingData mining Performance analysis and visualizationDebugging Road network shortest-path preprocessingGraph Botnet detectionData mining Epitome computationImage processing Neural network trainingStatistics Parallel machine learning framework infer.netMachine learning Distributed query cachingOptimization Image indexingImage processing Web indexing structureWeb graph
51
Conclusions 51 =
52
“What’s the point if I can’t have it?” Glad you asked We’re offering Dryad+DryadLINQ to academic partners Dryad is in binary form, DryadLINQ in source Requires signing a 3-page licensing agreement 52
53
Backup Slides 53
54
DryadLINQ 54 Declarative programming Integration with Visual Studio Integration with.Net Type safety Automatic serialization Job graph optimizations static dynamic Conciseness
55
What does DryadLINQ do? 55 public struct Data { … public static int Compare(Data left, Data right); } Data g = new Data(); var result = table.Where(s => Data.Compare(s, g) < 0); public static void Read(this DryadBinaryReader reader, out Data obj); public static int Write(this DryadBinaryWriter writer, Data obj); public class DryadFactoryType__0 : LinqToDryad.DryadFactory DryadVertexEnv denv = new DryadVertexEnv(args); var dwriter__2 = denv.MakeWriter(FactoryType__0); var dreader__3 = denv.MakeReader(FactoryType__0); var source__4 = DryadLinqVertex.Where(dreader__3, s => (Data.Compare(s, ((Data)DryadLinqObjectStore.Get(0))) < ((System.Int32)(0))), false); dwriter__2.WriteItemSequence(source__4); Data serialization Data factory Channel writer Channel reader LINQ code Context serialization
56
TT [0-?)[?-100) Range-Distribution Manager S DDD SS SSS T static dynamic 56 Hist [0-30),[30-100) [30-100)[0-30) [0-100)
57
JM code vertex code Staging 1. Build 2. Send.exe 3. Start JM 5. Generate graph 7. Serialize vertices 8. Monitor Vertex execution 4. Query cluster resources Cluster services 6. Initialize vertices
58
Bibliography 58 Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007 DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey Symposium on Operating System Design and Implementation (OSDI), San Diego, CA, December 8-10, 2008 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou Very Large Databases Conference (VLDB), Auckland, New Zealand, August 23-28 2008 Hunting for problems with Artemis Hunting for problems with Artemis Gabriela F. Creţu-Ciocârlie, Mihai Budiu, and Moises Goldszmidt USENIX Workshop on the Analysis of System Logs (WASL), San Diego, CA, December 7, 2008
59
Data Partitioning 59 RAM DATA
60
Linear Algebra & Machine Learning in DryadLINQ 60 Dryad DryadLINQ Large Vector Machine learning Data analysis
61
Operations on Large Vectors: Map 1 61 U T T U f f f preserves partitioning
62
V Map 2 (Pairwise) 62 T U f V U T f
63
Map 3 (Vector-Scalar) 63 T U f V V U T f
64
Reduce (Fold) 64 UUU U f fff f UUU U
65
Linear Algebra 65 T U V =,, T
66
Linear Regression Data Find S.t. 66
67
Analytic Solution 67 X×X T Y×X T Σ X[0]X[1]X[2]Y[0]Y[1]Y[2] Σ [ ] -1 * A Map Reduce
68
Linear Regression Code Vectors x = input(0), y = input(1); Matrices xx = x.Map(x, (a,b) => a.OuterProd(b)); OneMatrix xxs = xx.Sum(); Matrices yx = y.Map(x, (a,b) => a.OuterProd(b)); OneMatrix yxs = yx.Sum(); OneMatrix xxinv = xxs.Map(a => a.Inverse()); OneMatrix A = yxs.Map(xxinv, (a, b) => a.Mult(b)); 68
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.