Presentation is loading. Please wait.

Presentation is loading. Please wait.

DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep.

Similar presentations


Presentation on theme: "DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep."— Presentation transcript:

1 DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, Jon Currey Microsoft Research Silicon Valley Presented by: TD (Tathagata Das)

2 Designing a general purpose language for writing distributed data-parallel programs for a compute cluster General purpose Single-thread abstraction Familiar language / environment

3 ??? Dryad Cluster Shell script Shell Machine ≈ Dryad = Execution Engine

4 Nebula – limited to existing binaries Scope – SQL-ish, not general purpose Can we do better? – Can we get the general purpose-ness of C#/Java and conciseness of SQL? – And at the same time, be efficient too? Can I have my cake and eat it too!

5 Language Integrated Query (LINQ)

6 The creamy goodness of SQL-like queries within a declarative programming model Basic abstraction - collections “All the world’s a collection, And all the men and women merely iterate on collections” - implied by Shakespeare

7 Collections, Iterators and LINQ IEnumerable + LINQ => IEnumerable => import system.linq; var result = from num in numbers where num % 2 == 0 orderby num select num; List result = new List (); foreach (int num in numbers) { if (num % 2 == 0) result.Add(num); } result.sort();

8 Syntactical sweetness of LINQ var result = from num in numbers where num % 2 == 0 orderby num select num; var result = numbers.Where(num => num % 2 == 0).OrderBy(n => n); Query Style Method Style

9 LINQ Functionality Select / SelectMany Where GroupBy OrderBy Join Union / Intersect / Except … Map (1-to-1 / 1-to-many) Filter Reduce Sort Join Set operations

10 LINQ Providers SQL XML … Google Wikipedia Twitter Select / SelectMany Where GroupBy OrderBy Join Union / Intersect / Except …

11 LINQ System Architecture.Net Program LINQ Provider Interface LINQ Provider Interface Query Objects LINQ-to-SQL LINQ-to-XML PLINQ DryadLINQ

12 Parallel Collections Partition Collection Simplest example: GFS/HDFS file

13 Dryad + LINQ = DryadLINQ string uri = @"file://\\machine\directory\input.pt"; PartitionedTable input = PartitionedTable.Get (uri); var lengths = input.Select(line => line.ToString().Length);

14 Word Count with DryadLINQ string uri = @"file://\\machine\directory\input.pt"; PartitionedTable input = PartitionedTable.Get (uri); string separator = ","; var words = input.SelectMany(x => SplitLineRecord(separator)); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x[2]); var top = ordered.Take(k); top.ToDryadPartitionedTable("matching.pt"); Get SM G G S S O O Take Execution Plan Graph

15 DryadLINQ Word Count  Dryad SM G G S S O O D D MS G G S S SM D D MS G G S S SM D D MS G G S S G G G G G G D D D D D D SM D D MS G G S S G G D D Execution Plan Graph Data Flow Graph Distributed Data Flow Graph

16 DryadLINQ Architecture [1] DryadLINQ Client machine Distributed Query Plan.Net Programs.Net Programs Query Expr Cluster Output Tables Input Tables Query Dryad Execution Dryad JM Vertex code Con- text

17 DryadLINQ Code Generation string uri = @"file://\\machine\directory\input.pt"; PartitionedTable input = PartitionedTable.Get (uri); string separator = ","; var words = input.SelectMany(x => SplitLineRecord(separator)); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x.count); var top = ordered.Take(k); top.ToDryadPartitionedTable("matching.pt"); Conversion of subexpressions to code for Dryad vertices… 1.Local variables 2.Local libraries and functions

18 DryadLINQ Architecture [2] DryadLINQ Client machine (11) Distributed Query Plan.Net Programs.Net Programs Query Expr Cluster Output Tables Results Input Tables Invoke Query Output Partitioned- Table Output Partitioned- Table Dryad Execution.Net Objects Dryad JM Vertex code Con- text

19 Combining with LINQ-to-SQL 19 DryadLINQ Subquery Query LINQ-to-SQL

20 DryadLINQ Optimizations Some are similar to existing DB optimizations – Eliminate redundant partitioning steps – Aggregation steps moved up the graph, before partitioning steps Existing Dryad optimizations as well – Dynamic reconfiguration of aggregation trees

21 Thoughts [1] Easy to read, though reads more like a PL paper What are system contributions that are different from Dryad? Does the high level abstraction provide any extra information that allow

22 Thoughts [2] Interesting anecdote… DryadLINQ is inefficient for random access workload, but for some workloads they outperformed systems customized for random-access HDD performance characteristics are such that sequential read (even if you discard 99% data) is better than small random accesses

23 Thoughts [3] How different is FlumeJava from this?


Download ppt "DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep."

Similar presentations


Ads by Google