Download presentation
Presentation is loading. Please wait.
Published byRosalind McBride Modified over 9 years ago
1
DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, Jon Currey Microsoft Research Silicon Valley Presented by: TD (Tathagata Das)
2
Designing a general purpose language for writing distributed data-parallel programs for a compute cluster General purpose Single-thread abstraction Familiar language / environment
3
??? Dryad Cluster Shell script Shell Machine ≈ Dryad = Execution Engine
4
Nebula – limited to existing binaries Scope – SQL-ish, not general purpose Can we do better? – Can we get the general purpose-ness of C#/Java and conciseness of SQL? – And at the same time, be efficient too? Can I have my cake and eat it too!
5
Language Integrated Query (LINQ)
6
The creamy goodness of SQL-like queries within a declarative programming model Basic abstraction - collections “All the world’s a collection, And all the men and women merely iterate on collections” - implied by Shakespeare
7
Collections, Iterators and LINQ IEnumerable + LINQ => IEnumerable => import system.linq; var result = from num in numbers where num % 2 == 0 orderby num select num; List result = new List (); foreach (int num in numbers) { if (num % 2 == 0) result.Add(num); } result.sort();
8
Syntactical sweetness of LINQ var result = from num in numbers where num % 2 == 0 orderby num select num; var result = numbers.Where(num => num % 2 == 0).OrderBy(n => n); Query Style Method Style
9
LINQ Functionality Select / SelectMany Where GroupBy OrderBy Join Union / Intersect / Except … Map (1-to-1 / 1-to-many) Filter Reduce Sort Join Set operations
10
LINQ Providers SQL XML … Google Wikipedia Twitter Select / SelectMany Where GroupBy OrderBy Join Union / Intersect / Except …
11
LINQ System Architecture.Net Program LINQ Provider Interface LINQ Provider Interface Query Objects LINQ-to-SQL LINQ-to-XML PLINQ DryadLINQ
12
Parallel Collections Partition Collection Simplest example: GFS/HDFS file
13
Dryad + LINQ = DryadLINQ string uri = @"file://\\machine\directory\input.pt"; PartitionedTable input = PartitionedTable.Get (uri); var lengths = input.Select(line => line.ToString().Length);
14
Word Count with DryadLINQ string uri = @"file://\\machine\directory\input.pt"; PartitionedTable input = PartitionedTable.Get (uri); string separator = ","; var words = input.SelectMany(x => SplitLineRecord(separator)); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x[2]); var top = ordered.Take(k); top.ToDryadPartitionedTable("matching.pt"); Get SM G G S S O O Take Execution Plan Graph
15
DryadLINQ Word Count Dryad SM G G S S O O D D MS G G S S SM D D MS G G S S SM D D MS G G S S G G G G G G D D D D D D SM D D MS G G S S G G D D Execution Plan Graph Data Flow Graph Distributed Data Flow Graph
16
DryadLINQ Architecture [1] DryadLINQ Client machine Distributed Query Plan.Net Programs.Net Programs Query Expr Cluster Output Tables Input Tables Query Dryad Execution Dryad JM Vertex code Con- text
17
DryadLINQ Code Generation string uri = @"file://\\machine\directory\input.pt"; PartitionedTable input = PartitionedTable.Get (uri); string separator = ","; var words = input.SelectMany(x => SplitLineRecord(separator)); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x.count); var top = ordered.Take(k); top.ToDryadPartitionedTable("matching.pt"); Conversion of subexpressions to code for Dryad vertices… 1.Local variables 2.Local libraries and functions
18
DryadLINQ Architecture [2] DryadLINQ Client machine (11) Distributed Query Plan.Net Programs.Net Programs Query Expr Cluster Output Tables Results Input Tables Invoke Query Output Partitioned- Table Output Partitioned- Table Dryad Execution.Net Objects Dryad JM Vertex code Con- text
19
Combining with LINQ-to-SQL 19 DryadLINQ Subquery Query LINQ-to-SQL
20
DryadLINQ Optimizations Some are similar to existing DB optimizations – Eliminate redundant partitioning steps – Aggregation steps moved up the graph, before partitioning steps Existing Dryad optimizations as well – Dynamic reconfiguration of aggregation trees
21
Thoughts [1] Easy to read, though reads more like a PL paper What are system contributions that are different from Dryad? Does the high level abstraction provide any extra information that allow
22
Thoughts [2] Interesting anecdote… DryadLINQ is inefficient for random access workload, but for some workloads they outperformed systems customized for random-access HDD performance characteristics are such that sequential read (even if you discard 99% data) is better than small random accesses
23
Thoughts [3] How different is FlumeJava from this?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.