Download presentation
Presentation is loading. Please wait.
Published byPaulina Anderson Modified over 9 years ago
2
Definition DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters. The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for every programmer. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the.NET Language Integrated Query (LINQ).
3
The Evolution of DryadLINQ Dryad had its roots in an idea developed in October 2004 by Isard Yu recognized the potential of LINQ to serve as the front-end programming tool for Dryad, and started the DryadLINQ project in September 2006 By early 2008, the Dryad/DryadLINQ combination was made available within Microsoft The DryadLINQ research paper won a best-paper award in 2008 during the eighth USENIX Symposium on Operating Systems Design and Implementation
4
Current Status Works with any LINQ enabled language – C#, VB, F#, IronPython, … Works with multiple storage systems – NTFS, SQL, Windows Azure, Cosmos DFS Released internally within Microsoft – Used on a variety of applications External academic release announced at PDC – DryadLINQ in source, Dryad in binary – UW, UCSD, Indiana, ETH, Cambridge, …
5
Dryad Definition Dryad is an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data-parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming.
6
The Structure of Dryad Jobs
7
Dryad services An API to create distributed applications (jobs), by specifying which processes have to be executed and communication channels linking them. Scheduling of the processes on the cluster machines. Fault-tolerance through re-execution of processes after transient failures. Monitoring of the computation and statistics collection. Job visualization. An API for run-time resource management policies. Support for efficient bulk data transfer between processes.
8
Image Processing Cosmos DFSSQL Servers Software Stack 8 Windows Server Cluster Services Azure Platform Dryad DryadLINQ Windows Server Other Languages CIFS/NTFS Machine Learning Graph Analysis Data Mining Applications … Other Applications
9
Dryad System Architecture 9 Files, TCP, FIFO, Network job schedule data plane control plane NSPD V VV Job managercluster
10
LINQ Framework PLINQ Local machine.Net program (C#, VB, F#, etc) Execution engines Query Objects LINQ-to-SQL DryadLINQ LINQ-to-XML LINQ provider interface Scalability Single-core Multi-core Cluster Extremely open and extensible
11
DryadLINQ Operators Operators present in LINQ which are implemented by DryadLINQ. Adaptations of operators present in LINQ which return scalar values (i.e., not IQueryable), but which are modified to return an IQueryable instead. For example, Count returns an integer, while CountAsQueryable returns an IQueryable whose actual contents will be a single integer. The AsQueryable variants can be chained together to produce complex queries, while using the scalar variants would require breaking queries into small sub-queries, which could decrease efficiency New operators, which exist only in DryadLINQ. We have added new operators which cannot be synthesized efficiently from compositions of primitive LINQ operators, and which can substantially improve the performance of queries in the context of a distributed execution environment like Dryad.
12
Combining with LINQ-to-SQL 12 DryadLINQ Subquery Query LINQ-to-SQL
13
DryadLINQ and LINQ C# and LINQ data objects become distributed partitioned files. LINQ queries become distributed Dryad jobs. C# methods become code running on the vertices of a Dryad job.
14
DryadLINQ representation
15
DryadLInq features Declarative programming: computations are expressed in a high-level language similar to SQL Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. For exploiting multi-core parallelism on each machine DryadLINQ relies on thePLINQ parallelization framework.PLINQ Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management. Integration with.Net: all.Net libraries, including Visual Basic, and dynamic languages are available. Type safety: distributed computations are statically type-checked. Automatic serialization: data transport mechanisms automatically handle all.Net object types. Job graph optimizations – static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance. – dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed. Conciseness: the following line of code is a complete implementation of the Map-Reduce computation framework in DryadLINQ: public static IQueryable MapReduce (this IQueryable source, Expression >> mapper, Expression > keySelector, Expression,R>> reducer) { return source.SelectMany(mapper).GroupBy(keySelector, reducer); }
16
DryadLINQ System Architecture 16 DryadLINQ Client machine (11) Distributed query plan.NET program Query Expr Data center Output Tables Results Input Tables Invoke Query Output DryadTable Dryad Execution.Net Objects JM ToTable foreach Vertex code
17
A Query provider translates IQueryable objects to a suitable format and ships them to a remote execution engine. It also transforms the remote data into C# objects. DryadLINQ is just an instance of such a provider which interfaces with the Dryad remote execution framework. Query Provider Execute Local program (5) (11) Transform C# (1) (12) Query obj C# Objects (3) Remote execution Data (8) Results Data Query Invoke QueryQuery (2) Transform (7) (9) (10) (4) (6)
18
Execution stages of a Dryad Job
19
Partitioned File Structure
20
Reductions (Aggregations) var result = input.Aggregate((x,y) => x+y); [Associative] int Add(int x, int y); var sum = input.Aggregate((x,y)=>Add(x,y));
21
Apply The Select delegate receives each element individually, while the one of Apply receives the whole stream.
22
MapReduce in DryadLINQ 22 MapReduce(source, // sequence of Ts mapper, // T -> Ms keySelector, // M -> K reducer) // (K, Ms) -> Rs { var map = source.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.SelectMany(reducer); return result; // sequence of Rs }
23
Map-Reduce Plan (When reduce is combiner-enabled) M Q G1G1 C D MS G2G2 R M Q G1G1 C D G2G2 R M Q G1G1 C D G2G2 R G2G2 R map sort groupby combine distribute mergesort groupby reduce mergesort groupby reduce map Dynamic aggregation reduce
25
An Example: PageRank Ranks web pages by propagating scores along hyperlink structure Each iteration as an SQL query: 1.Join edges with ranks 2.Distribute ranks on edges 3.GroupBy edge destination 4.Aggregate into ranks 5.Repeat
26
Multi-Iteration PageRank pagesranks Iteration 1 Iteration 2 Iteration 3 Memory FIFO
27
Dryad Enters the Market A big step is coming, as Dryad and DryadLINQ become fully productized as part of the Microsoft HPC Server suite. It will be integrated with Microsoft SQL Server and Windows Azure to give customers from academia to the business community a new, powerful computing tool. Offering an easy-to-use but powerful, data-intensive computing tool It benefits a whole new set of Microsoft customers
28
Windows Azure and DryadLINQ Windows Azure is a platform for building scalable, highly reliable, multi-tiered web service applications. It is hosted on Microsoft’s large data centers in the United States, Europe, and Asia. Windows Azure has both compute and data resources. The compute resources are designed to allow applications to scale to thousands of servers and data resources. There is no port of Hadoop or Dryad/LINQ currently available. However, Windows Azure is an excellent platform for experimenting with new variations on large-scale map- reduce algorithms, as these patterns are easily coded as worker role networks.
29
“ We’re convinced that we will delight our customers, both with the pure capability of the system, as well as its ease of use. What I really like about Dryad is that is not just about handling a problem in a better way, it is also about new possibilities in computing that you couldn’t imagine before.”
30
Resources http://research.microsoft.com/en-us/projects/dryadlinq/ http://research.microsoft.com/en-us/projects/dryad/ DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language - Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, Jon Currey Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations - Yuan Yu, Pradeep Kumar Gunda, Michael Isard Distributed Data-Parallel Computing Using a High-Level Programming Language - Michael Isard, Yuan Yu Some sample programs written in DryadLINQ - Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey, Frank McSherry, and Kannan Achan http://blogs.msdn.com/b/dryad/archive/2009/11/24/what-is- dryad.aspx http://blogs.msdn.com/b/dryad/archive/2009/11/24/what-is- dryad.aspx http://research.microsoft.com/en-us/projects/azure/faq.aspx http://research.microsoft.com/en-us/news/features/dryad- 012611.aspx http://research.microsoft.com/en-us/news/features/dryad- 012611.aspx
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.