Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dryad and DryaLINQ. Dryad and DryadLINQ Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation Dryad provides.

Similar presentations


Presentation on theme: "Dryad and DryaLINQ. Dryad and DryadLINQ Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation Dryad provides."— Presentation transcript:

1 Dryad and DryaLINQ

2 Dryad and DryadLINQ Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation

3 Dryad General-purpose execution environment for distributed, data-parallel applications Focus on simplicity, reliability, scalability, efficiency and not latency, unreliable networks Automatic management of scheduling, distribution, fault tolerance Exploits Data Parallelism

4 Dryad Computations expressed as a Directed Acyclic Graph – Jobs executed on vertices – Edges are communication channels – Each vertex has several input and output edges – Data transport mechanisms: Files, TCP pipes, shared memory FIFOs

5 Job = Directed Acyclic Graph Processing vertices Channels (file, pipe, shared memory) Inputs Outputs

6 Dryad vs. MapReduce, Parallel DB More control to developer than MapReduce MapReduce aims at simplicity at the expense of generality and performance Computation Graph is implicit in Parallel DB

7 Dryad System Architecture Job manager – coordinates jobs, constructs graph Name server – exposes computers with network topology Daemons run on each computer in the cluster

8 Communication

9 Job (Graph) Construction Using graph operators implemented in C++ to describe the graph (from simpler sub graphs).

10 Job Execution Job manager not currently fault tolerant Vertices may be scheduled multiple times due to failures – Each execution versioned – Execution record kept- including versions of incoming vertices – Outputs are uniquely named (versioned) – Final outputs selected if job completes – Non-file communication (TCP pipe, Shared Memory FIFO) may cascade failures Vertices specify hard constraints or preferences for set of computers required Scheduling is greedy assuming only one job

11 Policy Managers 11 RR XXXX Stage R RR Stage X Job Manager R managerX Manager R-X Manager Connection R-X

12 Cluster network topology rack top-of-rack switch top-level switch

13 Run-time Graph Refinement

14 SSSS AAA SS T SSSSSS T # 1# 2# 1# 3 # 2 # 3# 2# 1 static dynamic rack # Dynamic Aggregation 14

15 Fault Tolerance

16 SkyServer DB Query 3-way join to find gravitational lens effect Table U: (objId, color) 11.8GB Table N: (objId, neighborId) 41.8GB Find neighboring stars with similar colors: – Join U+N to find T = U.color,N.neighborId where U.objId = N.objId – Join U+T to find U.objId where U.objId = T.neighborID and U.color ≈ T.color

17 Took SQL plan Manually coded in Dryad Manually partitioned data SkyServer DB query u: objid, color n: objid, neighborobjid [partition by objid] select u.color,n.neighborobjid from u join n where u.objid = n.objid (u.color,n.neighborobjid) [re-partition by n.neighborobjid] [order by n.neighborobjid] [distinct] [merge outputs] select u.objid from u join where u.objid =.neighborobjid and |u.color -.color| < d

18 Optimization D M S Y X M S M S M S UN U

19 D M S Y X M S M S M S UN U

20 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 0246810 Number of Computers Speed-up Dryad In-Memory Dryad Two-pass SQLServer 2005

21 High level Programming Languages Nebula – limited to existing binaries SSIS – SQLServer workflow engine, distributed DryadLINQ – Supports both imperative and declarative operations on datasets

22 Dryad/DryadLINQ Decoupling of Dryad and DryadLINQ – Dryad: execution engine (given DAG, do scheduling and fault tolerance) – DryadLINQ: programming model (given query, generate DAG)

23 DryadLINQ Exploits LINQ (Relational queries integrated in C#) to provide a hybrid of imperative and declarative programming LINQ has a design choice that is easy to express computations also giving runtime leeway implementing them. Sequential program composed of LINQ expressions Performs side-effect free transformations on datasets Written and Debugged using.NET development tools More general than distributed SQL Programs can be automatically optimized and efficiently executed on large cluster

24 DryadLINQ Serialization for dryad are provided by High level software layers like DrayLINQ DrayLINQ preserves the LINQ programming model and defines new operators and datatypes for data parallel programming

25 DryadLINQ Architecture

26 DryadLINQ Data Model Partition Partitioned Table.Net objects Data Model is distributed implementation of LINQ Collections Each Dataset is distributed (disjoint) across the cluster Partitioned table exposes metadata information – type, partition, compression scheme, serialization, etc.

27 DrayLINQ Constructs Expressions must be side-effect free Allows programmer to specify annotations (hints) to guide optimization Operators – Hash Partition – Range Partition – Apply: Allows arbitrary streaming computations – Fork: Takes single input and generates multiple output datasets

28 System Implementation Execution Plan Graph: Starts by converting raw LINQ expressions into EPG DryadLINQ Optimizations – Static Optimizations – Dynamic Optimizations Code Generation: Uses dynamic code generation to automatically synthesize LINQ code to be run at the Drayad vertex

29 Conclusions Goal: Use a compute cluster as if it is a single computer – Dryad/DryadLINQ represent a significant step Requires close collaborations across many fields of computing, including – Distributed systems – Distributed and parallel databases – Programming language design and analysis

30 References Dryad: Distributed Data-parallel Programs from Sequential Building Blocks (Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly March 2007) DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High- Level Language (Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey December 2008)

31 Thank you


Download ppt "Dryad and DryaLINQ. Dryad and DryadLINQ Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation Dryad provides."

Similar presentations


Ads by Google