Download presentation
Presentation is loading. Please wait.
Published bySophia Allen Modified over 9 years ago
1
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft Research, Silicon Valley Presented by: Thomas Hummel
2
2 Introduction System Overview Dryad Graph Program Development Program Execution Experimental Results Future Work Agenda
3
Introduction Problem How to write efficient distributed programs easily? Environment Parallel Processors High Speed Links Administered Domain Ignore Low Level Issues 3
4
Introduction Parallel Execution Faster Execution Automatic Specification Manual Specification GPU Shader Distributed Databases MapReduce 4
5
Introduction 5 Graph Model Verticies Are Programs Edges Are Communication Links Forced Parallelism Mindset Necessary Abstraction
6
Introduction 6 GPU Shader Low Level Hardware Specific MapReduce Simplicity Paramount Performance Sacrificed Database Implicit Communication Algebra Optimized
7
Introduction 7 Dryad Fine Communication Control Multiple Input/Output Sets Must Consider Resources Execution Engine Executes DAG Of Programs Outputs Directed To Inputs No Recursion
8
System Overview 8 Dryad Job DAG Data Passed On Edges Vertex is a Program Message Structure User Defined Shared Memory TCP Files
9
System Overview 9 Dryad Job DAG Data Passed On Edges Vertex is a Program Message Structure User Defined Shared Memory TCP Files
10
System Overview 10 System Organization Job Manager Name Server Dameon (Work Nodes)
11
Dryad Graph 11 Graph Description Language “Embedded” in C++ Combine Sub-Graphs C++ Class Inherited By Vertex Program Program Name Program Factory
12
Dryad Graph 12 Vertex Creation C++ Class Inherited By Vertex Program Program Name Program Factory One Vertex Is a Graph Factory Called Program Specific Arguments Applied
13
Dryad Graph 13 Edge Creation Composition (Combine) Operation Two Graphs Varying Assignment Methods
14
Dryad Graph 14
15
Dryad Graph 15 Communication Channel File I/O By Default TCP Shared Memory Pitfall: Connected Vertices Must Be On Same Process Deadlock Avoidance DAG Architecture
16
Program Development 16 Vertex Program Development C++ Base Classes Status And Errors Reported to Job Manager Standard “Main” Method Channel Readers/Writers Supplied Via Argument List Legacy Programs C++ Wrapper
17
Program Development 17 Pipelined Execution Assuming Sequential Code Event Based Programming Channels Are Asynchronous Thread Pool Optimized For Verticies
18
Program Execution 18 Job Manager Job Ends If JM Machine Fails Different Schemes Possible To Avoid This Versioning System For Execution Instances Vertex Execution Starts When All Input Channels Ready User Can Specify Execution Machine Can Be Re-Run On Failures Job Ends After All Verticies Have Run
19
Program Execution 19 Fault Tolerance Re-Run Vertex If Failed Channel Re-Creation (File Recreation) TCP/Shared Memory Failures Cause Failures On All Connected Vertices Staged Execution Allows Intermediate Error Checking
20
Experimental Results 20 SQL Operation 10 Computer Cluster Gigabit Connections Data Mining Operation 1800 Computer Cluster 10 TB Data Set 11 Minute Execution Time
21
Future Work 21 Scripting Language Nebula Additional Abstraction SISS Integration SQL Server Integration Distributed SQL Queries Query Optimizer
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.