NReduce: A Distributed Virtual Machine for Parallel Graph Reduction Peter Kelly Paul Coddington Andrew Wendelborn Distributed and High Performance Computing.

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

Threads, SMP, and Microkernels
Operating System Structures
M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
Remote Procedure Call Design issues Implementation RPC programming
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Distributed, parallel web service orchestration using XSLT Peter Kelly Paul Coddington Andrew Wendelborn.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
3.5 Interprocess Communication
1/28/2004CSCI 315 Operating Systems Design1 Operating System Structures & Processes Notice: The slides for this lecture have been largely based on those.
Cross Cluster Migration Remote access support Adianto Wibisono supervised by : Dr. Dick van Albada Kamil Iskra, M. Sc.
PRASHANTHI NARAYAN NETTEM.
DISTRIBUTED PROCESS IMPLEMENTAION BHAVIN KANSARA.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Session-02. Objective In this session you will learn : What is Class Loader ? What is Byte Code Verifier? JIT & JAVA API Features of Java Java Environment.
JAVA v.s. C++ Programming Language Comparison By LI LU SAMMY CHU By LI LU SAMMY CHU.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Java Security. Topics Intro to the Java Sandbox Language Level Security Run Time Security Evolution of Security Sandbox Models The Security Manager.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Comparative Programming Languages hussein suleman uct csc304s 2003.
A Simplified Approach to Web Service Development Peter Kelly Paul Coddington Andrew Wendelborn.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
SALSA: Language and Architecture for Widely Distributed Actor Systems. Carlos Varela, Abe Stephens, Department of.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Chapter 34 Java Technology for Active Web Documents methods used to provide continuous Web updates to browser – Server push – Active documents.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
 Remote Procedure Call (RPC) is a high-level model for client-sever communication.  It provides the programmers with a familiar mechanism for building.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Computer Programming 2 Why do we study Java….. Java is Simple It has none of the following: operator overloading, header files, pre- processor, pointer.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
1 Client-Server Interaction. 2 Functionality Transport layer and layers below –Basic communication –Reliability Application layer –Abstractions Files.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Lesson 1 1 LESSON 1 l Background information l Introduction to Java Introduction and a Taste of Java.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
Compilation of XSLT into Dataflow Graphs for Web Service Composition Peter Kelly Paul Coddington Andrew Wendelborn.
Layers Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.
Introduction to Operating Systems Concepts
Applications Active Web Documents Active Web Documents.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Spark Presentation.
Parallel Programming By J. H. Wang May 2, 2017.
Java programming lecture one
Parallel Programming in Contemporary Programming Languages (Part 2)
Software models - Software Architecture Design Patterns
Prof. Leonardo Mostarda University of Camerino
CS510 Operating System Foundations
Presentation transcript:

NReduce: A Distributed Virtual Machine for Parallel Graph Reduction Peter Kelly Paul Coddington Andrew Wendelborn Distributed and High Performance Computing Group School of Computer Science The University of Adelaide

Introduction Distributed computing middleware –Utilise a set of machines to execute a collection of jobs –Can be based on desktops, cluster, Internet Job organisation is usually either –Independent (task farming) –Dependency graph (workflow) Advantages of workflow languages –Implicit parallelism –High-level programming model, hide execution details –Access many different services easily Disadvantages of workflow languages –Limited feature set - poor support for control structures, data manipulation, other forms of computation –This means workflow developers sometimes have to create additional services or components in other languages to do relatively simple things (e.g. shims - data conversion)

Workflow execution

Related work Job scheduling (native programs) –Condor, Xgrid, Sun Grid Engine, Chimera Java/.NET-based –OptimalGrid, Alchemi, PAGIS Workflow/web service composition –Triana, Taverna, Kepler, BPEL, WSIPL Parallel functional programming –(v,G)-Machine, GUM, GHC+SMP Message passing libraries & languages –MPI, PVM, Erlang

Problem statement & approach Want to place more of the application’s logic in the workflow, rather than other services Useful for several types of programming constructs –Data structure access + manipulation –Functional operators: map, filter, reduction –Complex application logic Solution - NReduce, our distributed virtual machine –Implements a simple, Turing complete language for specifying workflows and computation –Based on existing techniques from the parallel functional programming community

Execution model Workflows often use data flow graphs to specify dependencies NReduce uses graph reduction, a similar model Well known technique for implementing functional languages –Also based on data dependencies –Supports higher order functions, lazy evaluation, parallelism –Can be efficiently compiled to native code using existing techniques f(g(x),h(y))

Execution model Workflows often use data flow graphs to specify dependencies NReduce uses graph reduction, a similar model Well known technique for implementing functional languages –Also based on data dependencies –Supports higher order functions, lazy evaluation, parallelism –Can be efficiently compiled to native code using existing techniques f(g(x),h(y))

Execution model Workflows often use data flow graphs to specify dependencies NReduce uses graph reduction, a similar model Well known technique for implementing functional languages –Also based on data dependencies –Supports higher order functions, lazy evaluation, parallelism –Can be efficiently compiled to native code using existing techniques f(r1,h(y))

Execution model Workflows often use data flow graphs to specify dependencies NReduce uses graph reduction, a similar model Well known technique for implementing functional languages –Also based on data dependencies –Supports higher order functions, lazy evaluation, parallelism –Can be efficiently compiled to native code using existing techniques f(r1,h(y))

Execution model Workflows often use data flow graphs to specify dependencies NReduce uses graph reduction, a similar model Well known technique for implementing functional languages –Also based on data dependencies –Supports higher order functions, lazy evaluation, parallelism –Can be efficiently compiled to native code using existing techniques f(r1,r2)

Execution model Workflows often use data flow graphs to specify dependencies NReduce uses graph reduction, a similar model Well known technique for implementing functional languages –Also based on data dependencies –Supports higher order functions, lazy evaluation, parallelism –Can be efficiently compiled to native code using existing techniques f(r1,r2)

Execution model Workflows often use data flow graphs to specify dependencies NReduce uses graph reduction, a similar model Well known technique for implementing functional languages –Also based on data dependencies –Supports higher order functions, lazy evaluation, parallelism –Can be efficiently compiled to native code using existing techniques r3

Distributed graph reduction x=A(...) y=B(...) f(g(x),h(y))

Distributed graph reduction x=A(...) y=B(...) f(g(x),h(y))

Distributed graph reduction x=r1 y=r2 f(g(x),h(y))

Distributed graph reduction x=r1 y=r2 f(g(x),h(y))

Distributed graph reduction f(r3,r4)

Distributed graph reduction f(r3,r4)

Distributed graph reduction r5

Nodes and tasks Distributed virtual machine consists of multiple nodes –Organised as a P2P network, with no central point of control –Nodes may be running different operating systems Each node contains multiple threads –Task threads (perform graph reduction) –I/O thread, garbage collector, etc. Threads communicate using a message passing layer –Provides asynchronous, one-way messaging –Similar to MPI but supports socket connections, dynamic node joins/departures A process is a group of cooperating task threads Distributed heap maintained across tasks Each task reduces its own graph segment

Nodes and tasks

Input language - ELC Very simple functional programming language Extended version of lambda calculus, with: –Arithmetic and conditional operations –Cons lists, strings –Letrec expressions and top-level functions Other functionality –Files & network connections - exposed as lists –Efficient list storage, based on arrays Intended as an intermediate language to be targeted by other compilers, e.g: –XPath/XSLT/XQuery (currently investigating) –Potentially other workflow languages

Execution Abstract instruction set based on the (v,G)-machine (Augustsson & Johnsson) Hides parallelism + distribution from programmer Bytecode interpreter –Similar to traditional interpreter, but based on call graph instead of call stack (for concurrency) Native code engine –JIT compilation on process startup; targets x86 –Significantly faster than interpreter (though not as fast as C) Runtime support –Distributed garbage collector –Message handler –Built-in functions

Frame management Frame = function call –Like stack frame/function activation record –Organised as a graph; may be >1 active at a time Frames are scheduled like processes in an OS Each frame is in one of four states –Running: candidate for execution by the processor; scheduled non-preemptively –Blocked: waiting on an I/O request or result of another call –New: a function call that has not yet begun evaluation (“suspended evaluation” or “future”) –Sparked: a function call that has not yet started, but is known to be needed in future; may be migrated to other machines for load balancing purposes Advantage: When one function call blocks, others may continue; no need to manually create threads

Work distribution Dynamic load distribution –Much simpler and more practical than static scheduling –Caters for nodes with different processor speeds and other running tasks –Based on the approach of GUM (Trinder et. al.) Work assigned to each task is based on the set of sparked frames available in the process Idle tasks send out work requests, asking to be given frames that they can start executing –When the request reaches a task with sparked frames, some of those frames will be migrated to the requesting task –Graph pointers are updated with remote references –Idle task will begin executing the new frames –Future work requests postponed until task is idle again

Work distribution

Parallelism NReduce uses strictness analysis to automatically detect parallelism –Many parallel functional languages instead rely on manual annotations, to avoid costs of excessive parallelism and sparking –We trade off a small amount of performance to gain automatic parallelism; acceptable for workflow languages which delegate most compute intensive work to services –Necessary for higher-level languages with no explicit support for parallelism (e.g. XSLT) Manual annotations also supported for certain cases Sparking is heavily optimised –One field assignment on spark, one field check on eval Side effect free programming model means no need for explicit synchronisation primitives

Streams Processes may establish TCP connections –Useful for accessing external services, e.g. web services Connections are exposed as data –Input stream: List of bytes, received from other machine –Output stream: List of bytes (generated by program), sent to other machine Parallel execution simplifies handling of multiple connections –When multiple function calls active, each may block and unblock independently –Blocking reads or writes to a connection affect only the function call that uses it, not the entire process –Important for workflows that invoke multiple service operations in parallel

Performance: Sequential nfib36mandelbrot nreduce (native) nreduce (interpreted) C Java Python Perl (8.5xC)(2.4xC)

Performance: Parallel

Current & future work Performance evaluation –Currently testing the VM in a range of scenarios –Comparison with other functional & workflow languages –Use in context of XSLT + web service composition Optimisations –Garbage collection –Work distribution Fault tolerance –Handle node failure within a process by recomputing lost portions of the graph where possible Background utilisation –Suspend node & migrate work when running on workstation and user becomes active

Conclusion Our goal is to support complex workflow applications and parallel computation NReduce is a distributed virtual machine, which: –Implements a functional programming model –Supports transparent parallelism & distribution –Enables concurrent access to external services –Is based on a P2P model –Runs across multiple platforms A different approach to workflow engine construction –More like a traditional programming language implementation, but using ideas from distributed computing –Provides a powerful + flexible mechanism for writing distributed applications