IPDPS 2009 Dynamic High-level Scripting in Parallel Applications Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois.

Slides:



Advertisements
Similar presentations
Introduction to .NET Framework
Advertisements

Database System Concepts and Architecture
C Language.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
LECTURE 1 CMSC 201. Overview Goal: Problem solving and algorithm development. Learn to program in Python. Algorithm - a set of unambiguous and ordered.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Spark: Cluster Computing with Working Sets
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
C++ fundamentals.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
1 LiveViz – What is it? Charm++ library Visualization tool Inspect your program’s current state Client runs on any machine (java) You code the image generation.
SALSA: Language and Architecture for Widely Distributed Actor Systems. Carlos Varela, Abe Stephens, Department of.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Introduction to Java and Object-Oriented Programming AJSS Computer Camp Department of Information Systems and Computer Science Ateneo de Manila University.
Python – Part 1 Python Programming Language 1. What is Python? High-level language Interpreted – easy to test and use interactively Object-oriented Open-source.
Advanced / Other Programming Models Sathish Vadhiyar.
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
Guide to Programming with Python Chapter One Getting Started: The Game Over Program.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Compilers: Overview/1 1 Compiler Structures Objective – –what are the main features (structures) in a compiler? , Semester 1,
Core Java Introduction Byju Veedu Ness Technologies httpdownload.oracle.com/javase/tutorial/getStarted/intro/definition.html.
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
Charm++ overview L. V. Kale. Parallel Programming Decomposition – what to do in parallel –Tasks (loop iterations, functions,.. ) that can be done in parallel.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Fault Tolerance in Charm++ Gengbin Zheng 10/11/2005 Parallel Programming Lab University of Illinois at Urbana- Champaign.
Using Charm++ with Arrays Laxmikant (Sanjay) Kale Parallel Programming Lab Department of Computer Science, UIUC charm.cs.uiuc.edu.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
1 Network Access to Charm Programs: CCS Orion Sky Lawlor 2003/10/20.
Teragrid 2009 Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University.
1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.
Debugging Large Scale Parallel Applications Filippo Gioachin Parallel Programming Laboratory Departement of Computer Science University of Illinois at.
Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.
Debugging Large Scale Applications in a Virtualized Environment Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.
PADTAD 2008 Memory Tagging in Charm++ Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign.
1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.
Debugging Tools for Charm++ Applications Filippo Gioachin University of Illinois at Urbana-Champaign.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Introduction to Operating Systems Concepts
Fundamentals of Programming I Overview of Programming
LiveViz – What is it? Charm++ library Visualization tool
Chapter 5 Remote Procedure Call
Parallel Programming By J. H. Wang May 2, 2017.
CharmDebug Filippo Gioachin.
Introduction to Operating System (OS)
NGS computation services: APIs and Parallel Jobs
Implementing Chares in a High-Level Scripting Language
Performance Evaluation of Adaptive MPI
Introduction to MATLAB
Component Frameworks:
Chapter 2: System Structures
Exploring the Power of EPDM Tasks - Working with and Developing Tasks in EPDM By: Marc Young XLM Solutions
Lecture 4: RPC Remote Procedure Call CDK: Chapter 5
Algorithm Correctness
Faucets: Efficient Utilization of Multiple Clusters
Realizing Concurrency using Posix Threads (pthreads)
Introduction to Data Structure
Debugging Support for Charm++
An Orchestration Language for Parallel Objects
Higher Level Languages on Adaptive Run-Time System
Support for Adaptivity in ARMCI Using Migratable Objects
Emulating Massively Parallel (PetaFLOPS) Machines
A type is a collection of values
Presentation transcript:

IPDPS 2009 Dynamic High-level Scripting in Parallel Applications Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign

IPDPS Filippo Gioachin - UIUC Outline ● Overview – Motivations – Charm++ RTS ● Scripting interface – Execution flow – Cross communication ● Case Studies – CharmDebug (parallel debugger) – Salsa (particle analysis tool) ● Future work

IPDPS Filippo Gioachin - UIUC Motivations ● Need for extra functionality at runtime – Steering computation – Analyzing data – Adding correctness checks while debugging ● Long running applications – Time consuming to recompile the code (if at all available) – Need to wait for application to re-execute ● Useful to upload scripts to perform not foreseen procedures

IPDPS Filippo Gioachin - UIUC Execution flow Registratio n Serve r Clien t Execut e ID Prin t (ID) “Hello World” Is Finished? (ID) Yes/No Execute can: ● Create new Python interpreter ● Wait for termination of script before returning ID

IPDPS Filippo Gioachin - UIUC Charm++ Overview ● Middleware written in C++ ● User decomposes work among objects (chares) ● System maps chares to processors – automatic load balancing – communication optimizations ● Communicate through asynchronous messages System view User view

IPDPS Filippo Gioachin - UIUC Charm++ RTS Pytho n Modul e Pytho n Modul e Pytho n Modul e External Client Converse Client ServerConverse Client Server

IPDPS Filippo Gioachin - UIUC Interface overhead ● Single Python interpreter – Creation of one Python interpreter: 40~50 ms – Other overhead: 1~2 ms – Independent on the number of processors ● Multiple Python interpreters Abe: NCSA Linux Cluster (dual-socket quad-core Intel Clovertown, 2.33GHz)

IPDPS Filippo Gioachin - UIUC Registration on the server ● In the code: Charm Interface (ci) file ● At runtime: module MyModule { array [1D] [python] MyArray { entry MyArray();..... } arrayProxy = CProxy_MyArray::ckNew(elem); arrayProxy.registerPython("pyCode"); Create a new chare collection of type MyArray. CCS request with tag “pyCode” are treated as Python requests bound to the just created collection. Declare a chare collection type called “MyArray” indexable by a 1-dimensional index.

IPDPS Filippo Gioachin - UIUC Usage on the client PythonExecute code = new PythonExecute(input.getText(), input.getMethod(), new PythonIteratorGroup(input.getChareGroup()), false, true, 0); code.setKeepPrint(true); code.setWait(true); code.setPersistent(true); if (interpreterHandle > 0) code.setInterpreter(interpreterHandle); byte[] reply = server.sendCcsRequestBytes("CpdPythonGroup", code.pack(), 0, true); if (reply.length == 0) { System.out.println("The python module was not linked in the application"); return; } interpreterHandle = CcsServer.readInt(reply, 0); PythonFinished finished = new PythonFinished(interpreterHandle, true); byte[] finishReply = server.sendCcsRequestBytes("CpdPythonGroup", finished.pack(), 0, true); PythonPrint print = new PythonPrint(interpreterHandle, false); byte[] output = server.sendCcsRequestBytes("CpdPythonGroup", print.pack(), 0, true); System.out.println("Python printed: "+new String(output)); Java snippet from CharmDebug

IPDPS Filippo Gioachin - UIUC Communication (1): low-level ● Always available to Python scripts – ck.mype() – ck.numpes() – ck.myindex() ● Additionally implementable in the server code – ck.read(what) – ck.write(what, where) Pytho n PyObject* read(PyObject*); void write(PyObject*, PyObject*); C++ (server code)

IPDPS Filippo Gioachin - UIUC Communication (2): high-level ● Allow other functions to be called by Python – Accessible through the “charm” module ● These functions can suspend and perform parallel operations module MyModule { array [1D] [python] MyArray { entry MyArray(); entry [python] void run(int); entry [python] void jump(int);..... } void run (int handle) { PyObject *args = pythonGetArg(handle); /* use args with Python/C API */ thisProxy.performAction(...parameters..., CkCallbackPython(msg)); int *value = (int*)msg->getData(); pythonReturn(i, Py_BuildValue("i",*value)); } Method definition (.ci)Method implementation (.C)

IPDPS Filippo Gioachin - UIUC High-level call overhead 10μs/cal l ● This timing depends on the Python/C API Abe, 1 processor (to avoid interference)

IPDPS Filippo Gioachin - UIUC Communication (3): iterative ● Double the mass of all particles with high velocity ● int buildIterator(PyObject*& data, void* iter); ● int nextIteratorUpdate(PyObject*& data, PyObject* result, void* iter); size = ck.read((“numparticles”, 0)) for i in range(0, size): vel = ck.read((“velocity”, i)) mass = ck.read((“mass”, i)) mass = mass * 2 if (vel > 1): ck.write((“mass, i), mass) Using low-level communication def increase(p): if (p.velocity > 1): p.mass = p.mass * 2 Using iterative communication

IPDPS Filippo Gioachin - UIUC Iterative interface overhead ● Each element performs work for 10μs ● The overhead to iterate over the elements is zero

IPDPS Charm Workshop 2009 CharmDebug: overview CharmDebug Java GUI (local machine) Firewal l Parallel Application (remote machine) CharmDebu g Applicatio n CCS (Converse Client- Server) GDB

IPDPS Filippo Gioachin - UIUC CharmDebug: introspection

IPDPS Filippo Gioachin - UIUC Salsa: cosmological data analysis Write your own piece of Python script Use graphical tool that internally issue Python scripts to perform the task

IPDPS Filippo Gioachin - UIUC 5-point 2D Jacobi ● Matrix size: sizeXsize ● Running on 32 processors ● Python interface used through CharmDebug ● Execution time in ms

IPDPS Filippo Gioachin - UIUC Conclusions ● Interface to dynamically upload Python scripts into a running parallel application ● The scripts can interact with the application – Low-level – High-level (initiate parallel operations) – Iterative mode ● The overhead of the interface is minimal – Negligible for human interactivity (Salsa) – Low enough to allow non-human interactivity (CharmDebug)

IPDPS Filippo Gioachin - UIUC Future work ● Handling errors more effectively – Currently mostly left to the programmer ● Define a good atomicity of operations – Checkpoint/rollback for automatic recovery ● Extend to MPI and other languages built on top of Charm++ – Any MPI routine is called – A specific routine is called (i.e MPI_Python) ● Export into MPI standard – Add capability to MPI to receive external messages – Execute Python scripts upon

IPDPS Filippo Gioachin - UIUC Questions? Thank you