IPDPS 2009 Dynamic High-level Scripting in Parallel Applications Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign
IPDPS Filippo Gioachin - UIUC Outline ● Overview – Motivations – Charm++ RTS ● Scripting interface – Execution flow – Cross communication ● Case Studies – CharmDebug (parallel debugger) – Salsa (particle analysis tool) ● Future work
IPDPS Filippo Gioachin - UIUC Motivations ● Need for extra functionality at runtime – Steering computation – Analyzing data – Adding correctness checks while debugging ● Long running applications – Time consuming to recompile the code (if at all available) – Need to wait for application to re-execute ● Useful to upload scripts to perform not foreseen procedures
IPDPS Filippo Gioachin - UIUC Execution flow Registratio n Serve r Clien t Execut e ID Prin t (ID) “Hello World” Is Finished? (ID) Yes/No Execute can: ● Create new Python interpreter ● Wait for termination of script before returning ID
IPDPS Filippo Gioachin - UIUC Charm++ Overview ● Middleware written in C++ ● User decomposes work among objects (chares) ● System maps chares to processors – automatic load balancing – communication optimizations ● Communicate through asynchronous messages System view User view
IPDPS Filippo Gioachin - UIUC Charm++ RTS Pytho n Modul e Pytho n Modul e Pytho n Modul e External Client Converse Client ServerConverse Client Server
IPDPS Filippo Gioachin - UIUC Interface overhead ● Single Python interpreter – Creation of one Python interpreter: 40~50 ms – Other overhead: 1~2 ms – Independent on the number of processors ● Multiple Python interpreters Abe: NCSA Linux Cluster (dual-socket quad-core Intel Clovertown, 2.33GHz)
IPDPS Filippo Gioachin - UIUC Registration on the server ● In the code: Charm Interface (ci) file ● At runtime: module MyModule { array [1D] [python] MyArray { entry MyArray();..... } arrayProxy = CProxy_MyArray::ckNew(elem); arrayProxy.registerPython("pyCode"); Create a new chare collection of type MyArray. CCS request with tag “pyCode” are treated as Python requests bound to the just created collection. Declare a chare collection type called “MyArray” indexable by a 1-dimensional index.
IPDPS Filippo Gioachin - UIUC Usage on the client PythonExecute code = new PythonExecute(input.getText(), input.getMethod(), new PythonIteratorGroup(input.getChareGroup()), false, true, 0); code.setKeepPrint(true); code.setWait(true); code.setPersistent(true); if (interpreterHandle > 0) code.setInterpreter(interpreterHandle); byte[] reply = server.sendCcsRequestBytes("CpdPythonGroup", code.pack(), 0, true); if (reply.length == 0) { System.out.println("The python module was not linked in the application"); return; } interpreterHandle = CcsServer.readInt(reply, 0); PythonFinished finished = new PythonFinished(interpreterHandle, true); byte[] finishReply = server.sendCcsRequestBytes("CpdPythonGroup", finished.pack(), 0, true); PythonPrint print = new PythonPrint(interpreterHandle, false); byte[] output = server.sendCcsRequestBytes("CpdPythonGroup", print.pack(), 0, true); System.out.println("Python printed: "+new String(output)); Java snippet from CharmDebug
IPDPS Filippo Gioachin - UIUC Communication (1): low-level ● Always available to Python scripts – ck.mype() – ck.numpes() – ck.myindex() ● Additionally implementable in the server code – ck.read(what) – ck.write(what, where) Pytho n PyObject* read(PyObject*); void write(PyObject*, PyObject*); C++ (server code)
IPDPS Filippo Gioachin - UIUC Communication (2): high-level ● Allow other functions to be called by Python – Accessible through the “charm” module ● These functions can suspend and perform parallel operations module MyModule { array [1D] [python] MyArray { entry MyArray(); entry [python] void run(int); entry [python] void jump(int);..... } void run (int handle) { PyObject *args = pythonGetArg(handle); /* use args with Python/C API */ thisProxy.performAction(...parameters..., CkCallbackPython(msg)); int *value = (int*)msg->getData(); pythonReturn(i, Py_BuildValue("i",*value)); } Method definition (.ci)Method implementation (.C)
IPDPS Filippo Gioachin - UIUC High-level call overhead 10μs/cal l ● This timing depends on the Python/C API Abe, 1 processor (to avoid interference)
IPDPS Filippo Gioachin - UIUC Communication (3): iterative ● Double the mass of all particles with high velocity ● int buildIterator(PyObject*& data, void* iter); ● int nextIteratorUpdate(PyObject*& data, PyObject* result, void* iter); size = ck.read((“numparticles”, 0)) for i in range(0, size): vel = ck.read((“velocity”, i)) mass = ck.read((“mass”, i)) mass = mass * 2 if (vel > 1): ck.write((“mass, i), mass) Using low-level communication def increase(p): if (p.velocity > 1): p.mass = p.mass * 2 Using iterative communication
IPDPS Filippo Gioachin - UIUC Iterative interface overhead ● Each element performs work for 10μs ● The overhead to iterate over the elements is zero
IPDPS Charm Workshop 2009 CharmDebug: overview CharmDebug Java GUI (local machine) Firewal l Parallel Application (remote machine) CharmDebu g Applicatio n CCS (Converse Client- Server) GDB
IPDPS Filippo Gioachin - UIUC CharmDebug: introspection
IPDPS Filippo Gioachin - UIUC Salsa: cosmological data analysis Write your own piece of Python script Use graphical tool that internally issue Python scripts to perform the task
IPDPS Filippo Gioachin - UIUC 5-point 2D Jacobi ● Matrix size: sizeXsize ● Running on 32 processors ● Python interface used through CharmDebug ● Execution time in ms
IPDPS Filippo Gioachin - UIUC Conclusions ● Interface to dynamically upload Python scripts into a running parallel application ● The scripts can interact with the application – Low-level – High-level (initiate parallel operations) – Iterative mode ● The overhead of the interface is minimal – Negligible for human interactivity (Salsa) – Low enough to allow non-human interactivity (CharmDebug)
IPDPS Filippo Gioachin - UIUC Future work ● Handling errors more effectively – Currently mostly left to the programmer ● Define a good atomicity of operations – Checkpoint/rollback for automatic recovery ● Extend to MPI and other languages built on top of Charm++ – Any MPI routine is called – A specific routine is called (i.e MPI_Python) ● Export into MPI standard – Add capability to MPI to receive external messages – Execute Python scripts upon
IPDPS Filippo Gioachin - UIUC Questions? Thank you