This work was performed under the auspices of the U.S. Department of Energy by University of California, Lawrence Livermore National Laboratory under Contract.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface Portable Parallel Programs.
Advertisements

MPI Message Passing Interface
Overview of programming in C C is a fast, efficient, flexible programming language Paradigm: C is procedural (like Fortran, Pascal), not object oriented.
Spheral++ An Open Source Project to Simulate Hydrodynamics in Astrophysical Systems. J. Michael Owen Lawrence Livermore National Laboratory.
Μπ A Scalable & Transparent System for Simulating MPI Programs Kalyan S. Perumalla, Ph.D. Senior R&D Manager Oak Ridge National Laboratory Adjunct Professor.
1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.
R Mohammed Wahaj. What is R R is a programming language which is geared towards using a statistical approach and graphics Statisticians and data miners.
Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
PyMercury: Interactive Python for the Mercury Particle Transport Code Forrest Iandola*, Matthew O’Brien, and Richard Procassini *University of Illinois,
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley.
1 Aug 7, 2004 GPU Req GPU Requirements for Large Scale Scientific Applications “Begin with the end in mind…” Dr. Mark Seager Asst DH for Advanced Technology.
1 Dan Quinlan, Markus Schordan, Qing Yi Center for Applied Scientific Computing Lawrence Livermore National Laboratory Semantic-Driven Parallelization.
The shift from sequential to parallel and distributed computing is of fundamental importance for the advancement of computing practices. Unfortunately,
Programming. Software is made by programmers Computers need all kinds of software, from operating systems to applications People learn how to tell the.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
CSE 381 – Advanced Game Programming 3D Game Architecture.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
OpenAlea An OpenSource platform for plant modeling C. Pradal, S. Dufour-Kowalski, F. Boudon, C. Fournier, C. Godin.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
NeSC Grid Apps Workshop Exposing Legacy Applications as OGSI Components using pyGlobus Keith R. Jackson Distributed Systems Department Lawrence Berkeley.
BIT 1003 – Presentation 7. Contents GENERATIONS OF LANGUAGES COMPILERS AND INTERPRETERS VIRTUAL MACHINES OBJECT-ORIENTED PROGRAMMING SCRIPTING LANGUAGES.
Continuations and Stackless Python Where Do You Want To Jump Today?
CASC This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under.
Parallel Interactive Computing with PyTrilinos and IPython Bill Spotz, SNL (Brian Granger, Tech-X Corporation) November 8, 2007 Trilinos Users Group Meeting.
Java Root IO Part of the FreeHEP Java Library Tony Johnson Mark Dönszelmann
1.8History of Java Java –Based on C and C++ –Originally developed in early 1991 for intelligent consumer electronic devices Market did not develop, project.
CS 240A Models of parallel programming: Distributed memory and MPI.
Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
Microsoft Visual Basic 2005 BASICS Lesson 1 A First Look at Microsoft Visual Basic.
LLNL-PRES-?????? This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Sep 08, 2009 SPEEDUP – Optimization and Porting of Path Integral MC Code to New Computing Architectures V. Slavnić, A. Balaž, D. Stojiljković, A. Belić,
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL)
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
Siena Computational Crystallography School 2005
Introduction What is detector simulation? A detector simulation program must provide the possibility of describing accurately an experimental setup (both.
Chapter 1 Introduction. Chapter 1 -- Introduction2  Def: Compiler --  a program that translates a program written in a language like Pascal, C, PL/I,
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Getting Started with SIDL using the ANL SIDL Environment (ASE) ANL SIDL Team MCS Division, ANL April 2003 The ANL SIDL compilers are based on the Scientific.
 Programming - the process of creating computer programs.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
Bronis R. de Supinski and Jeffrey S. Vetter Center for Applied Scientific Computing August 15, 2000 Umpire: Making MPI Programs Safe.
Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication.
Introduction to OOP CPS235: Introduction.
Lesson 1 1 LESSON 1 l Background information l Introduction to Java Introduction and a Taste of Java.
Exploring Parallelism with Joseph Pantoga Jon Simington.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Concurrency and Performance Based on slides by Henri Casanova.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.
1 Programming and problem solving in C, Maxima, and Excel.
DGrid: A Library of Large-Scale Distributed Spatial Data Structures Pieter Hooimeijer,
Compute and Storage For the Farm at Jlab
CST 1101 Problem Solving Using Computers
Visit for more Learning Resources
Computer 4 JEOPARDY Bobbie, Sandy, Trudy.
Software for scientific calculations
multiprocessing and mpi4py Python for computational science
Lawrence Livermore National Laboratory
A451 Theory – 7 Programming 7A, B - Algorithms.
Application Development Theory
Genome Workbench Chuong Huynh NIH/NLM/NCBI New Delhi, India
Introducing Java.
Presentation transcript:

This work was performed under the auspices of the U.S. Department of Energy by University of California, Lawrence Livermore National Laboratory under Contract W-7405-Eng-48. Steering Massively Parallel Applications Under Python Patrick Miller Lawrence Livermore National Laboratory

Characterization of Scientific Simulation Codes Big!Big! –Many options –Many developers Long LivedLong Lived –Years to develop –20+ year life Used as a tool for understandingUsed as a tool for understanding –Used in unanticipated ways

Steering Steering uses an interpreted language as the interface to “compiled assets.”Steering uses an interpreted language as the interface to “compiled assets.” –E.g. Mathematica Rich command structure lets users “program” within simulation paradigmRich command structure lets users “program” within simulation paradigm Exploit interactivityExploit interactivity

Why Steering Expect the unexpectedExpect the unexpected Let end-users participate in solutions to their problemsLet end-users participate in solutions to their problems Users can set, calculate with, and control low level detailsUsers can set, calculate with, and control low level details Developers can test and debug at a higher levelDevelopers can test and debug at a higher level Users can write their own instrumentation or even full blown physics packagesUsers can write their own instrumentation or even full blown physics packages It works!It works! –Mathematica, SCIrun, BASIS, IDL, Tecolote (LANL), Perl, Python!!!!

Why Python? Objects everywhereObjects everywhere Loose enough to be practicalLoose enough to be practical Small enough to be safeSmall enough to be safe –The tiny core could be maintained by one person Open source for the core AND many useful bits & pieces already existOpen source for the core AND many useful bits & pieces already exist

The big picture C++ and FORTRAN compiled assets for speedC++ and FORTRAN compiled assets for speed Python for flexibilityPython for flexibility Small, nimble objects are better than monolithic objectsSmall, nimble objects are better than monolithic objects No "main" - The control loop(s) are in PythonNo "main" - The control loop(s) are in Python –Python co-ordinates the actions of C++ objects –Python is the integration “glue”

Putting it together - Pyffle The biggest issue is always getting C++ to "talk with" Python (wrapping)The biggest issue is always getting C++ to "talk with" Python (wrapping) SWIG, PyFORT, CXX Python, Boost Python, PYFFLESWIG, PyFORT, CXX Python, Boost Python, PYFFLE Our big requirements are...Our big requirements are... –support for inheritance –support for Python-like interfaces –tolerance of templating<>

The shadow layer Pyffle generates constructor functions - NOT Python classesPyffle generates constructor functions - NOT Python classes We build extension classes in Python that "shadow" the C++ object toWe build extension classes in Python that "shadow" the C++ object to Allow Python style inheritanceAllow Python style inheritance Add functionalityAdd functionality Enforce consistencyEnforce consistency Modify interfaceModify interface......

Parallelism Kull uses SPMD style computation with a MPI enabled Python co-ordinatingKull uses SPMD style computation with a MPI enabled Python co-ordinating –Most parallelism is directly controlled "under the covers" by C++ (MPI and/or threads) –Started with a version with limited MPI support (barrier, allreduce) –New pyMPI supports a majority of MPI standard calls (comm, bcast, reduces, sync & async send/recv,...)

Basic MPI in Python mpi.bcast(value)mpi.bcast(value) mpi.reduce(item,operation, root?)mpi.reduce(item,operation, root?) mpi.barrier()mpi.barrier() mpi.rank and mpi.nprocsmpi.rank and mpi.nprocs mpi.send(value,destination, tag?)mpi.send(value,destination, tag?) mpi.recv(sender,tag?)mpi.recv(sender,tag?) mpi.sendrecv(msg, dest, src?, tag?, rtag?)mpi.sendrecv(msg, dest, src?, tag?, rtag?)

MPI calcPi def f(a): return 4.0/(1.0 + a*a) def integrate(rectangles,function): n = mpi.bcast(rectangles) h = 1.0/n for i in range(mpi.rank+1, n+1, mpi.procs): sum = sum + function( h * (i-0.05)) myAnswer = h * sum answer = mpi.reduce(myAnswer, mpi.SUM) return answer

Making it fast Where we have very generic base classes in the code (e.g. Equation-of-state), we have written Pythonized descendant classes that invoke arbitrary user written Python functionsWhere we have very generic base classes in the code (e.g. Equation-of-state), we have written Pythonized descendant classes that invoke arbitrary user written Python functions C++ component doesn't need to know it's invoking PythonC++ component doesn't need to know it's invoking Python There is a speed issue :-(There is a speed issue :-( But there is hope!But there is hope!

PyCOD - Compile on demand! Builds accelerated Python functions AND C++ functionsBuilds accelerated Python functions AND C++ functions Input is a Python function object and a type signature to compile toInput is a Python function object and a type signature to compile to Dynamically loads accellerated versionsDynamically loads accellerated versions Caches compiled code (matched against bytecode) to eliminate redundant compilesCaches compiled code (matched against bytecode) to eliminate redundant compiles Early results are encouraging (20X to 50X speedup)Early results are encouraging (20X to 50X speedup)

PyCOD example import compiler from types import * def sumRange(first,last): sum = 0 for i in xrange(first,last+1): sum = sum + i return sum signature = (IntType,IntType,IntType compiledSumRange, rawFunctionPointer = \ compiler.accelerate(sumRange,signature) print "Compiled result:", compiledSumRange(1,100)

Summary MPI enabled Python is a realityMPI enabled Python is a reality –We have run 1500 processors, long running simulations running fully under a Python core PYFFLE provides a reasonable conduit for developers to build capability in C++ and easily integrate that capability into PythonPYFFLE provides a reasonable conduit for developers to build capability in C++ and easily integrate that capability into Python