© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.

© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with asyncs –Asynchronous UPC (AUPC)‏ –Proper extension of UPC with asyncs –X10 (already asynchronous)‏ –Extension of sequential Java  Language runtimes share common APGAS runtime through an APGAS library in C, Fortran, Java (co-habiting with MPI)‏ –Implements PGAS –Remote references –Global data-structures –Implements inter-place messaging –Optimizes inlineable asyncs –Implements global and/or collective operations –Implements intra-place concurrency –Atomic operations Algorithmic scheduler  Libraries reduce cost of adoption, languages offer enhanced productivity benefits  XL UPC status: on path to IBM supported product in 2011 APGAS Realization  Programming model is still based on shared memory. –Familiar to many programmers.  Place hierarchies provide a way to deal with heterogeneity. –Async data transfers between places are not an ad-hoc artifact of the Cell.  Asyncs offer an elegant framework subsuming multi-core parallelism and messaging.  There are many opportunities for compiler optimizations –E.g. communication aggregation. –So the programmer can write more abstractly and still get good performance  There are many opportunities for static checking for concurrency/distribution design errors.  Programming model is implementable on a variety of hardware architectures –Leads to better application portability.. –There are many opportunities for hardware optimizations based on APGAS APGAS Advantages X10 Project Status  X10 is an APGAS language in the Java family of languages  X10 is an open source project (Eclipse Public License) Documentation, releases, implementation source code, benchmarks, etc. all publicly available at http://x10-lang.orghttp://x10-lang.org  X10 and X10DT 2.0 Just Released! Added structs for improved space/time efficiency More flexible distributed object model (global fields/methods) Static checking of place types (locality constraints) X10DT 2.0 supports X10 C++ backend X10 2.0 used in 2009 HPC Challenge (Class 2) submission|  X10 2.0 Platforms Java-backend (compile X10 to Java) Runs on any Java 5 JVM Single process implementation (all places in one JVM) C++ backend (compile X10 to C++) AIX, Linux, cygwin, MacOS, Solaris PowerPC, x86, x86_64, sparc Multi-process implementation (one place per process) Uses common APGAS runtime X10 Innovation Grants – http://www.ibm.com/developerworks/university/innovation/ http://www.ibm.com/developerworks/university/innovation/ – Program to support academic research and curricular development activities in the area of computing at scale on cloud computing platforms based on the X10 programming language. Asynchronous PGAS Programming Model A programming model provides an abstraction of the architecture that enables programmers to express their solutions in manner relevant to their domain – Mathematicians write equations – MBAs write business logic Compilers, language runtimes, libraries, and operating systems implement the programming model, bridging the gap to the hardware. Development and performance tools provide the surrounding ecosystem for a programming model and its implementation. The evolution of programming models impacts – Design methodologies – Operating systems – Programming environments Compilers, Runtimes, Libraries, Operating Systems Programming Model Programming Models Design Methodologies Operating Systems Programming Environments Programming Models: Bridging the Gap Between Programmer and Hardware Fine grained concurrency async S Atomicity atomic S when (c) S Global data-structures points, regions, distributions, arrays Place-shifting operations at (P) S Ordering finish S clock Two basic ideas: Places and Asynchrony X10 LURAStreamFFT nodes GFlop/sMUP/sGBytes/sGFlops/s 43546.34325.723.67 866612.31650.540.62 16126823.021287.865.92 3243.12601.5 UPC LURAStreamFFT nodes GFlop/sMUP/sGBytes/sGFlops/s 43795.51407.9 874710.825613 16144221.552326.3 32233343.3122439.8 Performance results: Power5+ cluster HPL perf. comparison GFlop/s FFT perf. comparison IBM Poughkeepsie Benchmark Center 32 Power5+ nodes 16 SMT 2x processors/node 64 GB/node; 1.9 GHz HPS switch, 2 GBytes/s/link Performance results – Blue Gene/P X10 LURAStreamFFT nodes GFlop/sGUP/sGBytes/sGFlops/s 32 1170.042141 1024 38931.054516 2048 4096 UPC LURAStreamFFT nodes GFlop/sGUP/sGBytes/sGFlops/s 322420.041686.4 102477441.275376156 2048155382.54 4096280625.04 IBM TJ Watson Res. Ctr. WatsonShaheen 4 racks Blue Gene/P 1024 nodes/rack 4 CPUs/node; 850 MHz 4 Gbytes/node RAM 16 x 16 x 16 torus HPL perf. comparisonFFT perf. comparison

© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.

Similar presentations

Presentation on theme: "© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.

Similar presentations

Presentation on theme: "© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with."— Presentation transcript:

Similar presentations

About project

Feedback