© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.

Slides:

Advertisements

Similar presentations

Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.

Advertisements

Emerging of Software Technologies Reporter: Jeremie Lucero.

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.

IBM’s X10 Presentation by Isaac Dooley CS498LVK Spring 2006.

Introduction to the Partitioned Global Address Space (PGAS) Programming Model David E. Hudak, Ph.D. Program Director for HPC Engineering

PGAS Language Update Kathy Yelick. PGAS Languages: Why use 2 Programming Models when 1 will do? Global address space: thread may directly read/write remote.

Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.

1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*

University of Houston So What’s Exascale Again?. University of Houston The Architects Did Their Best… Scale of parallelism Multiple kinds of parallelism.

Introduction CS 524 – High-Performance Computing.

Some Thoughts on Technology and Strategies for Petaflops.

Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.

1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.

What is X10? X10 is a Java-like language being developed by IBM Research focusing on high-productivity and high performance support for programming multi-core,

Distributed Programming in Scala with APGAS Philippe Suter, Olivier Tardieu, Josh Milthorpe IBM Research Picture by Simon Greig.

Overview of Eclipse Parallel Tools Platform Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:

Hossein Bastan Isfahan University of Technology 1/23.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

Priority Research Direction Key challenges Fault oblivious, Error tolerant software Hybrid and hierarchical based algorithms (eg linear algebra split across.

ET E.T. International, Inc. X-Stack: Programming Challenges, Runtime Systems, and Tools Brandywine Team May2013.

OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.

Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures Ahmed Sameh and Ananth Grama Computer Science Department Purdue University.

Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.

Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.

© 2010 IBM Corporation Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems Gabor Dozsa 1, Sameer Kumar 1, Pavan Balaji 2,

Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Coarray: a parallel extension to Fortran Jim.

Slide 1 MIT Lincoln Laboratory Toward Mega-Scale Computing with pMatlab Chansup Byun and Jeremy Kepner MIT Lincoln Laboratory Vipin Sachdeva and Kirk E.

SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

MRPGA ： An Extension of MapReduce for Parallelizing Genetic Algorithm Reporter ：古乃卉.

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc

StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.

Back-end (foundation) Working group X-stack PI Kickoff Meeting Sept 19, 2012.

MAPLD Reconfigurable Computing Birds-of-a-Feather Programming Tools Jeffrey S. Vetter M. C. Smith, P. C. Roth O. O. Storaasli, S. R. Alam

HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.

Cloud Age Time to change the programming paradigm?

Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.

October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,

A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa {johnmc, dotsenko,

1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.

Programmability Hiroshi Nakashima Thomas Sterling.

Introduction Why are virtual machines interesting?

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

A Multi-platform Co-Array Fortran Compiler for High-Performance Computing Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey {dotsenko, ccristi,

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

Eclipse Debug Views Update Policy. 2 Copyright (c) 2005 IBM Corporation and others. All rights reserved. This program and the accompanying materials are.

1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.

Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.

EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.

BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.

NCSA Strategic Retreat: System Software Trends Bill Gropp.

Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.

Productive Performance Tools for Heterogeneous Parallel Computing

Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.

Welcome: Intel Multicore Research Conference

X10: Performance and Productivity at Scale

CRESCO Project: Salvatore Raia

Scaling for the Future Katherine Yelick U.C. Berkeley, EECS

Performance Evaluation of Adaptive MPI

Summary Background Introduction in algorithms and applications

HPC User Forum: Back-End Compiler Technology Panel

Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.

Presentation transcript:

© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with asyncs –Asynchronous UPC (AUPC)‏ –Proper extension of UPC with asyncs –X10 (already asynchronous)‏ –Extension of sequential Java  Language runtimes share common APGAS runtime through an APGAS library in C, Fortran, Java (co-habiting with MPI)‏ –Implements PGAS –Remote references –Global data-structures –Implements inter-place messaging –Optimizes inlineable asyncs –Implements global and/or collective operations –Implements intra-place concurrency –Atomic operations Algorithmic scheduler  Libraries reduce cost of adoption, languages offer enhanced productivity benefits  XL UPC status: on path to IBM supported product in 2011 APGAS Realization  Programming model is still based on shared memory. –Familiar to many programmers.  Place hierarchies provide a way to deal with heterogeneity. –Async data transfers between places are not an ad-hoc artifact of the Cell.  Asyncs offer an elegant framework subsuming multi-core parallelism and messaging.  There are many opportunities for compiler optimizations –E.g. communication aggregation. –So the programmer can write more abstractly and still get good performance  There are many opportunities for static checking for concurrency/distribution design errors.  Programming model is implementable on a variety of hardware architectures –Leads to better application portability.. –There are many opportunities for hardware optimizations based on APGAS APGAS Advantages X10 Project Status  X10 is an APGAS language in the Java family of languages  X10 is an open source project (Eclipse Public License) Documentation, releases, implementation source code, benchmarks, etc. all publicly available at  X10 and X10DT 2.0 Just Released! Added structs for improved space/time efficiency More flexible distributed object model (global fields/methods) Static checking of place types (locality constraints) X10DT 2.0 supports X10 C++ backend X used in 2009 HPC Challenge (Class 2) submission|  X Platforms Java-backend (compile X10 to Java) Runs on any Java 5 JVM Single process implementation (all places in one JVM) C++ backend (compile X10 to C++) AIX, Linux, cygwin, MacOS, Solaris PowerPC, x86, x86_64, sparc Multi-process implementation (one place per process) Uses common APGAS runtime X10 Innovation Grants – – Program to support academic research and curricular development activities in the area of computing at scale on cloud computing platforms based on the X10 programming language. Asynchronous PGAS Programming Model A programming model provides an abstraction of the architecture that enables programmers to express their solutions in manner relevant to their domain – Mathematicians write equations – MBAs write business logic Compilers, language runtimes, libraries, and operating systems implement the programming model, bridging the gap to the hardware. Development and performance tools provide the surrounding ecosystem for a programming model and its implementation. The evolution of programming models impacts – Design methodologies – Operating systems – Programming environments Compilers, Runtimes, Libraries, Operating Systems Programming Model Programming Models Design Methodologies Operating Systems Programming Environments Programming Models: Bridging the Gap Between Programmer and Hardware Fine grained concurrency async S Atomicity atomic S when (c) S Global data-structures points, regions, distributions, arrays Place-shifting operations at (P) S Ordering finish S clock Two basic ideas: Places and Asynchrony X10 LURAStreamFFT nodes GFlop/sMUP/sGBytes/sGFlops/s UPC LURAStreamFFT nodes GFlop/sMUP/sGBytes/sGFlops/s Performance results: Power5+ cluster HPL perf. comparison GFlop/s FFT perf. comparison IBM Poughkeepsie Benchmark Center 32 Power5+ nodes 16 SMT 2x processors/node 64 GB/node; 1.9 GHz HPS switch, 2 GBytes/s/link Performance results – Blue Gene/P X10 LURAStreamFFT nodes GFlop/sGUP/sGBytes/sGFlops/s UPC LURAStreamFFT nodes GFlop/sGUP/sGBytes/sGFlops/s IBM TJ Watson Res. Ctr. WatsonShaheen 4 racks Blue Gene/P 1024 nodes/rack 4 CPUs/node; 850 MHz 4 Gbytes/node RAM 16 x 16 x 16 torus HPL perf. comparisonFFT perf. comparison