Presented by High Productivity Language Systems: Next-Generation Petascale Programming Aniruddha G. Shet, Wael R. Elwasif, David E. Bernholdt, and Robert.

Slides:



Advertisements
Similar presentations
IBM Research: Software Technology © 2006 IBM Corporation 1 Programming Language X10 Christoph von Praun IBM Research HPC WPL Sandia National Labs December.
Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
IBM’s X10 Presentation by Isaac Dooley CS498LVK Spring 2006.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Chapel: The Cascade High Productivity Language Ting Yang University of Massachusetts.
Chapel Chao, Dave, Esteban. Overview High-Productivity Computing Systems (HPCS, Darpa) High-Productivity Computing Systems (HPCS, Darpa) Cray’s Cascade.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Graph Analysis with High Performance Computing by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008.
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
CS252: Systems Programming Ninghui Li Final Exam Review.
Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji and D. K. Panda Network Based.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
ET E.T. International, Inc. X-Stack: Programming Challenges, Runtime Systems, and Tools Brandywine Team May2013.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Compiler, Languages, and Libraries ECE Dept., University of Tehran Parallel Processing Course Seminar Hadi Esmaeilzadeh
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
The Global View Resilience Model Approach GVR (Global View for Resilience) Exploits a global-view data model, which enables irregular, adaptive algorithms.
Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Coarray: a parallel extension to Fortran Jim.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Presented by High Productivity Language and Systems: Next Generation Petascale Programming Wael R. Elwasif, David E. Bernholdt, and Robert J. Harrison.
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
Presented by Reliability, Availability, and Serviceability (RAS) for High-Performance Computing Stephen L. Scott and Christian Engelmann Computer Science.
+ CUDA Antonyus Pyetro do Amaral Ferreira. + The problem The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Definitions Speed-up Efficiency Cost Diameter Dilation Deadlock Embedding Scalability Big Oh notation Latency Hiding Termination problem Bernstein’s conditions.
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
Multiresolution Analysis, Computational Chemistry, and Implications for High Productivity Parallel Programming Aniruddha G. Shet, James Dinan, Robert J.
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
High-Level, One-Sided Models on MPI: A Case Study with Global Arrays and NWChem James Dinan, Pavan Balaji, Jeff R. Hammond (ANL); Sriram Krishnamoorthy.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Fortress John Burgess and Richard Chang CS691W University of Massachusetts Amherst.
Kevin Skadron University of Virginia Dept. of Computer Science LAVA Lab Wrapup and Open Issues.
Scaling NWChem with Efficient and Portable Asynchronous Communication in MPI RMA Min Si [1][2], Antonio J. Peña [1], Jeff Hammond [3], Pavan Balaji [1],
October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
HPC Components for CCA Manoj Krishnan and Jarek Nieplocha Computational Sciences and Mathematics Division Pacific Northwest National Laboratory.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Fortress Aaron Becker Abhinav Bhatele Hassan Jafri 2 May 2006.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Programmability Hiroshi Nakashima Thomas Sterling.
Parallel Programming Basics  Things we need to consider:  Control  Synchronization  Communication  Parallel programming languages offer different.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
CCA Common Component Architecture Distributed Array Component based on Global Arrays Manoj Krishnan, Jarek Nieplocha High Performance Computing Group Pacific.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Nov 14, 08ACES III and SIAL1 ACES III and SIAL: technologies for petascale computing in chemistry and materials physics Erik Deumens, Victor Lotrich, Mark.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Accelerators to Applications
One Language to Rule Them All or ADA Strikes Back
Conception of parallel algorithms
3- Parallel Programming Models
Computer Engg, IIT(BHU)
X10: Performance and Productivity at Scale
Programming Models for SimMillennium
Computational issues Issues Solutions Large time scale
Presentation transcript:

Presented by High Productivity Language Systems: Next-Generation Petascale Programming Aniruddha G. Shet, Wael R. Elwasif, David E. Bernholdt, and Robert J. Harrison Computer Science and Mathematics Division Oak Ridge National Laboratory

2 Elwasif_HPCS_SC07 Revolutionary approach to large- scale parallel programming  Million-way concurrency (and more) will be required on coming HPC systems.  The current “Fortran+MPI+OpenMP” model will not scale.  New languages from the DARPA HPCS program point the way toward the next-generation programming environment.  Emphasis on performance and productivity.  Not SPMD:  Lightweight “threads,” LOTS of them  Different approaches to locality awareness/management  High-level (sequential) language constructs:  Rich array data types (part of the base languages)  Strongly typed object oriented base design  Extensible language model  Generic programming Candidate languages:  Chapel (Cray)  Fortress (Sun)  X10 (IBM) Based on joint work with  Argonne National Laboratory  Lawrence Berkeley National Laboratory  Rice University And the DARPA HPCS program

3 Elwasif_HPCS_SC07 Concurrency: The next generation  Single initial thread of control  Parallelism through language constructs  True global view of memory, one-sided access model  Support task and data parallelism  “Threads” grouped by “memory locality”  Extensible, rich distributed array capability  Advanced concurrency constructs:  Parallel loops  Generator-based looping and distributions  Local and remote futures

4 Elwasif_HPCS_SC07 What about productivity?  Index sets/regions for arrays  “Array language” (Chapel, X10)  Safe(r) and more powerful language constructs  Atomic sections vs locks  Sync variables and futures  Clocks (X10)  Type inference  Leverage advanced IDE capabilities  Units and dimensions (Fortress)  Component management, testing, contracts (Fortress)  Math/science-based presentation (Fortress)

5 Elwasif_HPCS_SC07 Exploring new languages: Quantum chemistry  Fock matrix construction is a key kernel.  Used in pharmaceutical and materials design, understanding combustion and catalysis, and many other areas.  Scalable algorithm is irregular in both data and work distribution.  Cannot be expressed efficiently using MPI. D, F global- view distribute d arrays task-local working blocks work pool of integral blocks F   D  [ 2 (  ) - (  |  ) ] DF Integral s CPU 0CPU 1CPU P-2CPU P-1 (|)(|)

6 Elwasif_HPCS_SC07 Load balancing approaches for Fock matrix build Load balancing approach Language constructs used Chapel (Cray)Fortress (Sun)X10 (IBM) Static, program managed Unstructured computations + locality control Explicit threads + locality control Asynchronous activities + locality control Dynamic, language (runtime) managed Iterators + forall loops Multigenerator for loops Not currently specified Dynamic, program managed Task pool Synchronization variables Abortable atomic expressions Conditional atomic sections + futures Shared counter Synchronization variables Atomic expressions Unconditional atomic sections + futures DF Integrals CPU 0CPU 1CPU P-2CPU P-1 (|)(|)

7 Elwasif_HPCS_SC07 Parallelism and global-view data in Fock matrix build Operations Language constructs used Chapel (Cray)Fortress (Sun)X10 (IBM) Mixed data and task parallelism Cobegin (task) + domain iterator (data) Tuple (task) + for loop (data) Finish async (task) + ateach (data) Global-view array operations Initializatio n Array initialization expressions Comprehensions / function expressions Array initialization functions Arithmetic Array promotions of scalar operators (+,*) Fortress library operators (+,juxtaposition) Array class methods (add,scale) Sub-array Slicing Array factory functions (subarray) Restriction DF Integrals CPU 0CPU 1CPU P-2CPU P-1 (|)(|)

8 Elwasif_HPCS_SC07 Tradeoffs in HPLS language design  Emphasis on parallel safety (X10) vs expressivity (Chapel, Fortress)  Locality control and awareness:  X10: explicit placement and access  Chapel: user-controlled placement, transparent access  Fortress: placement “guidance” only, local/remote access blurry (data may move!!!)  What about mental performance models?  Programming language representation:  Fortress: Allow math-like representation  Chapel, X10: Traditional programming language front end  How much do developers gain from mathematical representation?  Productivity/performance tradeoff  Different users have different “sweet spots”

9 Elwasif_HPCS_SC07 Remaining challenges  (Parallel) I/O model  Interoperability with (existing) languages and programming models  Better (preferably portable) performance models and scalable memory models  Especially for machines with 1M+ processors  Other considerations:  Viable gradual adoption strategy  Building a complete development ecosystem

10 Elwasif_HPCS_SC07 Contacts Aniruddha G. Shet Computer Science Research Group Computer Science and Mathematics Division (865) Wael R. Elwasif Computer Science Research Group Computer Science and Mathematics Division (865) David E. Bernholdt Computer Science Research Group Computer Science and Mathematics Division (865) Robert J. Harrison Computational Chemical Sciences Computer Science and Mathematics Division (865) Elwasif_HPCS_SC07