Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.

Slides:

Advertisements

Similar presentations

Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.

Advertisements

Introductions to Parallel Programming Using OpenMP

The OpenUH Compiler: A Community Resource Barbara Chapman University of Houston March, 2007 High Performance Computing and Tools Group

Introduction to the Partitioned Global Address Space (PGAS) Programming Model David E. Hudak, Ph.D. Program Director for HPC Engineering

1 An Evaluation of Global Address Space Languages: Co-Array Fortran and Unified Parallel C Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey Rice University.

PGAS Language Update Kathy Yelick. PGAS Languages: Why use 2 Programming Models when 1 will do? Global address space: thread may directly read/write remote.

1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*

Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.

Types of Parallel Computers

Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.

Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick U.C. Berkeley, EECS LBNL, Future Technologies Group.

Graph Analysis with High Performance Computing by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008.

Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.

Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.

Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.

Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.

UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.

Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.

A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Hossein Bastan Isfahan University of Technology 1/23.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.

The Future of MPI William Gropp Argonne National Laboratory

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.

1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova.

CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.

High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,

Bulk Synchronous Parallel Processing Model Jamie Perkins.

SPMD: Single Program Multiple Data Streams

Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Coarray: a parallel extension to Fortran Jim.

AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author ： Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source ： Proceedings of the 2nd IASTED.

Presented by High Productivity Language and Systems: Next Generation Petascale Programming Wael R. Elwasif, David E. Bernholdt, and Robert J. Harrison.

Presented by High Productivity Language Systems: Next-Generation Petascale Programming Aniruddha G. Shet, Wael R. Elwasif, David E. Bernholdt, and Robert.

MIMD Distributed Memory Architectures message-passing multicomputers.

Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.

Unified Parallel C at LBNL/UCB An Evaluation of Current High-Performance Networks Christian Bell, Dan Bonachea, Yannick Cote, Jason Duell, Paul Hargrove,

Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.

Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.

Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.

1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.

1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

University of Minnesota Comments on Co-Array Fortran Robert W. Numrich Minnesota Supercomputing Institute University of Minnesota, Minneapolis.

© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.

A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa {johnmc, dotsenko,

1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.

HPC Components for CCA Manoj Krishnan and Jarek Nieplocha Computational Sciences and Mathematics Division Pacific Northwest National Laboratory.

MPI: Portable Parallel Programming for Scientific Computing William Gropp Rusty Lusk Debbie Swider Rajeev Thakur.

Connections to Other Packages The Cactus Team Albert Einstein Institute

Unified Parallel C Kathy Yelick EECS, U.C. Berkeley and NERSC/LBNL NERSC Team: Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu,

Programmability Hiroshi Nakashima Thomas Sterling.

DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.

2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.

SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.

1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.

Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.

NCSA Strategic Retreat: System Software Trends Bill Gropp.

Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick LBNL and U.C. Berkeley.

For Massively Parallel Computation The Chaotic State of the Art

Q: What Does the Future Hold for “Parallel” Languages?

HPC User Forum: Back-End Compiler Technology Panel

Programming Parallel Computers

Presentation transcript:

Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications William D. Gropp Argonne National Laboratory

Center for Programming Models for Scalable Parallel Computing2 Participants Coordinating Principal Investigator: Ewing Lusk – Argonne National Laboratory Co-Principal Investigators (Laboratories): William Gropp – Argonne National Laboratory Ricky Kendall – Ames Laboratory Jarek Nieplocha – Pacific Northwest National Laboratory Co-Principal Investigators (Universities): Barbara Chapman – University of Houston Guang Gao – University of Delaware John Mellor-Crummey – Rice University Robert Numrich – University of Minnesota Dhabaleswar Panda – Ohio State University Thomas Sterling – California Institute of Technology Marianne Winslett – University of Illinois Katherine Yelick – University of California, Berkeley

Center for Programming Models for Scalable Parallel Computing3 Problem Statement Problem: Current programming models have enabled development of scalable applications on current large-scale computers, but the application development process itself remains complex, lengthy, and expensive, obstructing progress in scientific application development. Solution: Facilitate application development by providing standard libraries, convenient parallel programming languages, and petaflops-targeted advanced programming models. Goals: An array of attractive options for convenient, efficient, development of scalable, efficient scientific applications for terascale computers

Center for Programming Models for Scalable Parallel Computing4 A Three-Pronged Approach to Next- Generation Programming Models Extensions to existing library-based models MPI (-2; extensions) Global Arrays and extensions Portable SHMEM Robust implementations of language-based models UPC Co-Array Fortran Titanium OpenMP optimizations Advanced models for advanced architectures Multithreaded, PIM-based machines, Gilgamesh, etc.

Center for Programming Models for Scalable Parallel Computing5 Relationships Among the Parts Message Passing Remote Memory Shared Memory Mixed Models Language Extensions New Models Application Programming ModelsCommunication Firmware VIAMyrinetInfiniband MPP Switches Model Instances EARTHTitaniumGPSHMEMGAMPI-2MPIUPCCAFOpenMP + MPI OpenMP Implementation Substrate Panda Parallel I/OCAF Packages/ Modules Common Runtime ARMCIADI-3 Open64 Compiler HDF-5

Center for Programming Models for Scalable Parallel Computing6 Libraries Libraries for the remote memory access model MPI and MPI-2 Global Arrays GA combine higher-level model with efficiency for application convenience GP-SHMEM Popular Cray T3E model made portable Co-Array Fortran library Object-based scientific library, written in CAF

Center for Programming Models for Scalable Parallel Computing7 Languages Three languages providing a software global address space (suitable for distributed memory) and parallelism CAF (Co-Array Fortran) UPC (Unified Parallel C) Titanium (parallel Java) One language for shared memory Scalable OpenMP The Open64 compiler infrastructure Industrial strength compiler for C, Fortran 9x, C++ Used in the above projects One contribution to the community

Center for Programming Models for Scalable Parallel Computing8 Cross-Project Infrastructure Runtime communication approaches Exploiting NICs in support of parallel programming models ARMCI GASNet I/O Active buffering in Panda MPI-IO and parallel file systems Integrating active buffering into ROMIO implementation of MPI-IO Scalable I/O for parallel languages UPC CAF I/O

Center for Programming Models for Scalable Parallel Computing9 New Programming Models Defining a new execution model Semantics first Define for performance –Must provide the enormous benefit Bill Camp mentioned Define to support best algorithms in support of applications Define for likely HPC hardware, including –Many (zillions) processors –Deep memory hierarchy –Some hardware support for programming model Likely to have some kind of precisely relaxed memory consistency model –Common feature of all of the high performance libraries and languages in the project (even OpenMP) Experiments with new concepts such as percolation (move program to data instead of data to program)

Center for Programming Models for Scalable Parallel Computing10 Connections With Other Programs Applications from SciDAC, NSF/PACI, etc. DARPA HPCS Program John Mellor-Crummey (Rice) for HP Bob Numrich (UMN) for SGI Thomas Sterling (JPL/Caltech) for Cray Kathy Yelick (Berkeley) for SUN Guang Gao (U Delaware) IBM ANL a member of Cray Affiliates program Open64 Community OpenMP (U Houston formed a company to join ARB, since only companies can be members  ) IBM Blue Gene/L and QCDoC More…