A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa {johnmc, dotsenko,

Slides:

Advertisements

Similar presentations

MPI Message Passing Interface

Advertisements

Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.

Programming Paradigms and languages

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,

Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.

1 An Evaluation of Global Address Space Languages: Co-Array Fortran and Unified Parallel C Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey Rice University.

 Open standard for parallel programming across heterogenous devices  Devices can consist of CPUs, GPUs, embedded processors etc – uses all the processing.

Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.

Reference: Message Passing Fundamentals.

Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Yuri Dotsenko Jason Lee EckhardtJohn Mellor-Crummey Department of.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1: the Berkeley UPC Experience Christian Bell and Wei Chen CS252.

Parallel Computing Overview CS 524 – High-Performance Computing.

ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.

CS 240A: Models of parallel programming: Machines, languages, and complexity measures.

Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.

Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.

Big Kernel: High Performance CPU-GPU Communication Pipelining for Big Data style Applications Sajitha Naduvil-Vadukootu CSC 8530 (Parallel Algorithms)

UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

ET E.T. International, Inc. X-Stack: Programming Challenges, Runtime Systems, and Tools Brandywine Team May2013.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Adaptive MPI Milind A. Bhandarkar

1 A Multi-platform Co-Array Fortran Compiler Yuri Dotsenko Cristian Coarfa John Mellor-Crummey Department of Computer Science Rice University Houston,

CSE 260 – Parallel Processing UCSD Fall 2006 A Performance Characterization of UPC Presented by – Anup Tapadia Fallon Chen.

1 John Mellor-Crummey Cristian Coarfa, Yuri Dotsenko Department of Computer Science Rice University Experiences Building a Multi-platform Compiler for.

Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Coarray: a parallel extension to Fortran Jim.

AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author ： Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source ： Proceedings of the 2nd IASTED.

Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.

Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University.

Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++

Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.

GPU Architecture and Programming

High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.

1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.

MPI-3 Hybrid Working Group Status. MPI interoperability with Shared Memory Motivation: sharing data between processes on a node without using threads.

High-Level, One-Sided Models on MPI: A Case Study with Global Arrays and NWChem James Dinan, Pavan Balaji, Jeff R. Hammond (ANL); Sriram Krishnamoorthy.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.

University of Minnesota Comments on Co-Array Fortran Robert W. Numrich Minnesota Supercomputing Institute University of Minnesota, Minneapolis.

Memory Management Operating Systems CS550. Memory Manager Memory manager - manages allocation and de-allocation of main memory Plays significant impact.

© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.

1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.

1 Lecture 4: Part 2: MPI Point-to-Point Communication.

Experiences with Co-array Fortran on Hardware Shared Memory Platforms Yuri DotsenkoCristian Coarfa John Mellor-CrummeyDaniel Chavarria-Miranda Rice University,

A Multi-platform Co-Array Fortran Compiler for High-Performance Computing Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey {dotsenko, ccristi,

C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.

1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.

Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.

4/27/2000 A Framework for Evaluating Programming Models for Embedded CMP Systems Niraj Shah Mel Tsai CS252 Final Project.

Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.

Co-array Fortran: Compilation, Performance, Languages Issues Cristian Coarfa Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University.

An Emerging, Portable Co-Array Fortran Compiler for High-Performance Computing Daniel Chavarría-Miranda, Cristian Coarfa, Yuri.

MPI-Message Passing Interface

Summary Background Introduction in algorithms and applications

Experiences with Sweep3D Implementations in Co-array Fortran

Support for Adaptivity in ARMCI Using Migratable Objects

Presentation transcript:

A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa {johnmc, dotsenko, SPMD process images – number of images fixed during execution – images operate asynchronously Both private and shared data – real a(20,20) private: a 20x20 array in each image – real a(20,20) [*] shared: a 20x20 array in each image Simple one-sided shared memory communication – x(:,j:j+2) = a(r,:) [p:p+2] copy rows from p:p+2 into local columns Flexible synchronization – sync_team(team [,wait]) team = a vector of process ids to synchronize with wait = a vector of processes to wait for (a subset of team) Pointers and dynamic allocation Parallel I/O Performance Results on IA64+Myrinet 2000Performance Results on Alpha+Quadrics *Performance Results on SGI Altix 3000 ** integer A(10,10)[*] if (me.eq. 1) then A(1:3,1:5)[me+1] = A(1:3,1:5)[me] A(10,10) image 1 A(10,10) image 2 A(10,10) image N A(10,10) image 1 A(10,10) image 2 A(10,10) image N Implementation Status Research Focus Enhancements to Co-Array Fortran model Point-to-point one-way synchronization Hints for matching synchronization events Collective operations intrinsincs Split-phase primitives Synchronization strength-reduction Communication vectorization Platform-driven communication optimizations Transform as useful from 1-sided to two-sided and collective comm. Generate both fine-grain load/store and calls to communication libraries as necessary Multi-model code for hierarchical architectures Convert Gets into Puts Compiler-directed parallel I/O with UIUC Interoperability with other parallel programming models Source-to-source code generation for wide portability Open source compiler will be available Working prototype for core language features Current compiler implementation performs no optimization – each co-array access is transformed into a get/put operation at the same point in the code Code generation uses the widely-portable ARMCI communication library Front-end based on production-quality Open64 front end, modified to support source-to-source compilation Co-Array Fortran Language... real(8) a(0:N+1,0:N+1)[*] me = this_image()... ! ghost cell update a(1:N,N+1)[left(me)] = a(1:N,0)... PUT Translation Example type CafHandleReal8 integer:: handle real(8):: ptr(:,:) end type type(CafHandleReal8) a_caf... allocate( cafBuffer_1%ptr(1:N,0:0) ) cafBuffer_2%ptr => a_caf%ptr(1:N,N+1:N+1) cafBuffer_1%ptr = a_caf%ptr(1:N,0) call CafArmciPutS(a_caf%handle, left(me), cafBuffer_1, cafBuffer_2) deallocate( cafBuffer_1%ptr )... MPI Portable and widely used The programmer has explicit control over data locality and communication Using MPI can be difficult and error prone Most of the burden for communication optimization falls on application developers; compiler support is underutilized HPF The compiler is responsible for communication and data locality Annotated sequential code (semiautomatic parallelization) Requires heroic compiler technology The model limits the application paradigms: extensions to the standard are required for supporting irregular computation Co-Array Fortran A sensible alternative to these extremes Programming Models for High-Performance Computing Simple and expressive models for high performance programming based on extensions to widely used languages Performance: users control data and computation partitioning Portability: same language for SMPs, MPPs, and clusters Programmability: global address space for simplicity * For NAS BT and CG the base case is synthetic, so that the first measurable point has efficiency 1.0** Preliminary results on a loaded system in the presence of other users competing for the memory bandwidth