Download presentation
Presentation is loading. Please wait.
Published byCassandra Eileen Black Modified over 9 years ago
1
Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University
2
Center for Programming Models for Scalable Parallel Computing2Review, March 13, 2003 Outline Co-array Fortran language overview CAF compiler status and preliminary results language and compiler research issues interactions OpenMP compiler and runtime strategies for improving scalability Dragon tool hybrid MPI + OpenMP Open64 infrastructure source-to-source and source-to-object code infrastructure
3
Center for Programming Models for Scalable Parallel Computing3Review, March 13, 2003 Co-Array Fortran (CAF) Explicitly-parallel extension of Fortran 90/95 (Numrich & Reid) Global address space SPMD parallel programming model one-sided communication Simple, two-level model that supports locality management local vs. remote memory Programmer control over performance critical decisions data partitioning communication Suitable for mapping to a range of parallel architectures shared memory, message passing, hybrid, PIM Much in common with UPC
4
Center for Programming Models for Scalable Parallel Computing4Review, March 13, 2003 CAF Programming Model Features SPMD process images fixed number of images during execution images operate asynchronously Both private and shared data real y(20, 20) a private 20x20 array in each image real y(20, 20) [*] a shared 20x20 array in each image Simple one-sided shared-memory communication x(:, j:j+2) = y(r, :) [p:p+2] copy rows from p:p+2 into local columns Flexible synchronization sync_team(notify [,wait]) notify = a vector of process ids to signal wait = a vector of process ids to wait for Pointers and (perhaps asymmetric) dynamic allocation Parallel I/O
5
Center for Programming Models for Scalable Parallel Computing5Review, March 13, 2003 One-sided Communication with Co-Arrays integer a(10,20)[*] if (thisimage() > 1) a(1:5,1:10) = a(1:5,1:10)[thisimage()-1] a(10,20) image 1image 2image N image 1image 2image N
6
Center for Programming Models for Scalable Parallel Computing6Review, March 13, 2003 Finite Element Example (Numrich) subroutine assemble(start, prin, ghost, neib, x) integer :: start(:), prin(:), ghost(:), neib(:), k1, k2, p real :: x(:) [*] call sync_all(neib) do p = 1, size(neib) ! Add contributions from ghost regions k1 = start(p); k2 = start(p+1)-1 x(prin(k1:k2)) = x(prin(k1:k2)) + x(ghost(k1:k2)) [neib(p)] enddo call sync_all(neib) do p = 1, size(neib) ! Update the ghosts k1 = start(p); k2 = start(p+1)-1 x(ghost(k1:k2)) [neib(p)] = x(prin(k1:k2)) enddo call synch_all end subroutine assemble
7
Center for Programming Models for Scalable Parallel Computing7Review, March 13, 2003 Portable CAF Compiler Compile CAF to Fortran 90 + runtime support library source-to-source code generation for wide portability expect best performance by leveraging vendor F90 compiler Co-arrays access data in generated code using F90 pointers allocate storage with dope vector initialization outside F90 Porting to a new compiler / architecture synthesize compatible dope vectors for co-array storage tailor communication to architecture
8
Center for Programming Models for Scalable Parallel Computing8Review, March 13, 2003 CAF Compiler Status Near production-quality F90 front end from Open64 Working prototype for a CAF subset allocate co-arrays using static constructor-like strategy co-array access remote data access uses ARMCI get/put process local data access uses load/store synch_all, synch_team synchronization multi-dimensional array section operations Successfully compiled and executed NAS MG platforms: SGI Origin, IA64 Myrinet performance similar to hand-coded MPI
9
Center for Programming Models for Scalable Parallel Computing9Review, March 13, 2003 NAS MG Efficiency (Class C) IA64/Myrinet 2000
10
Center for Programming Models for Scalable Parallel Computing10Review, March 13, 2003 CAF Compiler Coming Attractions Co-arrays as procedure arguments Triplet notation for co-dimensions Co-arrays of user defined types types can contain pointers Dynamic allocation of co-arrays Compiler support for parallel I/O
11
Center for Programming Models for Scalable Parallel Computing11Review, March 13, 2003 CAF Language Research Issues Synchronization locks instead of critical sections split-phase primitives synch_team/synch_all semantics can require pairwise notification may need synchronization matching hints to enable optimization Language support for efficient reductions manually-coded reductions unlikely to yield portable performance Memory consistency model for co-array data Controlling process to processor mapping Support for hierarchical locality domains support work sharing on SMPs?
12
Center for Programming Models for Scalable Parallel Computing12Review, March 13, 2003 CAF Compiler Research Issues Aim for performance transparency Compiler optimization of communication and I/O multi-mode communication: direct load/store + RDMA combine synchronization with communication put/get with flag one-sided two-sided communication transform from get to put communication exploit split-phase communication and synchronization communication vectorization latency hiding for communication and parallel I/O platform-tailored optimization synchronization strength reduction Interoperability with other parallel programming models Optimizations to improve node performance
13
Center for Programming Models for Scalable Parallel Computing13Review, March 13, 2003 CAF Interactions Working with CAF code from Numrich and Wallcraft (NRL) Refining ARMCI synchronization with Nieplocha Designing parallel I/O design for CAF with UIUC Exploring language design with Numrich and Nieplocha Coordinating with Rasmussen (LANL) on Fortran 90 array dope vector interface library Planning a fall CAF workshop at PSC coordinating with Ralph Roskies, Sergiu Sanielevici encouragement from Rich Hirsch, Fred Johnson
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.