Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 9, 2000 PDCS-2000 A Generalized Portable SHMEM Library Krzysztof Parzyszek Ames Laboratory Jarek Nieplocha Pacific Northwest National Laboratory.

Similar presentations


Presentation on theme: "November 9, 2000 PDCS-2000 A Generalized Portable SHMEM Library Krzysztof Parzyszek Ames Laboratory Jarek Nieplocha Pacific Northwest National Laboratory."— Presentation transcript:

1 November 9, 2000 PDCS-2000 A Generalized Portable SHMEM Library Krzysztof Parzyszek Ames Laboratory Jarek Nieplocha Pacific Northwest National Laboratory Ricky Kendall Ames Laboratory

2 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 Overview zIntroduction yglobal address space programming model yone-sided communication zCray SHMEM zGPSHMEM - Generalized Portable SHMEM zImplementation Approach zExperimental Results zConclusions

3 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 Global Address Space and 1-Sided Communication (0xf5670,P0) (0xf32674,P5) P0 P1P2 collection of address spaces of processes in a parallel job global address: (address, pid) message passing P1 P0 receive send But not P1P0 put one-sided communication Communication model hardware examples: Cray T3E, Fujitsu VPP5000 language support: Co-Array Fortran, UPC

4 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 Motivation: global address space versus other programming models

5 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 One-sided communication interfaces zFirst commercial implementation - SHMEM on the Cray T3D yput, get, scatter, gather, atomic swap ymemory consistency issues (solved on the T3E) ymaps well to the Cray T3E hardware - excellent application performance zVendors specific interfaces yIBM LAPI, Fujitsu MPlib, NEC Parlib/CJ, Hitachi RDMA, Quadrics Elan zPortable Interfaces  MPI-2 1-sided (related but rather restrictive model) yARMCI one-sided communication library ySHMEM (some platforms) yGPSHMEM -- first fully portable implementation of SHMEM

6 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 History of SHMEM zIntroduced in on the Cray T3D in 1993 yone-sided operations: put, get, scatter, gather, atomic swap ycollective operations: synchronization, reduction ycache not coherent w.r.t. SHMEM operations (problem solved on the T3E) yhighest level of performance on any MPP at that time zIncreased availability ySGI after purchasing Cray ported to IRIX systems and Cray vector systems xbut not always full functionality (w/o atomic ops on vector systems like Cray J90) xextensions to match more datatypes - SHMEM API is datatype oriented yHPVM project lead by Andrew Chien (UIUC/UCSD) xported and extended a subset of SHMEM xon top of Fast Messages for Linux (later dropped) and Windows clusters yQuadrics/Compaq port to Elan xavailable on Linux and Tru64 clusters with QSW switch ysubset on top of LAPI for the IBM SP xinternal porting tool by the IBM ACTS group at Watson

7 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 Characteristics of SHMEM zMemory addressability ysymmetric objects ystack, heap allocation on the T3D yCray memory allocation routine shmalloc zOrdering of operations yordered in the original version on the T3D yout-of-order on the T3E xadaptive routing, added shmem_quiet zProgress rules yfully one-sided, no explicit or implicit polling by remote node ymuch simpler model than MPI-2 1-sided xno redundant locking or remote process cooperation P1P0 shmem_put(a,b,n,0) Symmetric object a a b

8 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 GPSHMEM zFull interface of the Cray T3D SHMEM version zOrdering of operations zPortability restriction: must use shmalloc for memory allocation zExtensions for block strided data transfers ythe original Cray strided interface involved single elements yGPSHMEM shmem_strided_get( prem, ploc, rstride, lstride,nbytes, nblock, proc) ploc shmem_strided_get prem nblock lstride nbytes shmem_iget Cray SHMEM GPSHMEM

9 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 GPSHMEM implementation approach ARMCI message-passing library (MPI,PVM) Platform-specific communication interfaces (active messages, RMC, threads, shared memory) one-sided operations collective operations SHMEM interfaces Run-time support

10 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 ARMCI portable 1-sided communication library zFunctionality yput, get, accumulate (also with noncontiguous interfaces) yatomic read-modify-write, mutexes and locks ymemory allocation operations zCharacteristics ysimple progress rules - truly one-sided yoperations ordered w.r.t. target (ease of use) ycompatible with message-passing libraries (MPI, PVM) ylow-level system, no Fortran API zPortability yMPPs: Cray T3E, Fujitsu VPP, IBM SP (uses vendors 1-sided ops) yclusters of Unix and Windows systems (Myrinet,VIA,TCP/IP) ylarge servers with shared memory: SGI, Sun, Cray SV1, HP

11 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 Multiprotocols in ARMCI (IBM SP example) Process/thread synchronization Active Messages threads shared memory Remote memory copy between nodes SMP AMs used for noncontiguous transfers and atomic operations Places all user’s data in shared memory! ARMCI_Malloc()

12 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 Experience zPerformance studies yGPSHMEM overhead over SHMEM on the Cray T3E yComparison to MPI-2 1-sided on the Fujitsu VX-4 zApplications - see paper ymatrix multiplication on a Linux cluster yporting Cray T3E codes

13 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 GPSHMEM Overhead on the T3E zApproach yrenamed GPSHMEM calls to avoid conflict with Cray SHMEM ycollected latency and bandwidth numbers zOverhead yshmem_put 3.5  s yshmem_get 3  s ybandwidth is the same since GPSHMEM and ARMCI do not add extra memory copies zDiscussion ythe overhead includes GPSHMEM and ARMCI yreflects address conversion xsearching table of addresses for allocated objects xcan be avoided when addresses are identical ARMCI GPSHMEM Cray SHMEM

14 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 Performance of GPSHMEM and MPI-2 on the Fujitsu VX-4

15 Pacific Northwest National Laboratory Ames Laboratory PDCS-2000 Conclusions zDescribed a fully portable implementation of SHMEM-like library ySHMEM becomes a viable alternative to MPI-2 1-sided yGood performance closely tied up to ARMCI yOffers potential wide portability to other tools based on SHMEM xe.g. Co-Array Fortran zCray SHMEM API incomplete for strided data structures yextensions for block strided transfers improve performance zMore work with applications needed to drive future extensions and development  Code availability: rickyk@ameslab.gov


Download ppt "November 9, 2000 PDCS-2000 A Generalized Portable SHMEM Library Krzysztof Parzyszek Ames Laboratory Jarek Nieplocha Pacific Northwest National Laboratory."

Similar presentations


Ads by Google