Download presentation
Presentation is loading. Please wait.
Published byLee Morton Modified over 9 years ago
1
© 2008 IBM Corporation Deep Computing Messaging Framework Lightweight Communication for Petascale Supercomputing Supercomputing 2008 Michael Blocksome, blocksom@us.ibm.com
2
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 2 DCMF Open Source Community Open source community established January 2008 Wiki –http://dcmf.anl-external.org/wikihttp://dcmf.anl-external.org/wiki Mailing List –dcmf@lists.anl-external.orgdcmf@lists.anl-external.org Git Source Repository –helpful git resources on wiki –git clone http://dcmf.anl-external.org/dcmf.git/
3
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 3 Design Goals Scalable to millions of tasks Efficient on low frequency embedded cores –Inlined system programmer interface (SPI) Supports many programming paradigms –Active Messages –Support multiple contexts –Multiple levels of application interfaces Structured component design –Extendible to new architectures –Software architecture for multiple networks –Open source runtime with external contributions Separate library for optimized collectives –Hardware acceleration –Software collectives
4
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 4 Berkeley UPC Application DMA SPI DCMF (C++) MPICH2 DCMF Public API Global Arrays GASNet Systems Programming Interface Deep Computing Messaging Framework CCMI Application Layer Charm++ DMA SPI Applications (QCD) DCMF Applications ARMCI Library Portability Layer BG/P Network Hardware IBM supported software Externally supported software IBM ® Blue Gene ® /P Messaging Software Stack dcmfd ADI
5
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 5 Direct DCMF Application Programming dcmf.h – core interface –point-to-point and utilities –all functions implemented collectives interface(s) –may or may not be implemented –check return value on register! Collective Component Messaging Interface (CCMI) –high level collectives library –uses multisend interface –extensible to new collectives BG/P Hardware dcmf_collectives.h CCMI SX SX SX DCMF sysdep messager sysdep Device Protocol dcmf.h Protocol Device dcmf_globalcollectives.hdcmf_multisend.h Adaptor Application high level collectives multisend collectives all point-to-point global collectives
6
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 6 DCMF Blue Gene/P Performance Point-to-PointCollectives on 512 nodes (SMP) MPI achieves 4300MB/sec (96% of peak) for torus near-neighbor communication on 6 links ProtocolLatency (µs) DCMF Eager One-way1.6 MPI Eager One-way2.4 MPI Rendezvous One-way5.6 DCMF Put0.9 DCMF Get1.6 ARMCI blocking put2.0 ARMCI blocking get3.3 Collective OperationPerformance MPI Barrier1.3us MPI Allreduce (int sum)4.3us MPI Broadcast4.3us MPI Allreduce throughput817 MB/sec MPI Bcast throughput2.0 GB/sec Barriers accelerated via the Global Interrupt network Allreduce and broadcast operations accelerated via the collective network Large broadcasts take advantage of the 6 edge- disjoint routes on a 3D torus
7
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 7 Why use DCMF ? Scales on BG/P to millions of tasks –high-efficiency, low overhead Open Source –active community support Easily port applications and libraries to DCMF interface Unique features of DCMF –See next chart
8
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 8 MXVERBSLAPIELANDCMF Multiple ContextsNYYYY Active MessagesNN1N1 YYY One-sided callsNYYYY Strided or Vector callsN1N1 N1N1 YYN2N2 Multi-send callsN1N1 N1N1 N1N1 N1N1 Y Message Ordering and Consistency NNNNY Device interface for many different networks NY (C-API)NNY 3 (C++) Topology AwarenessNNNNY Architecture NeutralNYYNY Non-blocking optimized collectives N1N1 N1N1 N1N1 BlockingY 1 This feature can be implemented in software on top of the provided set of features in this API, at possibly lower efficiency 2 Non-contiguous transfer operation to be added 3 Device level programming is available at the protocol level and not the API Feature Comparison (to the best of our knowledge)
9
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 9 DCMF C API Features Multiple Context Registration –supports multiple, concurrent communication paradigms Memory Consistency –One sided communication APIs like UPC and ARMCI need optimized support for memory consistency levels Active Messaging –Good match for Charm++ and other active message runtimes –MPI can be easily supported Multisend Protocols –Amortize startup across many messages sent together Topology Awareness Optimized Protocols See dcmf.h
10
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 10 Extending DCMF to other Architectures Copy the “Linux ® sockets” messager and build options –Contains sockets device and DCMF_Send () protocol –Implements core API, returns DCMF_UNIMPL for collectives New architecture only needs to implement DCMF_Send –Sockets device enables DCMF on Linux clusters –Shmem device enables DCMF on multi-core systems DCMF provides default *oversend point-to-point implementations –DCMF_Put () –DCMF_Get () –DCMF_Control () Selectively implement architecture devices and optimized protocols –Assign to DCMF_USER0_SEND_PROTOCOL (for example) to test
11
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 11 Upcoming Features * (nothing promised) Common Device Interface (CDI) –Posix Shared Memory –Sockets –Infiniband Multi-channel advance –Thread may advance a “slice” of the messaging devices –Dedicated threads result in uncontested locks for high-level communication libraries Add a blocking advance API –Eliminate explicit processor polls on supported hardware –May degrade to a regular DCMF_Messager_advance() on unsupported hardware Extend API to access Blue Gene ® features in portable manner –network and device structures –replace hardware struct with key-value Noncontiguous point-to-point one-sided –iterator can be used to implement all other interfaces (strided, vector, etc) One-sided “on the fly” collectives (ad hoc)
12
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 12 DCMF Device Abstraction At the core of DCMF is a “Device”, with a packet API abstraction and a DMA API abstraction In principle, the functions are virtual, in practice the methods are inlined for performance –Barton-Nackman C++ templates Common Device Interface (CDI) –If you implement this interface, you get all of DCMF “for free” –Good for rapid prototypes
13
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 13 Current DCMF Devices Blue Gene/P –DMA / 3-D Torus Network –Collective Network –Global Interrupt Network –Lockbox / Memory Atomics Generic –Sockets –hybrid compatable –Shared Memory –hybrid compatable –Infiniband –hybrid compatable
14
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 14 Other DCMF Projects IBM –Roadrunner Argonne National Laboratory –MPICH2 –ZeptoOS Pacific Northwest National Laboratory –Global Arrays / ARMCI Berkeley –UPC / GASNet University of Illinois at Urbana-Champaign –Charm++
15
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 15 Open Source Project Ideas, in no particular order Store-and-Forward protocols Stream API Channel combining, message striping across devices Extend to other process managers (OpenMPI, etc) Extend to other platforms (OS X, BSD, Windows, ?) DCMF functional and performance test suite Scalability improvements for sockets and IB Combination shmem/sockets messager GPU device ? hybrid model Shared memory collectives
16
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 16 How Can We be a more effective open source project How to improve open source experience specific needs, directions? missing features?
17
© 2008 IBM Corporation Additional Charts DCMF on Linux Clusters DCMF on Infiniband
18
© 2008 IBM Corporation DCMF on Linux Clusters
19
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 19 DCMF on Linux Clusters Build Instructions on Wiki –http://dcmf.anl-external.org/wiki/index.php/Building_DCMF_for_Linuxhttp://dcmf.anl-external.org/wiki/index.php/Building_DCMF_for_Linux Test environment for application developers –Evaluate the DCMF API and runtime –Port applications to DCMF before reserving time on Blue Gene/P Uses MPICH2 PMI for job launch and management –Needs pluggable job launch and sysdep extension to remove MPICH2 dependency Implemented Devices –sockets device –shmem device
20
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 20 DCMF Sockets Device Standard sockets syscalls implemented on many architectures Uses the “packet” CDI –New “stream” CDI may provide better performance Current design is not scalable –primarily a development and porting platform Can be used to initialize other devices that require sychronization
21
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 21 DCMF Shmem Device Uses the “packet” CDI Only point-to-point send Thread safe, allows multiple threads to post messages to device No collectives
22
© 2008 IBM Corporation DCMF on Infiniband
23
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 23 DCMF Infiniband Motivations Optimize for low power processors and big fatties Infiniband project lead: Charles Archer –communicate via dcmf mailing list
24
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 24 DCMF Infiniband Device Implements CDI “rdma” version –direct RDMA –memregions Implements CDI “packet” version –“eager” style sends rdma CDI design –SRQ, scalable – worst latency packet CDI design –Per destination rdma with send recv –Per destination rdma with direct DMA – best latency
25
DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 25 DCMF Infiniband – Future Work Remove artificial limits on scalability –currently 32 nodes Implement memregion caching Multiple adaptor support (?) Switch management routines (?) Multiple network implemention –SRQ and “per destination” Async progress through IB events
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.