CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State.

CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State University David K. Lowenthal Department of Computer Science, University of Georgia

Motivation Related work CC-MPI –One-to-many(all) communications –Many(all)-to-many(all) communications Performance study Conclusion

Traditional communication libraries (e.g. MPI) hide network details and provide a simple API. Advantage: User friendly. Limitation: Communication optimization opportunity is limited. –Optimizations can either be done at the compiler or in the library. »Architectural independent optimizations in the compiler »Architectural dependent optimizations in the library, but such optimizations can only be done for a single routine.

Compiled Communication: –At compile time, use both the application communication and network architecture information to perform communication optimizations. Static management of network resources. Compiler directed architecture dependent optimizations. Architecture dependent optimization across patterns. –To apply the compiled communication technique to MPI programs, The library must closely match the MPI library. The library must be able to support optimizations in compiled communication. –Expose network details. –Different implementations for a routine so that the user can choose the best one. –This work focuses on the compiled communication capable communication library.

Related Work: –Compiled directed architectural dependent optimization [Hinrichs94] –Compiled Communication [Bromley91,Cappello95,Kumar92,Yuan03] –MPI optimizations [Ogawa96,Lauria97,Tang00,Kielmann99]

CC-MPI: –Optimizes one-al-all, one-to-many, all-to-all, and many-to-many communications –Targets Ethernet Switched Clusters –Basic idea: Separate network control routines from data transmission routines Multiple implementations for each MPI routine.

One-to-many(all) communications: –Multicast based implementations. Reliable multicast (IP multicast is unreliable) –Use a simple ACK-based protocol) Group Management –A group needs to be created before any communication is to be performed. –2^n potential groups for n members –The hardware limits the number of simultaneous groups. –CC-MPI supports three group management schemes: Static, dynamic, and compiler-assisted

Static group management: –Associate a multicast group with a communicator statically. MPI_Bcast: send a reliable multicast message to the group MPI_Scatter: aggregate all messages to different nodes and send the aggregated messages to the group. Each receiver can extract its portion. MPI_Scatterv: Two MPI_Bcasts, one for the layout of the data, the other for the data. –Problem: For one-to-many communications, nodes that are not in the communications must also participate in the reliable multicast process

dynamic group management: –Dynamically creates a new group for one-to- many communications. –May introduce too much group management overheads. Compiler-assisted group management: –Extend the MPI API to allow users to directly manage multicast groups. For example, for MPI_Scatterv, we have three routines –MPI_Scatterv_open_group –MPI_Scatterv_ data_movement –MPI_Scatterv_close_group –The users may move, merge, and delete the control routines when additional information is available.

An example:

All(many)-to-all(many) communications –MPI_Alltoall, MPI_Alltoallv, MPI_Allgather, etc. –Multicast based implementation may not be efficient. –We need to distinguish between communications with large messages and small messages: Small messages: each node sends as fast as it can Large messages: use some mechanism to reduce contention. –Phase communication [Hinrichs94] »partition the all-to-all communication into phases such that there is no network contention within each phase. »Use barriers to separate phases so that different phases do not interference with each other. –Phase communications for all-to-all communications is well studied for many topologies.

Phase communication for many-to-many communication (MPI_alltoallv): –All nodes must know the pattern Use MPI_Allgather before anything is done Assume the compile has the info and store in local data structures. –Communication scheduling Greedy scheduling All-to-all based scheduling

CC-MPI supports four methods for MPI_Alltoallv: –All nodes send as fast as possible –Phased communication, level 1: MPI_Allgather for pattern information Communication scheduling Actual phase communication –Phased communication, level 2: (pattern is known) Communication scheduling Actual phase communication –Phased communication, level 3: (phases are known) Actual phase communication.

Performance Study: –Environment: 29 P3-650, 100Mbps Ethernet switch –LAM/MPI version 6.5.4 with c2c mode –MPICH version 4.2.4 with device ch_p4

Evaluation individual routine:

MPI_Bcast:

MPI_Scatter:

MPI_Scatterv (5 to 5 out of 29 nodes):

MPI_Allgather (16 nodes):

MPI_alltoall (16 nodes)

MPI_Alltoallv (alltoall pattern on 16 nodes):

MPI_Alltoallv (random pattern):

Benchmark Program (IS):

Benchmark Program (FT):

CC-MPI for software DSM (a synthetic application):

Conclusion: –We develop a compiled communication capable MPI prototype. –We demonstrate that by allowing users more control on the communications, significant improvement can be obtained. –Compiler support is needed for this model to be successful. http://www.cs.fsu.edu/~xyuan/CCMPI

CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State.

Similar presentations

Presentation on theme: "CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State.

Similar presentations

Presentation on theme: "CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State."— Presentation transcript:

Similar presentations

About project

Feedback