Benjamin Perry and Martin Swany University of Delaware Computer Information Science.

Benjamin Perry and Martin Swany University of Delaware Computer Information Science

 Background  The problem  The solution  The results  Conclusions and Future work

 MPI programs communicate via MPI data types  MPI data types are usually modeled after native data types  Payloads are often arrays of MPI data types

 The sending MPI library packs payload into contiguous block  The receiving MPI library unpacks payload into original form  Non-contiguous blocks incur a copy penalty  SPMD programs, particularly in homogenous environments, can use optimized packing

 Users model MPI types after native types  Some fields do not need to be transmitted  Users often replace dead fields with a gap in the MPI type to align with native type

 Smaller payload…. but  MPI type is non-contiguous ◦ Copy penalty during packing and unpacking  Multi-core machines and high-performance networks feel the cost depending on payload  Multi-core machines are becoming ubiquitous ◦ SPMD applications are ideal for these platforms

 Applies only to SPMD applications  Static analysis to locate MPI data types ◦ MPI_type_struct()  Build internal representation of MPI data type ◦ MPI data type defined via library call at runtime ◦ Parameters indicate base types, consecutive instances, and displacements ◦ Def/use analysis to determine static definition

 Look for gaps in displacement array ◦ Size of base types multiplied by consecutive array  Match MPI type to native type ◦ Analyze the types of the payload ◦ MPI type must be subset of native data structure ◦ All sends and receives with MPI type handle must also share same base types

 Perform transformation on MPI type and native type ◦ Adjust parameters in MPI_type_struct ◦ Relocate non-transmitted fields to bottom of type  End goal: improve library performance of packing large arrays

 Safety check ◦ Cast to a type ◦ Address-of  Except for computing displacement ◦ Non-local types  Profitability ◦ Sends / receives within loops ◦ Large arrays of MPI types in sends / receives ◦ Cost incurred by cache misses, locality by adjusting native type when native type is in loops

 LLVM compiler pass  OpenMPI  Intel Core2 Quad-core 2.4gz  Ubuntu  Control: sending un-optimized data type with gap using payloads of various sizes  Tested: Rearranging gap in MPI type and native type using payloads of various sizes

 MPI data types modeled after native data types  Users introduce gaps, making data noncontiguous and costly to pack on fast networks  Discover this scenario at compile time  Fix it if safe and profitable  Greatly improves multi-core performance; infiniband also receives boost.

 Data type fission with user-injected gaps ◦ Separate transmitted fields from non-transmitted fields ◦ Complete eliminates data copy during packing  Data type fission with non-used fields ◦ Perform analysis on receiving end to see which fields are actually being used ◦ Cull non-used fields from data type; perform fission

? ? ? ? ?

Benjamin Perry and Martin Swany University of Delaware Computer Information Science.

Similar presentations

Presentation on theme: "Benjamin Perry and Martin Swany University of Delaware Computer Information Science."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Benjamin Perry and Martin Swany University of Delaware Computer Information Science.

Similar presentations

Presentation on theme: "Benjamin Perry and Martin Swany University of Delaware Computer Information Science."— Presentation transcript:

Similar presentations

About project

Feedback