Download presentation
Presentation is loading. Please wait.
1
FT-MPI Survey Alan & Nathan
2
FT- MPI - Overview Background Architecture Modes Drawbacks
Commercial Applications
3
FT-MPI – Background [1] FT-MPI is a full MPI 1.2 implementation
C and Fortran interfaces Provides process level fault tolerance Able to survive n-1 process crashes on n process job Does not recover data on crashed node
4
dynamic process management and fault tolerance
FT- MPI – Architecture MPI Library HARNESS FT-MPI dynamic process management and fault tolerance
5
FT- MPI – Background [1] (HARNESS)
Heterogeneous Adaptive Reconfigurable Networked SyStem Underlying framework to provide highly dynamic and fault tolerant high performance computing FT-MPI is a HARNESS MPI API
6
FT-MPI – Background [3] Communicators
MPI communicator {valid, invalid} FT-MPI communicator {OK, Problem, Failed} Problem = detected, recover, recovered Processes MPI {OK, Failed} FT-MPI {OK, Unavailable, Joining, Failed}
7
FT- MPI – Architecture [4]
MPI 1.2 API/MPI objects Highly tuned Tuned collective routines OS interaction via Hlib Startup, recovery, and shutdown Inter node communication
8
FT- MPI – Architecture [3]
MPI collective operations tuning Broadcast Gather Three options for buffering Derived Data Types (DDT) Zero padding Minimal padding Re-ordering pack – encoding/decoding
9
FT- MPI – Architecture [3]
DDT and Buffer Management Reorders data and compresses 10-19% improvement for small messages (~12kb)* 78-81% improvement for large messages (~95kb)* *Compared to MPICH (1.3.1) on 93 element DDT
10
FT- MPI – Architecture [3]
HARNESS Kernel allows dynamic code insertion both directly and indirectly Crucial impact of this: Spawn and Notify Service Remote processes Naming Service Distributed Replicated Database (DRD) System state & Metadata
11
FT- MPI – Modes Provides 4 error modes [1]
Abort – Quits if a process crashes Blank – Continue execution with missing data Shrink – Continue running and shrink # nodes Rebuild – Restart crashed process Message modes [3] NOP – no operations on error CONT – all other continue
12
FT-MPI – Abort Architecture
Initial Configuration: 4 ranks Rank 1 Rank 2 Scatter Abort Scatter Gather Gather Rank 0 Scatter Abort Gather Rank 3 Abort Configuration: Gracefully Shutdown All Ranks
13
FT-MPI – Shrink Architecture
Initial Configuration: 4 ranks Rank 1 Rank 2 Scatter MyRank = 1 Scatter Scatter Gather Gather Gather Rank 0 Scatter MyRank = 2 Scatter Gather Gather Continue Rank 3 Shrunk Configuration: 3 ranks Rank 2
14
FT-MPI – Blank Architecture
Initial Configuration: 4 ranks, MPI_COMM_SIZE = 4 Blank Rank 1 Rank 2 Scatter Scatter Scatter Gather Gather Gather Rank 0 Scatter Scatter Gather Gather Continue Rank 3 Blank Configuration: 3 valid ranks MPI_COMM_SIZE = 4
15
FT-MPI – Rebuild Architecture
Rebuilt Configuration: 4 ranks, MPI_COMM_SIZE = 4 Blank Rank 1 Rank 2 Scatter Scatter MyRank = 2 Gather Gather Rank 2 Rank 0 Scatter Gather Continue Rank 3 Blank Configuration: 3 valid ranks MPI_COMM_SIZE = 4
16
FT- MPI – Example Code #include "mpi.h" void wave2D () { int rc;
// start MPI rc = MPI_Init(&argc, &argv); If (rc==MPI_ERR_OTHER); // handle error and restart // compute array for t = 0 & t = 1 // compute t = 2+ For (t = 2; t < max; t++) { rc = MPI_Allgather(array, …); if (rc == MPI_ERR_OTHER) // restart lost node MPI_Comm_dup (oldcomm, &newcomm); // redistribute data to crashed node // revert to last completed t t = last_complete; // calculate array } }
17
FT- MPI – Tool [1] COMINFO
18
FT-MPI – Drawbacks Modifies MPI (Standard) Semantics (Gropp and Ewing)
Communicator ranks can enter undefined states Not realistic for writing production Applications What FT-MPI does not do (Gabrial et al., EuroPVM/MPI 2003) Recover user data (e.g., automatic checkpointing) Provide transparent fault-tolerance Minimum Checkpoint support – must be manually coded Known bugs and problems (2006) MPI_Ssend, MPI_Issend and MPI_Ssend_init are not providing a real synchronous send mode MPI_Cancel on Isend operations might fail where MPI standard requires the cancel operation to succeed. Mixed-mode communication for heterogeneous platforms requires XDR format
19
FT-MPI – Drawbacks – Based on Outdated Tech
2001 Today and Future High-speed Information high way 100x CPU perf. 1000x Node Density Disaggregated Architecture 100/400 GBE Failure Rates 100x greater NVMeOF (non-volatile memory express over Fabric) Single core CPUs Low Speed Internet Links No/little Virtualization SDN (Software defined Networking) Limited Redundancy, no RAS References:
20
Comparative FT Approaches for MPI [4]
Reference: Gabrial et al., Fault Tolerant MPI presentation, EuroPVM/MPI 2003
21
FT- MPI – Commercial Applications
No Commercial Applications based directly on FT-MPI ( ) No Development has been done on FT-MPI since being integrated in Open MPI (~2006) Fault Tolerance for Open MPI is based on FT-MPI, with extended functionality and support Commercial Implementations of Open MPI include: IBM Oracle Fujitsu ISV – library/application that sits on top of Open MPI References: George Bosilca, Jack Dongarra, Innovative Computing Laboratory, University of Tennessee
22
References Dongarra, J. (n.d.). FT-MPI. Retrieved November 22, 2016, from Gabriel, E., Fagg, G., Bukovsky, A., Angskun, T., & Dongarra, J. (n.d.). A Fault-Tolerant Communication Library for Grid Environments (pp. 1-10, Tech.). Fagg, G., Bukovsky, A., & Dongarra, J. (2001). HARNESS and fault tolerant MPI (Vol. 27, Parallel Computing, pp , Rep.). Gabriel, E., Fagg, G., Angskun, T., Bosilca, G., Bukovsky, A., & Dongarra, J. (2003). Fault Tolerant MPI. Retrieved December 01, 2016, from Bosilca, George, et al. "MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes." Supercomputing, ACM/IEEE 2002 Conference. IEEE, 2002. Gropp, William, and Ewing Lusk. "Fault tolerance in message passing interface programs." International Journal of High Performance Computing Applications 18.3 (2004): Lemarinier, Pierre, et al. "Improved message logging versus improved coordinated checkpointing for fault tolerant MPI." Cluster Computing, 2004 IEEE International Conference on. IEEE, 2004.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.