Download presentation
Presentation is loading. Please wait.
1
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory rross@mcs.anl.gov
2
2 I/O in a HPC system Many cooperating tasks sharing I/O resources Relying on parallelism of hardware and software for performance Application I/O System Software … Storage Hardware … Clients running applications (100s-1000s) I/O devices or servers (10s-100s) Storage or System Network
3
3 Motivation HPC applications increasingly rely on I/O subsystems –Large input datasets, checkpointing, visualization Applications continue to be scaled, putting more pressure on I/O subsystems Application programmers desire interfaces that match the domain –Multidimensional arrays, typed data, portable formats Two issues to be resolved by I/O system –Very high performance requirements –Gap between app. abstractions and HW abstractions
4
4 I/O history in a nutshell I/O hardware has lagged behind and continues to lag behind all other system components I/O software has matured more slowly than other components (e.g. message passing libraries) –Parallel file systems (PFSs) are not enough This combination has led to poor I/O performance on most HPC platforms Only in a few instances have I/O libraries presented abstractions matching application needs
5
5 Evolution of I/O software Goal is convenience and performance for HPC Slowly capabilities have emerged Parallel high-level libraries bring together good abstractions and performance, maybe Local disk, POSIX Remote access (NFS, FC) Serial high-level libraries Parallel file systems MPI-IO Parallel high-level libraries (Not to scale or necessarily in the right order … )
6
6 I/O software stacks Myriad I/O components are converging into layered solutions Insulate applications from eccentric MPI-IO and PFS details Maintain (most of) I/O performance –Some HLL features do cost performance High-level I/O Library MPI-IO Library Parallel File System I/O Hardware Application
7
7 Role of parallel file systems Manage storage hardware –Lots of independent components –Must present a single view –Provide fault tolerance Focus on concurrent, independent access –Difficult to pass knowledge of collectives to PFS Scale to many clients –Probably means removing all shared state –Lock-free approaches Publish an interface that MPI-IO can use effectively –Not POSIX
8
8 Role of MPI-IO implementations Facilitate concurrent access by groups of processes –Understanding of the programming model Provide hooks for tuning PFS –MPI_Info as interface to PFS tuning parameters Expose a fairly generic interface –Good for building other libraries Leverage MPI-IO semantics –Aggregation of I/O operations Hide unimportant details of parallel file system
9
9 Role of high-level libraries Provide an appropriate abstraction for the domain –Multidimensional, typed datasets –Attributes –Consistency semantics that match usage –Portable format Maintain the scalability of MPI-IO –Map data abstractions to datatypes –Encourage collective I/O Implement optimizations that MPI-IO cannot (e.g. header caching)
10
10 Example: ASCI/Alliance FLASH FLASH is an astrophysics simulation code from the ASCI/Alliance Center for Astrophysical Thermonuclear Flashes Parallel netCDF IBM MPI-IO GPFS Storage ASCI FLASH Fluid dynamics code using adaptive mesh refinement (AMR) Runs on systems with thousands of nodes Three layers of I/O software between the application and the I/O hardware Example system: ASCI White Frost
11
11 FLASH data and I/O 3D AMR blocks –16 3 elements per block –24 variables per element –Perimeter of ghost cells Checkpoint writes all variables –no ghost cells –one variable at a time (noncontiguous) Visualization output is a subset of variables Portability of data desirable –Postprocessing on separate platform Ghost cell Element (24 vars)
12
12 Tying it all together FLASH tells PnetCDF that all its processes want to write out regions of variables and store them in a portable format PnetCDF performs data conversion and calls appropriate MPI-IO collectives MPI-IO optimizes writing of data to GPFS using data shipping, I/O agents GPFS handles moving data from agents to storage resources, storing the data, and maintaining file metadata In this case, PnetCDF is a better match to the application
13
13 Future of I/O system software More layers in the I/O stack –Better match application view of data –Mapping this view to PnetCDF or similar –Maintaining collectives, rich descriptions More high-level libraries using MPI-IO –PnetCDF, HDF5 are great starts –These should be considered mandatory I/O system software on our machines Focusing component implementations on their roles –Less general-purpose file systems -Scalability and APIs of existing PFSs aren ’ t up to workloads and scales –More aggressive MPI-IO implementations -Lots can be done if we ’ re not busy working around broken PFSs –More aggressive high-level library optimization -They know the most about what is going on Application Domain Specific I/O Library High-level I/O Library MPI-IO Library Parallel File System I/O Hardware
14
14 Future Creation and adoption of parallel high-level I/O libraries should make things easier for everyone –New domains may need new libraries or new middleware –HLLs that target database backends seem obvious, probably someone else is already doing this? Further evolution of components necessary to get best performance –Tuning/extending file systems for HPC (e.g. user metadata storage, better APIs) Aggregation, collective I/O, and leveraging semantics are even more important at larger scale –Reliability too, especially for kernel FS components Potential HW changes (MEMS, active disk) are complementary
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.