I/O on Clusters Rajeev Thakur Argonne National Laboratory.

I/O on Clusters Rajeev Thakur Argonne National Laboratory

State of Affairs I/O recognized as a problem for several years No good answers still for the question: What should I use for I/O on my cluster? Partial solutions exist for parts of the problem No mainstream solution that you can use blindfolded

What are the Requirements? Two distinct requirements Home directories globally visible across cluster For executables, parameter files, small output Reliable, need to be backed up Less of a need for concurrent writes to the same file Parallel I/O For large inputs, large outputs High-bandwidth concurrent writes and reads to the same file Performance critical Even some vendors not clear about above requirements

What are the Current Solutions? Can one file system be used for both? In theory yes, in practice no. Need physical separation between the two for performance Home directories Traditionally NFS; works at small scale Open question at large scale. Some NFS-GFS combo?

What are the Current Solutions? (contd) Parallel I/O IBM GPFS Really good paper on it at FAST ’02 http://www.usenix.org/publications/library/proceedings/fast02/full_papers/schmuck/schmuck.pdf http://www.usenix.org/publications/library/proceedings/fast02/full_papers/schmuck/schmuck.pdf I haven’t used it myself. I haven’t heard how well it works on large Linux clusters PVFS (Argonne, Clemson) Fast. Measured > 2GB/s bandwidth Needs more in the areas of reliability and management tools PVFS2 (Argonne, Clemson) Under development Improved performance. Adds reliability and management tools Lustre Under development

Using Parallel I/O Given a good parallel file system and sufficient I/O hardware, what can users or libraries do to get good performance? Wherever possible, make large concurrent I/O requests If not possible, make single collective request for noncontiguous data instead of lots of small requests (using MPI-IO, for example)

Example: Distributed Array Access File containing the global array in row-major order P3P2 P1P0 2D array distributed among four processes

Don’t Do This Each process makes one independent read request for each row in the local array (as in Unix) MPI_File_open(..., file,..., &fh) for (i=0; i<n_local_rows; i++) { MPI_File_seek(fh,...); MPI_File_read(fh, &(A[i][0]),...); } MPI_File_close(&fh);

Do This! Each process defines a noncontiguous file view and calls a collective I/O function MPI_Type_create_subarray(..., &subarray,...); MPI_Type_commit(&subarray); MPI_File_open(MPI_COMM_WORLD, file,..., &fh); MPI_File_set_view(fh,..., subarray,...); MPI_File_read_all(fh, A,...); MPI_File_close(&fh);

High-Level I/O Libraries Libraries have an even greater responsibility to do it right! Need the right API in the first place Collective, noncontiguous Use MPI-IO the right way Minimize small metadata accesses and updates Example: Parallel NetCDF library being developed at Argonne and Northwestern

What is Needed… In addition to performance, we need software that is Self-recovering Self-optimizing It’s time we write software that is “smart”

I/O on Clusters Rajeev Thakur Argonne National Laboratory.

Similar presentations

Presentation on theme: "I/O on Clusters Rajeev Thakur Argonne National Laboratory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

I/O on Clusters Rajeev Thakur Argonne National Laboratory.

Similar presentations

Presentation on theme: "I/O on Clusters Rajeev Thakur Argonne National Laboratory."— Presentation transcript:

Similar presentations

About project

Feedback