Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.

Similar presentations


Presentation on theme: "University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also."— Presentation transcript:

1 University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also common Testbeds needed for software at scale

2 University of Chicago Department of Energy Topics for Discussion NetCDF  Other applications (leverage)  Read (parallel analysis tools) PVFS opportunities  TB in your office Scalability Testbed  Where do you test at scale? Application Log files  Real app log files at > 1GB Other  Can we quantify apps needs?

3 University of Chicago Department of Energy PVFS Peak Write Performance Using compute nodes for storage in these tests Peak at around 25-30 Mbytes/sec per I/O server Clients cannot maintain this to disk

4 University of Chicago Department of Energy Performance Visualization with Jumpshot For detailed analysis of parallel program behavior, timestamped events are collected into a log file during the run. A separate display program (Jumpshot) aids the user in conducting a post mortem analysis of program behavior. Log files can become large (>1GB), making it impossible to inspect the entire program at once. The FLASH Project motivated an indexed file format (SLOG) that uses a preview to select a time of interest and quickly display an interval. We collaborated with IBM and LLNL to collect SLOG files directly from AIX trace records and display traces from multithreaded programs. Logfile Jumpshot Processes Display

5 University of Chicago Department of Energy Chiba City Scalability Testbed http://www.mcs.anl.gov/chiba/

6 University of Chicago Department of Energy Notes FAQ on Parallel I/O  Include performance graphs, tutorial links Interaction with P2 (Data Mining and Access Pattern Discovery)  Parallel NetCDF (P2 as an application group)  Managing datasets of NetCDF files  Collect log files of application I/O Explore use of WAN FTP for Grid I/O  Remote I/O through MPI-IO interface PVFS Clusters for TB dataset experimentation Close with John Drake on parallel NetCDF for Climate

7 University of Chicago Department of Energy SC02 Demo Use Parallel NetCDF over MPI-IO over PVFS to access dataset  Extract time series from collection of files Parallel reads as well as writes  New feature: handle dynamically changing datasets Observe progress of running application Perform data analysis and visualization  Contrast with nonparallel approach Prototype on Chiba scalability testbed at ANL Bonus: collect log files of I/O behavior and show analysis and visualizations of log files

8 University of Chicago Department of Energy Demo Steps 1.Select variable from collection of files, write a new NetCDF file  Illustrates fast I/O  (address open performance for collections of files) 2.Perform PCA Illustrates algorithmically efficient methods 3.Visualize at each time step

9 University of Chicago Department of Energy Vision for the Future Databases and parallel I/O integration Data representations for standard file formats that provide better performance for typical access patterns (post NetCDF/HDF) Transparent parallel I/O to/from everywhere (grid transparent, file system hierarchy transparent)


Download ppt "University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also."

Similar presentations


Ads by Google