Presentation is loading. Please wait.

Presentation is loading. Please wait.

PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out.

Similar presentations


Presentation on theme: "PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out."— Presentation transcript:

1 PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out IDX data directly in parallel – Real-time interactive visualization and analyze of data. – monitor the health of the simulations which can assist in steering the simulation as well Usage S3D combustion application to demonstrate the efficacy of PIDX for a real-world scientific simulation.

2 PIDX I/O phases Describe data model Create an IDX block bitmap – The bitmap indicates which IDX blocks must be populated in order to store an arbitrary N-dimensional dataset. Create underlying file and directory hierarchy – The IDX file and directory hierarchy is created by the rank 0 process in the application before any I/O is performed. Perform HZ encoding – The HZ encoding step is performed independently on each process. – In order to minimize memory access complexity, all samples are copied into intermediate buffers in a linear Z ordering. Aggregate data Write data to storage

3 HPC data models with PIDX /* define variables across all processes * / var1 = PIDX_variable_ global_define (“var1”, samples, datatype ) ; /* add local variables to the dataset */ PIDX_variable_local_add (dataset, var1, global_index, count ) ; /* describe memory layout */ PIDX_variable_local_layout (dataset,var1,memory_address, datatype) ; /* write all data */ PIDX_write ( dataset ) ;

4 Aggregation Phases Separate IO By each process leads to a large number of small accesses to each file. Using RMA to transmit each contiguous data segment to an intermediate aggregator. Aggregator Process Performs one single large I/O operation. Bundle noncontiguous memory into a single MPI indexed data types. Reduces the number of small network messages

5 Throughput comparison of all the versions of the API (Aggregation Strategy) EXPERIMENT SETUP Each process writes out a (64) 3 sub-volume with 4 variables. PERFORMANCE RESULTS At 256 processes, we achieve up to a 18-fold speed up, and at 2048 processes, we achieve up to 30-fold speed up over a scheme with no aggregation. The aggregation strategy that utilized MPI datatypes yielded a 20% improvement over the aggregation strategy that issued a separate MPI_Put() for each contiguous region.

6 Performance Evaluation With S3D EXPERIMENT SETUP In each run, S3D I/O wrote out 10 time-steps wherein each process contributed 32MiB data set PERFORMANCE RESULTS At 8192 processes, PIDX achieves a maximum I/O throughput of 18 GiB/s ( 90% of the IOR throughput). IOR and Fortran I/O achieve similar throughput for all the process counts. Fortran I/O in S3D behaves similarly to IOR test case with each process populating a unique output file.

7 Impact of PIDX file parameters on Lustre EXPERIMENT SETUP Procs : 256 to 4K. Proc Size : 64 3 (doubles) 512 MiB (256 procs) and 4 GiB (4K procs). Elements per block 2 15 to 2 18 Blocks per file 128, 256 and 512 PERFORMANCE RESULTS As the number of files increases a noticeable speed up :- The number of aggregators is increased. The Lustre file system performs better as data is distributed across a larger number of files. Design is flexible enough to be tuned to generate small number of large shared files or a large number of files depending on which is optimal for the target system.

8 Time taken by the various PIDX I/O components


Download ppt "PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out."

Similar presentations


Ads by Google