PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out.

Slides:



Advertisements
Similar presentations
MPI 2.2 William Gropp. 2 Scope of MPI 2.2 Small changes to the standard. A small change is defined as one that does not break existing correct MPI 2.0.
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Starfish: A Self-tuning System for Big Data Analytics.
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)
Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
File Systems.
Parallel I/O Performance Study Christian Chilan The HDF Group September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1.
Spark: Cluster Computing with Working Sets
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
1 File Management (a). 2 File-System Interface  File Concept  Access Methods  Directory Structure  File System Mounting  File Sharing  Protection.
1-1 Embedded Software Development Tools and Processes Hardware & Software Hardware – Host development system Software – Compilers, simulators etc. Target.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
Backup & Recovery 1.
HDF5 collective chunk IO A Working Report. Motivation for this project ► Found extremely bad performance of parallel HDF5 when implementing WRF- Parallel.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
1 © 2012 The MathWorks, Inc. Speeding up MATLAB Applications.
Suggested Exercise 9 Sarah Diesburg Operating Systems CS 3430.
Tanzima Z. Islam, Saurabh Bagchi, Rudolf Eigenmann – Purdue University Kathryn Mohror, Adam Moody, Bronis R. de Supinski – Lawrence Livermore National.
Optimizing Performance of HPC Storage Systems
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
Prof. Yousef B. Mahdy , Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
Internet and Distributed Representation of Agent Based Model by- Manish Sharma.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
1 File Management Chapter File Management n File management system consists of system utility programs that run as privileged applications n Concerned.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Distributed Components for Integrating Large- Scale High Performance Computing Applications Nanbor Wang, Roopa Pundaleeka and Johan Carlsson
Connections to Other Packages The Cactus Team Albert Einstein Institute
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
File management and Performance. File Systems Architecture device drivers physical I/O (PIOCS) logical I/O (LIOCS) access methods File organization and.
1.1 Sandeep TayalCSE Department MAIT 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming Batched Systems Time-Sharing Systems.
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
1 Data Structures and Algorithms Outline This topic will describe: –The concrete data structures that can be used to store information –The basic forms.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Model-driven Data Layout Selection for Improving Read Performance Jialin Liu 1, Bin Dong 2, Surendra Byna 2, Kesheng Wu 2, Yong Chen 1 Texas Tech University.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
PSC. BigBen Features Compute Nodes 2068 nodes running Catamount (QK) microkernel Seastar interconnect in a 3-D torus configuration No external.
Compute and Storage For the Farm at Jlab
Sarah Diesburg Operating Systems COP 4610
Parallel I/O Optimizations
SHARED MEMORY PROGRAMMING WITH OpenMP
CSCE 990: Advanced Distributed Systems
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
ICOM 6005 – Database Management Systems Design
Lock Ahead: Shared File Performance Improvements
Sarah Diesburg Operating Systems CS 3430
Support for Adaptivity in ARMCI Using Migratable Objects
Presentation transcript:

PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out IDX data directly in parallel – Real-time interactive visualization and analyze of data. – monitor the health of the simulations which can assist in steering the simulation as well Usage S3D combustion application to demonstrate the efficacy of PIDX for a real-world scientific simulation.

PIDX I/O phases Describe data model Create an IDX block bitmap – The bitmap indicates which IDX blocks must be populated in order to store an arbitrary N-dimensional dataset. Create underlying file and directory hierarchy – The IDX file and directory hierarchy is created by the rank 0 process in the application before any I/O is performed. Perform HZ encoding – The HZ encoding step is performed independently on each process. – In order to minimize memory access complexity, all samples are copied into intermediate buffers in a linear Z ordering. Aggregate data Write data to storage

HPC data models with PIDX /* define variables across all processes * / var1 = PIDX_variable_ global_define (“var1”, samples, datatype ) ; /* add local variables to the dataset */ PIDX_variable_local_add (dataset, var1, global_index, count ) ; /* describe memory layout */ PIDX_variable_local_layout (dataset,var1,memory_address, datatype) ; /* write all data */ PIDX_write ( dataset ) ;

Aggregation Phases Separate IO By each process leads to a large number of small accesses to each file. Using RMA to transmit each contiguous data segment to an intermediate aggregator. Aggregator Process Performs one single large I/O operation. Bundle noncontiguous memory into a single MPI indexed data types. Reduces the number of small network messages

Throughput comparison of all the versions of the API (Aggregation Strategy) EXPERIMENT SETUP Each process writes out a (64) 3 sub-volume with 4 variables. PERFORMANCE RESULTS At 256 processes, we achieve up to a 18-fold speed up, and at 2048 processes, we achieve up to 30-fold speed up over a scheme with no aggregation. The aggregation strategy that utilized MPI datatypes yielded a 20% improvement over the aggregation strategy that issued a separate MPI_Put() for each contiguous region.

Performance Evaluation With S3D EXPERIMENT SETUP In each run, S3D I/O wrote out 10 time-steps wherein each process contributed 32MiB data set PERFORMANCE RESULTS At 8192 processes, PIDX achieves a maximum I/O throughput of 18 GiB/s ( 90% of the IOR throughput). IOR and Fortran I/O achieve similar throughput for all the process counts. Fortran I/O in S3D behaves similarly to IOR test case with each process populating a unique output file.

Impact of PIDX file parameters on Lustre EXPERIMENT SETUP Procs : 256 to 4K. Proc Size : 64 3 (doubles) 512 MiB (256 procs) and 4 GiB (4K procs). Elements per block 2 15 to 2 18 Blocks per file 128, 256 and 512 PERFORMANCE RESULTS As the number of files increases a noticeable speed up :- The number of aggregators is increased. The Lustre file system performs better as data is distributed across a larger number of files. Design is flexible enough to be tuned to generate small number of large shared files or a large number of files depending on which is optimal for the target system.

Time taken by the various PIDX I/O components