SAN DIEGO SUPERCOMPUTER CENTER HDF5/SRB Integration July 10, 2006 Mike Wan SRB, SDSC Peter Cao HDF, NCSA Sponsored by CIP/NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration
SAN DIEGO SUPERCOMPUTER CENTER Current Status Present the work at the TeraGrid '06 Publish HDF5 AIP documents White paper: HDF5 METS template: Finish h5ingest command line tool Create HDF5 METS template file Validate HDF5 METS document Setup a demo server to support SCEC files
SAN DIEGO SUPERCOMPUTER CENTER Current Status Work on test suite and bug fix Add code to separate HDF5 I/O time and SRB time Test large files and dataset (>2GB) Fix bug at srb client handler Work on performance improvement Implement a fairly large set of changes for the performance improvement by transfer raw data by byte- stream Need to test on large files
SAN DIEGO SUPERCOMPUTER CENTER Next Month More tests on performance for transferring raw data Add more features to HDFView for SRB support Integrate the software into the SRB configuration and distribution
SAN DIEGO SUPERCOMPUTER CENTER Potential SAC Projects SDSC ENZO project Enzo, 3D cosmological hydrodynamics code, simulating the process of massive star formation and destruction HDF5 is used as file format and parallel file I/O access FLASH Program The UC/DOE collaboration on creating three-dimensional, virtual reality projections of the cosmic explosions HDF5 is used for storing the data and high I/O access SCEC Terascale Earthquake Simulations Over 100 TB data/year Collections at SRB – 2.6 million files, 114 Terabytes
SAN DIEGO SUPERCOMPUTER CENTER TeraShake Surface Seismograms 4D Array (1.2 TB) Time (22,728) Horizontal (3,000) Vertical (1,500) Vector Component (3) Each file: 22,728 x 3,000 x 5 x 1 1,363,680,000 Bytes TeraShake scenario 900 files
SAN DIEGO SUPERCOMPUTER CENTER Example HDF5 File xhist00001hpss-scec xhist00002hpss-scec xhist00003hpss-scec xhist00004hpss-scec xhist00005hpss-scec HDF5 File 32-bit float 22,728 3,000 25
SAN DIEGO SUPERCOMPUTER CENTER File on SRB server
SAN DIEGO SUPERCOMPUTER CENTER Select a Subset
SAN DIEGO SUPERCOMPUTER CENTER HDFView