The HDF Group A Brief Introduction to HDF5 Quincey Koziol Director of Core Software and HPC The HDF Group March 5,

Slides:



Advertisements
Similar presentations
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Advertisements

File Systems.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
University of Illinois at Urbana-ChampaignHDF Mike Folk HDF-EOS Workshop IV Sept , 2000 HDF Update HDF.
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.
Director of Core Software & HPC
The HDF Group HDF5: State of the Union Quincey Koziol The HDF Group November 13,
The HDF Group Company, Services and Products May 30-31, 2012HDF5 Workshop at PSI 1.
The HDF Group HDF Update Mike Folk The HDF Group The 13th HDF and HDF-EOS Workshop November 3-5, 2009 HDF/HDF-EOS Workshop XIII1.
The HDF Group April 17-19, 2012HDF/HDF-EOS Workshop XV1 Introduction to HDF5 Barbara Jones The HDF Group The 15 th HDF and HDF-EOS Workshop.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
The HDF Group Virtual Object Layer in HDF5 Exploring new HDF5 concepts May 30-31, 2012HDF5 Workshop at PSI 1.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
December 1, 2005HDF & HDF-EOS Workshop IX P eter Cao, NCSA December 1, 2005 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
The HDF Group ESIP Summer Meeting HDF Studio John Readey The HDF Group 1 July 8 – 11, 2014.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
11/7/2007HDF and HDF-EOS Workshop XI, Landover, MD1 HDF5 Software Process MuQun Yang, Quincey Koziol, Elena Pourmal The HDF Group.
1 HDF5 Life cycle of data Boeing September 19, 2006.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
- 1 - HDF5, HDF-EOS and Geospatial Data Archives HDF and HDF-EOS Workshop VII September 24, 2003.
The HDF Group Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group November 5, 2009 November 3-5,
DM_PPT_NP_v01 SESIP_0715_JR HDF Server HDF for the Web John Readey The HDF Group Champaign Illinois USA.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
HDF and HDF-EOS Workshop VII September 24, 2003 HDF5, HDF-EOS and Geospatial Data Archives Don Keefer Illinois State Geological Survey Mike Folk Univ.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
TACTIC | Workflow: Project Management OSS on Microsoft Azure Helps Enterprises to Create Streamline, Manage, and Track Digital Content MICROSOFT AZURE.
The HDF Group HDF5 Overview Elena Pourmal The HDF Group 1 10/17/15ICALEPCS 2015.
1 Data Management with HDF5 Quincey Koziol Director of Core Software Development and HPC The HDF Group September 10, 2012NASA Digital.
Rich Internet Applications: Better Practices for Financial Services Stephen Turbek, Avenue A | Razorfish.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Update on Unidata Technologies for Data Access Russ Rew
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
The HDF Group HDF5 Update Quincey Koziol The HDF Group HEC-FSIO Workshop August 3, 2010 HEC-FSIO Workshop1.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
The HDF Group Introduction to HDF5 Session Three HDF5 Software Overview 1 Copyright © 2010 The HDF Group. All Rights Reserved.
THE PRESENT AND FUTURE nopCommerce  .
Hierarchical Data Formats (HDF) Update
Mohamad Chaarawi The HDF Group
Open Source distributed document DB for an enterprise
Spark Presentation.
HDF5 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.
HDF5 Metadata and Page Buffering
Wonderware Online Cost-Effective SaaS Solution Powered by the Microsoft Azure Cloud Platform Delivers Industrial Insights to Users and OEMs MICROSOFT AZURE.
Accessing Spatial Information from MaineDOT
Quick introduction to the Workshop
NGAGE Intelligence Leverages Microsoft Azure Platform to Provide Essential Analytics for Hybrid SharePoint Server/Office 365 Environments MICROSOFT AZURE.
Media365 Portal by Ctrl365 is Powered by Azure and Enables Easy and Seamless Dissemination of Video for Enhanced B2C and B2B Communication MICROSOFT AZURE.
Hierarchical Data Format (HDF) Status Update
Running C# in the browser
Presentation transcript:

The HDF Group A Brief Introduction to HDF5 Quincey Koziol Director of Core Software and HPC The HDF Group March 5, 20151HPC Oil & Gas Workshop

Why use HDF5? Challenging Data: Application data that pushes the limits of traditional solutions. Software Solutions: For very large and/or complex data With very fast access requirements Easily share data across a platforms Use different programming languages and OSs. Take advantage of the tools that understand HDF5. Enable long-term preservation of data. March 5, HPC Oil & Gas Workshop

HDF5 is like … March 5, 2015HPC Oil & Gas Workshop3

What is HDF5? March 5, 2015HPC Oil & Gas Workshop4 HDF5 == Hierarchical Data Format, v5 A flexible data model Structures for data organization and specific ation Open source software Implements the data model Portable file fo rmat Designed for high volume or complex data

5, HDF5 Data Model Groups – provide structure among objects Datasets – where the primary data goes Data arrays Rich set of datatype options Flexible, efficient storage and I/O Attributes - for metadata Everything else is built essentially from these parts. HPC Oil & Gas Workshop

HDF5 Software HDF5 home page: March 5, 2015HPC Oil & Gas Workshop6

Useful Tools For New Users March 5, 2015HPC Oil & Gas Workshop7 h5dump, h5ls : Tools to “dump” or list contents of HDF5 file HDFView : Java browser for HDF5 files HDF5 Examples (C, Fortran, Java, Python, Matlab) h5cc, h5c++, h5fc : Scripts to compile applications

Recent HPC Success Story Performance results on Blue NCSA I/O Kernel of a DOE Plasma Physics application Running on 298,048 cores ~10 Trillion particles Single 291TB HDF5 file Achieved 52 GB/s ~50% of the peak performance Using 1 GB stripe size and 160 Lustre OSTs March 5, 20158HPC Oil & Gas Workshop

HDF5 in Oil & Gas REMSQL: Standard for reservoir data (Energistics) standards/current-standardshttp:// standards/current-standards H5EM-TS: Exchange standard for field EM data (EMGS, Statoil, Interaction) ftp://fileformats.emgs.com/H5EM- TS_1.0/documentation/H5EM- TS_information_sheet.pdfftp://fileformats.emgs.com/H5EM- TS_1.0/documentation/H5EM- TS_information_sheet.pdf March 5, 2015HPC Oil & Gas Workshop9

HDF5 in Oil & Gas TEMHDF: Exchange standard for MetalMapper and other EMI data ftp://geom.geometrics.com/pub/Data/TEM2H5_ Deliverables/TEM2HDF_RefManual.pdfftp://geom.geometrics.com/pub/Data/TEM2H5_ Deliverables/TEM2HDF_RefManual.pdf PH5: Archival format for active source seismic data (moving away from SEG-Y, to HDF5) Petrel: E&P Workflow and Visualization Pages/petrel.aspxhttp:// Pages/petrel.aspx March 5, 2015HPC Oil & Gas Workshop10

HDF5 in Oil & Gas Globe Claritas: HDF5 is format for their seismic processing software SEG-Y vs. HDF5 Whitepaper: 303/55223/file/HDF5%20For%20Seismic%20Refle ction%20Datasets.pdf 303/55223/file/HDF5%20For%20Seismic%20Refle ction%20Datasets.pdf News release: est-Release est-Release PDF data sheet: 39/47774/file/Claritas%20HDF5.pdf 39/47774/file/Claritas%20HDF5.pdf Powerpoint: start-guide-to-using-hdf5-in-globe-claritas start-guide-to-using-hdf5-in-globe-claritas March 5, 2015HPC Oil & Gas Workshop11

Where We’ll Be Soon: HDF Beta release: Fall 2015 Major Features: Single-Writer/Multiple-Reader (SWMR) Virtual Datasets Improved scalability of chunked datasets Parallel I/O performance and capabilities March 5, HPC Oil & Gas Workshop

Other Items of Interest We’re not planning to change current multi-threaded concurrency behavior HDF5 Excel Add-in: HEXAD REST-based service for HDF5 data HDF Compass visualization package March 5, HPC Oil & Gas Workshop

The HDF Group Thank You! Questions & Comments? March 5, HPC Oil & Gas Workshop

The HDF Group Services Helpdesk and Mailing Lists Available to all users as a first level of support: Priority Support Rapid issue resolution and advice Consulting Needs assessment, troubleshooting, design reviews, etc. Training Tutorials and hands-on practical experience Enterprise Support Coordinate HDF activities across departments Special Projects Adapting customer applications to HDF New features and tools Research and Development March 5, HPC Oil & Gas Workshop

HDF Planned Features: SWMR Improves HDF5 for Data Acquisition: Allows simultaneous data gathering and monitoring/analysis Focused on storing data sequences for high-speed data sources Supports ‘Ordered Updates’ to file: Crash-proofs accessing HDF5 file Possibly uses small amount of extra space March 5, HPC Oil & Gas Workshop

HDF Planned Features Virtual Object Layer (VOL) Provides the HDF5 data model and API, but allows different underlying storage mechanisms Intercepts all HDF5 API calls that can touch the data on disk and routes them to a VOL plugin Possibly SEG-Y VOL plugin? March 5, HPC Oil & Gas Workshop

HDF Planned Features ‘Virtual’ Datasets Can “stitch together” multiple ‘source’ datasets into a single ‘virtual’ dataset Supports unlimited dimensions in both source and virtual datasets March 5, HPC Oil & Gas Workshop

HDF Planned Features: Chunk Imp. Dataset typeIndex typeSpace improvements Speed improvements no unlimited dimensions, no I/O filters, no missing chunks “implicit” no actual chunk index Same storage space as contiguous dataset storage (no index) Constant time lookups Faster parallel I/O no unlimited dimensions “fixed sized” smaller chunk index Smaller index overhead Constant time lookups 1 unlimited dimension “extensible array” Smaller index overhead Constant time lookups and appends 2+ unlimited dimension Improved B-tree* Smaller index overhead Faster March 5, HPC Oil & Gas Workshop

HDF Planned Features: HPC Continue to improve our use of MPI and parallel file system features Remove ‘truncate’ operation on file close, etc. Reduce # of I/O accesses for metadata access Collective Read/Write of metadata Multi-dataset Collective I/O Support for compression in parallel Collective access mode only Possibly Support Single-Write/Multiple-Reader (SWMR) access in parallel March 5, HPC Oil & Gas Workshop

HDF5 Roadmap March 5, Concurrency Single-Writer/Multiple- Reader (SWMR) Internal threading Virtual Object Layer (VOL) Data Analysis Query / View / Index APIs Native HDF5 client/server Performance Scalable chunk indices Metadata aggregation and Page buffering Asynchronous I/O Variable-length records Fault tolerance Parallel I/O I/O Autotuning HPC Oil & Gas Workshop “The best way to predict the future is to invent it.” – Alan Kay

Where We’re Not Going We’re not changing multi-threaded concurrency support Keep “global lock” on library Will focus on asynchronous I/O instead Will be using threads internally though March 5, HPC Oil & Gas Workshop

Codename “HEXAD” HDF5 Excel Add-in: HEXAD Lets you do the usual things including: Display content (file structure, detailed object info) Create/read/write datasets Create/read/update attributes Plenty of ideas for bells & whistles HDF5 Image & PyTables support, etc. Send in your Must Have/Nice To Have list!* Stay tuned for the beta program * March 5, HPC Oil & Gas Workshop

HDF Server REST-based service for HDF5 data Reference Implementation for REST API Developed in Python using Tornado Framework Supports Read/Write operations Clients can be Python/C/Fortran or Web Page Let us know what specific features you’d like to see. March 5, HPC Oil & Gas Workshop

HDF Compass “Simple” Python HDF5 Viewer application Cross platform (Windows/Mac/Linux) Native look and feel Can display extremely large HDF5 files View HDF5 files and OpenDAP resources Plugin model enables different file formats/remote resources to be supported Community-based development model March 5, HPC Oil & Gas Workshop

5, Brief History of HDF 1987At NCSA (University of Illinois), forms task force to create an architecture-independent file format and library, which becomes HDF Early NASA adopts HDF for Earth Observing System project 1990’s 1996 DOE collaborates with the HDF group (at NCSA) to create “Big HDF” which becomes HDF HDF5 released, with support from DOE, NASA & NCSA 2006 The HDF Group spins out of University of Illinois as non-profit corporation HPC Oil & Gas Workshop

The HDF Group Established in years at University of Illinois’ National Center for Supercomputing Applications 8 years as independent non-profit company: “The HDF Group” The HDF Group owns HDF4 and HDF5 HDF4 & HDF5 formats, libraries, and tools are open source and freely available with BSD-style license March 5, HPC Oil & Gas Workshop