9/17/2015The HDF Group1 HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop XI November 7, 2007
9/17/2015The HDF Group2 Outline What is The HDF Group? HDF Software Update Other Activities of Interest
9/17/2015The HDF Group3 What is The HDF Group (THG)?
9/17/2015The HDF Group4 THG, the Company Spun-off from University of Illinois July 2006 Non-profit 20+ scientific, technology, professional staff Intellectual property: THG owns HDF4 and HDF5 HDF formats and libraries to remain open Libraries have BSD-type license Continue ties to U of I and NCSA
9/17/2015The HDF Group5 The mission of The HDF Group is to ensure long-term accessibility of HDF data through sustainable development and support of HDF technologies.
9/17/2015The HDF Group6 Goals Maintain, evolve HDF for sponsors and communities that depend on it Do consulting, training, tuning, development, research Sustain The HDF Group for long term to assure data access over time
9/17/2015The HDF Group7 THG Services Helpdesk and Mailing Lists Available to all users as a first level of support Standard Support Rapid issue resolution support Consulting Needs assessment, troubleshooting, design reviews, etc. Enterprise Support Coordinating HDF activities across divisions Special Projects Adapting customer applications to HDF New features and tools, with changes normally incorporated into open source product Research and Development Training Tutorials and hands-on practical experience
9/17/2015The HDF Group8 HDF Software Update
9/17/2015The HDF Group9 HDF4 update
9/17/2015The HDF Group10 HDF 4.2r2 Released in October
9/17/2015The HDF Group11 New features and changes New APIs added to the SD and GR interfaces: SDreset_maxopenfiles, SDget_maxopenfiles, Modifies, reports maximum allowable number of files SDget_numopenfiles:Gets number of open files SDgetcompinfo, GRgetcompinfo: Gets compression info SDgetfilename: Retrieves name of file, given its ID SDgetnamelen: Retrieves length of object name, given its ID SZIP compression Now can be invoked by Fortran API Now available for raster images via GR interface SDS, Vgroup names no longer limited to 64 characters
9/17/2015The HDF Group12 New features and changes HDF configuration changes --enable-netcdf flag introduced Autotools versions updated Many bug fixes made to hrepack and hdiff See RELEASE.txt for a full list of changes
9/17/2015The HDF Group13 Platforms to drop/add next release Drop Windows XP with MSVC Linux 2.4 IRIX SunOS 5.8, 5.9 Add Windows 64-bit (32 and 64-bit binaries)
9/17/2015The HDF Group14 Platforms tested Systems AIX 5.3 (32-bit, 64-bit) Free BSD 6.2 (32-bit, 64-bit)* HP-UX B (32-bit, 64-bit)* IRIX 64 v6.5 (32-bit, 64-bit) Linux 2.4, 2.6* Linux ia64 Linux x86_64 Sun OS 5.8, 5.10* (32-bit, 64- bit) SunOS 5.10 on Intel Windows XP, Vista Mac OS X Intel* * New platforms For detailed info, see RELEASE.txt Compilers IBM C and Fortran compilers GNU gcc 3.4* and GNU Fortran HPUX C and Fortran compilers GNU gcc 3.4 and 4.* Intel C and Fortran versions 9.1 and SUN WorkShop C and Fortran Visual Studio.NET and 2005 and Intel Fortran Visual Studio 2005 (no fortran) GNU gcc with gfortran and g95
9/17/2015The HDF Group15 HDF5 Update
9/17/2015The HDF Group16 HDF
9/17/2015The HDF Group17 HDF release Primarily a bug-fix release Some tool changes (see later slide)
9/17/2015The HDF Group18 Platforms dropped Operating systems AIX 5.3 Solaris 2.8 and 2.9 OSF1 Windows XP with MSVC Compilers PGI 6.5-*
9/17/2015The HDF Group19 Platforms added Systems Alpha Open VMS MAC OSX 10.4 (Intel) Solaris 2.* on Intel Cray XT3 Windows 64-bit (32 and 64- bit) BG/L Compilers PGI V. 7.* Intel 10.* MPICH MPICH2
9/17/2015The HDF Group20 HDF5 1.8
9/17/2015The HDF Group21 HDF5 1.8 new library features Datatype and dataspace features Create datatype from text description Integer to float conversions during I/O Compact storage for N-bit datatypes Offset+size storage filter, saving space “Null” dataspace – datasets with no elements Data transformation filter
9/17/2015The HDF Group22 HDF5 1.8 – new library features Group improvements Creation order access Compact groups – small groups take less space Large group storage improvements Intermediate group creation Link improvements Unicode names allowed External links – to objects in another file User defined links – create own kinds of links
9/17/2015The HDF Group23 HDF5 1.8 – new library features Attribute improvements Improved storage for large number of attributes Iterate or look up by creation order Unicode names allowed Support for Unicode UTF-8 character set Shared header information, possibly saving space Metadata cache improvements – faster I/O on files with many objects Better UNIX/Linux portability
9/17/2015The HDF Group24 HDF5 1.8 – new APIs New extendible error-handling API New APIs to copy objects between files quickly Dimension scale model and API “HDFpacket” API, to read/write packets efficiently
9/17/2015The HDF Group25 HDF5 1.8 – Backward and Forward Compatibility
9/17/2015The HDF Group26 HDF5 1.8 and 1.6 Differences between 1.8 and 1.6.x Some file format changes Several new routines added Old APIs deprecated – may be removed in later release Consequences Applications requiring 1.8 format changes will generate objects that cannot be read by 1.6 library To exploit 1.8 changes, applications need to be rewritten
9/17/2015The HDF Group27 “The art of progress is to preserve order amid change, and to preserve change amid order.” Alfred North Whitehead
9/17/2015The HDF Group289/17/2015The HDF Group28 Principle of Maximum File Format Compatibility Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information. Assures older library versions are forward compatible whenever possible: Objects in new files can be read with old versions of the library, if the objects are “known” to the old libraries. New versions of the library can always read objects in files written with older versions.
9/17/2015The HDF Group29 HDF5 Forward Compatibility Format Can old libraries access files made by new library? Old library versions will read all objects in a file created by a newer library if objects are known to the old library API Can old applications link with the new library? Applications written to work with an older version of library will compile, link and run as expected with a newer version
9/17/2015The HDF Group30 HDF5 Backward Compatibility File Format Can new library access files made by old library? Newer version of the library will always read files created with an older version Library APIs Can new applications link with the older libraries? Application written for the newer version will compile and link with the older library unless new features are used
9/17/2015The HDF Group31 HDF5 Compatibility information Compatibility info: ADGuide/CompatFormat180.html (Note: URL will change when HDF5 1.8 officially released.)
9/17/2015The HDF Group329/17/2015The HDF Group32 Command Line Tools
9/17/2015The HDF Group339/17/2015The HDF Group33 New features for existing tools -V option for all tools Prints HDF5 library version number used by tool h5repack: -L option Use latest version of file format to create objects h5dump: dumps groups/attributes in creation or name order -q Q, --sort_by=Q Sort groups and attributes by index Q -z Z, --sort_order=Z Sort groups and attributes by order Z
9/17/2015The HDF Group349/17/2015The HDF Group34 New command line tools h5mkgrp Creates new groups and group hierarchies in an HDF5 file h5stat Provides statistics regarding the file, such as number of objects per group, sizes of datasets, amount of free space in file h5copy Copy object within a file or cross files h5check Verifies an HDF5 file against the defined HDF5 File Format Specification Completed for 1.6. In progress for 1.8
9/17/2015The HDF Group359/17/2015The HDF Group35 Tool work in the pipeline Export numeric data formatted in several different ways (such as MS excel, XML, etc) Import ASCII data that conforms to certain format Use a common text format for h5import and h5dump Support NaN in tools such as h5diff. Challenges: NaN is platform specific NaN can have different values for the same machine Checking NaN can be a performance hit
9/17/2015The HDF Group369/17/2015The HDF Group36 HDF Java Products
9/17/2015The HDF Group37 HDF5 Java is Growing UP
9/17/2015The HDF Group389/17/2015The HDF Group38 HDFView changes HDFView 2.4 released Many new features, such as Support for compound datatypes of 2D+ arrays Support for "filtering fill value" in Image Viewer Effective handling of large 3D images Support large fonts in GUI components New autogain algorithm for image Brightness/Contrast New platforms Mac intel Linux 64-bit AMD Solaris 64-bit
9/17/2015The HDF Group399/17/2015The HDF Group39 Other Java products 36 new enhancements and 44 bugs fixed Test suite (using junit testing framework) Tests all public methods in the object package Added “make check” to run the test suite Enhanced documentation All public methods in the object package are fully documented
9/17/2015The HDF Group409/17/2015The HDF Group40 Future work for Java Update HDF5 JNI APIs for HDF5 1.8 release Release HDFView with bug fixes/new features with HDF5 1.8 release Port HDF5-SRB model to HDF5-iRODS model Writing capability for HDF5-iRODS model
9/17/2015The HDF Group41 Other Activities of Interest
9/17/2015The HDF Group42 New THG Website
9/17/2015The HDF Group439/17/2015The HDF Group43 New THG Website
9/17/2015The HDF Group44 HDF Performance Framework
9/17/2015The HDF Group45 Goals A framework for performance regression testing A tool for Testing on multiple platforms Testing different versions Long term regression testing Assistance in debugging
9/17/2015The HDF Group46 A User’s Benchmark Performance Library Database Web Server cron www HDF5 1.6HDF5 1.8 PHP Graph/Text Solution
9/17/2015The HDF Group47 | | :51:14 | groups | creating empty groups | | hdfdap | | 4384 | for(i=0;i<1000 ;i++) { H5Gcreate(fileid,group_name,(size_t)0)); // Add groups } H5Perf_endTimer(&time); H5Perf_startTimer(&time); H5Perf_addInstance(db_host, date, time); * * * /home/local/hyoklee/src/chicago/test-perf-hdfdap-3.sh TimestampInstance Name VersionPlatformTime Sample Usage
9/17/2015The HDF Group489/17/2015The HDF Group48 Improved Crash Survivability in the HDF5 Library
9/17/2015The HDF Group499/17/2015The HDF Group49 Crash Survivability in HDF5 Problem: Data in HDF5 files susceptible to corruption in the event of an application or system crash. Corruption possible if structural metadata is being written when the crash occurs. Initial Objective: Guarantee an HDF5 file with consistent metadata can be reconstructed in the event of a crash. No guarantee on state of raw data – contains whatever made it to disk prior to crash.
9/17/2015The HDF Group509/17/2015The HDF Group50 Crash Survivability in HDF5 Approach: Metadata Journaling When a piece of metadata is modified and in a consistent state, make a journal note. If the application crashes, a recovery program can replay the journal by applying in order all metadata writes until the end of the last completed transaction written to the journal file.
9/17/2015The HDF Group51 Faster HDF5 Data Appends
9/17/2015The HDF Group529/17/2015The HDF Group52 Fast Data Appends Problem: Metadata operations limit the rate at which HDF5 can append data to datasets. Solution: new data structure for indexing chunks: Allows constant time extend, shrink and lookup of chunks in datasets with single unlimited dimension # of metadata I/O operations to append to dataset is independent of # of chunks Allows single-writer/multiple-reader access Details at: ChunkIndex/SkipListChunkIndex.html
9/17/2015The HDF Group53 netCDF-4
9/17/2015The HDF Group54 netCDF-4 Project Enhanced NetCDF-4 Interface to HDF5 Combine features of netCDF and HDF5 Take advantage of their separate strengths Collaboration between NCSA, THG, Unidata Currently in beta release Will be released after HDF5 1.8
9/17/2015The HDF Group55 NetCDF-4 Architecture HDF5 Library netCDF-4Library netCDF-3 Interface netCDF-3 applications netCDF-3 applicationsnetCDF-4applicationsnetCDF-4applications HDF5 applications HDF5 applications netCDF files netCDF files netCDF-4 HDF5 files HDF5 files Supports access to netCDF files and HDF5 files created through netCDF-4 interface
9/17/2015The HDF Group569/17/2015The HDF Group56 HDF5 OPeNDAP Project
9/17/2015The HDF Group579/17/2015The HDF Group57 Project description Investigate integrated DAP-aware HDF5 library that can provide seamless access to both local and remote data A NASA ROSES NRA project See Kent Yang’s talk and poster
9/17/2015The HDF Group58 NOAA – Science Data Stewardship
9/17/2015The HDF Group599/17/2015The HDF Group59 NOAA – Science Data Stewardship Use HDF5 Archival Information Package (AIP) to archive HDF EOS2 data A collaboration between NSIDC and THG See Ruth Duerr and Kent Yang’s poster
9/17/2015The HDF Group609/17/2015The HDF Group60 HDF5 and.NET Framework
9/17/2015The HDF Group619/17/2015The HDF Group61 Why.NET? The Microsoft.NET framework is used by most new applications created for Windows. Makes it easier to develop applications Reduces application vulnerability to security threats Supports development in multiple programming languages, in particular C#. Increased level of interest in.NET from users of HDF5.
9/17/2015The HDF Group629/17/2015The HDF Group62 HDF and.NET Status Received funding to implement prototype.NET wrapper API for Windows XP Based on HDF5 C API Focus on C# binding Functionality limited to subset of API routines If funded, we would like to move beyond the prototype to Create.NET wrappers for all HDF C functions Offer full support for.NET wrappers with HDF5 1.8
9/17/2015The HDF Group63 Bioinformatics caacaagccaaaactcgtacaaCgagatatctcttggaaaaactgctcacaatattgacgtacaaggttgttcatgaaactttcggtaAcaatcgttgacattgcgacctaatacagcccagcaagcagaat Managing genomic data
9/17/2015The HDF Group64 Electron tomography 25-80Å resolution 4k x 4k x 500 images now 8k x 8k x 1k images soon (256 GB)
9/17/2015The HDF Group65 Next Generation DNA Sequencing Next Gen Sequencing platforms produce ~1500 X more data than CE (Sanger) A single Next Gen instrument can produce 20 times more data a single run than a day’s operation of a genome center with 100 CE instruments
9/17/2015The HDF Group66 An on Sept 21… 40K x 1MEach of the cells in this matrix has ~10 numerical statistics “… A little background, we're doing genetic association studies, these result in large 2-d matrices (40K x 1M before applying threshholds). Each of the cells in this matrix has ~10 numerical statistics (e.g. some sort of pvalue)… ” 40K x 1M x 10 x 4 = 1,600,000,000,000 (1.6 TB)
9/17/2015The HDF Group67 Product Data STEP
9/17/2015The HDF Group68 Product data HDF5 proposed to ISO as binary representation for product data representation and exchange Would be a binary option to the STEP format ISO/NWI-CD , STEP Part 26
9/17/2015The HDF Group69 SQL Server and HDF5
9/17/2015The HDF Group709/17/2015The HDF Group70 SQL Server and HDF5 THG discussing possible project with Microsoft Microsoft envisions a dream environment for scientists that would encompass both computing and data management Possible SQL Server solution Combine RDBMS and scientific analysis tools in a single integrated system Use HDF5 to manage scientific objects not handled well by traditional database
9/17/2015The HDF Group71 HDF5 in SQL server Entity Framework (EDM, eSQL, O-R mapping) HDF5 EDM model Visualization Libraries (MATLAB,…) HDF5 files Web Services (XML, REST, RSS) OLAP and Data Mining Reporting HDF5 type HDF5 Index HDF5 FS blob HDF5 TVFs.NET Languages with Language Integrated Query SQL Server
9/17/2015The HDF Group72 Thank You All and Thank You NASA!
9/17/2015The HDF Group73 Acknowledgement This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration.
9/17/2015The HDF Group74 Questions/comments?
9/17/2015The HDF Group75 Information Sources HDF website HDF5 Information Center HDF Helpdesk HDF users mailing list coming soon: