1 of 14 Substituting HDF5 tools with Python/H5py scripts Daniel Kahn Science Systems and Applications Inc. HDF HDF-EOS Workshop XIV, 28 Sep. 2010.

Slides:



Advertisements
Similar presentations
Guy Griffiths. General purpose interpreted programming language Widely used by scientists and programmers of all stripes Supported by many 3 rd -party.
Advertisements

Programming Paradigms and languages
CSE341: Programming Languages Lecture 16 Datatype-Style Programming With Lists or Structs Dan Grossman Winter 2013.
The NCAR Command Language (NCL) and the NetCDF Data Format Research Tools Presentation Matthew Janiga 10/30/2012.
11/6/07HDF and HDF-EOS Workshop XI, Landover, MD1 Introduction to HDF5 HDF and HDF-EOS Workshop XI November 6-8, 2007.
Scripting Languages CS351 – Programming Paradigms.
Russell Taylor Lecturer in Computing & Business Studies.
The HDF Group Introduction to HDF5 Barbara Jones The HDF Group The 13 th HDF & HDF-EOS Workshop November 3-5, HDF/HDF-EOS Workshop.
1 Outline 7.1 Introduction 7.2 Implementing a Time Abstract Data Type with a Class 7.3 Special Attributes 7.4Controlling Access to Attributes 7.4.1Get.
V Avon High School Tech Club Agenda Old Business –Delete Files New Business –Week 18 Topics: Intro to HTML/CSS: Questions? Summer Work Letter.
© 2008The MathWorks, Inc. ® ® The MATLAB Low-Level HDF5 Interface John Evans.
Euratom – ENEA Association Commonalities and differences between MDSplus and HDF5 data systems G. Manduchi Consorzio RFX, Euratom-ENEA Association, corso.
HDF5 Tools Update Peter Cao - The HDF Group November 6, 2007 This report is based upon work supported in part by a Cooperative Agreement.
The HDF Group April 17-19, 2012HDF/HDF-EOS Workshop XV1 Introduction to HDF5 Barbara Jones The HDF Group The 15 th HDF and HDF-EOS Workshop.
PHP TUTORIAL. HISTORY OF PHP  PHP as it's known today is actually the successor to a product named PHP/FI.  Created in 1994 by Rasmus Lerdorf, the very.
SPACE TELESCOPE SCIENCE INSTITUTE Operated for NASA by AURA COS Pipeline Language(s) We plan to develop CALCOS using Python and C Another programming language?
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop VIII October 26, 2004.
Database structure for the European Integrated Tokamak Modelling Task Force F. Imbeaux On behalf of the Data Coordination Project.
The future of MINC Robert D. Vincent
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
The HDF Group HDF5 Tools Updates Peter Cao, The HDF Group September 28-30, 20101HDF and HDF-EOS Workshop XIV.
HDF Dimension Scales in HDF5 HDF-EOS Workshop IX San Francisco, CA November 30 - December 2, 2005 Pedro Vicente Nunes THG/NCSA Champaign-Urbana, IL HDF.
The HDF Group October 28, 2010NetcDF Workshop1 Introduction to HDF5 Quincey Koziol The HDF Group Unidata netCDF Workshop October 28-29,
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
Term 2, 2011 Week 1. CONTENTS Problem-solving methodology Programming and scripting languages – Programming languages Programming languages – Scripting.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
Perl Language Yize Chen CS354. History Perl was designed by Larry Wall in 1987 as a text processing language Perl has revised several times and becomes.
C++ Programming Basic Learning Prepared By The Smartpath Information systems
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop IX November 30, 2005.
HDF5.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Announcements Assignment 1 due Wednesday at 11:59PM Quiz 1 on Thursday 1.
The HDF Group 10/17/151 Introduction to HDF5 ICALEPCS 2015.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
1 Introduction to HDF5 Programming and Tools Boeing September 19, 2006.
PYTHON FOR HIGH PERFORMANCE COMPUTING. OUTLINE  Compiling for performance  Native ways for performance  Generator  Examples.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 next week. See next slide. Both versions of assignment 3 are posted. Due today.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
HDF/HDF-EOS Meeting Oct th 2008, Aurora CO Proposal for adding Named Dimensions to HDF5 Arrays Daniel Kahn Science Systems and Applications, Inc.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/9/2006 Lecture 6 – String Processing.
HDF and HDF-EOS Workshop XII
CSE 374 Programming Concepts & Tools
CST 1101 Problem Solving Using Computers
CSC391/691 Intro to OpenCV Dr. Rongzhong Li Fall 2016
Moving from HDF4 to HDF5/netCDF-4
Type Checking Generalizes the concept of operands and operators to include subprograms and assignments Type checking is the activity of ensuring that the.
Chapter 6: Data Types Lectures # 10.
Containers and Lists CIS 40 – Introduction to Programming in Python
GC101 Introduction to computers and programs
Introduction Python is an interpreted, object-oriented and high-level programming language, which is different from a compiled one like C/C++/Java. Its.
Tad Scheiblich RSI December 2, 2005
CSE341: Programming Languages Lecture 16 Datatype-Style Programming With Lists or Structs Dan Grossman Spring 2013.
CSE341: Programming Languages Lecture 16 Datatype-Style Programming With Lists or Structs Zach Tatlock Winter 2018.
CSE341: Programming Languages Lecture 16 Datatype-Style Programming With Lists or Structs Dan Grossman Spring 2017.
Introduction to MATLAB
Introduction to Python
HDF5 Virtual Dataset Elena Pourmal Copyright 2017, The HDF Group.
Access HDF5 Datasets via OPeNDAP’s Data Access Protocol (DAP)
CSE341: Programming Languages Lecture 16 Datatype-Style Programming With Lists or Structs Dan Grossman Autumn 2018.
Introduction to HDF5 Mike McGreevy The HDF Group
Moving applications to HDF
Programming Languages 2nd edition Tucker and Noonan
CSE341: Programming Languages Lecture 16 Datatype-Style Programming With Lists or Structs Dan Grossman Autumn 2017.
Eagle: Maturation and Evolution
CSE341: Programming Languages Lecture 16 Datatype-Style Programming With Lists or Structs Dan Grossman Spring 2016.
CISC101 Reminders Assignment 3 due today.
CSE341: Programming Languages Lecture 16 Datatype-Style Programming With Lists or Structs Dan Grossman Spring 2019.
Presentation transcript:

1 of 14 Substituting HDF5 tools with Python/H5py scripts Daniel Kahn Science Systems and Applications Inc. HDF HDF-EOS Workshop XIV, 28 Sep. 2010

2 of 14 What are HDF5 tools? HDF5 tools are command line programs distributed with the HDF5 library. They allow users to manipulate HDF5 files. h5dump: dump HDF5 data as ASCII text. h5import: convert non-HDF5 data to HDF5 h5diff: show differences between HDF5 files. h5copy: Copy objects between HDF5 files. h5repack: Copy entire file while changing storage properties of HDF5 objects. h5edit: (proposed) add attributes to HDF5 objects. HDF5 tools have a long history as the first (and for a long time only) way to manipulate HDF5 files conveniently. I.e. without writing a C or Java program, or without buying expensive commercial software such as IDL or Matlab.

3 of 14 Text Processing—Evaluate command arguments, process input text files, match group names. Tree Walking – Search HDF5 file hierarchy for objects by name. Object Level Operations – Operate on the objects: copy, diff, repack, etc. The tools can be characterized as having three parts: The tools are simple to use and convenient as they are distributed with the HDF5 library.

4 of 14 Disadvantage of HDF5 tools: The command line arguments limit tool capability. Development time for designing and implementing new features is long (weeks...months). Use cases must be evaluated, a solution proposed in an RFC, the proposal must be implemented, new code is distributed in next release. Adding new features with command line syntax which is both readable and does not break the legacy syntax becomes difficult.

5 of 14 h5copy -v -i "test1.h5" -o "test1.out.h5" -s "/array" -d "/array Here's an example from HDF documentation: But suppose we had multiple datasets named arrayNNN where N is 0–9. We'd like to write something like: h5copy -v -i "test1.h5" -o "test1.out.h5" -s "/array\d+{3}” So that \d+{3} would provide a match to all such objects. Extending the tool syntax to meet this use case, and then again for the next use case would be a never ending game of catch up. A more flexible substitute is desirable...

6 of 14...Python?

7 of 14 What is Python? Python is a programming language. Unlike Perl, it supports native floating point numbers. It has scientific array support in the style of IDL or Matlab (numpy module). Array operations can be programmed using normal arithmetic operators. It has access to the HDF5 library (Anderw Collette's h5py module). Python is currently the only programming language in wide spread use to have all these features. They are essential to the success of the language for easy HDF5 file manipulation. It features dynamic binding of variables, like Perl or shell scripts, IDL, Matlab, but not C or Fortran.

8 of 14 Real world Experience: Learning Python and h5py is quick. In the summer of 2010 SSAI hired a summer intern. Equipped with some Perl programming experience the intern was able to come up to speed on Python, HDF5, h5py, and numpy within one to two weeks and, over the summer, develop a specialized file/dataset merging tool and a dataset conversion tool. Python and h5py are the best way to introduce HDF5 because it allows the user to concentrate on the H in HDF5, rather than the C API syntax.

9 of 14 Python is well suited to HDF5 because the HDF5 array objects carry the dimensionality, extent, and element data type information, just as HDF5 datasets do. The object oriented nature of Python allows these objects to be manipulated at a high level. C, by contrast, lacks a scientific array object and the ability to define object methods. Python is well suited to HDF5

10 of 14 Example: Creating and Writing a Dataset to a New File Compare to C version: import h5py import numpy TestData = numpy.array(range(1,25),dtype='int32').reshape(4,6) h5py.File("WrittenByH5PY.h5","w")['/TestDataset'] = TestData #include "hdf5.h" int main() { hid_t file_id, dataspace_id, dataset_id; /* identifiers */ herr_t status; hsize_t dims[2]; const int FirstIndex = 4, SecondIndex = 6; int i, j, dset_data[4][6]; for (i = 0; i < 4; i++) /* Initialize the dataset. */ for (j = 0; j < 6; j++) dset_data[i][j] = i * 6 + j + 1; dims[0] = FirstIndex; dims[1] = SecondIndex; file_id = H5Fcreate("WrittenByC.h5", H5F_ACC_TRUNC, H5P_DEFAULT,H5P_DEFAULT); /* Open an existing file. */ dataspace_id = H5Screate_simple(2, dims, NULL); dataset_id = H5Dcreate(file_id, "/TestDataset", H5T_STD_I32LE, dataspace_id, H5P_DEFAULT,H5P_DEFAULT,H5P_DEFAULT); /* Write the dataset. */ status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); status = H5Dclose(dataset_id); /* Close the dataset. */ status = H5Fclose(file_id); /* Close the file. */ } Python:

11 of 14 h5dump WrittenByH5PY.h5 HDF5 "WrittenByH5PY.h5" { GROUP "/" { DATASET "TestDataset" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0,0): 1, 2, 3, 4, 5, 6, (1,0): 7, 8, 9, 10, 11, 12, (2,0): 13, 14, 15, 16, 17, 18, (3,0): 19, 20, 21, 22, 23, 24 } And here's the output:

12 of 14 Python is well suited to Text Processing Python has wide range of string manipulation functions, an easy-to- use regular expression module, and list and dictionary (hash table) objects. No segmentation faults! Python is well suited to Tree Walking. Recursive functions and loops over lists are easy to write Object Level Operations (e.g. copy, diff) are challenging to write efficiently and should be provided as part of the API by the HDF Group, for example h5o_copy. API functions are available to the Python programmer via h5py. Object Level Operations...Not so much. Python and the Three Pillars of HDF5 Tools

13 of 14 Why use Python to substitute HDF5 tools? Python is available now. Further Resources: Python is a full programming language. It can accomplish tasks which HDF5 tools cannot. Some HDF5 tools are still under development as new use cases are presented. For example, users have requested a tool to add attributes to HDF5 files. Such a capability already exists with h5py: python -c "import h5py ; fid = h5py.File('FileForAttributeAddition.h5','r+') ; fid['/TestDataset'].attrs['CmdLine1'] = 'NewValue' ; fid.close()" It's little ugly, but it is available today.

14 of 14 Recommendations: The HDF Group should avoid complex enhancements to tools where Python/h5py could be used instead. The HDF Group should concentrate on providing efficient API functions for object level tasks: object copy, dataset difference, etc. Users should consider Python and H5py to accomplish their HDF5 file manipulation projects. An easily searched contributed application repository on the HDF Group website with user ratings would be very helpful.