1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
1 Projection Indexes in HDF5 Rishi Rakesh Sinha The HDF Group.
C ROSS D ISCIPLINARY A PPLICATIONS OF M ULTIPLEX O BSERVATIONAL AND C OMPUTATIONAL D ATASETS USING FOR A RCHIVING AND H IGH P ERFORMANCE P ROCESSING. Marcel.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Special collections and digital libraries: a new role for consortia? Dale Flecker Harvard University Library.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Bioinformatics caacaagccaaaactcgtacaaCgagatatctcttggaaaaactgctcacaatattgacgtacaaggttgttcatgaaactttcggtaAcaatcgttgacattgcgacctaatacagcccagcaagcagaat Managing.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
BUSINESS DRIVEN TECHNOLOGY
University of Illinois at Urbana-ChampaignHDF Mike Folk HDF-EOS Workshop IV Sept , 2000 HDF Update HDF.
University of Illinois at Urbana-ChampaignHDF 1McGrath/Yang 2/27/02 Transitioning from HDF4 to HDF5 Robert E. McGrath Kent Yang.
Platform as a Service (PaaS)
Bioinformatics Core Facility Ernesto Lowy February 2012.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
NetCDF-4 The Marriage of Two Data Formats Ed Hartnett, Unidata June, 2004.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
ENEON first workshop Observing Europe: Networking the Earth Observation Networks in Europe September, Paris Summary on data availability, sharing,
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
April 28, 2008LCI Tutorial1 HDF5 Tutorial LCI April 28, 2008.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
MCSE Guide to Microsoft Exchange Server 2003 Administration Chapter Two Installing and Configuring Exchange Server 2003.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Internet tool to find answers to poorly defined questions SmartNet © ITC Software,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
August 2003 At A Glance VMOC-CE is an application framework that facilitates real- time, remote cooperative work among geographically dispersed mission.
Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
A High performance I/O Module: the HDF5 WRF I/O module Muqun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
- 1 - HDF5, HDF-EOS and Geospatial Data Archives HDF and HDF-EOS Workshop VII September 24, 2003.
HDF Hierarchical Data Format Nancy Yeager Mike Folk NCSA University of Illinois at Urbana-Champaign, USA
HDF5.
Greg D. Glassman Java User’s Group : Tokyo’s Blend.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
HDF and HDF-EOS Workshop VII September 24, 2003 HDF5, HDF-EOS and Geospatial Data Archives Don Keefer Illinois State Geological Survey Mike Folk Univ.
1 Data Management with HDF5 Quincey Koziol Director of Core Software Development and HPC The HDF Group September 10, 2012NASA Digital.
CF 2.0 Coming Soon? (Climate and Forecast Conventions for netCDF) Ethan Davis ESO Developing Standards - ESIP Summer Mtg 14 July 2015.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
Development of a CF Conventions API Russ Rew GO-ESSP Workshop, LLNL
1 Chapter The Impact of Database Customer centric approach - A highly personal approach Marketing databases are essential to the marketing process.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Other Projects Relevant (and Not So Relevant) to the SODA Ideal: NetCDF, HDF, OLE/COM/DCOM, OpenDoc, Zope Sheila Denn INLS April 16, 2001.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Platform as a Service (PaaS)
NASA HDF and HDF-EOS Status Use in EOSDIS
HDF Experiences with I/O Bottlenecks
Platform as a Service (PaaS)
The importance of being Connected
HDF5 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.
Plans for an Enhanced NetCDF-4 Interface to HDF5 Data
Joseph JaJa, Mike Smorul, and Sangchul Song
Microsoft Azure Platform Powers New Elements Constellation Software Suite to Deliver Invaluable Insights From Your Data for Marketing and Sales MICROSOFT.
SDM workshop Strawman report History and Progress and Goal.
CloneManager® Helps Users Harness the Power of Microsoft Azure to Clone and Migrate Systems into the Cloud Cost-Effectively and Securely MICROSOFT AZURE.
MANAGING DATA RESOURCES
Presentation transcript:

1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006

2 Topics What is HDF? Sample uses of HDF THG the Company

3 What is HDF?

4 Answering big questions … Matter & the universe August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) Weather and climate Life and nature

5 involves big data …

6 varied data… caacaagccaaaactcgtacaaatatgaccgcacttcgctataaagaacacggcttgtgg cgagatatctcttggaaaaactttcaagagcaactcaatcaactttctcgagcattgctt gctcacaatattgacgtacaagataaaatcgccatttttgcccataatatggaacgttgg gttgttcatgaaactttcggtatcaaagatggtttaatgaccactgttcacgcaacgact acaatcgttgacattgcgaccttacaaattcgagcaatcacagtgcctatttacgcaacc aatacagcccagcaagcagaatttatcctaaatcacgccgatgtaaaaattctcttcgtc ggcgatcaagagcaatacgatcaaacattggaaattgctcatcattgtccaaaattacaa aaaattgtagcaatgaaatccaccattcaattacaacaagatcctctttcttgcacttgg

7 Contig Summaries Discrepancies Contig Qualities Coverage Depth and complex relationships… Read quality Aligned bases Contig Reads Percent match SNP Score Trace

8 on big computers…

9 and on little computers.

10 How do we… Describe the data? Read it? Store it? Find it? Share it? Mine it? Move it into, out of, and between computers and repositories

11 HDF is A file format for managing any kind of data Software to store and access data in the format Suited especially to large or complex data collections Suited for every size of system Platform independent – runs almost anywhere Open – both file formats and software

12 HDF solution I/O software & tools Common Data models Standard APIs Scientific data file format Efficient storage, I/O

13 An HDF file is a container… lat | lon | temp ----|-----| | 23 | | 24 | | 21 | 3.6 palette …into which you can put your data objects.

14 HDF structures for organizing objects in filespalette Raster image 3-D array 2-D array Raster image lat | lon | temp ----|-----| | 23 | | 23 | | 24 | | 24 | | 21 | | 21 | 3.6Table “/” (root) “/foo”

15 HDF5 data model HDF5 file – container for data objects Primary Objects Groups Datasets Additional ways to organize data Attributes Sharable objects Storage and access properties Everything else is built from these parts.

16 Mesh Example, in HDFView

17 HDF5 Software Tools & Applications HDF File HDF I/O Library

18 Goals of HDF5 Library Flexible API to support a wide range of operations on data High performance access in serial and parallel computing environments Compatibility with common data models and programming languages

19 Features Ability to create complex data structures Complex subsetting Efficient storage Flexible I/O (parallel, remote, etc.) Ability to transform data during I/O Support for key language models OO compatible C & Fortran primarily Also Java, C++

20 Sample uses of HDF

21 1. NASA Earth Observing System (EOS) Aqua (6/01) Aura TESHRDLS MLSOMI Terra CERESMISR MODISMOPITT Aqua CERES MODIS AMSR

22 2. Advanced Simulation & Computing (ASC) Question: How do we maintain a nuclear stockpile in the absence of testing?

23 Answer: Very large simulations on very large computers

24 ASC Data requirements Large datasets (> a terabyte) Good I/O performance on massive parallel systems Complex data and extensive metadata

25

26 3. Bioinformatics -- Managing genomic data caacaagccaaaactcgtacaaCgagatatctcttggaaaaactgctcacaatattgacgtacaaggttgttcatgaaactttcggtaAcaatcgttgacattgcgacctaatacagcccagcaagcagaat

27 DNA sequencing workflows Diverse formats Highly redundant data Repeated file processing Disconnected programs Non-scalable storage Lack of persistence

28 Multiple levels and relationships Contig Summaries Discrepancies Contig Qualities Coverage Depth Read quality Aligned bases Contig Reads Percent match SNP Score Trace

29 HDF5 as binary format for bioinformatics

30 4. Flight test data --

31 3. Boeing flight test

32 Flight test data requirements Fast data acquisition from 1000s of sources Wide variety of data types Active archive Standardization for data/software exchange Special features

33 Summary of Reasons to use HDF5

34 … data access includes random access parallel I/O fast access partialI/O … data is large made of many objects complex hetero- geneous esoteric low cost a standard format special API available tools … or if you need …computing, networking, data environments require special platforms multiple platforms Port- ability efficient storage Reasons to use HDF5 …

35 THG the Company

36 What is the HDF Group? 18 years at National Center for Supercomputing Center (NCSA) at University of Illinois Recent spin-off U of I Non-profit 501(c)(3) 17 scientific, technology, and professional staff 5 students 2+million product users world-wide Cross industry sectors and disciplines

37 THG mission To support the vast community of HDF users and to ensure the sustainable development of HDF technologies and the ongoing accessibility of HDF-stored data.

38 Business model Non-profit: mission driven Intellectual property: U of I plans to assign ownership to THG The HDF formats will remain free, and HDF software will remain open source. Continue close ties to U of I and NCSA.

39 Income-generating activities Major client support Targeted HDF development Grant-supported R&D Consulting

40 Thank you

41 HDF Information HDF Information Center HDF Help address HDF users mailing list

42 Sustainability Maintain big clients Find and enlist new big clients Broaden base of applications Build “net assets” through marketing, cost structure

43 Cost model includes Cost of the work F&A overhead R&D tax Sustainability tax Other options?