Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

Similar presentations


Presentation on theme: "1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006."— Presentation transcript:

1 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006

2 2 Topics What is HDF? Sample uses of HDF THG the Company

3 3 What is HDF?

4 4 Answering big questions … Matter & the universe August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) 60 385 610 Weather and climate Life and nature

5 5 involves big data …

6 6 varied data… caacaagccaaaactcgtacaaatatgaccgcacttcgctataaagaacacggcttgtgg cgagatatctcttggaaaaactttcaagagcaactcaatcaactttctcgagcattgctt gctcacaatattgacgtacaagataaaatcgccatttttgcccataatatggaacgttgg gttgttcatgaaactttcggtatcaaagatggtttaatgaccactgttcacgcaacgact acaatcgttgacattgcgaccttacaaattcgagcaatcacagtgcctatttacgcaacc aatacagcccagcaagcagaatttatcctaaatcacgccgatgtaaaaattctcttcgtc ggcgatcaagagcaatacgatcaaacattggaaattgctcatcattgtccaaaattacaa aaaattgtagcaatgaaatccaccattcaattacaacaagatcctctttcttgcacttgg

7 7 Contig Summaries Discrepancies Contig Qualities Coverage Depth and complex relationships… Read quality Aligned bases Contig Reads Percent match SNP Score Trace

8 8 on big computers…

9 9 and on little computers.

10 10 How do we… Describe the data? Read it? Store it? Find it? Share it? Mine it? Move it into, out of, and between computers and repositories

11 11 HDF is A file format for managing any kind of data Software to store and access data in the format Suited especially to large or complex data collections Suited for every size of system Platform independent – runs almost anywhere Open – both file formats and software

12 12 HDF solution I/O software & tools Common Data models Standard APIs Scientific data file format Efficient storage, I/O

13 13 An HDF file is a container… lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 palette …into which you can put your data objects.

14 14 HDF structures for organizing objects in filespalette Raster image 3-D array 2-D array Raster image lat | lon | temp ----|-----|----- 12 | 23 | 3.1 12 | 23 | 3.1 15 | 24 | 4.2 15 | 24 | 4.2 17 | 21 | 3.6 17 | 21 | 3.6Table “/” (root) “/foo”

15 15 HDF5 data model HDF5 file – container for data objects Primary Objects Groups Datasets Additional ways to organize data Attributes Sharable objects Storage and access properties Everything else is built from these parts.

16 16 Mesh Example, in HDFView

17 17 HDF5 Software Tools & Applications HDF File HDF I/O Library

18 18 Goals of HDF5 Library Flexible API to support a wide range of operations on data High performance access in serial and parallel computing environments Compatibility with common data models and programming languages

19 19 Features Ability to create complex data structures Complex subsetting Efficient storage Flexible I/O (parallel, remote, etc.) Ability to transform data during I/O Support for key language models OO compatible C & Fortran primarily Also Java, C++

20 20 Sample uses of HDF

21 21 1. NASA Earth Observing System (EOS) Aqua (6/01) Aura TESHRDLS MLSOMI Terra CERESMISR MODISMOPITT Aqua CERES MODIS AMSR

22 22 2. Advanced Simulation & Computing (ASC) Question: How do we maintain a nuclear stockpile in the absence of testing?

23 23 Answer: Very large simulations on very large computers

24 24 ASC Data requirements Large datasets (> a terabyte) Good I/O performance on massive parallel systems Complex data and extensive metadata

25 25

26 26 3. Bioinformatics -- Managing genomic data caacaagccaaaactcgtacaaCgagatatctcttggaaaaactgctcacaatattgacgtacaaggttgttcatgaaactttcggtaAcaatcgttgacattgcgacctaatacagcccagcaagcagaat

27 27 DNA sequencing workflows Diverse formats Highly redundant data Repeated file processing Disconnected programs Non-scalable storage Lack of persistence

28 28 Multiple levels and relationships Contig Summaries Discrepancies Contig Qualities Coverage Depth Read quality Aligned bases Contig Reads Percent match SNP Score Trace

29 29 HDF5 as binary format for bioinformatics

30 30 4. Flight test data --

31 31 3. Boeing flight test

32 32 Flight test data requirements Fast data acquisition from 1000s of sources Wide variety of data types Active archive Standardization for data/software exchange Special features

33 33 Summary of Reasons to use HDF5

34 34 … data access includes random access parallel I/O fast access partialI/O … data is large made of many objects complex hetero- geneous esoteric low cost a standard format special API available tools … or if you need …computing, networking, data environments require special platforms multiple platforms Port- ability efficient storage Reasons to use HDF5 …

35 35 THG the Company

36 36 What is the HDF Group? 18 years at National Center for Supercomputing Center (NCSA) at University of Illinois Recent spin-off U of I Non-profit 501(c)(3) 17 scientific, technology, and professional staff 5 students 2+million product users world-wide Cross industry sectors and disciplines

37 37 THG mission To support the vast community of HDF users and to ensure the sustainable development of HDF technologies and the ongoing accessibility of HDF-stored data.

38 38 Business model Non-profit: mission driven Intellectual property: U of I plans to assign ownership to THG The HDF formats will remain free, and HDF software will remain open source. Continue close ties to U of I and NCSA.

39 39 Income-generating activities Major client support Targeted HDF development Grant-supported R&D Consulting

40 40 Thank you

41 41 HDF Information HDF Information Center http://hdfgroup.org/ HDF Help email address hdfhelp@hdfgroup.org/ HDF users mailing list hdfnews@hdfgroup.org/

42 42 Sustainability Maintain big clients Find and enlist new big clients Broaden base of applications Build “net assets” through marketing, cost structure

43 43 Cost model includes Cost of the work F&A overhead R&D tax Sustainability tax Other options?


Download ppt "1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006."

Similar presentations


Ads by Google