Download presentation
Presentation is loading. Please wait.
Published byMadlyn Tate Modified over 9 years ago
1
1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006
2
2 Topics What is HDF? Sample uses of HDF THG the Company
3
3 What is HDF?
4
4 Answering big questions … Matter & the universe August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) 60 385 610 Weather and climate Life and nature
5
5 involves big data …
6
6 varied data… caacaagccaaaactcgtacaaatatgaccgcacttcgctataaagaacacggcttgtgg cgagatatctcttggaaaaactttcaagagcaactcaatcaactttctcgagcattgctt gctcacaatattgacgtacaagataaaatcgccatttttgcccataatatggaacgttgg gttgttcatgaaactttcggtatcaaagatggtttaatgaccactgttcacgcaacgact acaatcgttgacattgcgaccttacaaattcgagcaatcacagtgcctatttacgcaacc aatacagcccagcaagcagaatttatcctaaatcacgccgatgtaaaaattctcttcgtc ggcgatcaagagcaatacgatcaaacattggaaattgctcatcattgtccaaaattacaa aaaattgtagcaatgaaatccaccattcaattacaacaagatcctctttcttgcacttgg
7
7 Contig Summaries Discrepancies Contig Qualities Coverage Depth and complex relationships… Read quality Aligned bases Contig Reads Percent match SNP Score Trace
8
8 on big computers…
9
9 and on little computers.
10
10 How do we… Describe the data? Read it? Store it? Find it? Share it? Mine it? Move it into, out of, and between computers and repositories
11
11 HDF is A file format for managing any kind of data Software to store and access data in the format Suited especially to large or complex data collections Suited for every size of system Platform independent – runs almost anywhere Open – both file formats and software
12
12 HDF solution I/O software & tools Common Data models Standard APIs Scientific data file format Efficient storage, I/O
13
13 An HDF file is a container… lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 palette …into which you can put your data objects.
14
14 HDF structures for organizing objects in filespalette Raster image 3-D array 2-D array Raster image lat | lon | temp ----|-----|----- 12 | 23 | 3.1 12 | 23 | 3.1 15 | 24 | 4.2 15 | 24 | 4.2 17 | 21 | 3.6 17 | 21 | 3.6Table “/” (root) “/foo”
15
15 HDF5 data model HDF5 file – container for data objects Primary Objects Groups Datasets Additional ways to organize data Attributes Sharable objects Storage and access properties Everything else is built from these parts.
16
16 Mesh Example, in HDFView
17
17 HDF5 Software Tools & Applications HDF File HDF I/O Library
18
18 Goals of HDF5 Library Flexible API to support a wide range of operations on data High performance access in serial and parallel computing environments Compatibility with common data models and programming languages
19
19 Features Ability to create complex data structures Complex subsetting Efficient storage Flexible I/O (parallel, remote, etc.) Ability to transform data during I/O Support for key language models OO compatible C & Fortran primarily Also Java, C++
20
20 Sample uses of HDF
21
21 1. NASA Earth Observing System (EOS) Aqua (6/01) Aura TESHRDLS MLSOMI Terra CERESMISR MODISMOPITT Aqua CERES MODIS AMSR
22
22 2. Advanced Simulation & Computing (ASC) Question: How do we maintain a nuclear stockpile in the absence of testing?
23
23 Answer: Very large simulations on very large computers
24
24 ASC Data requirements Large datasets (> a terabyte) Good I/O performance on massive parallel systems Complex data and extensive metadata
25
25
26
26 3. Bioinformatics -- Managing genomic data caacaagccaaaactcgtacaaCgagatatctcttggaaaaactgctcacaatattgacgtacaaggttgttcatgaaactttcggtaAcaatcgttgacattgcgacctaatacagcccagcaagcagaat
27
27 DNA sequencing workflows Diverse formats Highly redundant data Repeated file processing Disconnected programs Non-scalable storage Lack of persistence
28
28 Multiple levels and relationships Contig Summaries Discrepancies Contig Qualities Coverage Depth Read quality Aligned bases Contig Reads Percent match SNP Score Trace
29
29 HDF5 as binary format for bioinformatics
30
30 4. Flight test data --
31
31 3. Boeing flight test
32
32 Flight test data requirements Fast data acquisition from 1000s of sources Wide variety of data types Active archive Standardization for data/software exchange Special features
33
33 Summary of Reasons to use HDF5
34
34 … data access includes random access parallel I/O fast access partialI/O … data is large made of many objects complex hetero- geneous esoteric low cost a standard format special API available tools … or if you need …computing, networking, data environments require special platforms multiple platforms Port- ability efficient storage Reasons to use HDF5 …
35
35 THG the Company
36
36 What is the HDF Group? 18 years at National Center for Supercomputing Center (NCSA) at University of Illinois Recent spin-off U of I Non-profit 501(c)(3) 17 scientific, technology, and professional staff 5 students 2+million product users world-wide Cross industry sectors and disciplines
37
37 THG mission To support the vast community of HDF users and to ensure the sustainable development of HDF technologies and the ongoing accessibility of HDF-stored data.
38
38 Business model Non-profit: mission driven Intellectual property: U of I plans to assign ownership to THG The HDF formats will remain free, and HDF software will remain open source. Continue close ties to U of I and NCSA.
39
39 Income-generating activities Major client support Targeted HDF development Grant-supported R&D Consulting
40
40 Thank you
41
41 HDF Information HDF Information Center http://hdfgroup.org/ HDF Help email address hdfhelp@hdfgroup.org/ HDF users mailing list hdfnews@hdfgroup.org/
42
42 Sustainability Maintain big clients Find and enlist new big clients Broaden base of applications Build “net assets” through marketing, cost structure
43
43 Cost model includes Cost of the work F&A overhead R&D tax Sustainability tax Other options?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.