HDF5 Tutorial @ICALEPCS2017 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.

Slides:



Advertisements
Similar presentations
Doc Document Management Systems For Manufacturing Industry Infocrew Solutions Pvt.Ltd.
Advertisements

DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
HDF5 A new file format & software for high performance scientific data management.
Company Overview for GDF Suez December 29, Enthought’s Business Enthought provides products and consulting services for scientific software solutions.
1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
The HDF Group HDF5 Overview Elena Pourmal The HDF Group 1 10/17/15ICALEPCS 2015.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
SAS® Viya™ Overview ANDRĖ DE WAAL, GLOBAL ACADEMIC PROGRAM
Tago Tago IoT DAY GRAIN BIN LEVEL? The epicenter of middleware
VisIt Project Overview
Device Maintenance and Management, Parental Control, and Theft Protection for Home Users Made Easy with Remo MORE and Power of Azure MICROSOFT AZURE APP.
MICROSOFT AZURE ISV PROFILE: BMC SOFTWARE
COMPANY PROFILE: CORENT TECHNOLOGY INC.
Hierarchical Data Formats (HDF) Update
Meemim's Microsoft Azure-Hosted Knowledge Management Platform Simplifies the Sharing of Information with Colleagues, Clients or the Public MICROSOFT AZURE.
DocFusion 365 Intelligent Template Designer and Document Generation Engine on Azure Enables Your Team to Increase Productivity MICROSOFT AZURE APP BUILDER.
Tools and Services Workshop
Axway MailGate Unifies “Safe-for-Work” Solutions to Keep Your Enterprise as Secure as Possible in the Azure Cloud and/or Any Hybrid Environment MICROSOFT.
Joslynn Lee – Data Science Educator
Moving from HDF4 to HDF5/netCDF-4
Status and Challenges: January 2017
Docker Birthday #3.
Free Cloud Management Portal for Microsoft Azure Empowers Enterprise Users to Govern Their Cloud Spending and Optimize Cloud Usage and Planning MICROSOFT.
VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock
Replace with Application Image
NeoFirma Taps into the Microsoft Azure Cloud Platform to Deliver Digital Oilfield SaaS to North American Independent Oil and Gas Producers MICROSOFT AZURE.
Cherwell Service Management is an IT Service Management Solution that Makes it Easier for Users to Capitalize on Power of Microsoft Azure MICROSOFT AZURE.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Wonderware Online Cost-Effective SaaS Solution Powered by the Microsoft Azure Cloud Platform Delivers Industrial Insights to Users and OEMs MICROSOFT AZURE.
Supporting HDF5 and The HDF Group
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Stylelabs Develops the Marketing Content Hub to Offer Enterprises a High-End Marketing Content Management Platform Based on Microsoft Azure MICROSOFT AZURE.
Containers in HPC By Raja.
Quick introduction to the Workshop
SmartHOTEL Solutions Powered by Microsoft Azure Provide Hoteliers with Comprehensive, One-Stop Automated Management of All Booking Channels MICROSOFT AZURE.
OpenNebula Offers an Enterprise-Ready, Fully Open Management Solution for Private and Public Clouds – Try It Easily with an Azure Marketplace Sandbox MICROSOFT.
Microsoft Azure Platform Powers New Elements Constellation Software Suite to Deliver Invaluable Insights From Your Data for Marketing and Sales MICROSOFT.
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
FACTON Provides Businesses with a Cloud Solution That Elevates Enterprise Product Cost Management to a New Level Using the Power of Microsoft Azure MICROSOFT.
Running on the Powerful Microsoft Azure Platform,
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Be Better: Achieve Customer Service Excellence and Create a Lean RMA and Returns Process with Renewity RMA and the Power of Microsoft Azure MICROSOFT AZURE.
Utilizing the Capabilities of Microsoft Azure, Skipper Offers a Results-Based Platform That Helps Digital Advertisers with the Marketing of Their Mobile.
Immersive, Hands-On Learning: The KNOLSKAPE Gamified Learning and Assessment Apps Built on Azure Transform Your Talent, Make Learning Fun MICROSOFT AZURE.
Big Red Cloud Offers a Simple Online Accounts Solution for Business Owners and Bookkeepers Hosted on the Powerful Microsoft Azure Platform MICROSOFT AZURE.
I-POWER JAPAN Gives Small Businesses the Ability to Get Their Work Done from Anywhere, Even a Construction Site, by Using Microsoft Azure MICROSOFT AZURE.
The Only Digital Asset Management System on Microsoft Azure, MediaValet Is Uniquely Equipped to Meet Any Company’s Needs MICROSOFT AZURE ISV PROFILE: MEDIAVALET.
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
CloudLabs, Powered by Azure, Enables the Quick, Easy, Cost-Effective Management, Distribution of Online Training Labs for Education and Business MICROSOFT.
Excelian Grid as a Service Offers Compute Power for a Variety of Scenarios, with Infrastructure on Microsoft Azure and Costs Aligned to Actual Use MICROSOFT.
Unitrends Enterprise Backup Solution Offers Backup and Recovery of Data in the Microsoft Azure Cloud for Better Protection of Virtual and Physical Systems.
CloneManager® Helps Users Harness the Power of Microsoft Azure to Clone and Migrate Systems into the Cloud Cost-Effectively and Securely MICROSOFT AZURE.
AllDigital Brevity on Microsoft Azure Cloud Platform Supercharges Media Workloads by Encoding During High-Speed File Transmission MICROSOFT AZURE ISV PROFILE:
MyCloudIT Enables Partners to Drive Their Cloud Profitability Using CSP-Enabled Desktop Hosting Automation with Microsoft Azure and Office 365 MICROSOFT.
Principal Product Manager Oracle Data Science Platform
Searchable. Secure. Simple.
Crypteron is a Developer-Friendly Data Breach Solution that Allows Organizations to Secure Applications on Microsoft Azure in Just Minutes MICROSOFT AZURE.
TEMPLATE.
Appcelerator Arrow: Build APIs in Minutes. Connect to Any Data Source
ADAM on Microsoft Azure Streamlines Access and Control of Full Function Digital Asset and Product Content Management for All Workers MICROSOFT AZURE ISV.
XtremeData on the Microsoft Azure Cloud Platform:
AIMS for BizTalk, Built on the Microsoft Azure Platform, Empowers Enterprises to Automate Insight and Analytics and Boost Value Creation MICROSOFT AZURE.
Thales Alenia Space Competence Center Software Solutions
Improve Patient Experience with Saama and Microsoft Azure
PRESENTER: Entrinsik, Inc.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Convergence of Big Data and Extreme Computing
Presentation transcript:

HDF5 Tutorial @ICALEPCS2017 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.

Q1: How would you describe your knowledge of HDF5?

Q3: If you selected HDF5 features above, what would you like to learn more about?

Goals of today’s presentations Help new users to start with HDF5 Answer your questions as we go through the material Help everyone to avoid major HDF5 pitfalls

The HDF Group and HDF5 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.

Offices in Champaign, IL + Boulder, CO Who is The HDF Group? 6 HDF Group has developed open source solutions for Big Data challenges for nearly 30 years Small company (~ 40 employees) with focus on High Performance Computing and Scientific Data Offices in Champaign, IL + Boulder, CO Our flagship platform – HDF5 – is at the heart of our open source ecosystem. Tens of thousands use HDF5 every day, as well as build their own solutions (800+ projects on Github) “De-facto standard for scientific computing” and integrated into every major analytics + visualization tool

What does the HDF Group do? v HDF5 (Open Source) + “Enterprise” (Future) Connectors: ODBC + Cloud (Beta) Add-Ons: compression + VOL plugins + VFD plugins Products v Support Packages (Basic, Professional, Premier, Customized) Support for h5py + PyTables + pandas (NEW) Training Support v HDF5: new functionality + performance tuning for specific platforms General HPC software engineering with scientific expertise Consulting

Silicon Manufacturing Defense & National Security Our Industries 8 v v v v v Financial Services Oil and Gas Aerospace Automotive Medical & Biotech v v v v v Silicon Manufacturing Electronics Instrument Government Defense & National Security Academic Research

Why Use HDF5? Self-documenting container optimized for scientific data 9 I/O library and tools optimized for scale and speed Self-documenting container optimized for scientific data Users who need both features

TRILLION-PARTICLE SIMULATION Lawrence Berkeley National Laboratory (LBNL) 10 Complex collisions of particle that light up the aurora borealis can fracture Earth's magnetic shield and wreak havoc on electronics, power grids, and space satellites. Visualization of trillion-particle datasets made possible with HDF5 are helping scientists decipher how. Simulation ran at NERSC Cray XE6 on 120,000 cores using 80% of computing resources 90% of available memory 50% of Lustre scratch system and writing 10 one-trillion particle dumps of 30-42 TBs in HDF5 files; sustained ~ 27 GB/sec; total 350 TBs in HDF5

EARTH OBSERVING SYSTEM 11 EARTH OBSERVING SYSTEM NASA Deliver 6,700 Different Data Products to 12 Data Archive Centers Nearly 16 terabytes per day are redistributed to more than 1.7 million end users worldwide

When we say ‘HDF5’… HDF5 data model HDF5 library HDF5 “file” format 12 …we usually mean one of the following: A Data Model that organizes array variables in hierarchical structure A Library that maps/manages model instances in storage contexts (core, FS, net, obj. store) A self-describing “file” Format for serializing model instances into single or multi-file layouts The technology stack that includes 1. - 3. A domain- specific format implemented on top of 4. (HDF5 as a Universal File Format) An Ecosystem (language bindings, 3rd party apps., standards) Open source HDF5 data model HDF5 library HDF5 “file” format

Why is this concept so different and useful? Support for multidimensional data of complex types Data and metadata in one place streamlines data lifecycle and work flow Portable between different storage – FS, Object Store, fast memory and slow memory (backing store) Pluggable data transformation for compression, integrity, encryption, etc. High-performance I/O Large ecosystem (800+ Github projects, e.g., h5py, PyTables, Pandas)

What isn’t HDF5? Algorithm or Analytics Tool Shrink-Wrapped Service 14 Algorithm or Analytics Tool We provide the data, users provide their ‘”secret sauce” Shrink-Wrapped Service HDF5 is an SDK for developers to embed into their own solutions Fully-Featured Database HDF5 eliminates anything that slows down I/O performance

Questions?