Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open-source Scientific Computing and Data Analytics using HDF

Similar presentations


Presentation on theme: "Open-source Scientific Computing and Data Analytics using HDF"— Presentation transcript:

1 Open-source Scientific Computing and Data Analytics using HDF
July 24th 2017 ESIP Summer Aashish Chaudhary Technical Leader with Patrick O’Leary, Dr. Rama Nemani (NASA), Chris Harris, Chris Kotfila, Doruk Aztek, Andrew Michaelis (NASA)

2 What We Do at Kitware? Open Source and Open Data is strongly encouraged and practiced at Kitware

3 It started with VTK

4 Parallel Processing and Rendering - Paraview

5 Computer Vision Images, Video, Point Clouds Recognition by Function
Object and Building Recognition by Function (DARPA) Human Activity Detection (Army RRTO, CTTSO) and Tracking in Wide-Area Video (AFRL) Computer Vision Images, Video, Point Clouds Recognition by Function Content-based Retrieval Event & Activity Recognition Anomaly Detection 3D Extraction and Compression Detection & Tracking Content-based Video Retrieval by Actions (DARPA VIRAT) Complex Event Recognition in Internet Videos (IARPA Aladdin) Wide-area Motion Imagery Threat Detection and Nodal Analysis (DARPA PerSEAS) Normalcy Modeling and Anomaly Detection (DARPA PANDA and PerSEAS) Football Play Recognition (DARPA CARVE) 3D model-based video compression (DARPA GRID) and super-resolved 3D reconstruction (DARPA Super 3D)

6 Medical Computing Longitudinal and population shape analysis
Surgical guidance And simulation Interactive medical applications and visualizations Vascular analysis Digital pathology Orthopedic analysis Quantitative imaging Electronic health records

7 Community Adaptation

8 HDF at Kitware Climate Community High Performance Computing
Network Common Data Form (NetCDF) Most projects use NetCDF4 Extensible Data Model and Format Developed to exchange scientific data between HPC codes and tools Heavy data is stored using HDF5 Medical Community Vision Community Leading-edge algorithms for registering and segmenting multidimensional data

9 ACME The Accelerated Climate Modeling for Energy (ACME) project is sponsored by the Earth System Modeling (ESM) program (Biological and Environmental Research) with eight national laboratories and six partner institutions to develop and apply the most complete, leading-edge climate and Earth system models to challenging and demanding climate-change research imperatives. Most commonly used data format - NetCDF4 Data streaming using OpenDAP Python Interface for most of the tools

10 OpenNEX NEX is a platform for scientific collaboration, knowledge sharing and research for the Earth science community Global Daily Downscaled Projections (NEX- GDDP, NetCDF4) MODIS-Land and Atmosphere (HDF)

11 Data processing Web Visualization
Gaia Gaia

12 Data processing Web Visualization
Pure JS? We are using hdf to drive Web-based big data analysis. As excpected we are using ParaView to read and write files for ParaViewWeb processing, and we are using hdf5.node for more D3 like investigation tools. It would be wonderful if you could help the hdf5.node project along and maybe create a Pure JS implimentation.

13 HDF5 File Organization HDF5 layout for big data. Derived statistical data is saved in groups so that it doesn’t have to be recomputed for value adding visualizations.

14 Preprocessing Simulation Postprocessing
HDF5 is commonly utilized by simulators leveraging our HPCCloud infrastructure. Performance is faily well studied on the traditional HPC resources interface, but writing files on AWS-based HPC clusters that HPCCloud creates on the fly is probably not optimal. The first graph shows speed up from 4 to 64 processors in the cloud. The second graph shows that writing the output to cloud storage (in red) dominates performance bottlenecks.

15 Layered architecture of HPCCloud providing access to traditional and virtual cloud HPC resources.

16 Possible Improvements
Streaming and Big Data analytics Any useful ingestion of HDF data into cluster requires ETL pipeline For some tools, computation cannot move close to the data, streaming support is necessary in such cases Optimal read/write on cloud storage Web-Support More tools and projects are moving to support web-enabled data analysis and visualization Pure JS implementation if possible

17 Summary HDF is widely data format for scientific computing, climate/geospatial visualization, and in other domains at Kitware Recently we have started using HDF for information visualization We are looking forward to HDF usage on cloud and web-environment Kitware is always looking for strong open source collaborations and is committed to push open-source scientific computing to its next level

18 Information Aashish Chaudhary: aashish.chaudhary@kitware.com
LinkedIn: Kitware: NASA-NEX: Kitware-AIST: HPC Cloud : HPCloud Github:


Download ppt "Open-source Scientific Computing and Data Analytics using HDF"

Similar presentations


Ads by Google