Download presentation
Presentation is loading. Please wait.
Published byIsaac Cummings Modified over 9 years ago
1
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Trilinos I/O Support (TRIOS) Proposed I/O Software for FY11 Approved for General Release SAND 2010-7732P November 2010 Ron Oldfield and Gregory Sjaardema Sandia National Laboratories
2
Trios Overview and Goals Parallel I/O Support for required I/O libraries –Exodus, Nemesis, IOSS –TPLs: netCDF, HDF5 –POC: Greg Sjaardema Co-Design Vehicle for I/O Research –Provides “brand name” and common distribution point for rapid deployment and testing –Planned Software Products for Trios Scalable I/O Libraries for sparse and dense matrices (from NGC project) In-Transit Data Services (from CSSE/SIO Research) –Network Scalable Service Interface (Nessie) –netCDF caching/staging service –Fragment detection and tracking for CTH HPC Database services (from CSRF: HPC System Support for Informatics) –SQL Services –Rapid Ingest Capabilities –POC: Ron Oldfield November 2010 2Trios Software Review
3
Impact of Efficient I/O Libraries Scaling Challenges for Trilinos-Based Document Clustering Strong scaling exposes weaknesses –Original methods for loading were not designed for production use. Improvements to enable large scale –Sparse Matrix Reads Keep track of mapping Parallel I/O –Dense Matrix Reads Convert to binary format Parallel I/O Data ordering –Memory efficient algorithms Multi-pass dense-matrix multiply for cosine-similarity allows calculation of full dataset Previous version could not cluster 400K docs (one matrix had to be resident in memory on each process) Early Performance Results: Bible Dataset NSSI only November 2010 3Trios Software Review
4
Multilingual Document Clustering Performance on JaguarPF With I/O improvements, application is compute bound even at large scale November 2010 4Trios Software Review
5
In-Transit Data Services Data Processing Between Simulation and Storage Motivation –Improve “effective” I/O performance by injecting data services between app and storage –Leverage available compute/service nodes Network Scalable Service Interface (Nessie) –Developed for the Lightweight FS Project –Framework for HPC client/server development –Designed for scalable data movement –Asynchronous RPC-like API Examples –Preprocessing for seismic imaging (ca.1995) –netCDF staging service –CTH particle detection/tracking –SQL Proxy for HPC/Database Integration –Sparse-matrix viz, real-time network analysis Client Application (compute nodes) I/O Service (compute/service nodes) Raw Data Processed Data Lustre File System Cache/aggregate /process Visualization Client November 2010 5Trios Software Review
6
In-Transit Data Services NetCDF Staging Service Motivation –Synchronous I/O libraries require app to wait until data is on storage device –Most of application I/O is “bursty” –Not enough cache on compute nodes for typical async I/O approaches –NetCDF is basis of important I/O libs at Sandia (no code changes required) NetCDF Caching Service –Service aggregates/caches data and pushes data to storage –Async I/O allows overlap of I/O and computation Client Application (compute nodes) NetCDF Service (compute nodes) NetCDF requests Processed Data Lustre File System Cache/aggregate November 2010 6Trios Software Review
7
In-Transit Data Services CTH Fragment Detection Motivation –Fragment detection requires data from every time step (I/O intensive) –Detection process takes 30% of time-step calculation (scaling issues) –Integrating detection software with CTH is intrusive on developer CTH fragment detection service –Extra compute nodes provide in-line processing (overlap fragment detection with time step calculation) –Only output fragments to storage (reduce I/O) –Non-intrusive Looks like normal I/O (spyplot interface) Can be configured out-of-band Status –Developing client/server stubs –Developing Paraview proxy service CTH (compute nodes) Fragment-Detection Service (compute nodes) Raw Data Fragment Data Lustre File System Detect Fragments Visualization Client spyplot Fragment detection service provides on-the-fly data analysis with no modifications to CTH. November 2010 7Trios Software Review
8
Database Service Provides Fast Access to Remote Database from HPC Application Database Service Features –Provides “bridge” between parallel apps and external DWA –Runs on Cray XT/XMT network nodes –Applications communicate with DB service using Nessie (over Portals or IB) –Service-level access to Netezza through standard interface (e.g., ODBC, nzload) Client Application on XT/XMT (compute nodes) Database Service (service node) Portals ODBC/NZL oad © Netezza Corporation Storage Arrays November 2010 8Trios Software Review
9
Motivating Application: NISAC/N-ABLE Modeling Economic Security Model economic impact of disruptions in infrastructure –Changes in U.S. Border Security technologies –Terrorist acts on commodity futures markets –Transportation disruptions on regional agriculture and food supply chains –Optimized military supply chains –Electric power and rail transportation disruptions on chemical supply chains Compute and data challenges –Models economy to the level of the individual firm –Model transactions from 10s of millions of companies –Simulation data ingested into DB for analysis –DB ingest is bottleneck (10x time to simulate data) –Time to solution is critical… want answers in hours November 2010 9Trios Software Review
10
Performance-Based Motivation for Database Service ODBC is the only available interface for remote access. It’s the interface and protocols, not the network that’s the bottleneck Service can use NZLoad from the Netezza host Bytes Written Time (sec) Database Ingest Performance November 2010 10Trios Software Review
11
Summary Trios is a perfect vehicle for I/O co-design Rapid deployment for efficient production-quality I/O libraries –Exodus, Nemesis, IOSS (from Seacas) –Sparse/Dense Matrix I/O (from NGC) Research vehicle for in-transit data services –Goal is to make efficient use of platform to reduce burden on file system –Already demonstrated value for Seismic (Salvo) –Enables exploration of new functionality for Trilinos codes on HPC systems netCDF staging to manage bursts of I/O In-transit fragment detection/tracking: reduce storage system requirements Integration with Data Warehouse Appliances Interactive observation/control of HPC application November 2010 11Trios Software Review
12
Trios Software Products Planned Timeline for Integration and Release Seacas –Exodus, Nemesis, IOSS (in Trilinos release 10.6) –Plans to move out of Trios subdirectory to a new package named “Seacas” Sparse/Dense matrix I/O libraries –Installed and tested: November 30, 2011 Research software –Network-Scalable Service Interface (Nessie) Installed and testedJan 1, 2011 I’m not sure how to test services using the Trilinos testing framework –netCDF staging service Installed with basic tests: Feb 1, 2011 Link with Exodus and evaluate performance: March 2011 – CTH tracking service In development, need working demo for ASC level III milestone: Summer 2011 Type of release depends on copyright resolution: currently for internal release only November 2010 12Trios Software Review
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.