F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.

Slides:



Advertisements
Similar presentations
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Advertisements

Current Progress on the CCA Groundwater Modeling Framework Bruce Palmer, Yilin Fang, Vidhya Gurumoorthi, Computational Sciences and Mathematics Division.
Katie Antypas NERSC User Services Lawrence Berkeley National Lab NUG Meeting 1 February 2012 Best Practices for Reading and Writing Data on HPC Systems.
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.
I/O Analysis and Optimization for an AMR Cosmology Simulation Jianwei LiWei-keng Liao Alok ChoudharyValerie Taylor ECE Department Northwestern University.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
MCell Usage Scenario Project #7 CSE 260 UCSD Nadya Williams
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
Simo Niskala Teemu Pasanen
ORNL is managed by UT-Battelle for the US Department of Energy Data Management User Guide Suzanne Parete-Koon Oak Ridge Leadership Computing Facility.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Jonathan Carroll-Nellenback University of Rochester.
DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
June 29 San FranciscoSciDAC 2005 Terascale Supernova Initiative Discovering New Dynamics of Core-Collapse Supernova Shock Waves John M. Blondin NC State.
Mark Rast Laboratory for Atmospheric and Space Physics Department of Astrophysical and Planetary Sciences University of Colorado, Boulder Kiepenheuer-Institut.
Advanced Simulation and Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear.
CS 591 x I/O in MPI. MPI exists as many different implementations MPI implementations are based on MPI standards MPI standards are developed and maintained.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
1D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Scientific Data Management: An Incomplete Experimental HENP Perspective D. Olson, LBNL 26 March 2002 SDM-ISIC Meeting.
TeraScale Supernova Initiative: A Networker’s Challenge 11 Institution, 21 Investigator, 34 Person, Interdisciplinary Effort.
Using IOR to Analyze the I/O Performance
Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
SDM Center Parallel I/O Storage Efficient Access Team.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks On-line Visualization for Grid-based Astronomical.
An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Logistical Networking: Buffering in the Network Prof. Martin Swany, Ph.D. Department of Computer and Information Sciences.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
VisIt Project Overview
Simulation Production System
Hadoop Aakash Kag What Why How 1.
An Open Source Project Commonly Used for Processing Big Data Sets
Example: Rapid Atmospheric Modeling System, ColoState U
Existing Perl/Oracle Pipeline
Grid Computing AEI Numerical Relativity Group has access to high-end resources in over ten centers in Europe/USA They want: Bigger simulations, more simulations.
Introduction to Operating System (OS)
So far we have covered … Basic visualization algorithms
TYPES OFF OPERATING SYSTEM
SDM workshop Strawman report History and Progress and Goal.
Hadoop Technopoints.
Nuclear Physics Data Management Needs Bruce G. Gibbard
TeraScale Supernova Initiative
The Operating system Gives life to the hardware
Department of Computer Science, University of Tennessee, Knoxville
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Parallel I/O for Distributed Applications (MPI-Conn-IO)
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale

F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Characteristics of Nuclear Astrophysical Simulation Data Origin: –Usually from hydrodynamic, MHD, or radiation-transport components of a simulation –Supernova models, neutron-star mergers, etc. Disk Access Patterns: –Data written & read primarily from structured or block-structured AMR grids –Unstructured grid or particle data is possible –Writes & reads done via parallelized I/O (usually MPI-I/O) –Large number of processes (>= 1024) –Post run analysis & Viz requires accessing lengthy sequences of large files

F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Characteristics of Nuclear Astrophysical Simulation Data File Sizes: –Large for both checkpointing and viz dumps –Currently ~ 1 Gbyte –Near Future ~ 10’s of Gbytes File Abundances: –Many (typically 1000’s) of files from a single batch job –Especially true for long timescale problems File Access Frequency: –Low –Perhaps only once per run –Probably will be post-processed/analyzed on non-ultrascale platform

F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Problem Sizes A typical 2-D multi-group Boltzmann transport simulation: A 3-D multi-group flux-limited diffusion model or 3-D hydro model checkpoint 1024x1204x1024 resolution will be comparable in size # Scratch vectors Bytes/variable# Neutrino species# Energy bins # Angular points# Spatial points 256x256 x 50 x 16x16 x (6 + 8) x 8 = 100 Gbytes

F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March NERSC ORNL Indiana U. Stony Brook NC State UC San Diego Distributed Set of Computing and Analysis Sites Long round-trip time between sites: –Approx. 75 msec for NERSC to Stony Brook –Bad for interactive visualization & analysis of data –Must use store & forward capabilities of logisitical networking

F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Networking Challenges # Checkpoint files Bytes/variable# Neutrino species# Energy bins # Angular points# Spatial points 256x256 x 50 x 16x16 x 6 x 100 x 8 = 4 Tbytes Movement of large data sets between compute & user sites Needed for post-run analysis, reconstruction, and visualization A typical 2-D multi-group Boltzmann transport simulation: A 3-D multi-group flux-limited diffusion model may produce a Terabyte of data

F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs of Ultrascale Nuclear Astrophysical Simulation Projects Parallel I/O -Support for portable data formats in data storage and data management products & tools –Vendors need to be helpful to developers of these formats –Support HDF5 & netCDF –Specifically must help portable data format developers -Support parallel HDF5 & netCDF interfaces for vendor specific MPI-I/O implementations -Support for parallel HDF5 & netCDF on vendor specific parallel filesystems -I/O must scale to large ( > 2048) processors –Asynchronous I/O support via MPI-2 is highly desireable

F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs of Ultrascale Nuclear Astrophysical Simulation Projects Seamless file access across scratch & tertiary file storage –Tertiary storage files accessible via Unix paths directly from OS Enable easy Tbyte data transfer between select sites –Vital for post-run analysis & visualization –Integration of storage with Logistical Networking (LBONE) depots –Increase transfer throughput by use of dedicated non-tcp networks to handle scheduled transfers?? –Automated data migration between sites Ability to handle large numbers of large files from single batch job –Need lots of scratch space on Ultrascale platforms –Automate migration of files from scratch to tertiary storage –Viz & Analysis tools need to be able to handle long time- sequences of files from a simulation