Parallel IO in the Community Earth System Model

Slides:

Advertisements

Similar presentations

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

Advertisements

NGS computation services: API's,

Weather Research & Forecasting: A General Overview

System Integration and Performance

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.

Μπ A Scalable & Transparent System for Simulating MPI Programs Kalyan S. Perumalla, Ph.D. Senior R&D Manager Oak Ridge National Laboratory Adjunct Professor.

NSF NCAR | NASA GSFC | DOE LANL ANL | NOAA NCEP GFDL | MIT | U MICH Emergence of the Earth System Modeling Framework NSIPP Seasonal Forecast.

Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.

SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.

Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.

Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.

John Dennis Dave Brown Kevin Paul Sheri Mickelson

Reference: Message Passing Fundamentals.

I/O Analysis and Optimization for an AMR Cosmology Simulation Jianwei LiWei-keng Liao Alok ChoudharyValerie Taylor ECE Department Northwestern University.

The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.

Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory

Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.

Chapter 3 Software Two major types of software

Parallelization with the Matlab® Distributed Computing Server CBI cluster December 3, Matlab Parallelization with the Matlab Distributed.

Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.

NE II NOAA Environmental Software Infrastructure and Interoperability Program Cecelia DeLuca Sylvia Murphy V. Balaji GO-ESSP August 13, 2009 Germany NE.

Service-enabling Legacy Applications for the GENIE Project Sofia Panagiotidi, Jeremy Cohen, John Darlington, Marko Krznarić and Eleftheria Katsiri.

DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.

Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

1 Addressing Critical Skills Shortages at the NWS Environmental Modeling Center S. Lord and EMC Staff OFCM Workshop 23 April 2009.

SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.

Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.

Mathematics and Computer Science & Environmental Research Divisions ARGONNE NATIONAL LABORATORY Regional Climate Simulation Analysis & Vizualization John.

1 Cactus in a nutshell... n Cactus facilitates parallel code design, it enables platform independent computations and encourages collaborative code development.

PIO: The Parallel I/O Library The 13 th Annual CCSM Workshop, June 19, 2008 Raymond Loy Leadership Computing Facility / Mathematics and Computer Science.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.

CSE 219 Computer Science III Program Design Principles.

ESMF Performance Evaluation and Optimization Peggy Li(1), Samson Cheung(2), Gerhard Theurich(2), Cecelia Deluca(3) (1)Jet Propulsion Laboratory, California.

SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

CESM/ESMF Progress Report Mariana Vertenstein NCAR Earth System Laboratory CESM Software Engineering Group (CSEG) NCAR is sponsored by the National Science.

PetaApps: Update on software engineering and performance J. Dennis M. Vertenstein N. Hearn.

Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.

ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.

Earth System Modeling Framework Status Cecelia DeLuca NOAA Cooperative Institute for Research in Environmental Sciences University of Colorado, Boulder.

Petascale –LLNL Appro AMD: 9K processors [today] –TJ Watson Blue Gene/L: 40K processors [today] –NY Blue Gene/L: 32K processors –ORNL Cray XT3/4 : 44K.

Mark Rast Laboratory for Atmospheric and Space Physics Department of Astrophysical and Planetary Sciences University of Colorado, Boulder Kiepenheuer-Institut.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

CCSM Portability and Performance, Software Engineering Challenges, and Future Targets Tony Craig National Center for Atmospheric Research Boulder, Colorado,

ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.

NCEP ESMF GFS Global Spectral Forecast Model Weiyu Yang, Mike Young and Joe Sela ESMF Community Meeting MIT, Cambridge, MA July 21, 2005.

CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.

Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.

ESMF,WRF and ROMS. Purposes Not a tutorial Not a tutorial Educational and conceptual Educational and conceptual Relation to our work Relation to our work.

SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.

1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.

From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.

Tackling I/O Issues 1 David Race 16 March 2010.

PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out.

Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,

NSF NCAR / NASA GSFC / DOE LANL ANL / NOAA NCEP GFDL / MIT / U MICH C. DeLuca/NCAR, J. Anderson/NCAR, V. Balaji/GFDL, B. Boville/NCAR, N. Collins/NCAR,

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

GWE Core Grid Wizard Enterprise (

Parallel Objects: Virtualization & In-Process Components

Parallel Algorithm Design

National Center for Atmospheric Research

SAMANVITHA RAMAYANAM 18TH FEBRUARY 2010 CPE 691

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Parallel IO in the Community Earth System Model Jim Edwards John Dennis (NCAR) Ray Loy(ANL) Pat Worley (ORNL)

CPL7 COUPLER CAM ATMOSPHERIC MODEL CLM LAND MODEL POP2 OCEAN MODEL CICE OCEAN ICE MODEL CISL LAND ICE MODEL

Some CESM 1.1 Capabilities: Ensemble configurations with multiple instances of each component Highly scalable capability proven to 100K+ tasks Regionally refined grids Data assimilation with DART

Prior to PIO Each model component was independent with it’s own IO interface Mix of file formats NetCDF Binary (POSIX) Binary (Fortran) Gather-Scatter method to interface serial IO

Steps toward PIO Converge on a single file format NetCDF selected Self describing Lossless with lossy capability (netcdf4 only) Works with the current postprocessing tool chain

Extension to parallel Reduce single task memory profile Maintain single file decomposition independent format Performance (secondary issue)

Parallel IO from all compute tasks is not the best strategy Data rearrangement is complicated leading to numerous small and inefficient IO operations MPI-IO aggregation alone cannot overcome this problem

Parallel I/O library (PIO) Goals: Reduce per MPI task memory usage Easy to use Improve performance Write/read a single file from parallel application Multiple backend libraries: MPI-IO,NetCDF3, NetCDF4, pNetCDF, NetCDF+VDC Meta-IO library: potential interface to other general libraries

PIO VDC netcdf4 pnetcdf HDF5 netcdf3 MPI-IO CPL7 COUPLER CISL LAND ICE MODEL CAM ATMOSPHERIC MODEL CLM LAND MODEL CICE OCEAN ICE MODEL POP2 OCEAN MODEL PIO VDC netcdf4 pnetcdf HDF5 netcdf3 MPI-IO

PIO design principles Separation of Concerns Separate computational and I/O decomposition Flexible user-level rearrangement Encapsulate expert knowledge

Separation of concerns What versus How Concern of the user: What to write/read to/from disk? eg: “I want to write T,V, PS.” Concern of the library developer: How to efficiently access the disk? eq: “How do I construct I/O operations so that write bandwidth is maximized?” Improves ease of use Improves robustness Enables better reuse

Separate computational and I/O decomposition Rearrangement between computational and I/O decompositions

Flexible user-level rearrangement A single technical solution is not suitable for the entire user community: User A: Linux cluster, 32 core job, 200 MB files, NFS file system User B: Cray XE6, 115,000 core job, 100 GB files, Lustre file system Different compute environment requires different technical solution!

Writing distributed data (I) I/O decomposition Computational decomposition Rearrangement + Maximize size of individual io-op’s to disk - Non-scalable user space buffering Very large fan-in  large MPI buffer allocations Correct solution for User A

Writing distributed data (II) I/O decomposition Computational decomposition Rearrangement + Scalable user space memory + Relatively large individual io-op’s to disk Very large fan-in  large MPI buffer allocations

Writing distributed data (III) I/O decomposition Computational decomposition Rearrangement + Scalable user space memory + Smaller fan-in -> modest MPI buffer allocations Smaller individual io-op’s to disk Correct solution for User B

Encapsulate Expert knowledge Insert images here Flow-control algorithm Match size of I/O operations to stripe size Cray XT5/XE6 + Lustre file system Minimize message passing traffic at MPI-IO layer Load balance disk traffic over all I/O nodes IBM Blue Gene/{L,P}+ GPFS file system Utilizes Blue Gene specific topology information

Experimental setup Did we achieve our design goals? Impact of PIO features Flow-control Vary number of IO-tasks Different general I/O backends Read/write 3D POP sized variable [3600x2400x40] 10 files, 10 variables per file, [max bandwidth] Using Kraken (Cray XT5) + Lustre filesystem Used 16 of 336 OST

3D POP arrays [3600x2400x40]

3D POP arrays [3600x2400x40]

3D POP arrays [3600x2400x40]

3D POP arrays [3600x2400x40]

3D POP arrays [3600x2400x40]

PIOVDC Parallel output to a VAPOR Data Collection (VDC) A wavelet-based, gridded data format supporting both progressive access and efficient data subsetting Data may be progressively accessed (read back) at different levels of detail, permitting the application to trade off speed and accuracy Think GoogleEarth: less detail when the viewer is far away, progressively more detail as the viewer zooms in Enables rapid (interactive) exploration and hypothesis testing that can subsequently be validated with full fidelity data as needed Subsetting Arrays are decomposed into smaller blocks that significantly improve extraction of arbitrarily oriented sub arrays Wavelet transform Similar to Fourier transforms Computationally efficient: order O(n) Basis for many multimedia compression technologies (e.g. mpeg4, jpeg2000)

Other PIO Users Earth System Modeling Framework (ESMF) Model for Prediction Across Scales (MPAS) Geophysical High Order Suite for Turbulence (GHOST) Data Assimilation Research Testbed (DART)

Write performance on BG/L Update slide with new data Write performance on BG/L April 26, 2010 Penn State University

Read performance on BG/L Update slide with new data Read performance on BG/L April 26, 2010 Penn State University

Coefficient prioritization (VDC2) 100:1 Compression with coefficient prioritization 10243 Taylor-Green turbulence (enstrophy field) [P. Mininni, 2006] No compression Coefficient prioritization (VDC2)

800:1 compressed: 0.34GBs/field Original: 275GBs/field 40963 Homogenous turbulence simulation Volume rendering of original enstrophy field and 800:1 compressed field 800:1 compressed: 0.34GBs/field Original: 275GBs/field Data provided by P.K. Yeung at Georgia Tech and Diego Donzis at Texas A&M

F90 code generation interface PIO_write_darray ! TYPE real,int ! DIMS 1,2,3 module procedure write_darray_{DIMS}d_{TYPE} end interface genf90.pl

# 1 "tmp.F90.in" interface PIO_write_darray module procedure dosomething_1d_real module procedure dosomething_2d_real module procedure dosomething_3d_real module procedure dosomething_1d_int module procedure dosomething_2d_int module procedure dosomething_3d_int end interface

PIO is opensource http://code.google.com/p/parallelio/ Documentation using doxygen http://web.ncar.teragrid.org/~dennis/pio_doc/html/

Thank you

Existing I/O libraries netCDF3 Serial Easy to implement Limited flexibility HDF5 Serial and Parallel Very flexible Difficult to implement Difficult to achieve good performance netCDF4 Based on HDF5

Existing I/O libraries (con’t) Parallel-netCDF Parallel Easy to implement Limited flexibility Difficult to achieve good performance MPI-IO Very difficult to implement Very flexible ADIOS Serial and parallel BP file format Easy to achieve good performance All other file formats