John Dennis Dave Brown Kevin Paul Sheri Mickelson

Slides:



Advertisements
Similar presentations
Weather Research & Forecasting: A General Overview
Advertisements

Agenda Definitions Evolution of Programming Languages and Personal Computers The C Language.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
The NCAR Command Language (NCL) and the NetCDF Data Format Research Tools Presentation Matthew Janiga 10/30/2012.
Preparing CMOR for CMIP6 and other WCRP Projects
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
1 Generic logging layer for the distributed computing by Gene Van Buren Valeri Fine Jerome Lauret.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
RCAC Research Computing Presents: DiaGird Overview Tuesday, September 24, 2013.
Reference: Message Passing Fundamentals.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
UNIX Chapter 01 Overview of Operating Systems Mr. Mohammad A. Smirat.
GIS Topics and Applications
SCRIPTING LANGUAGE. The first interactive shells were developed in the 1960s to enable remote operation of the first time-sharing systems, and these,
Types of software. Sonam Dema..
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
NCL Website Tour NCL Website Tour  NCL Website Tour Overview, downloading, what’s new FAQ Examples.
“SEMI-AUTOMATED PARALLELISM USING STAR-P " “SEMI-AUTOMATED PARALLELISM USING STAR-P " Dana Schaa 1, David Kaeli 1 and Alan Edelman 2 2 Interactive Supercomputing.
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
Introduction to the WRF Modeling System Wei Wang NCAR/MMM.
C-Coupler1: a Chinese community coupler for Earth system modeling Li Liu, Cheng Zhang, Ruizhe Li, Guangwen Yang, Bin Wang, Zhiyuan Zhang Tsinghua University,
DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
CSCI 588 – FA05David Woollard - Peter Tipton - Andrew Hart Team 6 Status Review October 18, 2005 David Woollard (ID: 8735) Andrew Hart (ID: 4152) Peter.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
Overview of Recent MCMD Developments Manojkumar Krishnan January CCA Forum Meeting Boulder.
Robert Jacob Jayesh Krishna, Xiabing Xu, Sheri Mickelson, Tim Tautges, Mike Wilde, Rob Latham, Ian Foster, Rob Ross, Mark Hereld,
Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco.
PetaApps: Update on software engineering and performance J. Dennis M. Vertenstein N. Hearn.
Expanding the Functionality of ArcGIS Through Tool Building
Voltron A Peer To Peer Grid Networking Client Rice University Software Construction Methodology Dr. Stephen Wong, Instructor.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
Petascale –LLNL Appro AMD: 9K processors [today] –TJ Watson Blue Gene/L: 40K processors [today] –NY Blue Gene/L: 32K processors –ORNL Cray XT3/4 : 44K.
Experiences Accelerating MATLAB Systems Biology Applications Heart Wall Tracking Lukasz Szafaryn, Kevin Skadron University of Virginia.
The EDGeS project receives Community research funding 1 Porting Applications to the EDGeS Infrastructure A comparison of the available methods, APIs, and.
Reid & Sanders, Operations Management © Wiley 2002 Simulation Analysis D SUPPLEMENT.
Introduction of Geoprocessing Lecture 9. Geoprocessing  Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation.
FOUNDATION IN INFORMATION TECHNOLOGY (CS-T-101) TOPIC : INFORMATION SYSTEM – SOFTWARE.
Dryad and DryaLINQ. Dryad and DryadLINQ Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation Dryad provides.
Holding slide prior to starting show. Applications WG Jonathan Giddy
Post Processing Tools Sylvia Murphy National Center for Atmospheric Research.
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
SCD User Briefing NCL and PyNGL Visualization for the Geosciences Don Middleton with presentation material developed by Luca Cinquini, Mary Haley, Fred.
An introduction to CDO, NCL and PRECIS utilities
NCAS Computational Modelling Service (CMS) Group providing services to the UK academic modelling community Output of UM Diagnostics Directly in CF NetCDF;
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
Concurrency and Performance Based on slides by Henri Casanova.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
Hernán García CeCalcULA Universidad de los Andes.
Chapter 1: Introduction to Computers and Programming.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
A Quick Tour of the NOAA Environmental Software Infrastructure and Interoperability Group Cecelia DeLuca Dr. Robert Detrick visit March 28, 2012
Regional Climate Model Version 4.1 (RegCM4.1) Centre for Oceans, Rivers, Atmosphere and Land Sciences Indian Institute of Technology Kharagpur Kharagpur.
Parallel Programming By J. H. Wang May 2, 2017.
Abstract Machine Layer Research in VGrADS
PyStormTracker: A Parallel Object-Oriented Cyclone Tracker in Python
Restructuring the multi-resolution approximation for spatial data to reduce the memory footprint and to facilitate scalability Vinay Ramakrishnaiah Mentors:
NetCDF and Scientific Data Standard
GENERAL VIEW OF KRATOS MULTIPHYSICS
National Center for Atmospheric Research
Quick Access to HDF and HDFEOS data with NCL
Tour of NCL Website Modified by R. Grotjahn
NCL variable based on a netCDF variable model
DIBBs Brown Dog BDFiddle
Presentation transcript:

John Dennis Dave Brown Kevin Paul Sheri Mickelson 1

 Post-processing consumes a surprisingly large fraction of simulation time for high- resolution runs  Post-processing analysis is not typically parallelized  Can we parallelize post-processing using existing software? ◦ Python ◦ MPI ◦ pyNGL: python interface to NCL graphics ◦ pyNIO: python interface to NCL I/O library 2

 Conversion of time-slice to time-series  Time-slice ◦ Generated by the CESM component model ◦ All variables for a particular time-slice in one file  Time-series ◦ Form used for some post-processing and CMIP ◦ Single variables over a range of model time  Single most expensive post-processing step for CMIP5 submission 3

 Convert 10-years of monthly time-slice files into time-series files  Different methods: ◦ Netcdf Operators (NCO) ◦ NCAR Command Language (NCL) ◦ Python using pyNIO (NCL I/O library) ◦ Climate Data Operators (CDO) ◦ ncReshaper-prototype (Fortran + PIO) 4

dataset# of 2D vars# of 3D varsInput total size (Gbytes) CAMFV CAMSE CICE CAMSE CLM CLM CICE POP POP

14 hours! 5 hours 6

7

 Data-parallelism: ◦ Divide single variable across multiple ranks ◦ Parallelism used by large simulation codes: CESM, WRF, etc ◦ Approach used by ncReshaper-prototype code  Task-parallelism: ◦ Divide independent tasks across multiple ranks ◦ Climate models output large number of different variables  T, U, V, W, PS, etc.. ◦ Approach used by python + MPI code 8

 Create dictionary which describes which tasks need to be performed  Partition dictionary across MPI ranks  Utility module ‘parUtils.py’ only difference between parallel and serial execution 9

import parUtils as par … rank = par.GetRank() # construct global dictionary ‘varsTimeseries’ for all variables varsTimeseries = ConstructDict() … # Partition dictionary into local piece lvars = par.Partition(varsTimeseries) # Iterate over all variables assigned to MPI rank for k,v in lvars.iteritems(): …. 10

task-parallelism data-parallelism 11

12

7.9x (3 nodes) 35x speedup (13 nodes) 13

 Large amounts of “easy-parallelism” present in post-processing operations  Single source python scripts can be written to achieve task-parallel execution  Factors of 8 – 35x speedup is possible  Need ability to exploit both task and data parallelism  Exploring broader use within CESM workflow Expose entire NCL capability to python? 14