CCSM cpl6 Design and Interfaces Tony Craig Brian Kauffman Tom Bettge National Center for Atmospheric Researc Robert Jacob Jay Larson Everest Ong Argonne National Laboratory Chris Ding Helen He Lawrence Berkeley National Laboratory ESMF Workshop, May 14, 2003, GFDL
CCSM2 “Hub and Spoke” System cpl atm ocnice lnd Each component is a separate executable Each component is on a unique set of hardware processors All communications go through coupler Coupler –communicates with all components –maps (interpolates) data –merges fields –computes some fluxes –has diagnostic, history, and restart capability
CCSM2 “Hub and Spoke” System –Multiple Executables Inherited from CSM1.0 (Cray-OS could dynamically load balance a multi-executable system) Allows constructing a coupled system with a minimum of modification to individual model’s source code. Coupling is achieved by a handful of subroutine calls. Each model’s working group can continue to use and develop their model in a standalone mode which is closely related to the coupled version. Disadvantages: –Startup of multiple executables, control of thread/processor count highly system dependent. –Running a “standalone-equivalent” (one active model and 3 data models) still requires 5 executables.
CCSM2 “Hub and Spoke” System – The Hub Individual models only talk to the Hub and have no idea how many or which models are in the coupled system. Easy to swap out other models. CCSM is a multi model system: 13 different models—multiple combinations allowed (set at compile-time). Can perform global diagnostics at the hub and check conservation (does heat out = heat in globally?) Natural place to put calculations that don’t clearly belong in one of the individual models (air-sea fluxes, mappings, inter-model accumulation).
Design Issues for CCSM2.1/cpl6 Address Shortcomings in cpl5 –Alleviate potential bottleneck by moving to distributed memory parallelism –Do MxN data transfers –Generalize the model interface and the coupler functionality. –Simplify the process of extending the coupled system. Keep the Multiple-Executable execution mode Keep the Hub-and-Spoke Simplify coupling interface in components but keep similar level/location of source modification. (Project start: June, 2000)
Design Issues for CCSM2.1/cpl6 All the models are F90 so make cpl6 and all supporting software (MCT) F90 to avoid inter- language issues. Must run on many platforms Cpl6 requirements document:
cpl6 Design MCT* MPH** * Model Coupling Toolkit ** Multi-Component Handshaking Library cpl6 High level designed specifically for CCSM. Lower levels have general coupling capabilities. The cpl6 design has abstracted, parallel communication software into lower layers
cpl6 Design: Another view of CCSM In cpl5, MPI was the coupling interface In cpl6, the “coupler” is now implicitly hooked to each component via the coupling interfaces –Components unaware of coupling method –Coupling work can be carried out on component processors –Separate coupler no longer absolutely required atm lnd iceocn cpl coupling interface layer hardware processors
Cpl6-Model Interface Modules cpl_fields_mod –All models “use cpl_fields_mod” –Provides common field names and indicies to entire system. –Differentiates states and fluxes –Naming convention allows automatic routing of data between components for “simple” fields. cpl_interface_mod –All models “use cpl_interface_mod” –Simple interfaces, simple arguments (6 subroutines) –The components pass simple Fortran arrays to interface routines which then load them into cpl6 data types. –Components don’t know about MCT, cpl6 data types, or the underlying communication method. –Coupler operates directly on cpl6 datatypes and passes them to cpl_interface routines. Source or target identified by arguments. Replaces model-specific comm modules in cpl5. –Extensible
Basic cpl6 Data Types Contract –Bundle, Infobuffer, Router Infobuffer –Non-gridded data. integers and reals. Error codes, date, time of day, orbital parameters Bundle –Fundamental cpl6 storage data type for gridded data –Name, Domain, Attribute Vector, Counter Domain –cpl6 grid data type –Name, Attribute Vector of grid data (lats, lons), GSMap (decomposition) Map –Name, Smat, Domains, Rearranger
MCT Data Types Used by Cpl6 Attribute Vector –Fundamental data storage type –2d integer and real arrays (field,grid point) –Strings for field names Global Seg Map –Decomposition information Router –M to N inter-model communication information Rearranger –Local intra-model Communication information Smat –Scattered mapping matrix data MCT Design Note: Motivation to write MCT was partly to handle the data transfer issues raised in converting CCSM’s coupler to distributed memory while simultaneously retaining the ability to “hook up” easily to a gridded component model with unknown internal data structures and decomposition.
cpl6 Summary CCSM production coupler as of March, 2003 Duplicates cpl5 “science” Fully parallel distributed memory coupler Has M to N communication between components Coupling interfaces abstract communication method away from components Very usable, flexible, and extensible coupler Good communication and overall performance, scales well to multiple processors and higher resolutions Cpl6 tested/working only in concurrent, multiple executable mode. Plan to start exploring concurrent, single executable.
CCSM2/CPL6 Answers System Requirements: Linux/Unix, MPI. Components cannot spawn other components. Programming language: F90, some C. Not neutral. Component Abstraction: Contract, cpl_interface subroutines, cpl_fields names. Data is somewhat self describing: Can inquire about number of attributes and names. Each vector of reals or ints has a character string associated with it. Data structure is extensible. Data always copied between components (no choice in multiple executable configuration) Two component registries: one in MPH, one in MCT. Components do not effect other components. CPl6 assumes five named components. A “component” is either a physical model (atm, ocn, ice, lnd) or the coupler. Components are otherwise indistinguishable.
More CCSM2/CPL6 Answers Components can be internally parallel (MPI, threads, hybrid), can run concurrently and can support multiple executables. Components have some specific functions: e.g. atm component must send atmosphere state. But many different programs can “stand in” for the atmosphere. No virtualization of process/thread/CPU. There’s at least one MPI processes for each component. Components can not come and go during execution. Compute resources can not be acquired/released during execution (MPI needs to do that first!) Complicated model-dependent initialization phase. High level control syntax is the same for serial, parallel Components must be in F90 or provide an F90 interface layer to cpl6 and use the cpl_interface routines.
Yet More CCSM2/CPL6 Answers Each component is responsible for saving/restoring its own internal state. Coupler will send signal to save state. Coupler saves additional state for exact restart of coupled system. Bringing in a new component to replace a current one involves: add cpl6 modules to include path and link to mct/mpeu/mct libraries. Add cpl6_interface calls at appropriate places. Load simple fortran array arguments. Bringing in a “6 th ” component will require minor changes to cpl6 (and coupler’s main.) Target users of cpl6: Two types: –coupler writer. Whoever writes coupler’s main.F90 –Model integrator: person(s) charged with integrating a given component model into CCSM. Target component authors: Earth System scientists and their students/postdocs/programming staff who develop numerical models of parts of the Earth’s climate system.
END