Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Operational COSMO Demonstrator OPCODE André Walser and Oliver Fuhrer MeteoSwiss COSMO-GM, Rome, 5-9 September 2011
2 OPCODE | COSMO-GM 2011 André Walser Project overview Additional proposal to the Swiss HP2C initiative to build an “OPerational COSMO DEmonstrator (OPCODE)” Project proposal accepted by end of May Start of project 1 June 2011 until end of 2012 Project resources: second contract with IT company SCS to continue collaboration until end of new positions at MeteoSwiss for about 1 year Swiss HPC center CSCS C2SM (collaboration with ETH Zurich and others)
3 OPCODE | COSMO-GM 2011 André Walser Main goals Leverage the research results of the ongoing HP2C COSMO project Prototyp implementation of the COSMO production suite of MeteoSwiss making aggressive use of GPU technology MeteoSwiss ready to buy a GPU based hardware for the 2015 production machine Same time-to-solution on substantially cheaper hardware: Cray XT4 (3 cabinets) GPU based hardware (a few rack units)
4 OPCODE | COSMO-GM 2011 André Walser GPU perspectives GFLOPS per Watt is expected to increase strongly in the next years
5 OPCODE | COSMO-GM 2011 André Walser Workflow on demonstrator
6 OPCODE | COSMO-GM 2011 André Walser COSMO-7 / COSMO-2 suite: 034 3h assimilation (21 UTC)0-24h forecast (00 UTC) and TC products Elapsed time in min 3h assimilation (21 UTC) 0-24h forecast (00 UTC) and TC products 25-72h forecast (00 UTC) and TC products COSMO-2 forecast COSMO-7 assimilation COSMO-7 forecast COSMO-2 assimilation COSMO-2 TC products COSMO-7 TC products 61 Current production scheme Time-critical post-processing takes about 15 minutes longer than forecasts for both COSMO-2 and COSMO-7 current bottleneck is post-processing tool fieldextra entire suite has to be optimized for demonstrator
7 OPCODE | COSMO-GM 2011 André Walser Two workpages Workpage A: Porting remainig parts of opr COSMO MeteoSwiss to demonstrator Workpage B: Porting suite to demonstrator, optimize it, and operate it
8 OPCODE | COSMO-GM 2011 André Walser To use full speed-up, data has to remain on GPU within a time step; sent to CPU for I/O only Work package A COSMO workflow: Input Physics Dynamics Assimilation Boundary Conditions Diagnostics Output What’s still missing for a full GPU implementation?
9 OPCODE | COSMO-GM 2011 André Walser To use full speed-up, data has to remain on GPU within a time step; sent to CPU for I/O only Work package A COSMO workflow: Input Physics HPC2 Dynamics HPC2 Assimilation Boundary Conditions Diagnostics Output What’s still missing for a full GPU implementation?
10 OPCODE | COSMO-GM 2011 André Walser Tasks Work Package A Task A1. Dynamical Core: complete/update HP2C code SCS
11 OPCODE | COSMO-GM 2011 André Walser Task A2: Inter-/intra-GPU parallelization COSMO requires a communication library with halo-update as well as several other communications (e.g. global reduce, gather, scatter) e.g. peer-to-peer:
12 OPCODE | COSMO-GM 2011 André Walser Work Package A Task A1. Dynamical Core: complete/update HP2C code SCS A2. Inter-GPU parallelization: library for halo-updates, global reductions, scatters, gathers CSCS
13 OPCODE | COSMO-GM 2011 André Walser Work Package A Task A1. Dynamical Core: complete/update HP2C code SCS A2. Inter-GPU parallelization: library for halo-updates, global reductions, scatters, gathers CSCS A3. Interoperability C++/CUDA/Fortran: common compile system, Unified Virtual Addressing SCS
14 OPCODE | COSMO-GM 2011 André Walser A4. Data Assimilation: Porting to GPU Assimilation part is a huge code!
15 OPCODE | COSMO-GM 2011 André Walser Work Package A Task A1. Dynamical Core: complete/update HP2C code SCS A2. Inter-GPU parallelization: library for halo-updates, global reductions, scatters, gathers CSCS A3. Interoperability C++/CUDA/Fortran: common compile system, Unified Virtual Addressing SCS A4. Data Assimilation: Porting to GPU MeteoSwiss
16 OPCODE | COSMO-GM 2011 André Walser Work Package A Task A1. Dynamical Core: complete/update HP2C code SCS A2. Inter-GPU parallelization: library for halo-updates, global reductions, scatters, gathers CSCS A3. Interoperability C++/CUDA/Fortran: common compile system, Unified Virtual Addressing SCS A4. Data Assimilation: Porting to GPU MeteoSwiss A5. I/O: Software layer controling copying of fields from CPU to GPU and vice versa for I/O C2SM?
17 OPCODE | COSMO-GM 2011 André Walser Work Package A Task A1. Dynamical Core: complete/update HP2C code SCS A2. Inter-GPU parallelization: library for halo-updates, global reductions, scatters, gathers CSCS A3. Interoperability C++/CUDA/Fortran: common compile system, Unified Virtual Addressing SCS A4. Data Assimilation: Porting to GPU MeteoSwiss A5. I/O: Software layer controling copying of fields from CPU to GPU and vice versa for I/O C2SM? A6. Porting other code parts (BC, diagnostics) to GPU SCS
18 OPCODE | COSMO-GM 2011 André Walser Work Package B Task B1. Hardware CSCS B2. System Software CSCS
19 OPCODE | COSMO-GM 2011 André Walser Work Package B Task B1. Hardware CSCS B2. System Software CSCS B3. COSMO-Package: Porting and optimization of steering scripts MeteoSwiss
20 OPCODE | COSMO-GM 2011 André Walser Work Package B Task B1. Hardware CSCS B2. System Software CSCS B3. COSMO-Package: Porting and optimization of steering scripts MeteoSwiss B4. Post-processing: Parallelization of post-processing tools, additional work in fieldextra (partly paid by “COSMO license money”) MeteoSwiss
21 OPCODE | COSMO-GM 2011 André Walser Work Package B Task B1. Hardware CSCS B2. System Software CSCS B3. COSMO-Package: Porting and optimization of steering scripts MeteoSwiss B4. Post-processing: Parallelization of post-processing tools, additional work in fieldextra (partly paid by “COSMO license money”) MeteoSwiss B5. Setup and Testing MeteoSwiss
22 OPCODE | COSMO-GM 2011 André Walser Organization 1.7 FTE SCS, CSCS, C2SM 0.9 FTE new 1 year still open 1.9 FTE new MeteoSwiss 15 months, CSCS
23 OPCODE | COSMO-GM 2011 André Walser Schedule
24 OPCODE | COSMO-GM 2011 André Walser Thank you !