PT Evaluation of the Dycore Parallel Phase (EDP2)

Slides:



Advertisements
Similar presentations
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Advertisements

MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006.
PP POMPA (WG6) Overview Talk COSMO GM12, Lugano Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Status of Dynamical Core C++ Rewrite (Task 5) Oliver Fuhrer (MeteoSwiss), Tobias Gysi (SCS), Men Muhheim (SCS), Katharina Riedinger (SCS), David Müller.
REVIEW OF NA61 SOFTWRE UPGRADE PROPOSAL. Mandate The NA61 experiment is contemplating to rewrite its fortran software in modern technology and are requesting.
Integrating Parallel and Distributed Computing Topics into an Undergraduate CS Curriculum Andrew Danner & Tia Newhall Swarthmore College Third NSF/TCPP.
Porting the physical parametrizations on GPUs using directives X. Lapillonne, O. Fuhrer, Cristiano Padrin, Piero Lanucara, Alessandro Cheloni Eidgenössisches.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Operational COSMO Demonstrator OPCODE André Walser and.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss News from COSMO COSMO User Workshop 2010.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
GPU Architecture and Programming
1 06/09/2011, COSMO GM Xavier Lapillonne Porting the physical parametrizations on GPU using directives X. Lapillonne, O. Fuhrer Eidgenössisches Departement.
SAXS Scatter Performance Analysis CHRIS WILCOX 2/6/2008.
Experts in numerical algorithms and HPC services Compiler Requirements and Directions Rob Meyer September 10, 2009.
Manno, , © by Supercomputing Systems 1 1 COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi POMPA Workshop, Manno,
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Status of Dynamical Core C++ Rewrite Oliver Fuhrer (MeteoSwiss), Tobias Gysi (SCS), Men Muhheim (SCS), Katharina Riedinger (SCS), David Müller (SCS), Thomas.
FORTRAN History. FORTRAN - Interesting Facts n FORTRAN is the oldest Language actively in use today. n FORTRAN is still used for new software development.
Deutscher Wetterdienst COSMO-ICON Physics Current Status and Plans Ulrich Schättler Source Code Administrator COSMO-Model.
Making the System Operational Implementation & Deployment
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Performance of a Semi-Implicit, Semi-Lagrangian Dynamical Core for High Resolution NWP over Complex Terrain L.Bonaventura D.Cesari.
PYTHON FOR HIGH PERFORMANCE COMPUTING. OUTLINE  Compiling for performance  Native ways for performance  Generator  Examples.
SixTrack for GPU R. De Maria. SixTrack Status SixTrack: Single Particle Tracking Code [cern.ch/sixtrack]. 70K lines written in Fortran 77/90 (with few.
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
Recent Development on IN3D-ACC July 22, 2014 Recent Progress: 3D MPI Performance 1 Lixiang (Eric) Luo, Jack Edwards, Hong Luo Department of Mechanical.
1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.
New task within PT Support Activities “Support of COSMO licensees”
COMP 2100 From Python to Java
Operating System Structures
Is RRTMGP suited for GPU?
Introduction to Computer Science
The “Understanding Performance!” team in CERN IT
Summary of WG6 activities
ENIAC ENabling the Icon model on heterogeneous ArChitecture
A451 Theory – 7 Programming 7A, B - Algorithms.
Deep Learning Libraries
PP POMPA status Xavier Lapillonne.
Contents Simulink model Grouping into subsystems Naming the subsystems
AC500 Web Server Basic module
Many-core Software Development Platforms
Ray-Cast Rendering in VTK-m
Requirements and the Software Lifecycle
Computer-Generated Force Acceleration using GPUs: Next Steps
Experience with Maintaining the GPU Enabled Version of COSMO
Computer Science I CSC 135.
Department of Computer Science University of California, Santa Barbara
Dycore Rewrite Tobias Gysi.
Pluggable Architecture for Java HPC Messaging
NVIDIA Fermi Architecture
Chapter 2: The Linux System Part 1
Making the System Operational Implementation & Deployment
Discussion HPC Priority project for COSMO consortium
PASC PASCHA Project The next HPC step for the COSMO model
and news about our activities
Chapter 9 – Software Evolution and Maintenance
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Analysis models and design models
Threaded Programming in Python
Chapter 8 Software Evolution.
HPC User Forum: Back-End Compiler Technology Panel
Conservative Dynamical Core (CDC)
Deep Learning Libraries
Preparing a new Model Version according to Source Code Management
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Argon Phase 3 Feedback June 4, 2019.
Implementation Plan system integration required for each iteration
Presentation transcript:

PT Evaluation of the Dycore Parallel Phase (EDP2) X. Lapillonne, M. Baldauf, P. Spörri, O. Fuhrer, C. Barbu, C. Osuna, U. Schättler, A. Walser

Main goal Two implementations of the RK dynamical core coexist in COSMO: Fortran C++ based on STELLA library (required to run on GPU) C++ version will soon be in the offical version Evaluate cost and give recommendation to STC regarding the parallel phase where the 2 implementations coexist Stop short presentation at this slide

Cost evaluation The total time maintenance: 0.5 FTE # Type Cost/Change Cost 20 Minor Changes Adapting configuration variables, defaults or changing a line in the computation. 0.005 FTE 0.1 FTE 5 Medium Changes Changes that require new stencils but fit into the existing context. 0.02 FTE 1 Major Changes E.g. port a new Fast Waves Solver 0.2 FTE General maintenance Keeping up with recent compilers, investigation of performance issues. The total time maintenance: 0.5 FTE 0.3 FTE for basic maintenance and small to medium changes 0.2 FTE for integrating major developments Based on previous years, may vary now that the focus switch to ICON and the C++ code is distributed (more small changes, less large one)

Impact and evaluation Impact (on users/developers) : need for additional communication between the main dycore developers and the C++ dycore maintainer Performance on latest architecture of the C++/STELLA as compared to the Fortran dynamics x3 faster on GPU as compare to the original code on CPU   9x Haswell Sockets 96x Compute, 4x I/O Nodes 1x Haswell, 8x K80 Sockets 8x Compute, 2x I/O Nodes Setup GCC 4.9 F90 Dycore C++ Dycore GCC 4.9, CUDA 7.0 Double Precision 316 s 214 s 101 s Single Precision 169 s 152 s 65 s Consequences and experience of using the C++ dynamics for operational weather prediction : No adverse consequences in terms of maintenance, stability or complexity for operations

Developer experience (using STELLA) C++ Dycore Source Code Maintainer (P. Spörri, MCH) : nice and efficient to write single source performance portable HPC code C++ Dycore Users: Experiences at DWD: very different than higher programming language (like Fortran, C++, Python, ...), need permanent support for development. => Seen as a large obstacle for model development Experience at IMGW (port of EULAG with GridTools): while this is early stage, progress is good and currently there are no major obstacles to port the Fortran code to C++. => Divergent opinions !

Impact on support and installation of the COSMO code NMA involved in support activity, and will provides the second level support Impact of using a DSL for the dynamical core Faster and easier to maintain code when targeting multiple architectures Comparison with an OpenACC GPU implementation of some dycore component shows the STELLA implementation is 1.4x to 1.8x faster on

Possible development workflow Working in C++ only (no Fortran dynamics) reference implementation directly using STELLA or plain C++ code (with explicit loops) can be written directly inside the existing C++ dycore dycore maintainer integrates and optimizes the reference code in STELLA

Consequence of discontinuing the Fortran dynamics Training and change of workflow of the main développer (several weeks of investment) Additional training for developer at other universities would be required New training also needed for the user community : compilation of the c++ + Fortran code requires some additional knowhow. Consequence of discontinuing the C++ dynamics Issue for members and universities using the C++ dycore for production (MCH, ETH, EMPA) Would loose the ability to run on GPU architectures which is a strength of the COSMO model

Recommendation Extend the parallel phase of maintaining both the C++ and Fortran implementations Evaluate again after a period of at least two years Cost of having 2 dynamics: 0.5 FTE/Year => Extension accepted by STC