PP POMPA (WG6) Overview Talk COSMO GM12, Lugano Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team.

Slides:



Advertisements
Similar presentations
Weather Research & Forecasting: A General Overview
Advertisements

Shredder GPU-Accelerated Incremental Storage and Computation
WRF Modeling System V2.0 Overview
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
OpenFOAM on a GPU-based Heterogeneous Cluster
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Porting physical parametrizations to GPUs using compiler directives X. Lapillonne, O. Fuhrer Eidgenössisches Departement des Innern EDI Bundesamt für Meteorologie.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Status of Dynamical Core C++ Rewrite (Task 5) Oliver Fuhrer (MeteoSwiss), Tobias Gysi (SCS), Men Muhheim (SCS), Katharina Riedinger (SCS), David Müller.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
Slide 1 Auburn University Computer Science and Software Engineering Scientific Computing in Computer Science and Software Engineering Kai H. Chang Professor.
CS6235 L16: Libraries, OpenCL and OpenAcc. L16: Libraries, OpenACC, OpenCL CS6235 Administrative Remaining Lectures -Monday, April 15: CUDA 5 Features.
Science Plan and SPM report COSMO General Meeting, 9 September 2010.
Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
Porting the physical parametrizations on GPUs using directives X. Lapillonne, O. Fuhrer, Cristiano Padrin, Piero Lanucara, Alessandro Cheloni Eidgenössisches.
Experience with COSMO MPI/OpenMP hybrid parallelization Matthew Cordery, William Sawyer Swiss National Supercomputing Centre Ulrich Schättler Deutscher.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Operational COSMO Demonstrator OPCODE André Walser and.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss News from COSMO COSMO User Workshop 2010.
The Finite-Volume Dynamical Core on GPUs within GEOS-5 William Putman Global Modeling and Assimilation Office NASA GSFC 9/8/11 Programming weather, climate,
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
PP POMPA (WG6) Overview Talk COSMO GM11, Rome st Birthday.
GPU Architecture and Programming
Soil moisture generation at ECMWF Gisela Seuffert and Pedro Viterbo European Centre for Medium Range Weather Forecasts ELDAS Interim Data Co-ordination.
Climate-Weather modeling studies Using a Prototype Global Cloud-System Resolving Model Zhi Liang (GFDL/DRC)
Multi-core Acceleration of NWP John Michalakes, NCAR John Linford, Virginia Tech Manish Vachharajani, University of Colorado Adrian Sandu, Virginia Tech.
1 06/09/2011, COSMO GM Xavier Lapillonne Porting the physical parametrizations on GPU using directives X. Lapillonne, O. Fuhrer Eidgenössisches Departement.
Status of the COSMO-Model Package Ulrich Schättler.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Manno, , © by Supercomputing Systems 1 1 COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi POMPA Workshop, Manno,
Progress Toward Accelerating CAM-SE. Jeff Larkin Along with: Rick Archibald, Ilene Carpenter, Kate Evans, Paulius Micikevicius, Jim Rosinski, Jim Schwarzmeier,
Slide 1 Using OpenACC in IFS Physics’ Cloud Scheme (CLOUDSC) Sami Saarinen ECMWF Basic GPU Training Sept 16-17, 2015.
Status of Dynamical Core C++ Rewrite Oliver Fuhrer (MeteoSwiss), Tobias Gysi (SCS), Men Muhheim (SCS), Katharina Riedinger (SCS), David Müller (SCS), Thomas.
Status of the COSMO-Software and Documentation WG 6: Reference Version and Implementation WG Coordinator: Ulrich Schättler.
Belgrade, 26 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Overview of on-going work on NMMB HPC performance at BSC.
Innovation for Our Energy Future Opportunities for WRF Model Acceleration John Michalakes Computational Sciences Center NREL Andrew Porter Computational.
Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.
Deutscher Wetterdienst COSMO-ICON Physics Current Status and Plans Ulrich Schättler Source Code Administrator COSMO-Model.
WG 6 and Support Activities Overview Status Report Reference Version and Implementation WG Coordinator and Project Leader: Ulrich Schättler.
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Priority Project Performance On Massively Parallel Architectures (POMPA) Nice to meet you! COSMO GM10, Moscow.
Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
SixTrack for GPU R. De Maria. SixTrack Status SixTrack: Single Particle Tracking Code [cern.ch/sixtrack]. 70K lines written in Fortran 77/90 (with few.
Benefits. CAAR Project Phases Each of the CAAR projects will consist of a: 1.three-year Application Readiness phase ( ) in which the code refactoring.
Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling Tom Henderson NOAA Global Systems Division
Keith Kelley Presentation 2 Weather Forecasting with Parallel Computers.
Energy efficient SCalable
Generalized and Hybrid Fast-ICA Implementation using GPU
Amit Amritkar & Danesh Tafti Eric de Sturler & Kasia Swirydowicz
Summary of WG6 activities
PT Evaluation of the Dycore Parallel Phase (EDP2)
ENIAC ENabling the Icon model on heterogeneous ArChitecture
PP POMPA status Xavier Lapillonne.
Experience with Maintaining the GPU Enabled Version of COSMO
External Projects related to WG 6
Dycore Rewrite Tobias Gysi.
Implementation of a general tracer treatment
Discussion HPC Priority project for COSMO consortium
PASC PASCHA Project The next HPC step for the COSMO model
Joint GEOS-Chem and NCAR Modeling Workshop:
Cristiano Padrin (CASPUR)
Ulrich Schättler Source Code Administrator
Conservative Dynamical Core (CDC)
Preparing a new Model Version according to Source Code Management
Presentation transcript:

PP POMPA (WG6) Overview Talk COSMO GM12, Lugano Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team

Task Overview Task 1 Performance analysis and documentation Task 2 Redesign memory layout and data structures Task 3 Improve current parallelization Task 4 Parallel I/O Task 5 Redesign implementation of dynamical core Task 6 Explore GPU acceleration Task 7 Implementation documentation Talks of Tobias and Carlos

Data Structures (Task 2) First step: physical parametrizations with blocking f(nproma,k,nblock) New version of organize_physics() structured as follow ! start block loop do ib=1,nblock call copy_to blockf(i,j,k)  f_b(nproma,k) call organize_gscp call organize_radiation call organize_turbulence call organize_soil call copy_backf_b(nproma,k)  f(i,j,k) end do Unified COSMO / ICON physics library Straightforward OpenMP parallelization over blocks inside physics scheme data is in block form f_b(nproma,ke) Xavier Lapillonne

Asynchronous & Parallel I/O (Task 4) IPCC benchmark of ETH (400 PEs + 3 I/O PEs) Only for NetCDF output Will be available in COSMO v5.0 SectionRuntime [s] ORIGINAL Runtime [s] NEW Dynamics Physics8685 Additional Computations1110 Input55 Output1327 TOTAL Carlos Osuna

Explore GPU Acceleration (Task 6) Goal Investigate whether and how GPUs can be leveraged for numerical weather prediction with COSMO Background Pioneering work: WRF (Michalakes) ASUCA (JMA): full rewrite to CUDA HIRLAM: dynamical core in C and CUDA GOES-5 (NASA): CUDA and PGI directives CAM (CESM, NCAR): CUDA implementation NIM (NOAA): custom compiler F2C-ACC or OpenACC …and many others

You are not alone! Source: Aoki et al., Tokio Institute of Technology J. Schalkwijk; Griffith, E.; Post, F; and Jonker, H.J.J.; High performance simulations of turbulent clouds on a desktop PC: exploiting the GPU, Bulletin of the American Meteorological Society, March 2012, pp

Approach(es) in POMPA How to achieve portable performance while retaining a single source code? Dynamics ~60% of runtime few core developers many stencils very memory intense Physics + Assimilation ~20% of runtime more developers plug-in / shared code “easy” to parallelize DSEL approach (see Tobias’ talk) Accelerator directives

Why Assimilation? Only 2% of typical COSMO-2 forecast Lot’s of code GPU and CPU have different memories (currently)! dynamics CPU dynamics GPU

Full GPU Implemenation Low FLOP count per load/store (stencils!) Transfer of data on each timestep too expensive (At least) All code which touches the prognostic variables every timestep has to be ported Part Time/  t Dynamics172 ms Physics36 ms Total253 ms vs 118 ms Transfer of ten prognostic variables * §

Physics on GPU Convection scheme in preparation by CASPUR Meaningful subset of physics is ready! Ported using OpenACC (accelerator directives standard) CPU runtime [s] GPU runtime [s] Speedup Microphysics Radiation Turbulence Soil Copy (ijk) ↔ block-2.3- Total physics Xavier Lapillonne

Outlook functional CPU Version dycore Integration, GCL tuning & storage order GPU Version with dycore & GCL GPU variant of GCL, boundary conditions kernels, first experience with directives/CUDA interoperability GPU Version with timeloop additional kernels (e.g. relaxation), non-periodic boundary conditions, GCL Fortran interfaces, interoperability directives/CUDA, meaningful subset of physics, communication using comm wrapper, nudging GPU Version for realcases performance optimization, remaining parts with directives, GCL tuning, dycore tuning 2012 July August October December 2013

Integration Making the advection scheme run fast on a GPU is one thing Integration into operational code is another… Amdahl’s law usability sharing of data-structures clash of programming models maintainability Significant effort in 2012 to integrate all POMPA GPU- developments into production code (HP2C OPCODE) COSMO v v4.22 assml

Feedback of Code? Asynchronous, parallel NetCDF I/O Alternative gather strategy in I/O Consistent computation of tendencies Loop level hybrid parallelization (OpenMP / MPI) Blocked data structure in physics More flexible halo-exchange interface Consistent implementation of lateral BC Switchable singe/double precision …

Knowhow Transfer “Transfer of knowhow is critical” Dynamical core rewrite and GPU implementation 5 – 7 FTEs in 2012! 0.5 FTE from fixed COSMO staff rest is mostly temporary project staff HP2C Initiative, CSCS and CASPUR are providing the bulk of resources Knowhow transfer is only possible with a stronger engagement of COSMO staff!

Papers, Conferences and Workshops X. Lapillonne, O. Fuhrer (submitted 2012) : Using compiler directives to port large scientific applications to GPUs: an example from atmospheric science. Submitted to Parallel Processing Letters. Diamanti T., X. Lapillonne, O. Fuhrer, and D. Leuenberger, 2012: Porting the data assimilation of the atmospheric model COSMO to GPUs using directives. 41th SPEEDUP Workshop, Zurich, Switzerland. Fuhrer, O., A. Walser, X. Lapillonne, D. Leuenberger, and T. Schönemeyer, 2011: Bringing the "new COSMO" to production: Opportunities and Challanges. SOS 15 Workshop, Engelberg, Switzerland. Fuhrer, O., T. Gysi, X. Lapillonne, and T. Schulthess, 2012: Considerations for implementing NWP dynamical cores on next generation computers. EGU General Assembly, Vienna, Austria. Gysi, T., and D.Müller, 2011: COSMO Dynamical Core Redesign - Motivation, Approach, Design & Expectations. SOS 15 Workshop, Engelberg, Switzerland Gysi T., and D. Müller, 2011: COSMO Dynamical Core Redesign - Project, Approach by Example & Implementation. Programming weather, climate, and earth-system models on heterogeneous multi-core platforms, Boulder CO, USA Gysi T., P. Messmer, T. Schröder, O. Fuhrer, C. Osuna, X. Lapillonne, W. Sawyer, M. Bianco, U. Varetto, D. Müller and T. Schulthess, 2012: Rewrite of the COSMO Dynamical Core. GPU Technology Conference, San Jose, USA. Lapillonne, X., O. Fuhrer, 2012 : Porting the physical parametrizations of the atmospheric model COSMO to GPUs using directives, Speedup Conference, Basel, Switzerland. Lapillonne, X., O. Fuhrer, 2011 : Porting the physical parametrizations on GPUs using directives. European Meteorological Society Conference, Berlin, Germany. Lapillonne, X., O. Fuhrer, 2011: Adapting COSMO to next generation computers: The microphysics parametrization on GPUs using directives. Atelier de modélisation de l'atmosphère, Toulouse, France. Osuna, C., O. Fuhrer and N. Stringfellow, 2012: Recent developments on NetCDF I/O: Towards hiding I/O computations in COSMO. Scalable I/O in climate models workshop, Hamburg, Germany. Schulthess, T., 2010: GPU Considerations for Next Generation Weather Simulations. Supercomputing 2012, New Orleans, USA. Steiner, P., M. de Lorenzi, and A. Mangili, 2010: Operational numerical weather prediction in Switzerland and evolution towards new supercomputer architectures. 14th ECMWF Workshop on High Performance Computing in Meteorology, Reading, United Kingdom

Thank you! …and thanks to the POMPA project team for their work in 2012!