Download presentation
Presentation is loading. Please wait.
Published byRuby Fox Modified over 9 years ago
1
© Crown copyright Met Office Weather prediction and climate modelling at Exascale: Introducing the Gung-Ho project R. Ford, M.J. Glover, D.Ham, C.M. Maynard, S. Pickles, G. Riley and N. Wood
2
… and the weather for the conference is © Crown copyright Met Office
3
The primitive equations Rather complicated Equations of motion for density, humidity, pressure, temperature and wind, mass conservation and thermodynamics Partial Differential Equations – no general solution Approximate, discrete, numerical methods Rather complicated Equations of motion for density, humidity, pressure, temperature and wind, mass conservation and thermodynamics Partial Differential Equations – no general solution Approximate, discrete, numerical methods When a problem in pure or in applied mathematics is "solved" by numerical computation, errors, that is, deviations of the numerical "solution" obtained from the true, rigorous one, are unavoidable. Such a "solution" is therefore meaningless, unless there is an estimate of the total error in the above sense. J.von Neumann and H.H. Goldstine, Bull.Amer.Math.Soc 53 (1947) 1021-99
4
© Crown copyright Met Office Vegetation Model Short-wave radiation Clouds Convection Precipitation Long-wave radiation Surface Processes Physics Parameterisations
5
© Crown copyright Met Office Duration and/or Ensemble size Resolution Computing Resources Complexity 1/12 0 Challenge: Demands on computer power
6
Parallel Programming just got harder! June 21, 20126 Moore’s Law: More not faster Some cores are more equal than others. NUMA AMD Interlagos Heterogeneous Architectures: Accelerators NVidia Fermi Data parallel: cores MPI task scale 2 30 heterogeneous cores? Main memory is receding from view
7
The Unified Model - software © Crown copyright Met Office Obs, Var, UM (+), IO server, ensembles and verification – more than 2 million lines of code + Coupled models excluding ocean and sea ice UM used for both NWP and Climate Models Now ~ 25 years old Fortran90 (some F77 features remain) Parallelism expressed via MPI Some lower-level OpenMP (retro-fit) IO server MPI tasks dedicated to IO Dramatic improvement in IO performance. UM used for both NWP and Climate Models Now ~ 25 years old Fortran90 (some F77 features remain) Parallelism expressed via MPI Some lower-level OpenMP (retro-fit) IO server MPI tasks dedicated to IO Dramatic improvement in IO performance.
8
© Crown copyright Met Office Problems with a long-lat grid At 25km resolution, grid spacing near poles = 75m At 10km reduces to 12m! At 25km resolution, grid spacing near poles = 75m At 10km reduces to 12m! 3 rd Gen dynamical core (ENDGame) improved scaling Weak CFL ∆t↓ as ∆x↓ (implicit scheme) Data parallel in 2-D T/T 24
9
Globally Uniform Next Generation Highly Optimized GungHo! - Working Together Harmoniously
10
5 Year Project “To research, design and develop a new dynamical core suitable for operational, global and regional, weather and climate simulation on massively parallel computers of the size envisaged over the coming 20 years.” To address (inter alia): What should replace the lat-lon grid? How to transport material on that grid? Is implicit time scheme viable/desirable on such computers? Split into two phases: 2 years “research” 3 years “development” Bath, Exeter, Imperial, Leeds, Manchester, Reading – NERC STFC Daresbury and Met Office Bath, Exeter, Imperial, Leeds, Manchester, Reading – NERC STFC Daresbury and Met Office
11
Choice of mesh © Crown copyright Met Office New dynamical core Scalable to a very large number of elements Choice of elements and mesh not fixed Support for irregular elements in the horizonta l New dynamical core Scalable to a very large number of elements Choice of elements and mesh not fixed Support for irregular elements in the horizonta l structured mesh Neighbours known by construction - stencil Direct memory access structured mesh Neighbours known by construction - stencil Direct memory access unstructured mesh Neighbours unknown Look up table Indirect memory access unstructured mesh Neighbours unknown Look up table Indirect memory access Derivative operators
12
Consequences for memory access © Crown copyright Met Office a(i)=c*b(nb_list(i)) do k = 1, nlevel a(k,i)=c*b(k,nb_list(i)) end do do k = 1, nlevel a(k,i)=c*b(k,nb_list(i)) end do Indirect memory access destroys data locality poor cache utilisation poor performance Indirect memory access destroys data locality poor cache utilisation poor performance Mesh is likely to be structured in vertical horizontally unstructured columnar mesh vertical index (k) innermost (contiguous in memory) Mesh is likely to be structured in vertical horizontally unstructured columnar mesh vertical index (k) innermost (contiguous in memory)
13
Cache versus oversubscribed concurrency © Crown copyright Met Office Conventional CPU cache based memory model Will node level cache-coherency continue? Conventional CPU cache based memory model Will node level cache-coherency continue? GPU based thread-teams ( Warp ) fast switching Naively each thread own individual element vector memory access ( coalesced ) horizontal index ( i ) contiguous in memory GPU based thread-teams ( Warp ) fast switching Naively each thread own individual element vector memory access ( coalesced ) horizontal index ( i ) contiguous in memory
14
ILP - vectorisation © Crown copyright Met Office Vectorisation not limited to GPU-type machine SIMD units on CPUs SSE 2 64-bit words, AVX 4, SIMD on Intel MIC 8 Vectorisation not limited to GPU-type machine SIMD units on CPUs SSE 2 64-bit words, AVX 4, SIMD on Intel MIC 8 complex issue -- Pickles and Porter 2012 – NEMO (Ocean code) Compared two data layouts for 3D arrays -- found different operations favour different orderings Possible to vectorise some ops either layout complex issue -- Pickles and Porter 2012 – NEMO (Ocean code) Compared two data layouts for 3D arrays -- found different operations favour different orderings Possible to vectorise some ops either layout Vector friendly -- layer contiguous Vector friendly -- layer contiguous cache friendly -- column contiguous cache friendly -- column contiguous
15
Overview Model Input data Grid data Infrastructure (e.g. mctutils) halo_exchange() put(), get() – in place coupling (MPI) Program Parallelism mgmt -mpi, threads Read_(partitioned)_grid() call infrastructure_init() -Data and comms init -Including halo exchange init - and coupling exchange init call model_init() -e.g. Allocation -non in-place coupling model_timestep_control call model_run() call model_finalise() Model Science Code init() run() finalise() Model data set up e.g. Field descriptions Coupling requirements (‘tag’-based access) There will be several models and programs
16
© Crown copyright Met Office Science Model Computational Science (CS) workpackage - proposal Met Office Software development project Separation of concerns Computational Science (CS) workpackage - proposal Met Office Software development project Separation of concerns Computational science performance code Scientific and CS, performance code Fortran 2K3 + MPI + directives (OpenMP) Do not exclude PGAS models (CAF) of single- sided comms
17
Kernel API © Crown copyright Met Office Algorithm layer calls kernels parallel or serial implemented in PSy layer PSy layer calls compute for generic kernels -- defined interface Algorithm layer calls kernels parallel or serial implemented in PSy layer PSy layer calls compute for generic kernels -- defined interface Hand code versus auto-generated Misses opportunity for data re-use between kernels Special kernels e.g. Helmholtz consist of smaller kernels which share halo exchange Misses opportunity for data re-use between kernels Special kernels e.g. Helmholtz consist of smaller kernels which share halo exchange
18
Infrastructure Model © Crown copyright Met Office Define infrastructure API to be used by models Implementation neutral Use infrastructure software models Hide these implementations behind API e.g. ESMF for halo exchange, MCT-OASIS for coupling to Ocean model (NEMO) Define infrastructure API to be used by models Implementation neutral Use infrastructure software models Hide these implementations behind API e.g. ESMF for halo exchange, MCT-OASIS for coupling to Ocean model (NEMO)
19
Data Model © Crown copyright Met Office Model has local data view + halos Data belongs to objects – fields Data objects contain function space information – DoF of field topological entity Algorithm layer cannot access raw DoF arrays Enables Mesh/topological entity/function space to be changed without large code changes Unpacked as arrays before passing to kernel (variable or fixed data size for kernel?) State object contains internal GH data Model has local data view + halos Data belongs to objects – fields Data objects contain function space information – DoF of field topological entity Algorithm layer cannot access raw DoF arrays Enables Mesh/topological entity/function space to be changed without large code changes Unpacked as arrays before passing to kernel (variable or fixed data size for kernel?) State object contains internal GH data
20
Summary © Crown copyright Met Office NWP and Climate models are complex problems Key scientific driver for Exascale systems Gung-ho Complete redesign for UK Met Office mathematical formulation Algorithm Numerical Analysis software NWP and Climate models are complex problems Key scientific driver for Exascale systems Gung-ho Complete redesign for UK Met Office mathematical formulation Algorithm Numerical Analysis software personal view What about the hardware? Is there scope Co-design (wider project?) Software and hardware working together harmoniously personal view What about the hardware? Is there scope Co-design (wider project?) Software and hardware working together harmoniously
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.