Presentation is loading. Please wait.

Presentation is loading. Please wait.

WWOSC 2014 Montreal, Canada 2014 - 08 Running operational Canadian NWP models on next-generation supercomputers Michel Desgagné, Abdessamad Qaddouri, Janusz.

Similar presentations


Presentation on theme: "WWOSC 2014 Montreal, Canada 2014 - 08 Running operational Canadian NWP models on next-generation supercomputers Michel Desgagné, Abdessamad Qaddouri, Janusz."— Presentation transcript:

1 WWOSC 2014 Montreal, Canada 2014 - 08 Running operational Canadian NWP models on next-generation supercomputers Michel Desgagné, Abdessamad Qaddouri, Janusz Pudykiewicz, Stéphane Gaudreault, Michel Valin, Martin Charron R echerche en P révisions N umériques (RPN – Dorval,Canada)

2 Plan G lobal E nvironmental M ultiscale NWP model Current large Gem configurations at CMC Scaling up scenarios From Global Lat-Lon to Yin-Yang Investigating new numerics Investigating new horizontal geometry Conclusions

3 Global Deterministic Prediction System (GDPS) ENVar Data Assimilation 25 km horizontal resolution, 80 levels 00, 12Z runs up to 10-days 1280 IBM P7 cores Current NWP Systems(1) Regional Deterministic Prediction System (RDPS) ENVar Regional Data Assimilation 10 km horizontal resolution, 80 levels 48-hour forecast, 4x per day 1024 IBM P7 cores

4 Current NWP Systems(2) High-res Deterministic Prediction System (HRDPS): Downscaling from the RDPS 10 km 2.5 km horizontal resolution, 65 levels 48-hour forecast, 4x per day, 2976 IBM P7 cores

5 Scaling up those NWP Systems 2016-17 2020 2030 GDPS RDPS HRDPS Yin-Yang 10 km, 120 levels 12K “IBM P7” cores (4.5 X current cluster size at 33%) 2.5 km, 80 levels With ENVar & larger domain 5-9K “IBM P7” cores Experimental system Urban scale 250 m Strategic interests Special contracts Yin-Yang ~7 km ?? 36K “IBM P7” cores (3-fold) Yin-Yang ~2.5 km 800K “IBM P7” cores (22-fold) ~1.0 km, 120 levels 200K “IBM P7” cores

6 The GEM model 1.Grid point lat/lon model 2.Finite differences on an Arakawa-C grid 3.Semi-Lagrangian (poles are an issue) 4.Implicit time discretization 1.Direct solver (Nk 2D horizontal elliptic problems) 2.Full 3D iterative solver based on FGMRES 5.Global Lat-Lon, Yin-Yang and LAM configurations 6.Hybrid MPI/OpenMP 1.Halo exchanges 2.Array transposes for elliptic problems 7.PE block partitioning for I/O Global Lat-Lon grid: 1) a challenge for DM implementation 2) many more elliptic problems to solve due to implicit horizontal diffusion (transposes) 3) semi-Lagrangian near the poles 4) current DM implementation will not scale

7 Yin-Yang grid configuration Implemented as 2 LAMs communicating at boundaries Optimized Schwarz iterative method for solving the elliptic problem. Scales a whole lot better Operational implementation due in spring 2015 Communications are an issue Exchanging a Global Lat-Lon scalability problem (poles) by another scalability problem Abdessamad Qaddouri, Vivian Lee (2011) The Canadian Global Environmental Multiscale model on the Yin-Yang grid system, Q. J. R. Meteorol. Soc. 137: 1913–1926

8 H960H3200H5056H1920 Y3200Y8192Y16384Y30968 Y5056 H= CMC/hadar: IBM Power7 Y= NCAR/yellowstone: IBM iDataPlex 16x30x1 198x36 79x32x1 40x34 79x49x4 40x22 32x32x8 99x34 32x32x4 99x34 32x30x1 99x36 40x40x1 79x27 Yin-Yang 10 km scalability Ni=3160, Nj=1078, Nk=158 Y61936 158x49x4 20x22

9 Yin-Yang 10 km scalability Dynamics components semi-Lagrangian transpose Yin-Yang exchanges fft-solver

10 The future of GEM Yin-Yang 2km on order 100K cores is already feasible on P7 processors or similar Solver and Yin-Yang exchanges will need work Using GPUs capabilities is on the table Improve Omp scalability Reshaping main memory for better cache usage Export SL interpolations to reduce halo size Processor mapping to reduce the need to communicate through the switch Partition NK MIMD approach for I/O: I/O server

11 Revisiting Time Discretization: Exponential integration methods for meteorological models Jacobi operator Introducing the integrating factor yields Clancy C., Pudykiewicz J. (2013) On the use of exponential time integration methods in atmospheric models, Tellus A, vol. 65

12 And integrating over followed by multiplication by Depending on the quadrature used we will obtain different versions of the exponential integration schemes. Their common property is that the desired solution could be expressed as weighted sum of “phi functions” Tokman (2006) Efficient integration of large stiff systems of ODEs with exponential propagation iterative methods, J. Comp. Phys. (213), 748-776

13 Exploring exponential integrators with Eulerian finite-volume schemes in GEM The main difficulty of implementing exponential integrators is the evaluation of the φ functions. Recent advance in computational linear algebra is now allowing efficient computation:  Fast, matrix-free, Krylov method  Only the action of the matrix on a vector (matvec product) is need  Completely remove the need to evaluate φ functions  Reduced-size problem : Reduce the large system matrix to a smaller Hessenberg matrix on which calculating the exponential is easy  inherent local and global conservation  non-oscillatory advection (monotonic, positivity preservation)  geometric flexibility (any type of grid system)  stable even with large Courrant numbers

14 Major Findings Exponential methods can resolve high frequencies to the required level of tolerance without severe time step restriction of the explicit numerical schemes Contrast to usual implicit integration used with large time steps, which either damp high frequencies or map them to one and the same frequency (or nearly so) in the discretization scheme is free of noise even when the low order spatial discretization is used Compute efficienty is very good on linear problems and looks real promising for full model Expected to scale very well because of data locality

15 Investigating scalability and accuracy on an icosahedral geodesic grid Spatial discretization: finite volume method on icosahedral geodesic grid Time discretization: exponential integration methods which resolve high frequencies to the required level of tolerance without severe time step restriction Shallow water implementation already shows great scalability Vertical coordinates: Generalized quasi-Lagrangian with conservative monotonic remapping Pudykiewicz J. (2011), J. Comp. Phys., 230, pp 1956--1991 Qaddouri A., J. et al. (2012), Q. J. Roy. Met. Soc., 138, pp 989--1003

16 Conclusions Yin-Yang 2km on order 100K cores is already feasible on P7 processors or similar No real worries for next 4-6 years Ready to address architecture changes: –GPUs –Larger vectors –Larger # of cores per node Many developments items on the agenda Keep investigating new numerics: –that will enhance data locality and limit communications –that will be better suited for upcoming architectures

17 END


Download ppt "WWOSC 2014 Montreal, Canada 2014 - 08 Running operational Canadian NWP models on next-generation supercomputers Michel Desgagné, Abdessamad Qaddouri, Janusz."

Similar presentations


Ads by Google