Porting the MIT Global Circulation Model on the CellBE Processor Marco POLLINI (1), Paolo PALAZZARI (1,2), Vittorio ROSATO (1,2) (1) Ylichron Srl, Roma (2) ENEA CRESCO Project , Casaccia Research Centre, Roma February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 Outline The CellBE Processor The MITGCM (Global Circulation Model) Parallelization of the the 2D Conjugate Gradient routine on the CellBE (Very) Preliminary performance evaluations February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 Outline The CellBE Processor The MITGCM (Global Circulation Model) Parallelization of the the 2D Conjugate Gradient routine on the CellBE (Very) Preliminary performance evaluations February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 The CRESCO platform February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor 204.8 GB/s 25.6 GFlop/s February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 Outline The CellBE Processor The MITGCM (Global Circulation Model) Parallelization of the the 2D Conjugate Gradient routine on the CellBE (Very) Preliminary performance evaluations February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
MIT Global Circulation Model The MITgcm (MIT General Circulation Model) is a numerical model designed for study of the atmosphere, ocean, and climate. Its non-hydrostatic formulation enables it to simulate fluid phenomena over a wide range of scales. February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 Domain Decomposition The physical simulation domain is partitioned in 3D tiles. In the example, the domain is split along the x and y dimensions February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 Outline The CellBE Processor The MITGCM (Global Circulation Model) Parallelization of the the 2D Conjugate Gradient routine on the CellBE (Very) Preliminary performance evaluations February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 CG2D Routine Most demanding kernel of the code; Solves a linear system of equations Ax=b Parallelized and demanded to the 8 SPEs Rewritten in C Each tile is split in 8 sub-tiles, each managed by one SPE February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Add the borders within the tile … February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Workshop Progetti GRID PON Ricerca, Catania 2009 Outline The CellBE Processor The MITGCM (Global Circulation Model) Parallelization of the the 2D Conjugate Gradient routine on the CellBE (Very) Preliminary performance evaluations February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Performance evaluation No specific effort has been devoted (till now) to optimize the code. Focus to guarantee results correctness Reported results compare (for a cluster of 8 CellBE processors) MPI implementation on a cluster of PPE (PowerPC) nodes MPI implementation on the same PPE cluster + 8 SPE threads per PPE node February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009
Performance evaluation Execution time (200 iterations) of CG2D code (>90% of the whole computational effort of the MITgcm code) MPI MPI+SPE 1.51 sec 0.443 sec Speed-up S0 = 3.46 Expected performances upon code optimization and update to new Cell architecture (native Double) ≥ 10xS0 February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009