One-day Meeting, INI, September 26th, 2008 Role of spectral turbulence simulations in developing HPC systems YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN
One-day Meeting, INI, September 26th, Background Experience of developing the Earth Simulator 40Tflops vector-type distributed-memory supercomputer system A simulation code for box turbulence flow was used in the final adjustment of the system Large simulation on box turbulence flow was carried out. A Peta-flops supercomputer project
One-day Meeting, INI, September 26th, Contents Simulations on the Earth Simulator A Japanese peta-scale supercomputer project Trends of HPC system Summary
One-day Meeting, INI, September 26th, Simulations on the Earth Simulator
One-day Meeting, INI, September 26th, The Earth Simulator It was completed in Tflops sustained in LINPACK benchmark was achieved. It was chosen as one of 2002 best inventions by “TIME.”
One-day Meeting, INI, September 26th, Why I did? It is important to make performance evaluation of the Earth Simulator at the final adjustment phase. Suitable codes should be chosen To evaluate performance of vector processor, To measure performance all-to-all communication among compute-nodes through a crossbar switch, To make an operation of the Earth Simulator stable. Candidates LINPACK Benchmark? Atmospheric general circulation model (AGCM)? Any other code?
One-day Meeting, INI, September 26th, Why I did? (cont’d) Spectral turbulence simulation code Intensive computational kernel & a lot of data communications Simple code Significance to computational science. One of the grand challenges in computational science and high performance computing A new spectral code for the Earth Simulator Fourier spectral method for spatial discretization Some techniques (mode truncation and phase shift techniques) for aliasing error in calculating nonlinear terms Fourth-order Runge-Kutta method for time integration
One-day Meeting, INI, September 26th, Points of coding Optimization to the Earth Simulator Coordinated assignment of calculation to three-level of parallelism (vector processing, micro-tasking, and MPI parallelization) Higher-radix FFT B/F (data transfer rate between CPU and memories vs. operation performance) Removal of redundant processes and variables
One-day Meeting, INI, September 26th, sec Calculation for one time step Number of nodes Wall time 30.7sec days by 512 PNs
One-day Meeting, INI, September 26th, Performance Tflops 16.4Tflops Number of PNs % of the peak (single precision & analytical FLOP number)
One-day Meeting, INI, September 26th, Achievement of box turbulence flow simulations Year Orszag(1969) IBM Kerr(1985) Cray-1S NCAR K & I & Y (2002) Earth Simulator , Number of grid points Yamamoto(1994) Numerical Wind Tunnel Jimenez et al.(1993) Caltech Delta machine Siggia(1981) Cray-1 NCAR Gotoh&Fukayama(2001) VPP5000/56 NUCC 240 3
One-day Meeting, INI, September 26th, A Japanese Peta-Scale Supercomputer Project
One-day Meeting, INI, September 26th, Next-Generation Supercomputer Project Objectives are to develop the world's most advanced and high-performance supercomputer to develop and deploy its usage technologies as well as application software. as one of Japan's Key Technologies of National Importance. Period & Budget: FY2006-FY2012, ~1 billion US$ (expected) RIKEN (The Institute of Physical and Chemical Research) plays the central role of the project in developing the supercomputer under the law.
One-day Meeting, INI, September 26th, Goals of the project Development and installation of the most advanced high performance supercomputer system with LINPACK performance of 10 petaflops. Development and deployment of application software, which should be made to attain the system maximum capability, in various science and engineering fields. Establishment of an “Advanced Computational Science and Technology Center (tentative)” as one of the Center of Excellences for research, personnel development and training built around the supercomputer.
One-day Meeting, INI, September 26th, Major applications for the system Grand Challenges
One-day Meeting, INI, September 26th, Configuration of the system The Next-Generation Supercomputer will be a hybrid general-purpose supercomputer that provides the optimum computing environment for a wide range of simulations. Calculations will be performed in processing units that are suitable for the particular simulation. Parallel processing in a hybrid configuration of scalar and vector units will make larger and more complex simulations possible.
One-day Meeting, INI, September 26th, Roadmap of the project We are here.
One-day Meeting, INI, September 26th, Location of the supercomputer site, Kobe-City Tokyo Kobe 450km (280miles) west from Tokyo
One-day Meeting, INI, September 26th, Artists’ image of a building
One-day Meeting, INI, September 26th, Photo of the site (under construction) June 10, 2008 July 17, 2008 Aug. 20, 2008 Photo From South-Side
One-day Meeting, INI, September 26th, Trends of HPC system
One-day Meeting, INI, September 26th, Trends of HPC system It will have the large number of processors around 1 million or more. Each chip will be multi-core(8, 16, or 32), or many- core(more than 64) processor. low performance for each core small main memory capacity for each core fine-grain parallelism Each processor consumes low energy – low power processor Narrow bandwidth between CPU and main memory Bottleneck of the number of signal pins Bi-sectional bandwidth among compute-nodes will be narrow. One-to-one connection is very expensive and power-consuming
One-day Meeting, INI, September 26th, Impact to spectral simulations High performance in LINPACK benchmark The more the number of processors is, the higher the LINPACK performance is. It is not necessary that LINPACK performance denotes real-world application performance, especially spectral simulations Small memory capacity for each processor fine-grain decomposition of space increasing communication cost among parallel compute nodes Narrow memory bandwidth and narrow inter-node bi- sectional bandwidth memory wall problem and low all-to-all communication performance necessity of a low B/F algorithm in place of FFT
One-day Meeting, INI, September 26th, Impact to spectral simulations (cont’d) The trend does not completely fit doing 3D-FFT, i.e. box turbulence simulations are getting to be difficult to perform. We can use more and more computational resource near future, … But finer resolution simulation by spectral methods needs a long-time calculation time because of extremely slow of communications among parallel compute nodes, and we might not be able to obtain the final results in reasonable time.
One-day Meeting, INI, September 26th, Estimates for more than simulation If simulation performance with 500TFlops sustained can be used, simulation needs 7 second for one-time step 100TB total memory 8 days for 100,000 steps and 1PBytes for a complete simulation simulation 1 min for one-time step 800TB total memory 3 months for 125,000 steps and 10PB in total for a complete simulation
One-day Meeting, INI, September 26th, Summary Spectral methods is a very useful algorithm to evaluate the HPC system. In this sense, the trend of HPC system architecture is going to worse. Even if peak performance of the system is so high… We cannot expect high sustained performance. It may take a long time to finish a simulation due to very slow data transfer between nodes. Can we discard spectral methods and change the algorithm? Or, we have to put strong pressure on computer architecture community, and think of any international collaboration for developing the supercomputer system which fit the turbulent study. I would think of a HPC system as a particle accelerator like CERN.