Programming on IBM Cell Triblade Jagan Jayaraj,Pei-Hung Lin, Mike Knox and Paul Woodward University of Minnesota April 1, 2009
An instability of an interface between two fluids of different densities, which occurs when the lighter fluid is pushing the heavier fluid. Using multi-fluids Piecewise-Parabolic Method(PPM) to implement R-T instability simulation Program is written in Fortran Rayleigh–Taylor instability
TriBlade ▫Two QS22 blades, each with 2 PowerXCell 8i CPUs ▫LS21 blade with two dual-core AMD Opterons ▫16GB memory for LS21 and 8GB memory for QS22
LCSE Cell Cluster 6 Triblades 4 QS22 Cell blades 2 QS20 Cell blades 4 AMD Quadcore Systems
Login instructions Account credentials should be in your . Guest account: lcse / lcse$ncsa! Login steps: ▫SSH to frodo.lcse.umn.edu ▫Once logged in to frodo SSH to an assigned Cell Processor host AMD – rra001a ~ rra006a Cell – rra001b / rra001c ~ rra006b/rra006c
Software available Cell SDK 3.1 OpenMPI 1.3 DaCS Fortran bindings Compilers ▫AMD: gfortran, gcc ▫PPU: ppuxlf, ppu-gcc ▫SPU: spuxlf, spu-gcc Example code is available on /mnt/scratch/NCSA_Example
Compilation and Execution On AMD node: ▫make ppm4f-x86 On Cell node: ▫make ppm4f-ppu On AMD node: ▫./ppm4f-x86
Three levels of parallelism: within-Cell within-node node-to-node Compute-communication overlap DMA DaCS MPI Triblade programming paradigm
Single code for Roadrunner and non-RR systems ◦ Using lots #ifdef, #if, #endif… ◦ Using preprocessor to generate three codes Minimize the manual translation for SPU code ◦ Using Fortran to Cell C translator, Tedious portions of the SPU code can be translated. Fortran codes for PPU and AMD ◦ Fortran binding programs for C intrinsic libraries Keep memory footprint small Programming for IBM Cell Tri-blade
Single Source Code Preprocessor PPU Fortran codeSPU Fortran codeAMD Fortran code Translation SPU C codeFortran Binding Programs SPU C Compiler PPU Fortran Compiler GNU Fortran Compiler AMD ExecutablePPU ExecutableSPU Executable Embedded
Division of labor ▫Define jobs for AMD, PPU and SPU clearly AMD: I/O, MPI, relay data to Cell… PPU: Transfer data, manage SPUs SPU: Just compute
▫Three codes for three different ISAs ▫Different endian-ness between PPU and AMD Need to do byte-swapping ▫64bit/32bit conversion SPU supports 32bit address only, but DaCS requires 64bit address mode Items to care
Translator Fortran to C with Cell extensions Needs directives Built with ANTLR Handles: ▫Vector and scalar loops ▫DMAs (Including List DMAs) ▫Variable declarations ▫Conditional vector moves
References Woodward, P. R., J. Jayaraj, P.-H. Lin, and P.-C. Yew, “Moving Scientific Codes to Multicore Microprocessor CPUs,” Computing in Science & Engineering, special issue on novel architectures, Nov., 2008, p Also available at Woodward, P. R., J. Jayaraj, P.-H. Lin, and D. Porter, “Programming Techniques for Moving Scientific Simulation Codes to Roadrunner,” tutorial given 3/12/08 at Los Alamos, link available at Woodward, P. R., J. Jayaraj, P.-H. Lin, and W. Dai, “First Experience of Compressible Gas Dynamics Simulation on the Los Alamos Roadrunner Machine,” submitted to Concurrency and Computation Practice and Experience, preprint available at