Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming on IBM Cell Triblade Jagan Jayaraj,Pei-Hung Lin, Mike Knox and Paul Woodward University of Minnesota April 1, 2009.

Similar presentations


Presentation on theme: "Programming on IBM Cell Triblade Jagan Jayaraj,Pei-Hung Lin, Mike Knox and Paul Woodward University of Minnesota April 1, 2009."— Presentation transcript:

1 Programming on IBM Cell Triblade Jagan Jayaraj,Pei-Hung Lin, Mike Knox and Paul Woodward University of Minnesota April 1, 2009

2 An instability of an interface between two fluids of different densities, which occurs when the lighter fluid is pushing the heavier fluid. Using multi-fluids Piecewise-Parabolic Method(PPM) to implement R-T instability simulation Program is written in Fortran Rayleigh–Taylor instability

3 TriBlade ▫Two QS22 blades, each with 2 PowerXCell 8i CPUs ▫LS21 blade with two dual-core AMD Opterons ▫16GB memory for LS21 and 8GB memory for QS22

4

5 LCSE Cell Cluster 6 Triblades 4 QS22 Cell blades 2 QS20 Cell blades 4 AMD Quadcore Systems

6

7 Login instructions Account credentials should be in your email. Guest account: lcse / lcse$ncsa! Login steps: ▫SSH to frodo.lcse.umn.edu ▫Once logged in to frodo SSH to an assigned Cell Processor host  AMD – rra001a ~ rra006a  Cell – rra001b / rra001c ~ rra006b/rra006c

8 Software available Cell SDK 3.1 OpenMPI 1.3 DaCS Fortran bindings Compilers ▫AMD: gfortran, gcc 4.1.2 ▫PPU: ppuxlf, ppu-gcc ▫SPU: spuxlf, spu-gcc Example code is available on /mnt/scratch/NCSA_Example

9 Compilation and Execution On AMD node: ▫make ppm4f-x86 On Cell node: ▫make ppm4f-ppu On AMD node: ▫./ppm4f-x86

10  Three levels of parallelism:  within-Cell  within-node  node-to-node  Compute-communication overlap  DMA  DaCS  MPI Triblade programming paradigm

11  Single code for Roadrunner and non-RR systems ◦ Using lots #ifdef, #if, #endif… ◦ Using preprocessor to generate three codes  Minimize the manual translation for SPU code ◦ Using Fortran to Cell C translator,  Tedious portions of the SPU code can be translated.  Fortran codes for PPU and AMD ◦ Fortran binding programs for C intrinsic libraries  Keep memory footprint small Programming for IBM Cell Tri-blade

12 Single Source Code Preprocessor PPU Fortran codeSPU Fortran codeAMD Fortran code Translation SPU C codeFortran Binding Programs SPU C Compiler PPU Fortran Compiler GNU Fortran Compiler AMD ExecutablePPU ExecutableSPU Executable Embedded

13 Division of labor ▫Define jobs for AMD, PPU and SPU clearly  AMD: I/O, MPI, relay data to Cell…  PPU: Transfer data, manage SPUs  SPU: Just compute

14 ▫Three codes for three different ISAs ▫Different endian-ness between PPU and AMD  Need to do byte-swapping ▫64bit/32bit conversion  SPU supports 32bit address only, but DaCS requires 64bit address mode Items to care

15 Translator Fortran to C with Cell extensions Needs directives Built with ANTLR Handles: ▫Vector and scalar loops ▫DMAs (Including List DMAs) ▫Variable declarations ▫Conditional vector moves

16 References Woodward, P. R., J. Jayaraj, P.-H. Lin, and P.-C. Yew, “Moving Scientific Codes to Multicore Microprocessor CPUs,” Computing in Science & Engineering, special issue on novel architectures, Nov., 2008, p. 16-25. Also available at www.lcse.umn.edu/CiSE. Woodward, P. R., J. Jayaraj, P.-H. Lin, and D. Porter, “Programming Techniques for Moving Scientific Simulation Codes to Roadrunner,” tutorial given 3/12/08 at Los Alamos, link available at www.lanl.gov/roadrunner/rrtechnicalseminars2008. Woodward, P. R., J. Jayaraj, P.-H. Lin, and W. Dai, “First Experience of Compressible Gas Dynamics Simulation on the Los Alamos Roadrunner Machine,” submitted to Concurrency and Computation Practice and Experience, preprint available at www.lcse.umn.edu/RR-docs. http://www.lcse.umn.edu/NCSA_Workshop/


Download ppt "Programming on IBM Cell Triblade Jagan Jayaraj,Pei-Hung Lin, Mike Knox and Paul Woodward University of Minnesota April 1, 2009."

Similar presentations


Ads by Google