Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 HPC and the ROMS BENCHMARK Program Kate Hedstrom August 2003.

Similar presentations


Presentation on theme: "1 HPC and the ROMS BENCHMARK Program Kate Hedstrom August 2003."— Presentation transcript:

1 1 HPC and the ROMS BENCHMARK Program Kate Hedstrom August 2003

2 2 Outline New ARSC systems Experience with ROMS benchmark problem Other computer news

3 3 New ARSC Systems Cray X1 128 MSP (1.5 TFLOPS) 4 GB/MSP Water cooled IBM p690+ and p655+ 5 TFLOPS total At least 2 GB/cpu Air cooled Arriving in September, switch later

4 4 Cray X1 (klondike)

5 5

6 6 Cray Cray X1 Node Node is a 4-way SMP 16 GB/node Each MSP has four vector/scalar processors Processors in MSP share cache Node usable as 4 MSPs or 16 SSPs IEEE floating point hardware

7 7 Cray Programming Environment Fortran, C, C++ Support for MPI SHMEM Co-Array Fortran UPC OpenMP (Fall 2003) Compiling executes on CPES - Sun V480, happens invisibly to user

8 8

9 9 IBM Two p690+ Like our Regatta, but faster, more memory (8 GB/cpu) Shared memory between 32 cpu For big OpenMP jobs Six p655+ towers Like our SP, but faster, more memory (2 GB/cpu) Shared memory on each 8 cpu node, 92 nodes in all For big MPI jobs and small OpenMP jobs

10 10

11 11 Benchmark Problem No external files to read Three different resolutions Periodic channel representing the Antarctic Circumpolar Current (ACC) Steep bathymetry Idealized winds, clouds, etc., but full computation of atmospheric boundary layer KPP vertical mixing

12 12

13 13

14 14 IBM and SX6 Notes SX6 is 8 GFLOPS, Power4 is 5.2 GFLOPS peak Both less than 10% of peak IBM scales better, Cray person says SX6 is even worse for more than one node SX6 best for 1xN tiling, IBM better closer to MxM even though this problem is 512x64

15 15 Cray X1 Notes Have choice of MSP or SSP mode Four SSPs faster than one MSP Sixteen MSPs much faster than 64 SSPs On one MSP, vanilla ROMS spends: 66% in bulk_flux 28% in LMD 2% in 2-D engine Slower than either Power4 or SX6 Can inline lmd_wscale and vastly speed up LMD with compiler option, John Levesque has offered to rewrite bulk_flux - aim for 6-8 times faster than Power4 for CCSM

16 16 Clusters Can buy rack mounted turnkey systems running Linux Need to spend money on: Memory Processors - single cpu nodes may be best Switch - low latency, high bandwidth Disk storage

17 17 Don Morton’s Experience No such thing as turnkey Beowulf Need someone to take care of it: Configure queuing system to make it useful for more than one user Security updates Backups

18 18 DARPA Petaflops award Sun, IBM, Cray each awarded ~$50 million for phase-two development Two will be awarded phase 3 in 2006 Goal is to achieve petaflops by about 2010, also easier to program, more robust operating environment Sun - new switch between cpus, memory IBM - huge cache on chip Cray - heavyweight, lightweight cpus

19 19 Conclusions Things are still exciting in the computer industry The only thing you can count on is change


Download ppt "1 HPC and the ROMS BENCHMARK Program Kate Hedstrom August 2003."

Similar presentations


Ads by Google