Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005
Outline An engineering level overview of the HW and SW that make up jacquard. 1)CPU’s 2)Memory 3)OS 4)Interconnect Will use seaborg as a point of reference.
Colony Switch PGFS seaborg.nersc.gov (review?) ResourceSpeedBytes Registers 3 ns 256 B L1 Cache 5 ns 32 KB L2 Cache 45 ns 8 MB Main Memory300 ns 16 GB Remote Memory 19 us 7 TB GPFS 10 ms 50 TB HPSS 5 s 9 PB 380 x HPS S CSS0 CSS dedicated CPUs, 96 shared login CPUs Hierarchy of caching, speeds Bottleneck determined by first depleted resource 16 way SMP NHII Node Seaborg: crossbar main memory GPFS MPI
Infiniban d Switch PGFS jacquard.nersc.gov basics ResourceSpeedBytes Registers 0.5 ns 2 KB L1 Cache 1.5 ns 64 KB L2 Cache 45 ns 1 MB Main Memory ns 6 GB Remote Memory 5 us 2 TB GPFS 10 ms 15 TB HPSS 5 s 9 PB 320 x HPS S IB 640 dedicated CPUs, 8 shared login CPUs Smaller caches, HT, Really Fast SMP? NUMA? SUMO. 2 way Opteron node Jacquard: Main Memory GPFS MPI HT
Opteron Block Diagram : Not strictly SMP 1 TLB per CPU 1K entries 4K pages 4MB coverage SDRAM Switch, I/O
Hyper Transport: Good Stuff Little conflict between data movement and computation
SMP size and memory contention Jacquard’s numbers 1 task : 100 % 2 tasks: 98% Why is Jacquard 2 way SMP?
2.2 GHz Peak Theoretical Flops –Double (64 bit) floats : 1 add + 1 mult = 2.2 GFlop/s –Single (32 bit) floats : 2 add + 2 mult = 4.4 GFlop/s Peak Realized Flops –Double (64 bit) floats : 1.9 GFlop/s –Single (32 bit) floats : 3.4 GFlop/s Your Flops? – Walltime is more important than flops – For a known algorithm flops are a sanity check Memory BW 4 GB/sec per CPU
MPI Bandwidth: seaborg
MPI Bandwidth: Jacquard
Linux for AIX Users Linux and AIX are more similar than different Linux is not as good as AIX in keeping processes scheduled of the same CPU processor affinity work. Linux has easy interfaces to architectural and process performance information /proc/cpuinfo, /proc/self, etc. AIX MPI is in /usr/{bin,lib}, Linux MPI is in modules Linux doesn’t need –bmaxdata ! Little vs. Big Endian
Conclusions The underlying HW technologies HT, IB, etc. are quite promising. Opteron systems are delivering great price/performance. Still working some SDRAMM, OS, and SW issues. What’s useful to you? Let us know.