Download presentation
Presentation is loading. Please wait.
Published byAlisha Daniel Modified over 8 years ago
1
Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005
2
Outline An engineering level overview of the HW and SW that make up jacquard. 1)CPU’s 2)Memory 3)OS 4)Interconnect Will use seaborg as a point of reference.
3
Colony Switch PGFS seaborg.nersc.gov (review?) ResourceSpeedBytes Registers 3 ns 256 B L1 Cache 5 ns 32 KB L2 Cache 45 ns 8 MB Main Memory300 ns 16 GB Remote Memory 19 us 7 TB GPFS 10 ms 50 TB HPSS 5 s 9 PB 380 x HPS S CSS0 CSS1 6080 dedicated CPUs, 96 shared login CPUs Hierarchy of caching, speeds Bottleneck determined by first depleted resource 16 way SMP NHII Node Seaborg: crossbar main memory GPFS MPI
4
Infiniban d Switch PGFS jacquard.nersc.gov basics ResourceSpeedBytes Registers 0.5 ns 2 KB L1 Cache 1.5 ns 64 KB L2 Cache 45 ns 1 MB Main Memory70-117 ns 6 GB Remote Memory 5 us 2 TB GPFS 10 ms 15 TB HPSS 5 s 9 PB 320 x HPS S IB 640 dedicated CPUs, 8 shared login CPUs Smaller caches, HT, Really Fast SMP? NUMA? SUMO. 2 way Opteron node Jacquard: Main Memory GPFS MPI HT
5
Opteron Block Diagram : Not strictly SMP 1 TLB per CPU 1K entries 4K pages 4MB coverage SDRAM Switch, I/O
6
Hyper Transport: Good Stuff Little conflict between data movement and computation
7
SMP size and memory contention Jacquard’s numbers 1 task : 100 % 2 tasks: 98% Why is Jacquard 2 way SMP?
8
Flops @ 2.2 GHz Peak Theoretical Flops –Double (64 bit) floats : 1 add + 1 mult = 2.2 GFlop/s –Single (32 bit) floats : 2 add + 2 mult = 4.4 GFlop/s Peak Realized Flops –Double (64 bit) floats : 1.9 GFlop/s –Single (32 bit) floats : 3.4 GFlop/s Your Flops? – Walltime is more important than flops – For a known algorithm flops are a sanity check Memory BW 4 GB/sec per CPU
9
MPI Bandwidth: seaborg
10
MPI Bandwidth: Jacquard
11
Linux for AIX Users Linux and AIX are more similar than different Linux is not as good as AIX in keeping processes scheduled of the same CPU processor affinity work. Linux has easy interfaces to architectural and process performance information /proc/cpuinfo, /proc/self, etc. AIX MPI is in /usr/{bin,lib}, Linux MPI is in modules Linux doesn’t need –bmaxdata ! Little vs. Big Endian
12
Conclusions The underlying HW technologies HT, IB, etc. are quite promising. Opteron systems are delivering great price/performance. Still working some SDRAMM, OS, and SW issues. What’s useful to you? Let us know.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.