Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005.

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005

Outline An engineering level overview of the HW and SW that make up jacquard. 1)CPU’s 2)Memory 3)OS 4)Interconnect Will use seaborg as a point of reference.

Colony Switch PGFS seaborg.nersc.gov (review?) ResourceSpeedBytes Registers 3 ns 256 B L1 Cache 5 ns 32 KB L2 Cache 45 ns 8 MB Main Memory300 ns 16 GB Remote Memory 19 us 7 TB GPFS 10 ms 50 TB HPSS 5 s 9 PB 380 x HPS S CSS0 CSS1 6080 dedicated CPUs, 96 shared login CPUs Hierarchy of caching, speeds Bottleneck determined by first depleted resource 16 way SMP NHII Node Seaborg: crossbar main memory GPFS MPI

Infiniban d Switch PGFS jacquard.nersc.gov basics ResourceSpeedBytes Registers 0.5 ns 2 KB L1 Cache 1.5 ns 64 KB L2 Cache 45 ns 1 MB Main Memory70-117 ns 6 GB Remote Memory 5 us 2 TB GPFS 10 ms 15 TB HPSS 5 s 9 PB 320 x HPS S IB 640 dedicated CPUs, 8 shared login CPUs Smaller caches, HT, Really Fast SMP? NUMA? SUMO. 2 way Opteron node Jacquard: Main Memory GPFS MPI HT

Opteron Block Diagram : Not strictly SMP 1 TLB per CPU 1K entries 4K pages  4MB coverage SDRAM Switch, I/O

Hyper Transport: Good Stuff Little conflict between data movement and computation

SMP size and memory contention Jacquard’s numbers 1 task : 100 % 2 tasks: 98% Why is Jacquard 2 way SMP?

Flops @ 2.2 GHz Peak Theoretical Flops –Double (64 bit) floats : 1 add + 1 mult = 2.2 GFlop/s –Single (32 bit) floats : 2 add + 2 mult = 4.4 GFlop/s Peak Realized Flops –Double (64 bit) floats : 1.9 GFlop/s –Single (32 bit) floats : 3.4 GFlop/s Your Flops? – Walltime is more important than flops – For a known algorithm flops are a sanity check Memory BW 4 GB/sec per CPU

MPI Bandwidth: seaborg

MPI Bandwidth: Jacquard

Linux for AIX Users Linux and AIX are more similar than different Linux is not as good as AIX in keeping processes scheduled of the same CPU  processor affinity work. Linux has easy interfaces to architectural and process performance information /proc/cpuinfo, /proc/self, etc. AIX MPI is in /usr/{bin,lib}, Linux MPI is in modules Linux doesn’t need –bmaxdata ! Little vs. Big Endian

Conclusions The underlying HW technologies HT, IB, etc. are quite promising. Opteron systems are delivering great price/performance. Still working some SDRAMM, OS, and SW issues. What’s useful to you? Let us know.

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005.

Similar presentations

Presentation on theme: "Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005.

Similar presentations

Presentation on theme: "Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005."— Presentation transcript:

Similar presentations

About project

Feedback