Presentation is loading. Please wait.

Presentation is loading. Please wait.

SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

Similar presentations


Presentation on theme: "SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters."— Presentation transcript:

1 SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters

2 SGI’2000Parallel Programming Tutorial MIMD Multiprocessors Single Address space Shared Memory Multicomputers Multiple Address spaces UMA Central Memory NUMA distributed memory NORMA no-remote memory access PVP (Cray T90) SMP (Intel SHV, SUN E10000, DEC 8400 SGI Power Challenge, IBM R60, etc.) COMA (KSR-1, DDM) CC-NUMA (SGI Origin2000, SN1 (SGI3000), Cray T3E, HP Exemplar, Sequent NUMA-Q, Data General) NCC-NUMA (Cray T3D, IBM SP3) Cluster (IBM SP2, DEC TruCluster, Microsoft Wolfpack, “Beowolf”, etc.) loosely coupled, multiple OS “MPP” (Intel TFLOPS,TM-5) tightly coupled & single OS MIMDMultiple Instruction s Multiple DataPVP Parallel Vector Processor UMAUniform Memory Access SMP Symmetric Multi-Processor NUMANon-Uniform Memory Access COMA Cache Only Memory Architecture NORMANo-Remote Memory Access CC-NUMA Cache-Coherent NUMA MPPMassively Parallel Processor NCC-NUMA Non-Cache Coherent NUMA Classification of Computers

3 SGI’2000Parallel Programming Tutorial Design Space of Competing Computer Architecture

4 SGI’2000Parallel Programming Tutorial Processor Cache Processor Cache I/O Main Memory Main Memory Main Memory Main Memory Processor Cache Central Bus Structure of an SMP System (1) Does NOT scale due to Bus- saturation Bus is a very complex Component High Memory- Latency due to the Complexity

5 SGI’2000Parallel Programming Tutorial Central Crossbar Processor Cache Processor Cache I/O Main Memory Main Memory Main Memory Main Memory Processor Cache Structure of an SMP System (2) Scales very well Crossbar is a very complex Component High Memory- Latency due to the Complexity

6 SGI’2000Parallel Programming Tutorial ^Nodeboard I/O Structure of an SMP System (3) Origin SGI NUMA Architecture SGI NUMA hypercube Global Switch Interconnect N N R R R RR R R R N N N N N N N N N N N N NN ^Nodeboard I/O

7 SGI’2000Parallel Programming Tutorial Systems are built from Modules Deskside (Module) Rack (2 Modules) Multi-rack (4 Modules) Etc... 2-8 CPUs 16 CPUs..128 CPUs 32 CPUs

8 SGI’2000Parallel Programming Tutorial SGI Origin 3200 SGI Onyx 3200 SGI Origin 3400 SGI Onyx 3400 SGI Origin 3800 SGI Onyx 3800 New High-End Products Origin 3000 Servers – Onyx 3 Systems IRIX 6.5

9 SGI’2000Parallel Programming Tutorial SGI 3800 System (16-512p) Minimum (16p) System 128p System 128P System Topology R Rack 1 C C C C R C C C C R Rack 2 C C C C R C C C C R Rack 3 C C C C R C C C C R Rack 4 C C C C R C C C C 1234 Power Bay I-Brick C-Brick Power Bay R-Brick C-Brick R-Brick C-Brick Power Bay C-Brick Power Bay R-Brick C-Brick R-Brick C-Brick Power Bay C-Brick Power Bay R-Brick C-Brick R-Brick C-Brick Power Bay C-Brick Power Bay R-Brick C-Brick R-Brick C-Brick Power Bay C-Brick Power Bay I-Brick P, I, or, X-Brick Power Bay P, I, or, X-Brick Power Bay P, I, or, X-Brick Power Bay P, I, or, X-Brick R-Brick 8-port router C-Brick Power Bay R-Brick C-Brick Power Bay

10 SGI’2000Parallel Programming Tutorial ASCI Blue Mountain Los Alamos National Laboratories o Origin 2000 with 3+ Tflops peak o 1+ Tflop Application Performance o 48 Systems with 128 CPUs each = 6144 CPUs o 1536 Gbyte Memory o 76 Tbyte Diskspace

11 SGI’2000Parallel Programming Tutorial Speed of Access 1/clock 64reg 32KB (L1) 8MB (L2) ~1 - 100s GB Cache subsystemmemory Device Capacity (size) 1 0.1 0.01 ~4000 cy ~100 - 300 cy (NUMA) ~10 cy ~2-3 cy disk Memory hierarchy 175 235 285 335 435 485 585 343 554 759 836 1067 1169 0 200 400 600 800 1000 1200 1400 2p4p8p16p32p64p128p256p512p Remote Latency (ns) SN-MIPS Latency Origin2000 Latency

12 SGI’2000Parallel Programming Tutorial I/O Web serving Weather simulation CPU Storage Repository / archive Signal processing Media streaming Traditional big supercomputer Scale in Any and All Dimensions NUMAflex™ Flexible Configuration


Download ppt "SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters."

Similar presentations


Ads by Google