Download presentation
Presentation is loading. Please wait.
Published byNorman Warner Modified over 8 years ago
1
Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS TFLOPS = 10 12 floating point ops/sec PFLOPS = 1,000,000,000,000,000 / sec (10 15 )
2
Columbia (10240-processor SGI Altix, 50 Teraflops, NASA Ames Research Center)
3
Beowulf (18-processor cluster, lab machine)
4
AMD Opteron quad-core die
5
The nVidia G80 GPU 128 streaming floating point processors @1.5Ghz 1.5 Gb Shared RAM with 86Gb/s bandwidth 500 Gflop on one chip (single precision)
6
The Computer Architecture Challenge Most high-performance computer designs allocate resources to optimize Gaussian elimination on large, dense matrices. Originally, because linear algebra is the middleware of scientific computing. Nowadays, mostly for bragging rights. = x P A L U
7
Top 500 List http://www.top500.org/list/2008/11/100
8
Generic Parallel Machine Architecture Key architecture question: Where is the interconnect, and how fast? Key algorithm question: Where is the data? Proc Cache L2 Cache L3 Cache Memory Storage Hierarchy Proc Cache L2 Cache L3 Cache Memory Proc Cache L2 Cache L3 Cache Memory potential interconnects
9
Multicore SMP Systems 4MB Shared L2 Core2 FSB Fully Buffered DRAM 10.6GB/s Core2 Chipset (4x64b controllers) 10.6GB/s 10.6 GB/s(write) 4MB Shared L2 Core2 4MB Shared L2 Core2 FSB Core2 4MB Shared L2 Core2 21.3 GB/s(read) Intel Clovertown Crossbar Switch Fully Buffered DRAM 4MB Shared L2 (16 way) 42.7GB/s (read), 21.3 GB/s (write) 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 179 GB/s (fill) 90 GB/s (writethru) Sun Niagara2 4x128b FBDIMM memory controllers AMD Opteron 1MB victim Opteron 1MB victim Opteron Memory Controller / HT 1MB victim Opteron 1MB victim Opteron Memory Controller / HT DDR2 DRAM 10.6GB/s 4GB/s (each direction)
10
More Detail on GPU Architecture
11
Michael Perrone (IBM): Proper Care and Feeding of Multicore Beasts http://www.csm.ornl.gov/workshops/HPA/documents/ 1-arch/feeding_the_beast_perrone.pdf
12
Cray XMT (highly multithreaded shared memory)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.