Download presentation
Presentation is loading. Please wait.
Published byἸωσαΦάτ Ελευθερίου Modified over 6 years ago
1
Advanced Computer Architecture 5MD00 / 5Z033 TOP 500 supercomputers
Henk Corporaal TUEindhoven 2009
2
Topics How to cross the Petaflop boundary Ranking Examples Nov 2008
Nov 2009: what has been changed Examples Roadrunner (IBM) Jaguar Cray SGI Altix BlueGene 1/1/2019 ACA H.Corporaal
3
How to build a Petaflop supercomputer?
Opteron cluster (e.g. ~2X Ranger/TACC) 32,000 quad-core Opterons (130K cores) Cray XT3/4 (e.g. Baker/ORNL sooner) IBM BlueGene/P (bigger sooner) 80,000 BG/P PPC processors (320K cores) IBM Cell-accelerated Roadrunner cluster 10,000 Cells (80K Cell SPUs) 1/1/2019 ACA H.Corporaal
4
Supercomputer Ranking
Started in 1993 Jack Dongarra, University of Tennessee Based on LINPACK benchmark linear algebra (LU factorization) Superseded by LAPACK based on BLAS (Basic Lin. Alg. Subprograms) exploits caches Measures Floating Point performance Fortran code see 1/1/2019 ACA H.Corporaal
5
Performance Ranking Nov. 2008
# Name N_PE Rmax (Tflop) Rpeak P (kW) 1 Roadrunner IBM 129600 1105 1456 2483 2 Cray XT5 150152 1059 1381 6950 3 SGI Altix ICE 51200 487 608 2090 4 BlueGene IBM 212992 478 596 2329 100 Cluster Platform (Xeons) 5120 27 51 - 52 Power 575 SARA (Amst) 3328 49 63 532 75 BlueGene Astron 12288 35 42 95 496 Cluster in Gent Univ. 1568 13 16 1/1/2019 ACA H.Corporaal
6
Performance Ranking 2008: we crossed the Petaflop boundary # Name Npe
Rmax (Tflop) Rpeak P (kW) 1 Roadrunner IBM 129600 1105 1456 2483 2 Cray XT5 150152 1059 1381 6950 3 SGI Altix ICE 51200 487 608 2090 4 BlueGene IBM 212992 478 596 2329 100 Cluster Platform (Xeons) 5120 27 51 - 52 Power 575 SARA (Amst) 3328 49 63 532 75 BlueGene Astron 12288 35 42 95 496 Cluster in Gent Univ. 1568 13 16 2008: we crossed the Petaflop boundary 1/1/2019 ACA H.Corporaal
7
Update November 2009 # Name N_PE Rmax (Tflop) Rpeak P (kW) 1
Jaguar-Cray XT5-HE Oak Ridge, USA 224162 1759 2331 6951 2 Roadrunner IBM DOE, USA 122400 1042 1376 2346 3 Kraken Cray XT5-HE Tennessee, USA 98928 832 1029 - 4 BlueGene IBM Juelich, Germany 294912 826 1003 2268 5 Tianhe Xeon / ATI cluster, China 71680 563 1206 1/1/2019 ACA H.Corporaal
8
Alternative ranking: Green500
Most Power efficient Supercomputers 2008: best result = 536 MFlops/Watt => nJ / FloatingPt_operation 2009: best result = 723 MFlops/Watt => nJ / FloatingPt_operation Cell cluster, ranking 110 in top500 See 1/1/2019 ACA H.Corporaal
9
Nr1 (2008): Roadrunner IBM cluster 6480 nodes with
Dual core Opteron 1.8 GHz 2 * PowerXCell 8i 3.2 GHz (12.8 GFlops) Infiniband connection fabric (16 Gbit/s per link) FAT tree interconnect 100 Tbyte DRAM memory 216 I/O nodes MPI programming 2.35 MW power !! Size: 296 racks, 5500 ft This is huge !! 1/1/2019 ACA H.Corporaal
10
Cell/B.E. – the architecture
1 x PPE 64-bit PowerPC L1: 32 KB I$ + 32 KB D$ L2: 512 KB 8 x SPE cores: Local store: 256 KB 128 x 128 bit vector registers Hybrid memory model: PPE: Rd/Wr SPEs: Asynchronous DMA EIB: 205 GB/s sustained aggregate bandwidth Processor-to-memory bandwidth: 25.6 GB/s Processor-to-processor: 20 GB/s in each direction 1/1/2019 ACA H.Corporaal
11
1/1/2019 ACA H.Corporaal
12
Roadrunner: TriBlade = 2 nodes
For more details: Presentation slides of Ken Koch, March 2008 1/1/2019 ACA H.Corporaal
13
Nr2 (2008): Jaguar Cray XT5 QC I guess 5 times In total 150152 cores
7832 quad-core 2.1 GHz AMD Opetron 62 TB memory (= 2GB / core) 600 TB file system 250 TFlop In total cores SeaStar2+ interconnect (from Cray) Note 2009: quad-cores replaced by six-cores now nr 1 224,256 cores peak 1.75 PetaFlop 1/1/2019 ACA H.Corporaal
14
Jaguar 1/1/2019 ACA H.Corporaal
15
Nr3 (2008): SGI Altix ICE8200 92 racks of Al5x ICE
8200EX with 3.0 Ghz Intel Xenon quad-core processors or 47,104 cores 8 racks of Al5x ICE 8200 with 2.66 Ghz Intel quad-core 4096 cores. 51 TB Main memory DDR InfiniBand 1/1/2019 ACA H.Corporaal
16
Nr:4 (2008) BlueGene/L IBM Based on ASIC with PowerPC 440, 700 Mhz, each 2.8 GFlops 105,496 nodes 3D Torus interconnect for p2p communication + Collective network 3D-torus Complete system rack 1/1/2019 ACA H.Corporaal
17
BlueGene/L ASIC node 1/1/2019 ACA H.Corporaal
18
BlueGene/L Node board 16 cards with 2 ASICs each 8 GB 180 Gflop
1/1/2019 ACA H.Corporaal
19
2009: BlueGene/P System: 256 racks upto 1PB 3.56 PFlops Rack:
32 Node Cards 13.9 TF/s 2-4 TB Node card: 32 processor cards GB 435 GFlops Processor card: one 4-processor chip 13.6 GFlops 2-4 GB ASIC: 13.6 Gflops 8 MB EDRAM 1/1/2019 ACA H.Corporaal
20
BlueGene/P ASIC 1/1/2019 ACA H.Corporaal
21
PPC450: Exploiting SIMD Two FPUs SIMD
2 x bit registers SIMD Datapath width = 16 bytes Feeds two FPUs with 8 bytes each every cycle Two FP multiply-add operations per cycle 3.4 GFLOP/s peak performance 1/1/2019 ACA H.Corporaal
22
BlueGene/P ASIC 208M trans 850 MHz 16W 90nm 1/1/2019 ACA H.Corporaal
23
BlueGene/P node card 1/1/2019 ACA H.Corporaal
24
Next: BlueGene/Q 10 PFlops in 2011-2012
see 1/1/2019 ACA H.Corporaal
25
Can we match the human brain ???
Performance = 100 Billion (10^11) Neurons * 1000 (10^3) Connections/Neuron * 200 (2 * 10^2) Calculations Per Second Per Connection = 2 * 10^16 Calculations Per Second Memory = 100 Billion (10^11) Neurons * 1000 (10^3) Connections/Neuron * 10 bytes (information about connection strength and adress of output neuron, type of synapse) = 10^15 bytes = 1 PB = 1000 TB How far off are we? 1/1/2019 ACA H.Corporaal
26
Blue brain research Software replica of one column of the neocortex
cortex: 85% of brains total mass required for language, learning, memory and complex thought the essential first step to simulating the whole brain Next: include circuitry from other brain regions and eventually the whole brain. 1/1/2019 ACA H.Corporaal
27
Latest news: factorization of RSA768
RSA used to encypher text using public and private key EPFL, CWI and others have broken RSA768 This means: Factorize 768 bit number into 2 primes Using 1700 AMD 2.2 GHz cores for 1 year => 15 Mh (single core) compute time Current RSA standard uses 1024 bits still save for some years News of 11 jan 2010 1/1/2019 ACA H.Corporaal
28
RSA (Rivest, Shamir, Adleman)
choose 2 (large) primes p and q n = p*q choose e such that e and (p-1)(q-1) are coprime (i.e. do not share prime factors) choose d such d*e = 1 mod ((p-1)(q-1)) public key = (n,e) private key = (n,d) Encryption of message m: c=me mod n Decryption of cypher c: m = cd mod n see wikipedia for details and working example 1/1/2019 ACA H.Corporaal
29
RSA factorization result
factorization of RSA768, the following 768-bit, 232-digit number from RSA's challenge list: = * 1/1/2019 ACA H.Corporaal
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.