Download presentation
Presentation is loading. Please wait.
1
Multicore Processors (5)
Dezső Sima Spring 2008 (Ver. 2.1) Dezső Sima, 2008
2
10.3 IBM’s MC processors POWER line Cell BE
3
10.3 IBM’s MC processors 10.3.1 POWER line POWER4 10/2001 180 nm
11/2002 130 nm POWER5 5/2004 130 nm POWER5+ 10/2005 90 nm POWER6 5/2007 65 nm
4
10.3.1 Evolution of IBM’s major RISC lines
Figure: The evolution of IBM’s major RISC lines
5
Figure : POWER4 chip logical view [3.6]
Service Processor Core interface Unit (crossbar) Power On Reset Built-In-SelfTest Non-Cacheable Unit MultiChip Module Figure : POWER4 chip logical view [3.6]
6
POWER4 (2) Figure: Logical view of the L3 controller [3.5]
7
POWER4 (3) Figure: The memory cotroller of the POWER4 [3.5]
8
10.3.1 POWER4 (4) Figure: I/O controller of the POWER4 [3.5] Fabric
9
POWER4 (5) Figure: POWER4 chip [3.11]
10
10.3.1 POWER4 (6) Table: Main features of IBM’s dual-core POWER line
Off-chip Mem. contr. L3 L2 1.44 MB/shared Size/allocation On-chip Implementation 32 MB Size Tags on-chip SCM1/MCM2 115/125 Tags on-chip, data off-chip 1.3 174 mtrs 412 mm2 180 nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] fc [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management Table: Main features of IBM’s dual-core POWER line
11
Figure: New features of the POWER5+ [3.3]
12
10.3.1 POWER4+ (2) Table: Main features of IBM’s dual-core POWER line
On-chip Off-chip Mem. contr. L3 L2 1.5 MB/shared 1.44 MB/shared Size/allocation Implementation 32 MB Size SCM1/MCM2 70 1.7 184 mtrs 380 mm2 130 nm 11/2002 DC POWER4+ Tags on-chip 115/125 Tags on-chip, data off-chip 1.3 174 mtrs 412 mm2 180 nm 10/2001 POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] fc [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management Table: Main features of IBM’s dual-core POWER line
13
Figure 5.14: Contrasting POWER4 and POWER5 system structures [3.1]
(Exclusive L3) Figure 5.14: Contrasting POWER4 and POWER5 system structures [3.1]
14
Figure: Block diagram of the POWER5 (1) [3.1]
15
Figure: Block diagram of the POWER5 (2) [3.12]
16
Figure: Floorplan of the POWER5 [3.13]
17
POWER5 (6) POWER4 POWER5 180 nm, 412 mm2 130 nm, 389 mm2 (~3 % enlarged) Figure: Contrasting the floor plans of the POWER4 and POWER5 dies [3.11], [3.13]
18
POWER5 (7) POWER5+ Dual-Core Module Figure: Packaging alternatives of the POWER4/5 processors Source: Partridge R. and Ghatpande S., IBM Introduces POWER5+ and Quad-Core Modules in System p5,” Tech Trends Monthly, Nov./Dec. 2005,
19
POWER5 (8) POWER4 MCM Photo 32-way System Showing 4 MCMs and L3 Cache Figure: Quad–Chip POWER4 module (MCM) and a 32-way POWER4 system [3.7]
20
POWER5 (10) Figure: Photos of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 [3.7]
21
Figure: The Multi-chip module of the POWER5 [3.10]
22
10.3.1 POWER5 (12) Table: Main features of IBM’s dual-core POWER line
Dual/Quad-Core DC DC DC Introduced 10/2001 11/2002 5/2004 Technology 180 nm 130 nm 130 nm Die size 412 mm2 380 mm2 389 mm2 Nr. of transistors 174 mtrs 184 mtrs 276 mtrs fc [GHz] 1.3 1.7 1.65/1.9 L2 Size/allocation 1.44 MB/shared 1.5 MB/shared 1.9 MB/shared Implementation On-chip On-chip On-chip L3 Size 32 MB 32 MB 36 MB Implementation Tags on-chip, data off-chip Mem. contr. Off-chip On-chip On-chip TDP [W] 115/125 70 80 (est) Packaging SCM1/MCM2 SCM1/MCM2 DCM3/MCM2 Dual threaded Power management DPM6 L3 impl. Tags on-chip Tags on-chip L3 size 32 MB 36 MB 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management Table: Main features of IBM’s dual-core POWER line
23
Figure: Block diagram of the POWER5+
Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006,
24
POWER5 (9) Figure.: Interpretation of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 [3.7]
25
POWER5+ (2) Figure: Dual-Core Modules (DCMs) and Quad-Core Modules (QCM) of the POWER5+ [3.14]
26
POWER5+ (3) On-chip Off-chip Mem. contr. L3 L2 1.9 MB/shared 1.5 MB/shared 1.44 MB/shared Size/allocation Implementation 36 MB 32 MB Size Tags on-chip DPM6 DCM3/MCM2 80 (est) 1.65/1.9 276 mtrs 389 mm2 130 nm 5/2004 DC POWER5 SCM1/MCM2 70 1.7 184 mtrs 380 mm2 11/2002 POWER4+ 115/125 Tags on-chip, data off-chip 1.3 174 mtrs 412 mm2 180 nm 10/2001 POWER4 DCM4/QCM5 1.92 230 mm2 90 nm 10/2005 POWER5+ L3 size L3 impl. Power management Dual threaded Packaging TDP [W] fc [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 10.3 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management Table: Main features of IBM’s dual-core POWER line
27
10.3.1 POWER6 (1) POWER6’s main features [3.15b]
ultra-high frequency (4.7 = GHz) dual core dual threaded SMT 13 FO4 design private 4 MB L2 caches partially integrated 32 MB L3 victim cache minimization of excessive circuitry to reduce dissipation (modest speculation and ooo-execution, no renaming) push many fuctions of decoding and instruction grouping into predecoding (4 stages) (added L2 latency causes 0.5 % loss for each stage whereas each added stage after the I-cache access results in about 1 % loss per stage) increased dispath and completion bandwidth (to 7 instructions per thread) L2 cache, SMP interconnect, parts of the memory and I/O subsystem operate at 0.5 fc, L3 operates at one-quarter, the memory. controller up to 3.2 GHz. (In the POWER5 the L2 operates at fc,the remaining components at 0.5 fc.) since L2 operates at 0.5 fc, the width of the load and store interfaces was doubled.
28
POWER6 (2) POWER6 (in the IBM System p570) had at intro the highest figures for SPECint2006, SPECfp2006, SPECjbb2005 (Java performance) and TPC-C (transaction performance).
29
Hardware support of decimal arithmetic
POWER6 (3) POWER6 POWER5+ Hardware support of decimal arithmetic Figure: Contrasting the block diagrams of the POWER5 and POWER6 processors [3.15a]
30
POWER6 (4) Figure: Comparing the POWER5 and POWER6 processors [3.15b]
31
POWER6 (5) Table: Throughput comparison POWER6 vs POWER5 [3.15b]
32
POWER6 (6) [3.15b]
33
POWER6 (7) Figure: The internal pipelines of the POWER6 and the POWER5 [3.15b]
34
POWER6 (8) Figure: First level nodal topology of the POWER6 vs POWER5 [3.15b]
35
POWER6 (9) Figure: Second level topology of the POWER5 vs POWER6 [3.15b]
36
POWER6 (10) Table: POWER6 processor functional signal I/O-pin comparison for various system types [3.15b]
37
POWER6 (11) Figure: Micrograph of the POWER6 [3.15b]
38
10.3.1 POWER6 (12) Table: Main features of IBM’s dual-core POWER line
On-chip Off-chip Mem. contr. L3 L2 2*4 MB/private 1.9 MB/shared 1.5 MB/shared 1.44 MB/shared Size/allocation Implementation 32 MB 36 MB Size Tags on-chip DPM6 DCM3/MCM2 80 (est) 1.65/1.9 276 mtrs 389 mm2 130 nm 5/2004 DC POWER5 SCM1/MCM2 70 1.7 184 mtrs 380 mm2 11/2002 POWER4+ 115/125 Tags on-chip, data off-chip 1.3 174 mtrs 412 mm2 180 nm 10/2001 POWER4 DCM4/QCM5 1.92 230 mm2 90 nm 10/2005 POWER5+ L3 impl. n.a. Power management Dual threaded Packaging ~100 TDP [W] 4.7 fc [GHz] 790 mtrs Nr. of transistors 341 mm2 Die size 65 nm Technology 5/2007 Introduced Dual/Quad-Core POWER6 POWER line On-chip 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management Table: Main features of IBM’s dual-core POWER line
39
10.3 IBM’s MC processors Cell BE Cell BE 2/2006 90 nm
40
Cell BE (1) Figure: The history and development cost of the Cell BE [3.17], [3.22]
41
10.3.2 Cell BE (2) AUC: Atomic Update Cache BIC: Bus Interface Contr.
EIB: Element Interface Bus LS: Local Store of 256 KB MFC: Memory Flow Controller MIC: Memory Interface Contr. PPE: Power Processing Element PXU: POWER Execution Unit SMF: Synergistic Memory Flow Unit SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit XDR: Rambus DRAM Figure: Block diagram of the Cell BE [3.19]
42
Design parameters of the Cell BE:
PPE: dual-threaded > 200 GFLOPS (SP) > 20 GFLOPS (DP) > 25 GB/s memory BW > 75 GB/s I/O BW > 300 GB/s EIB BW fc > 4 GHz (lab) Figure: Main design parameters of the Cell BE [3.28]
43
Figure : Cell SPE architecture [3.16]
Cell BE (4) Figure : Cell SPE architecture [3.16]
44
Cell BE (5) Figure: Block diagram of the SPE [3.19]
45
Cell BE (6) Figure: Pipeline stages of the Cell BE [3.19]
46
Cell BE (7) Figure: Floor plan of a single SPE [3.19]
47
Cell BE (8) Principle of operation of the Element Interface Bus (EIB) [3.23]
48
Cell BE (9) Figure: The Element Interface Bus EIB) [3.19]
49
Cell BE (10) Figure: The Synergistic Memory Flow unit (SMF) [3.19]
50
Figure: PPE block diagram [3.28]
51
Cell BE (11) 235 mm2 241 mtrs Figure: Floor plan of the Cell BE processor [3.19]
52
10.3.2 Cell BE (12) Table: Main features of the IBM’s Cell BE L3
On-chip Memory controller Ring based Interconnection network Up to 75 MB/s I/O bandwidth PPE: 2-way SPE: Multithreading 95 3GHz TDP [W] 25 GB/s Memory bandwidth PPE: 512 KB SPE: 256 KB Local Store (128*128 bit) L2 3.0/3.2 fc [GHz] 234 mtrs Nr. of transistors 221 mm2 Die size 90 nm Technology 9/2006 (in the QS20 BladeCenter) Introduction PPE: 64-bit RISC SPE: Dual-issue 32-bit SIMD with 128 bit capability Cores PowerPC 2.02 Architecture Heterogeneous 1xPPE, 8*SPE Implementation Cell BE Series Table: Main features of the IBM’s Cell BE
53
10.3.2 Cell BE (13) Figure: Cell BE Blade Roadmap
Source: Brochard L., A Cell History,” Cell Workshop, April, 2006
54
10.3.2 Cell BE (14) Figure: Roadmap of the Cell BE
Source: Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006,
55
10.3 Literature (1) POWER4, POWER4+ POWER5, POWER5+
[3.1] Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, [3.2] DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, [3.3] Grassl C., „New IBM Components for HPCx”, Dec. 2003, [3.4] Krevell K., „IBM’s POWER4 Unveiling Continuues”, Microprocessor Report, Nov , pp- 1-4 [3.5] Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001, [3.6] Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,, IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25, POWER5, POWER5+ [3.7] Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, [3.8] DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, [3.9] Grassl C., „New IBM Components for HPCx”, Dec. 2003, [3.10] Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003, www-csl.csres.utexas.edu/users/billmark/teach/cs spring/lectures/Lecture22-RonKallaIBM.pdf
56
10.3 Literature (2) POWER5, POWER5+ (cont.) POWER6 Cell BE
[3.11] Kalla R., Sinharoy B., Tendler J.: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003 [3.12] Krevell K., „POWER5 Tops on Bandwidth”, Microprocessor Report, Dec [3.13] Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp [3.14] Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, POWER6 [3.15a] Kanter D., „IBM Previews the Power6,” Oct. 2006, [3.15b] Le. H. Q. et al., „IBM POWER6 microarchitecture,” IBM J. R&D, Vol. 51, No. 6, pp Cell BE [3.16] Blachford N.: „Cell Architecture Explained Version 2”, [3.17] Brochard L., A Cell History,” Cell Workshop, April, [3.18] Day M. and Hofstee P., „Hardware and Software Architectures for the Cell Broadband Engine processor, ” CODES, Sept. 2006, [3.19] Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,
57
10.3 Literature (3) Cell BE (cont.)
[3.20] Gschwind M., Hofstee H. P., Flachs B. K., Hophkins M., Watanabe Y., Yamazaki T „Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, Vol. 26, No. 2, 2006, pp [3.21] Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, [3.22] Hofstee H. P., „Cell today and tomorrow,” 2005, [3.23] Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006, [3.24] Krolak D., „Unleashing the Cell Broadband Engine Processor,” MPR Fall Proc. Forum, Nov. 2005, [3.25] Krewell K., „Cell Moves Into The Limelight,” Microprocessor Report, Febr , pp. 1-9 [3.26] Solie, D., „Technology Trends Presentation,” Power Symposium, Aug. 2006, file14+-+darryl+solie+-+ibm+power+symposium+presentation/$file/ 14+-+darryl+solie-ibm-power+symposium+presentation+v2.pdf [3.27] - „Cell Broadband Engine processor – based systems,” White Paper, IBM Corp., 2006 [3.28] - „Cell Architecture”, Course Code L1T1H1-10, 2006, CellArchitecture.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.