Dezső Sima Fall 2007 (Ver. 2.1) Dezső Sima, 2007 Multicore Processors (5)
POWER line Cell BE 10.3 IBM’s MC processors
POWER4180 nm 10/2001 POWER nm 11/ POWER line POWER5130 nm 5/2004 POWER5+ 90 nm 10/2005 POWER6 65 nm 5/ 2007
Figure: The evolution of IBM’s major RISC lines Evolution of IBM’s major RISC lines
Figure : POWER4 chip logical view [3.6] POWER4 (1) Built-In-SelfTest Service Processor Power On Reset Core interface Unit (crossbar) Non-Cacheable Unit MultiChip Module
Figure: Logical view of the L3 controller [3.5] POWER4 (2)
Figure: The memory cotroller of the POWER4 [3.5] POWER4 (3)
Figure: I/O controller of the POWER4 [3.5] Fabric Controller POWER4 (4)
Figure: POWER4 chip [3.11] POWER4 (5)
POWER4 (6) Table: Main features of IBM’s dual-core POWER line Off-chipMem. contr. L3 L MB/sharedSize/allocation On-chipImplementation 32 MBSize 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
POWER4+ (1) Figure: New features of the POWER5+ [3.3]
POWER4+ (2) Table: Main features of IBM’s dual-core POWER line On-chipOff-chipMem. contr. L3 L2 1.5 MB/shared1.44 MB/sharedSize/allocation On-chip Implementation 32 MB Size SCM 1 /MCM mtrs 380 mm nm 11/2002 DC POWER4+ 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
Figure 5.14: Contrasting POWER4 and POWER5 system structures [3.1] POWER5 (1) (Exclusive L3)
Figure: Block diagram of the POWER5 (1) [3.1] POWER5 (2)
Figure: Block diagram of the POWER5 (2) [3.12] POWER5 (3)
POWER5 (4) Figure: Floorplan of the POWER5 [3.13]
POWER4 POWER5 180 nm, 412 mm nm, 389 mm 2 (~3 % enlarged) POWER5 (6) Figure: Contrasting the floor plans of the POWER4 and POWER5 dies [3.11], [3.13]
Figure: Packaging alternatives of the POWER4/5 processors Source: Partridge R. and Ghatpande S., IBM Introduces POWER5+ and Quad-Core Modules in System p5,” Tech Trends Monthly, Nov./Dec. 2005, POWER5 + Dual-Core Module POWER5 (7)
POWER4 MCM Photo32-way System Showing 4 MCMs and L3 Cache Figure: Quad–Chip POWER4 module (MCM) and a 32-way POWER4 system [3.7] POWER5 (8)
Figure: Photos of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 [3.7] POWER5 (10)
Figure: The Multi-chip module of the POWER5 [3.10] POWER5 (11)
POWER5 (12) Table: Main features of IBM’s dual-core POWER line On-chip Off-chipMem. contr. L3 L2 1.9 MB/shared1.5 MB/shared1.44 MB/sharedSize/allocation On-chip Implementation 36 MB32 MB Size 36 MB Tags on-chip DPM 6 DCM 3 /MCM 2 80 (est) 1.65/ mtrs 389 mm nm 5/2004 DC POWER5 SCM 1 /MCM mtrs 380 mm nm 11/2002 DC POWER4+ 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, Figure: Block diagram of the POWER POWER5+ (1)
Figure.: Interpretation of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 [3.7] POWER5 (9)
Figure: Dual-Core Modules (DCMs) and Quad-Core Modules (QCM) of the POWER5+ [3.14] POWER5+ (2)
POWER5+ (3) Table: Main features of IBM’s dual-core POWER line SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
POWER6’s main features [3.15b] POWER6 (1) ultra-high frequency (4.7 = GHz) dual core dual threaded SMT 13 FO4 design private 4 MB L2 caches partially integrated 32 MB L3 victim cache minimization of excessive circuitry to reduce dissipation (modest speculation and ooo-execution, no renaming) push many fuctions of decoding and instruction grouping into predecoding (4 stages) (added L2 latency causes 0.5 % loss for each stage whereas each added stage after the I-cache access results in about 1 % loss per stage) increased dispath and completion bandwidth (to 7 instructions per thread) L2 cache, SMP interconnect, parts of the memory and I/O subsystem operate at 0.5 fc, L3 operates at one-quarter, the memory. controller up to 3.2 GHz. (In the POWER5 the L2 operates at fc,the remaining components at 0.5 fc.) since L2 operates at 0.5 fc, the width of the load and store interfaces was doubled.
POWER6 (2) POWER6 (in the IBM System p570) had at intro the highest figures for SPECint2006, SPECfp2006, SPECjbb2005 (Java performance) and TPC-C (transaction performance).
POWER6 POWER5+ Figure: Contrasting the block diagrams of the POWER5 and POWER6 processors [3.15a] Hardware support of decimal arithmetic POWER6 (3)
Figure: Comparing the POWER5 and POWER6 processors [3.15b] POWER6 (4)
Table: Throughput comparison POWER6 vs POWER5 [3.15b] POWER6 (5)
POWER6 (6) [3.15b]
Figure: The internal pipelines of the POWER6 and the POWER5 [3.15b] POWER6 (7)
Figure: First level nodal topology of the POWER6 vs POWER5 [3.15b] POWER6 (8)
Figure: Second level topology of the POWER5 vs POWER6 [3.15b] POWER6 (9)
Table: POWER6 processor functional signal I/O-pin comparison for various system types [3.15b] POWER6 (10)
POWER6 (11) Figure: Micrograph of the POWER6 [3.15b]
POWER6 (12) Table: Main features of IBM’s dual-core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management On-chip
10.3 IBM’s MC processors Cell BE90 nm 2/ Cell BE
Figure: The history and development cost of the Cell BE [3.17], [3.22] Cell BE (1)
AUC: Atomic Update Cache BIC: Bus Interface Contr. EIB: Element Interface Bus LS: Local Store of 256 KB MFC: Memory Flow Controller MIC: Memory Interface Contr. PPE: Power Processing Element PXU: POWER Execution Unit SMF: Synergistic Memory Flow Unit SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit XDR: Rambus DRAM Figure: Block diagram of the Cell BE [3.19] Cell BE (2)
PPE: dual-threaded > 200 GFLOPS (SP) > 20 GFLOPS (DP) > 25 GB/s memory BW > 75 GB/s I/O BW > 300 GB/s EIB BW fc > 4 GHz (lab) Figure: Main design parameters of the Cell BE [3.28] Cell BE (3) Design parameters of the Cell BE:
Figure : Cell SPE architecture [3.16] Cell BE (4)
Figure: Block diagram of the SPE [3.19] Cell BE (5)
Figure: Pipeline stages of the Cell BE [3.19] Cell BE (6)
Figure: Floor plan of a single SPE [3.19] Cell BE (7)
Principle of operation of the Element Interface Bus (EIB) [3.23] Cell BE (8)
Figure: The Element Interface Bus EIB) [3.19] Cell BE (9)
Figure: The Synergistic Memory Flow unit (SMF) [3.19] Cell BE (10)
Figure: PPE block diagram [3.28]
Figure: Floor plan of the Cell BE processor [3.19] 235 mm mtrs Cell BE (11)
Cell BE (12) Table: Main features of the IBM’s Cell BE L3 On-chipMemory controller Ring basedInterconnection network Up to 75 MB/sI/O bandwidth PPE: 2-way SPE: Multithreading 95 3GHzTDP [W] 25 GB/sMemory bandwidth PPE: 512 KB SPE: 256 KB Local Store (128*128 bit) L2 3.0/3.2f c [GHz] 234 mtrsNr. of transistors 221 mm 2 Die size 90 nmTechnology 9/2006 (in the QS20 BladeCenter)Introduction PPE: 64-bit RISC SPE: Dual-issue 32-bit SIMD with 128 bit capability Cores PowerPC 2.02Architecture Heterogeneous 1xPPE, 8*SPE Implementation Cell BESeries
Source: Brochard L., A Cell History,” Cell Workshop, April, Figure: Cell BE Blade Roadmap Cell BE (13)
Source: Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, Figure: Roadmap of the Cell BE Cell BE (14)
10.3 Literature (1) POWER4, POWER4+ [3.3] Grassl C., „New IBM Components for HPCx”, Dec. 2003, [3.1] Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, [3.2] DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, [3.4] Krevell K., „IBM’s POWER4 Unveiling Continuues”, Microprocessor Report, Nov , pp- 1-4 [3.5] Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001, POWER5, POWER5+ [3.9] Grassl C., „New IBM Components for HPCx”, Dec. 2003, [3.7] Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, [3.8] DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, [3.10] Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003, www-csl.csres.utexas.edu/users/billmark/teach/cs spring/lectures/Lecture22-RonKallaIBM.pdf [3.6] Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,, IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25,
[3.11] Kalla R., Sinharoy B., Tendler J.: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, [3.12] Krevell K., „POWER5 Tops on Bandwidth”, Microprocessor Report, Dec [3.13] Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp [3.15a] Kanter D., „IBM Previews the Power6,” Oct. 2006, [3.14] Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, POWER6 POWER5, POWER5+ (cont.) Cell BE [3.17] Brochard L., A Cell History,” Cell Workshop, April, [3.19] Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, [3.16] Blachford N.: „Cell Architecture Explained Version 2”, [3.18] Day M. and Hofstee P., „Hardware and Software Architectures for the Cell Broadband Engine processor, ” CODES, Sept. 2006, Literature (2) [3.15b] Le. H. Q. et al., „IBM POWER6 microarchitecture,” IBM J. R&D, Vol. 51, No. 6, pp
10.3 Literature (3) Cell BE (cont.) [3.23] Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006, [3.21] Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, [3.26] Solie, D., „Technology Trends Presentation,” Power Symposium, Aug. 2006, file14+-+darryl+solie+-+ibm+power+symposium+presentation/$file/ 14+-+darryl+solie-ibm-power+symposium+presentation+v2.pdf [3.27] - „Cell Broadband Engine processor – based systems,” White Paper, IBM Corp., 2006 [3.25] Krewell K., „Cell Moves Into The Limelight,” Microprocessor Report, Febr , pp. 1-9 [3.20] Gschwind M., Hofstee H. P., Flachs B. K., Hophkins M., Watanabe Y., Yamazaki T „Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, Vol. 26, No. 2, 2006, pp [3.24] Krolak D., „Unleashing the Cell Broadband Engine Processor,” MPR Fall Proc. Forum, Nov. 2005, [3.22] Hofstee H. P., „Cell today and tomorrow,” 2005, [3.28] - „Cell Architecture”, Course Code L1T1H1-10, 2006, CellArchitecture.pdf