III. Multicore Processors (5) Dezső Sima Spring 2007 (Ver. 2.1) Dezső Sima, 2007
POWER line Cell BE 10.3 IBM’s MC processors
POWER4180 nm 10/2001 POWER nm 11/ POWER line POWER5130 nm 5/2004 POWER5+ 90 nm 10/2005 POWER6 65 nm 2007
Figure: The evolution of IBM’s major RISC lines Evolution of IBM’s major RISC lines
Figure : POWER4 chip logical view [3.6] POWER4 (1) Built-In-SelfTest Service Processor Power On Reset Core interface Unit (crossbar) Non-Cacheable Unit MultiChip Module
Figure: Logical view of the L3 controller [3.5] POWER4 (2)
Figure: The memory cotroller of the POWER4 [3.5] POWER4 (3)
Figure: I/O controller of the POWER4 [3.5] Fabric Controller POWER4 (4)
Figure: POWER4 chip [3.11] POWER4 (5)
POWER4 (6) Table: Main features of IBM’s dual-core POWER line Off-chipMem. contr. L3 L MB/sharedSize/allocation On-chipImplementation 32 MBSize 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
POWER4+ (1) Figure: New features of the POWER5+ [3.3]
POWER4+ (2) Table: Main features of IBM’s dual-core POWER line On-chipOff-chipMem. contr. L3 L2 1.5 MB/shared1.44 MB/sharedSize/allocation On-chip Implementation 32 MB Size SCM 1 /MCM mtrs 380 mm nm 11/2002 DC POWER4+ 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
Figure 5.14: Contrasting POWER4 and POWER5 system structures [3.1] POWER5 (1)
Figure: Block diagram of the POWER5 (1) [3.1] POWER5 (2)
Figure: Block diagram of the POWER5 (2) [3.12] POWER5 (3)
POWER5 (4) Figure: Floorplan of the POWER5 [3.13]
POWER4 POWER5 180 nm, 412 mm nm, 389 mm 2 (~3 % enlarged) POWER5 (6) Figure: Contrasting the floor plans of the POWER4 and POWER5 dies [3.11], [3.13]
Figure: Packaging alternatives of the POWER4/5 processors Source: Partridge R. and Ghatpande S., IBM Introduces POWER5+ and Quad-Core Modules in System p5,” Tech Trends Monthly, Nov./Dec. 2005, POWER5 + Dual-Core Module POWER5 (7)
POWER4 MCM Photo32-way System Showing 4 MCMs and L3 Cache Figure: Quad–Chip POWER4 module (MCM) and a 32-way POWER4 system [3.7] POWER5 (8)
Figure.: Interpretation of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 [3.7] POWER5 (9)
Figure: Photos of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 [3.7] POWER5 (10)
Figure: The Multi-chip module of the POWER5 [3.10] POWER5 (11)
POWER5 (12) Table: Main features of IBM’s dual-core POWER line On-chip Off-chipMem. contr. L3 L2 1.9 MB/shared1.5 MB/shared1.44 MB/sharedSize/allocation On-chip Implementation 36 MB32 MB Size 36 MB Tags on-chip DPM 6 DCM 3 /MCM 2 80 (est) 1.65/ mtrs 389 mm nm 5/2004 DC POWER5 SCM 1 /MCM mtrs 380 mm nm 11/2002 DC POWER4+ 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, Figure: Block diagram of the POWER POWER5+ (1)
Figure: Dual-Core Modules (DCMs) and Quad-Core Modules (QCM) of the POWER5+ [3.14] POWER5+ (2)
POWER5+ (3) Table: Main features of IBM’s dual-core POWER line SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
POWER6 POWER5+ Figure: Contrasting the block diagrams of the POWER5 and POWER6 processors [3.15] Hardware support of decimal arithmetic POWER6 (1)
POWER6 (2) Table: Main features of IBM’s dual-core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
10.3 IBM’s MC processors Cell BE90 nm 2/ Cell BE
Figure: The history and development cost of the Cell BE [3.17], [3.22] Cell BE (1)
AUC: Atomic Update Cache BIC: Bus Interface Contr. EIB: Element Interface Bus LS: Local Store of 256 KB MFC: Memory Flow Controller MIC: Memory Interface Contr. PPE: Power Processing Element PXU: POWER Execution Unit SMF: Synergistic Memory Flow Unit SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit XDR: Rambus DRAM Figure: Block diagram of the Cell BE [3.19] Cell BE (2)
PPE: dual-threaded > 200 GFLOPS (SP) > 20 GFLOPS (DP) > 25 GB/s memory BW > 75 GB/s I/O BW > 300 GB/s EIB BW fc > 4 GHz (lab) Figure: Main design parameters of the Cell BE [3.28] Cell BE (3) Design parameters of the Cell BE:
Figure : Cell SPE architecture [3.16] Cell BE (4)
Figure: Block diagram of the SPE [3.19] Cell BE (5)
Figure: Pipeline stages of the Cell BE [3.19] Cell BE (6)
Figure: Floor plan of a single SPE [3.19] Cell BE (7)
Principle of operation of the Element Interface Bus (EIB) [3.23] Cell BE (8)
Figure: The Element Interface Bus EIB) [3.19] Cell BE (9)
Figure: The Synergistic Memory Flow unit (SMF) [3.19] Cell BE (10)
Figure: PPE block diagram [3.28]
Figure: Floor plan of the Cell BE processor [3.19] 235 mm mtrs Cell BE (11)
Cell BE (12) Table: Main features of the IBM’s Cell BE L3 On-chipMemory controller Ring basedInterconnection network Up to 75 MB/sI/O bandwidth PPE: 2-way SPE: Multithreading 95 3GHzTDP [W] 25 GB/sMemory bandwidth PPE: 512 KB SPE: 256 KB Local Store (128*128 bit) L2 3.0/3.2f c [GHz] 234 mtrsNr. of transistors 221 mm 2 Die size 90 nmTechnology 9/2006 (in the QS20 BladeCenter)Introduction PPE: 64-bit RISC SPE: Dual-issue 32-bit SIMD with 128 bit capability Cores PowerPC 2.02Architecture Heterogeneous 1xPPE, 8*SPE Implementation Cell BESeries
Source: Brochard L., A Cell History,” Cell Workshop, April, Figure: Cell BE Blade Roadmap Cell BE (13)
Source: Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, Figure: Roadmap of the Cell BE Cell BE (14)
10.3 Literature (1) POWER4, POWER4+ [3.3] Grassl C., „New IBM Components for HPCx”, Dec. 2003, [3.1] Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, [3.2] DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, [3.4] Krevell K., „IBM’s POWER4 Unveiling Continuues”, Microprocessor Report, Nov , pp- 1-4 [3.5] Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001, POWER5, POWER5+ [3.9] Grassl C., „New IBM Components for HPCx”, Dec. 2003, [3.7] Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, [3.8] DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, [3.10] Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003, www-csl.csres.utexas.edu/users/billmark/teach/cs spring/lectures/Lecture22-RonKallaIBM.pdf [3.6] Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,, IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25,
[3.11] Kalla R., Sinharoy B., Tendler J.: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, [3.12] Krevell K., „POWER5 Tops on Bandwidth”, Microprocessor Report, Dec [3.13] Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp [3.15] Kanter D., „IBM Previews the Power6,” Oct. 2006, [3.14] Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, POWER6 POWER5, POWER5+ (cont.) Cell BE [3.17] Brochard L., A Cell History,” Cell Workshop, April, [3.19] Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, [3.16] Blachford N.: „Cell Architecture Explained Version 2”, [3.18] Day M. and Hofstee P., „Hardware and Software Architectures for the Cell Broadband Engine processor, ” CODES, Sept. 2006, Literature (2)
10.3 Literature (3) Cell BE (cont.) [3.23] Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006, [3.21] Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, [3.26] Solie, D., „Technology Trends Presentation,” Power Symposium, Aug. 2006, file14+-+darryl+solie+-+ibm+power+symposium+presentation/$file/ 14+-+darryl+solie-ibm-power+symposium+presentation+v2.pdf [3.27] - „Cell Broadband Engine processor – based systems,” White Paper, IBM Corp., 2006 [3.25] Krewell K., „Cell Moves Into The Limelight,” Microprocessor Report, Febr , pp. 1-9 [3.20] Gschwind M., Hofstee H. P., Flachs B. K., Hophkins M., Watanabe Y., Yamazaki T „Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, Vol. 26, No. 2, 2006, pp [3.24] Krolak D., „Unleashing the Cell Broadband Engine Processor,” MPR Fall Proc. Forum, Nov. 2005, [3.22] Hofstee H. P., „Cell today and tomorrow,” 2005, [3.28] - „Cell Architecture”, Course Code L1T1H1-10, 2006, CellArchitecture.pdf