III. Multicore Processors (5) Dezső Sima Spring 2007 (Ver. 2.0) Dezső Sima, 2007
POWER line Cell BE 10.3 IBM’s MC processors
POWER4180 nm 10/2001 POWER nm 11/ POWER line POWER5130 nm 5/2004 POWER5+ 90 nm 10/2005 POWER6 65 nm 2007
Figure: The evolution of IBM’s major RISC lines Evolution of IBM’s major RISC lines
Figure : POWER4 chip logical view Built-In-SelfTest Service Processor Power On Reset Core interface Unit (crossbar) Non-Cacheable Unit MultiChip Module POWER4 (1) Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,, IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25,
Source: Power4 System Microarchitecture, Technical White Paper, 2001, IBM Corp., Figure: Logical view of the L3 controller POWER4 (2)
Figure: The memory cotroller of the POWER4 Source: Power4 System Microarchitecture, Technical White Paper, 2001, IBM Corp., POWER4 (3)
Figure: I/O controller of the POWER4 Source: Power4 System Microarchitecture, Technical White Paper, 2001, IBM Corp., Fabric Controller POWER4 (4)
Figure: POWER4 chip Source: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, POWER4 (5)
POWER4 (6) Table: Main features of IBM’s dual-core POWER line Off-chipMem. contr. L3 L MB/sharedSize/allocation On-chipImplementation 32 MBSize 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
POWER4+ (1) Figure: New features of the POWER5+ Source: Grassl C., „New IBM Components for HPCx”, Dec. 2003,
POWER4+ (2) Table: Main features of IBM’s dual-core POWER line On-chipOff-chipMem. contr. L3 L2 1.5 MB/shared1.44 MB/sharedSize/allocation On-chip Implementation 32 MB Size SCM 1 /MCM mtrs 380 mm nm 11/2002 DC POWER4+ 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
Figure 5.14: Contrasting POWER4 and POWER5 system structures Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, POWER5 (1)
Figure: Block diagram of the POWER5 (1) Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, / POWER5 (2)
Figure: Block diagram of the POWER5 (2) POWER5 (3)
POWER5 (4) Figure: Floorplan of the POWER5 Source: Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp
POWER4 POWER5 180 nm, 412 mm nm, 389 mm 2 (enlarged) POWER5 (6) Figure: Contrasting the floor plans of the POWER4 and POWER5 dies Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp Sources: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003http://
Figure: Packaging alternatives of the POWER4/5 processors Source: Partridge R. and Ghatpande S., IBM Introduces POWER5+ and Quad-Core Modules in System p5,” Tech Trends Monthly, Nov./Dec. 2005, POWER5 + Dual-Core Module POWER5 (7)
POWER4 MCM Photo32-way System Showing 4 MCMs and L3 Cache Figure: Quad–Chip POWER4 module (MCM) and a 32-way POWER4 system Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, POWER5 (8)
Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, Figure: Interpretation of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER POWER5 (9)
Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, Figure: Photos of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER POWER5 (10)
Source: Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003, www-csl.csres.utexas.edu/users/billmark/teach/cs spring/lectures/Lecture22-RonKallaIBM.pdf Figure: The Multi-chip module of the POWER POWER5 (11)
POWER5 (12) Table: Main features of IBM’s dual-core POWER line On-chip Off-chipMem. contr. L3 L2 1.9 MB/shared1.5 MB/shared1.44 MB/sharedSize/allocation On-chip Implementation 36 MB32 MB Size 36 MB Tags on-chip DPM 6 DCM 3 /MCM 2 80 (est) 1.65/ mtrs 389 mm nm 5/2004 DC POWER5 SCM 1 /MCM mtrs 380 mm nm 11/2002 DC POWER4+ 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, Figure: Block diagram of the POWER POWER5+ (1)
Figure: Dual-Core Modules (DCMs) and Quad-Core Modules (QCM) of the POWER5+ Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, POWER5+ (2)
POWER5+ (3) Table: Main features of IBM’s dual-core POWER line SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
POWER6 POWER5+ Figure: Contrasting the block diagrams of the POWER5 and POWER6 processors Source: Kanter D., „IBM Previews the Power6,” Oct. 2006, Hardware support of decimal arithmetic POWER6 (1)
POWER6 (2) Table: Main features of IBM’s dual-core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management
10.3 IBM’s MC processors Cell BE90 nm 2/ Cell BE
Hofstee H. P., „Cell today and tomorrow,” 2005, Sources: Brochard L., A Cell History,” Cell Workshop, April, Figure: The history and development cost of the Cell BE Cell BE (1)
AUC: Atomic Update Cache BIC: Bus Interface Contr. EIB: Element Interface Bus LS: Local Store of 256 KB MFC: Memory Flow Controller MIC: Memory Interface Contr. PPE: Power Processing Element PXU: POWER Execution Unit SMF: Synergistic Memory Flow Unit SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit XDR: Rambus DRAM Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, Figure: Block diagram of the Cell BE Cell BE (2)
PPE: dual-threaded > 200 GFLOPS (SP) > 20 GFLOPS (DP) > 25 GB/s memory BW > 75 GB/s I/O BW > 300 GB/s EIB BW fc > 4 GHz (lab) publib.boulder.ibm.com/.../stgv1r0/topic/com.ibm.iea.cbe/cbe/1.0/Overview/L1T1H1_02_CellOverview.pdf Source: IBM „Cell Broadband Engine Overview,” Course Code L1T1H1-02, Mai 2006 Figure: Main design parameters of the Cell BE Cell BE (3) Design parameters of the Cell BE:
Figure 5.16: Cell SPE architecture Source: Blachford N.: „Cell Architecture Explained Version 2”, Cell BE (4)
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, Figure: Block diagram of the SPE Cell BE (5)
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, Figure: Pipeline stages of the Cell BE Cell BE (6)
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, Figure: Floor plan of a single SPE Cell BE (7)
Source: Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006, Principle of operation of the Element Interface Bus (EIB) Cell BE (8)
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, Figure: The Element Interface Bus EIB) Cell BE (9)
Figure: The Synergistic Memory Flow unit (SMF) Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, Cell BE (10)
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, Figure: Floor plan of the Cell BE processor 235 mm mtrs Cell BE (11)
Cell BE (12) Table: Main features of the IBM’s Cell BE L3 On-chipMemory controller Ring basedInterconnection network Up to 75 MB/sI/O bandwidth PPE: 2-way SPE: Multithreading 95 3GHzTDP [W] 25 GB/sMemory bandwidth PPE: 512 KB SPE: 256 KB Local Store (128*128 bit) L2 3.0/3.2f c [GHz] 234 mtrsNr. of transistors 221 mm 2 Die size 90 nmTechnology 9/2006 (in the QS20 BladeCenter)Introduction PPE: 64-bit RISC SPE: Dual-issue 32-bit SIMD with 128 bit capability Cores PowerPC 2.02Architecture Heterogeneous 1xPPE, 8*SPE Implementation Cell BESeries
Source: Brochard L., A Cell History,” Cell Workshop, April, Figure: Cell BE Blade Roadmap Cell BE (13)
Source: Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, Figure: Roadmap of the Cell BE Cell BE (14)
10.3 Literature (1) POWER4, POWER4+ Grassl C., „New IBM Components for HPCx”, Dec. 2003, Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, Krevell K., „IBM’s POWER4 Unveiling Continuues”, Microprocessor Report, Nov , pp- 1-4 Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture, IBM Server, Technical White Paper, October POWER5, POWER5+ Grassl C., „New IBM Components for HPCx”, Dec. 2003, Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003, www-csl.csres.utexas.edu/users/billmark/teach/cs spring/lectures/Lecture22-RonKallaIBM.pdf Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,, IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25,
Kalla R., Sinharoy B., Tendler J.: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, Krevell K., „POWER5 Tops on Bandwidth”, Microprocessor Report, Dec Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp Kanter D., „IBM Previews the Power6,” Oct. 2006, Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, POWER6 POWER5, POWER5+ (cont.) Cell BE Brochard L., A Cell History,” Cell Workshop, April, Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, Blachford N.: „Cell Architecture Explained Version 2”, Day M. and Hofstee P., „Hardware and Software Architectures for the Cell Broadband Engine processor, ”CODES, Sept. 2006, Literature (2)
10.3 Literature (3) Cell BE (cont.) Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006, Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, Solie, D., „Technology Trends Presentation,” Power Symposium, Aug. 2006, file14+-+darryl+solie+-+ibm+power+symposium+presentation/$file/ 14+-+darryl+solie-ibm-power+symposium+presentation+v2.pdf - „Cell Broadband Engine processor – based systems,” White Paper, IBM Corp., 2006 Krewell K., „Cell Moves Into The Limelight,” Microprocessor Report, Febr , pp. 1-9 Gschwind M., Hofstee H. P., Flachs B. K., Hophkins M., Watanabe Y., Yamazaki T „Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, Vol. 26, No. 2, 2006, pp Krolak D., „Unleashing the Cell Broadband Engine Processor,” MPR Fall Proc. Forum, Nov. 2005,