Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dezső Sima Fall 2007 (Ver. 2.1)  Dezső Sima, 2007 Multicore Processors (5)

Similar presentations


Presentation on theme: "Dezső Sima Fall 2007 (Ver. 2.1)  Dezső Sima, 2007 Multicore Processors (5)"— Presentation transcript:

1 Dezső Sima Fall 2007 (Ver. 2.1)  Dezső Sima, 2007 Multicore Processors (5)

2 10.3.1 POWER line 10.3.2 Cell BE 10.3 IBM’s MC processors

3 POWER4180 nm 10/2001 POWER4+ 130 nm 11/2002 10.3.1 POWER line POWER5130 nm 5/2004 POWER5+ 90 nm 10/2005 POWER6 65 nm 5/ 2007

4 Figure: The evolution of IBM’s major RISC lines 10.3.1 Evolution of IBM’s major RISC lines

5 Figure : POWER4 chip logical view [3.6] 10.3.1 POWER4 (1) Built-In-SelfTest Service Processor Power On Reset Core interface Unit (crossbar) Non-Cacheable Unit MultiChip Module

6 Figure: Logical view of the L3 controller [3.5] 10.3.1 POWER4 (2)

7 Figure: The memory cotroller of the POWER4 [3.5] 10.3.1 POWER4 (3)

8 Figure: I/O controller of the POWER4 [3.5] Fabric Controller 10.3.1 POWER4 (4)

9 Figure: POWER4 chip [3.11] 10.3.1 POWER4 (5)

10 10.3.1 POWER4 (6) Table: Main features of IBM’s dual-core POWER line Off-chipMem. contr. L3 L2 1.44 MB/sharedSize/allocation On-chipImplementation 32 MBSize 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip 1.3 174 mtrs 412 mm 2 180 nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management

11 10.3.2 POWER4+ (1) Figure: New features of the POWER5+ [3.3]

12 10.3.1 POWER4+ (2) Table: Main features of IBM’s dual-core POWER line On-chipOff-chipMem. contr. L3 L2 1.5 MB/shared1.44 MB/sharedSize/allocation On-chip Implementation 32 MB Size SCM 1 /MCM 2 70 1.7 184 mtrs 380 mm 2 130 nm 11/2002 DC POWER4+ 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip 1.3 174 mtrs 412 mm 2 180 nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management

13 Figure 5.14: Contrasting POWER4 and POWER5 system structures [3.1] 10.3.1 POWER5 (1) (Exclusive L3)

14 Figure: Block diagram of the POWER5 (1) [3.1] 10.3.1 POWER5 (2)

15 Figure: Block diagram of the POWER5 (2) [3.12] 10.3.1 POWER5 (3)

16 10.3.1 POWER5 (4) Figure: Floorplan of the POWER5 [3.13]

17 POWER4 POWER5 180 nm, 412 mm 2 130 nm, 389 mm 2 (~3 % enlarged) 10.3.1 POWER5 (6) Figure: Contrasting the floor plans of the POWER4 and POWER5 dies [3.11], [3.13]

18 Figure: Packaging alternatives of the POWER4/5 processors Source: Partridge R. and Ghatpande S., IBM Introduces POWER5+ and Quad-Core Modules in System p5,” Tech Trends Monthly, Nov./Dec. 2005, POWER5 + Dual-Core Module 10.3.1 POWER5 (7)

19 POWER4 MCM Photo32-way System Showing 4 MCMs and L3 Cache Figure: Quad–Chip POWER4 module (MCM) and a 32-way POWER4 system [3.7] 10.3.1 POWER5 (8)

20 Figure: Photos of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 [3.7] 10.3.1 POWER5 (10)

21 Figure: The Multi-chip module of the POWER5 [3.10] 10.3.1 POWER5 (11)

22 10.3.1 POWER5 (12) Table: Main features of IBM’s dual-core POWER line On-chip Off-chipMem. contr. L3 L2 1.9 MB/shared1.5 MB/shared1.44 MB/sharedSize/allocation On-chip Implementation 36 MB32 MB Size 36 MB Tags on-chip DPM 6 DCM 3 /MCM 2 80 (est) 1.65/1.9 276 mtrs 389 mm 2 130 nm 5/2004 DC POWER5 SCM 1 /MCM 2 70 1.7 184 mtrs 380 mm 2 130 nm 11/2002 DC POWER4+ 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip 1.3 174 mtrs 412 mm 2 180 nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management

23 Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, http://www.redbooks.ibm.com/redpapers/pdfs/redp4150.pdf Figure: Block diagram of the POWER5+ 10.3.1 POWER5+ (1)

24 Figure.: Interpretation of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 [3.7] 10.3.1 POWER5 (9)

25 Figure: Dual-Core Modules (DCMs) and Quad-Core Modules (QCM) of the POWER5+ [3.14] 10.3.1 POWER5+ (2)

26 10.3.1 POWER5+ (3) Table: Main features of IBM’s dual-core POWER line 10.3 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management

27 POWER6’s main features [3.15b] 10.3.1 POWER6 (1) ultra-high frequency (4.7 = GHz) dual core dual threaded SMT 13 FO4 design private 4 MB L2 caches partially integrated 32 MB L3 victim cache minimization of excessive circuitry to reduce dissipation (modest speculation and ooo-execution, no renaming) push many fuctions of decoding and instruction grouping into predecoding (4 stages) (added L2 latency causes 0.5 % loss for each stage whereas each added stage after the I-cache access results in about 1 % loss per stage) increased dispath and completion bandwidth (to 7 instructions per thread) L2 cache, SMP interconnect, parts of the memory and I/O subsystem operate at 0.5 fc, L3 operates at one-quarter, the memory. controller up to 3.2 GHz. (In the POWER5 the L2 operates at fc,the remaining components at 0.5 fc.) since L2 operates at 0.5 fc, the width of the load and store interfaces was doubled.

28 10.3.1 POWER6 (2) POWER6 (in the IBM System p570) had at intro the highest figures for SPECint2006, SPECfp2006, SPECjbb2005 (Java performance) and TPC-C (transaction performance).

29 POWER6 POWER5+ Figure: Contrasting the block diagrams of the POWER5 and POWER6 processors [3.15a] Hardware support of decimal arithmetic 10.3.1 POWER6 (3)

30 Figure: Comparing the POWER5 and POWER6 processors [3.15b] 10.3.1 POWER6 (4)

31 Table: Throughput comparison POWER6 vs POWER5 [3.15b] 10.3.1 POWER6 (5)

32 10.3.1 POWER6 (6) [3.15b]

33 Figure: The internal pipelines of the POWER6 and the POWER5 [3.15b] 10.3.1 POWER6 (7)

34 Figure: First level nodal topology of the POWER6 vs POWER5 [3.15b] 10.3.1 POWER6 (8)

35 Figure: Second level topology of the POWER5 vs POWER6 [3.15b] 10.3.1 POWER6 (9)

36 Table: POWER6 processor functional signal I/O-pin comparison for various system types [3.15b] 10.3.1 POWER6 (10)

37 10.3.1 POWER6 (11) Figure: Micrograph of the POWER6 [3.15b]

38 10.3.1 POWER6 (12) Table: Main features of IBM’s dual-core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management On-chip

39 10.3 IBM’s MC processors Cell BE90 nm 2/2006 10.3.2 Cell BE

40 Figure: The history and development cost of the Cell BE [3.17], [3.22] 10.3.2 Cell BE (1)

41 AUC: Atomic Update Cache BIC: Bus Interface Contr. EIB: Element Interface Bus LS: Local Store of 256 KB MFC: Memory Flow Controller MIC: Memory Interface Contr. PPE: Power Processing Element PXU: POWER Execution Unit SMF: Synergistic Memory Flow Unit SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit XDR: Rambus DRAM Figure: Block diagram of the Cell BE [3.19] 10.3.2 Cell BE (2)

42 PPE: dual-threaded > 200 GFLOPS (SP) > 20 GFLOPS (DP) > 25 GB/s memory BW > 75 GB/s I/O BW > 300 GB/s EIB BW fc > 4 GHz (lab) Figure: Main design parameters of the Cell BE [3.28] 10.3.2 Cell BE (3) Design parameters of the Cell BE:

43 Figure : Cell SPE architecture [3.16] 10.3.2 Cell BE (4)

44 Figure: Block diagram of the SPE [3.19] 10.3.2 Cell BE (5)

45 Figure: Pipeline stages of the Cell BE [3.19] 10.3.2 Cell BE (6)

46 Figure: Floor plan of a single SPE [3.19] 10.3.2 Cell BE (7)

47 Principle of operation of the Element Interface Bus (EIB) [3.23] 10.3.2 Cell BE (8)

48 Figure: The Element Interface Bus EIB) [3.19] 10.3.2 Cell BE (9)

49 Figure: The Synergistic Memory Flow unit (SMF) [3.19] 10.3.2 Cell BE (10)

50 Figure: PPE block diagram [3.28]

51 Figure: Floor plan of the Cell BE processor [3.19] 235 mm 2 241 mtrs 10.3.2 Cell BE (11)

52 10.3.2 Cell BE (12) Table: Main features of the IBM’s Cell BE L3 On-chipMemory controller Ring basedInterconnection network Up to 75 MB/sI/O bandwidth PPE: 2-way SPE: Multithreading 95 W @ 3GHzTDP [W] 25 GB/sMemory bandwidth PPE: 512 KB SPE: 256 KB Local Store (128*128 bit) L2 3.0/3.2f c [GHz] 234 mtrsNr. of transistors 221 mm 2 Die size 90 nmTechnology 9/2006 (in the QS20 BladeCenter)Introduction PPE: 64-bit RISC SPE: Dual-issue 32-bit SIMD with 128 bit capability Cores PowerPC 2.02Architecture Heterogeneous 1xPPE, 8*SPE Implementation Cell BESeries

53 Source: Brochard L., A Cell History,” Cell Workshop, April, 2006 http://www.irisa.fr/orap/Constructeurs/Cell/Cell%20Short%20Intro%20Luigi.pdf Figure: Cell BE Blade Roadmap 10.3.2 Cell BE (13)

54 Source: Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, http://www.cercs.gatech.edu/docs/SC06_Cell_111606.pdf Figure: Roadmap of the Cell BE 10.3.2 Cell BE (14)

55 10.3 Literature (1) POWER4, POWER4+ [3.3] Grassl C., „New IBM Components for HPCx”, Dec. 2003, http://www.hpcx.ac.uk/about/events/annual2003/Grassl.pdf [3.1] Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, http://www.llnl.gov/computing/tutorials/ibm_sp/ [3.2] DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, http://h21007.www2.hp.com/dspp/files/unprotected/Itanium/sizingsuperheavys.pdf [3.4] Krevell K., „IBM’s POWER4 Unveiling Continuues”, Microprocessor Report, Nov. 20. 2000, pp- 1-4 [3.5] Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001, http://www-03.ibm.coom/servers/eserver/pseries/hardware/whitepapers/power4.pdf POWER5, POWER5+ [3.9] Grassl C., „New IBM Components for HPCx”, Dec. 2003, http://www.hpcx.ac.uk/about/events/annual2003/Grassl.pdf [3.7] Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, http://www.llnl.gov/computing/tutorials/ibm_sp/ [3.8] DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, http://h21007.www2.hp.com/dspp/files/unprotected/Itanium/sizingsuperheavys.pdf [3.10] Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003, www-csl.csres.utexas.edu/users/billmark/teach/cs352-05-spring/lectures/Lecture22-RonKallaIBM.pdf [3.6] Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,, IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25, http://www.research.ibm.com/journal/rd/461/tendler.pdf

56 [3.11] Kalla R., Sinharoy B., Tendler J.: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003 http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf [3.12] Krevell K., „POWER5 Tops on Bandwidth”, Microprocessor Report, Dec. 2003 http://studies.ac.upc.edu/ETSETB/SEGPAR/microprocessors/power5%20(2)%20(mpr).pdf [3.13] Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp. 505-521 [3.15a] Kanter D., „IBM Previews the Power6,” Oct. 2006, dkanter@realwordtech.com [3.14] Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, http://www.redbooks.ibm.com/redpapers/pdfs/redp4150.pdf POWER6 POWER5, POWER5+ (cont.) Cell BE [3.17] Brochard L., A Cell History,” Cell Workshop, April, 2006 http://www.irisa.fr/orap/Constructeurs/Cell/Cell%20Short%20Intro%20Luigi.pdf [3.19] Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf [3.16] Blachford N.: „Cell Architecture Explained Version 2”, http://www.blachford.info/computer/Cell/Cell1_v2.html [3.18] Day M. and Hofstee P., „Hardware and Software Architectures for the Cell Broadband Engine processor, ” CODES, Sept. 2006, http://www.casesconference.org/cases2005/pdf/Cell-tutorial.pdf 10.3 Literature (2) [3.15b] Le. H. Q. et al., „IBM POWER6 microarchitecture,” IBM J. R&D, Vol. 51, No. 6, 2007. pp 639-662

57 10.3 Literature (3) Cell BE (cont.) [3.23] Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006, http://www.cse.clrc.ac.uk/disco/mew17/talks/Keable_IBM_MEW17.pdf [3.21] Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, http://www.cercs.gatech.edu/docs/SC06_Cell_111606.pdf [3.26] Solie, D., „Technology Trends Presentation,” Power Symposium, Aug. 2006, http://www-03.ibm.com/procurement/proweb.nsf/objectdocswebview/ file14+-+darryl+solie+-+ibm+power+symposium+presentation/$file/ 14+-+darryl+solie-ibm-power+symposium+presentation+v2.pdf [3.27] - „Cell Broadband Engine processor – based systems,” White Paper, IBM Corp., 2006 [3.25] Krewell K., „Cell Moves Into The Limelight,” Microprocessor Report, Febr. 14 2005, pp. 1-9 [3.20] Gschwind M., Hofstee H. P., Flachs B. K., Hophkins M., Watanabe Y., Yamazaki T „Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, Vol. 26, No. 2, 2006, pp. 10-24 [3.24] Krolak D., „Unleashing the Cell Broadband Engine Processor,” MPR Fall Proc. Forum, Nov. 2005, http://www-128.ibm.com/developerworks/power/library/pa-fpfeib/?ca=dgr-lnxwCellConnects [3.22] Hofstee H. P., „Cell today and tomorrow,” 2005, http://www.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf [3.28] - „Cell Architecture”, Course Code L1T1H1-10, 2006, http://www.power.org/resources/devcorner/cellcorner/CellTraining_Track1/CourseCode_L1T1H1-10_ CellArchitecture.pdf


Download ppt "Dezső Sima Fall 2007 (Ver. 2.1)  Dezső Sima, 2007 Multicore Processors (5)"

Similar presentations


Ads by Google