III. Multicore Processors (5) Dezső Sima Spring 2007 (Ver. 2.1)  Dezső Sima, 2007.

Slides:



Advertisements
Similar presentations
Parallel Processing with PlayStation3 Lawrence Kalisz.
Advertisements

III. Multicore Processors (6) Dezső Sima Spring 2007 (Ver. 2.1)  Dezső Sima, 2007.
4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Multicore Architectures Michael Gerndt. Development of Microprocessors Transistor capacity doubles every 18 months © Intel.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
III. Multicore Processors (4) Dezső Sima Spring 2007 (Ver. 2.1)  Dezső Sima, 2007.
Cell Broadband Engine. INF5062, Carsten Griwodz & Pål Halvorsen University of Oslo Cell Broadband Engine Structure SPE PPE MIC EIB.
Chapter 1 An Introduction To Microprocessor And Computer
III. Multicore Processors (5) Dezső Sima Spring 2007 (Ver. 2.0)  Dezső Sima, 2007.
What you wanted to know about the iSeries hardware POWER 5, POWER 6 and POWER 7 Bill Fuller Natco Products Corporation
Presented by Performance and Productivity of Emerging Architectures Jeremy Meredith Sadaf Alam Jeffrey Vetter Future Technologies.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
ELEC 6200, Fall 07, Oct 29 McPherson: Vector Processors1 Vector Processors Ryan McPherson ELEC 6200 Fall 2007.
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
Cell Broadband Processor Daniel Bagley Meng Tan. Agenda  General Intro  History of development  Technical overview of architecture  Detailed technical.
Computer Organization and Assembly language
III. Multicore Processors (3)
Microarchitecture of Superscalars (4) Decoding Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
CSE378 Gen. Intro1 Machine Organization and Assembly Language Programming Machine Organization –Hardware-centric view (in this class) –Not at the transistor.
Cell Architecture. Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture.
Evaluation of Multi-core Architectures for Image Processing Algorithms Masters Thesis Presentation by Trupti Patil July 22, 2009.
Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.
Agenda Performance highlights of Cell Target applications
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Introduction to CMOS VLSI Design Lecture 22: Case Study: Intel Processors David Harris Harvey Mudd College Spring 2004.
Winter 2004 Class Representation For Advanced VLSI Course Instructor : Dr S.M.Fakhraie Presented by : Naser Sedaghati Major Reference : Design and Implementation.
A Gentler, Kinder Guide to the Multi-core Galaxy Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Guest lecture for ECE4100/6100.
Computer Science and Engineering Advanced Computer Architecture CSE 8383 April 17, 2008 Session 11.
Intel’s Penryn Sima Dezső Fall 2007 Version nm quad-core -
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
1 The IBM Cell Processor – Architecture and On-Chip Communication Interconnect.
Computer Organization and Design Computer Abstractions and Technology
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
Dezső Sima Fall 2007 (Ver. 2.1)  Dezső Sima, 2007 Multicore Processors (2)
Computer Organization & Assembly Language © by DR. M. Amer.
Sam Sandbote CSE 8383 Advanced Computer Architecture The IBM Cell Architecture Sam Sandbote CSE 8383 Advanced Computer Architecture April 18, 2006.
High Performance Computing Group Feasibility Study of MPI Implementation on the Heterogeneous Multi-Core Cell BE TM Architecture Feasibility Study of MPI.
CS5222 Adv. Comp. Arch. Part 0 Page.1 Chi C.H. Fall 2004 NUS CS5222 Advanced Computer Architecture Part 0: Course Introduction Fall Term, 2004/2005 Chi.
© 2005 IBM Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006.
Aarul Jain CSE520, Advanced Computer Architecture Fall 2007.
Dezső Sima Fall 2007 (Ver. 2.1)  Dezső Sima, 2007 Multicore Processors (5)
Lecture 3 Dr. Muhammad Ayaz Computer Organization and Assembly Language. (CSC-210)
IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.
Lecture 3 (Microprocessor) Dr. Muhammad Ayaz Computer Organization and Assembly Language. (CSC-210)
Sima Dezső 2007 őszi félév (Ver. 2.1)  Dezső Sima, 2007 Többmagos Processzorok (3)
Hardware Architecture
1/21 Cell Processor Systems Seminar Diana Palsetia (11/21/2006)
Microarchitecture of Superscalars (6) Register renaming Dezső Sima Spring 2008 (Ver. 2.0)  Dezső Sima, 2008.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
● Cell Broadband Engine Architecture Processor ● Ryan Layer ● Ben Kreuter ● Michelle McDaniel ● Carrie Ruppar.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.
Itanium® 2 Processor Architecture
Manycore processors Sima Dezső October Version 6.2.
Lynn Choi School of Electrical Engineering
ECE 3055: Computer Architecture and Operating Systems
Lynn Choi School of Electrical Engineering
High Performance Computing on an IBM Cell Processor --- Bioinformatics
Cell Architecture.
9/18/2018 Accelerating IMA: A Processor Performance Comparison of the Internal Multiple Attenuation Algorithm Michael Perrone Mgr, Cell Solution Dept.,
Parallel Computers Today
Technology and Historical Perspective: A peek of the microprocessor Evolution 11/14/2018 cpeg323\Topic1a.ppt.
III. Multicore Processors (2)
11. Multicore Processors Dezső Sima Fall 2006  D. Sima, 2006.
Többmagos Processzorok (2)
Multicore Processors (5)
Microarchitecture of Superscalars (4) Decoding
Presentation transcript:

III. Multicore Processors (5) Dezső Sima Spring 2007 (Ver. 2.1)  Dezső Sima, 2007

POWER line Cell BE 10.3 IBM’s MC processors

POWER4180 nm 10/2001 POWER nm 11/ POWER line POWER5130 nm 5/2004 POWER5+ 90 nm 10/2005 POWER6 65 nm 2007

Figure: The evolution of IBM’s major RISC lines Evolution of IBM’s major RISC lines

Figure : POWER4 chip logical view [3.6] POWER4 (1) Built-In-SelfTest Service Processor Power On Reset Core interface Unit (crossbar) Non-Cacheable Unit MultiChip Module

Figure: Logical view of the L3 controller [3.5] POWER4 (2)

Figure: The memory cotroller of the POWER4 [3.5] POWER4 (3)

Figure: I/O controller of the POWER4 [3.5] Fabric Controller POWER4 (4)

Figure: POWER4 chip [3.11] POWER4 (5)

POWER4 (6) Table: Main features of IBM’s dual-core POWER line Off-chipMem. contr. L3 L MB/sharedSize/allocation On-chipImplementation 32 MBSize 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management

POWER4+ (1) Figure: New features of the POWER5+ [3.3]

POWER4+ (2) Table: Main features of IBM’s dual-core POWER line On-chipOff-chipMem. contr. L3 L2 1.5 MB/shared1.44 MB/sharedSize/allocation On-chip Implementation 32 MB Size SCM 1 /MCM mtrs 380 mm nm 11/2002 DC POWER4+ 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management

Figure 5.14: Contrasting POWER4 and POWER5 system structures [3.1] POWER5 (1)

Figure: Block diagram of the POWER5 (1) [3.1] POWER5 (2)

Figure: Block diagram of the POWER5 (2) [3.12] POWER5 (3)

POWER5 (4) Figure: Floorplan of the POWER5 [3.13]

POWER4 POWER5 180 nm, 412 mm nm, 389 mm 2 (~3 % enlarged) POWER5 (6) Figure: Contrasting the floor plans of the POWER4 and POWER5 dies [3.11], [3.13]

Figure: Packaging alternatives of the POWER4/5 processors Source: Partridge R. and Ghatpande S., IBM Introduces POWER5+ and Quad-Core Modules in System p5,” Tech Trends Monthly, Nov./Dec. 2005, POWER5 + Dual-Core Module POWER5 (7)

POWER4 MCM Photo32-way System Showing 4 MCMs and L3 Cache Figure: Quad–Chip POWER4 module (MCM) and a 32-way POWER4 system [3.7] POWER5 (8)

Figure.: Interpretation of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 [3.7] POWER5 (9)

Figure: Photos of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 [3.7] POWER5 (10)

Figure: The Multi-chip module of the POWER5 [3.10] POWER5 (11)

POWER5 (12) Table: Main features of IBM’s dual-core POWER line On-chip Off-chipMem. contr. L3 L2 1.9 MB/shared1.5 MB/shared1.44 MB/sharedSize/allocation On-chip Implementation 36 MB32 MB Size 36 MB Tags on-chip DPM 6 DCM 3 /MCM 2 80 (est) 1.65/ mtrs 389 mm nm 5/2004 DC POWER5 SCM 1 /MCM mtrs 380 mm nm 11/2002 DC POWER4+ 32 MB Tags on-chip SCM 1 /MCM 2 115/125 Tags on-chip, data off-chip mtrs 412 mm nm 10/2001 DC POWER4 L3 size L3 impl. Power management Dual threaded Packaging TDP [W] Implementation f c [GHz] Nr. of transistors Die size Technology Introduced Dual/Quad-Core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management

Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, Figure: Block diagram of the POWER POWER5+ (1)

Figure: Dual-Core Modules (DCMs) and Quad-Core Modules (QCM) of the POWER5+ [3.14] POWER5+ (2)

POWER5+ (3) Table: Main features of IBM’s dual-core POWER line SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management

POWER6 POWER5+ Figure: Contrasting the block diagrams of the POWER5 and POWER6 processors [3.15] Hardware support of decimal arithmetic POWER6 (1)

POWER6 (2) Table: Main features of IBM’s dual-core POWER line 1 SMC: Single Chip Module 2 MCM: Multi Chip Module 3 DCM: Dual Chip Module 4 DCM: Dual Core Module 5 QCM: Quad Core Module 6 DPM: Dynamic Power Management

10.3 IBM’s MC processors Cell BE90 nm 2/ Cell BE

Figure: The history and development cost of the Cell BE [3.17], [3.22] Cell BE (1)

AUC: Atomic Update Cache BIC: Bus Interface Contr. EIB: Element Interface Bus LS: Local Store of 256 KB MFC: Memory Flow Controller MIC: Memory Interface Contr. PPE: Power Processing Element PXU: POWER Execution Unit SMF: Synergistic Memory Flow Unit SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit XDR: Rambus DRAM Figure: Block diagram of the Cell BE [3.19] Cell BE (2)

PPE: dual-threaded > 200 GFLOPS (SP) > 20 GFLOPS (DP) > 25 GB/s memory BW > 75 GB/s I/O BW > 300 GB/s EIB BW fc > 4 GHz (lab) Figure: Main design parameters of the Cell BE [3.28] Cell BE (3) Design parameters of the Cell BE:

Figure : Cell SPE architecture [3.16] Cell BE (4)

Figure: Block diagram of the SPE [3.19] Cell BE (5)

Figure: Pipeline stages of the Cell BE [3.19] Cell BE (6)

Figure: Floor plan of a single SPE [3.19] Cell BE (7)

Principle of operation of the Element Interface Bus (EIB) [3.23] Cell BE (8)

Figure: The Element Interface Bus EIB) [3.19] Cell BE (9)

Figure: The Synergistic Memory Flow unit (SMF) [3.19] Cell BE (10)

Figure: PPE block diagram [3.28]

Figure: Floor plan of the Cell BE processor [3.19] 235 mm mtrs Cell BE (11)

Cell BE (12) Table: Main features of the IBM’s Cell BE L3 On-chipMemory controller Ring basedInterconnection network Up to 75 MB/sI/O bandwidth PPE: 2-way SPE: Multithreading 95 3GHzTDP [W] 25 GB/sMemory bandwidth PPE: 512 KB SPE: 256 KB Local Store (128*128 bit) L2 3.0/3.2f c [GHz] 234 mtrsNr. of transistors 221 mm 2 Die size 90 nmTechnology 9/2006 (in the QS20 BladeCenter)Introduction PPE: 64-bit RISC SPE: Dual-issue 32-bit SIMD with 128 bit capability Cores PowerPC 2.02Architecture Heterogeneous 1xPPE, 8*SPE Implementation Cell BESeries

Source: Brochard L., A Cell History,” Cell Workshop, April, Figure: Cell BE Blade Roadmap Cell BE (13)

Source: Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, Figure: Roadmap of the Cell BE Cell BE (14)

10.3 Literature (1) POWER4, POWER4+ [3.3] Grassl C., „New IBM Components for HPCx”, Dec. 2003, [3.1] Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, [3.2] DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, [3.4] Krevell K., „IBM’s POWER4 Unveiling Continuues”, Microprocessor Report, Nov , pp- 1-4 [3.5] Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001, POWER5, POWER5+ [3.9] Grassl C., „New IBM Components for HPCx”, Dec. 2003, [3.7] Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, [3.8] DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, [3.10] Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003, www-csl.csres.utexas.edu/users/billmark/teach/cs spring/lectures/Lecture22-RonKallaIBM.pdf [3.6] Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,, IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25,

[3.11] Kalla R., Sinharoy B., Tendler J.: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, [3.12] Krevell K., „POWER5 Tops on Bandwidth”, Microprocessor Report, Dec [3.13] Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp [3.15] Kanter D., „IBM Previews the Power6,” Oct. 2006, [3.14] Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, POWER6 POWER5, POWER5+ (cont.) Cell BE [3.17] Brochard L., A Cell History,” Cell Workshop, April, [3.19] Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, [3.16] Blachford N.: „Cell Architecture Explained Version 2”, [3.18] Day M. and Hofstee P., „Hardware and Software Architectures for the Cell Broadband Engine processor, ” CODES, Sept. 2006, Literature (2)

10.3 Literature (3) Cell BE (cont.) [3.23] Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006, [3.21] Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, [3.26] Solie, D., „Technology Trends Presentation,” Power Symposium, Aug. 2006, file14+-+darryl+solie+-+ibm+power+symposium+presentation/$file/ 14+-+darryl+solie-ibm-power+symposium+presentation+v2.pdf [3.27] - „Cell Broadband Engine processor – based systems,” White Paper, IBM Corp., 2006 [3.25] Krewell K., „Cell Moves Into The Limelight,” Microprocessor Report, Febr , pp. 1-9 [3.20] Gschwind M., Hofstee H. P., Flachs B. K., Hophkins M., Watanabe Y., Yamazaki T „Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, Vol. 26, No. 2, 2006, pp [3.24] Krolak D., „Unleashing the Cell Broadband Engine Processor,” MPR Fall Proc. Forum, Nov. 2005, [3.22] Hofstee H. P., „Cell today and tomorrow,” 2005, [3.28] - „Cell Architecture”, Course Code L1T1H1-10, 2006, CellArchitecture.pdf