11. Multicore Processors Dezső Sima Fall 2006  D. Sima, 2006.

11. Multicore Processors Dezső Sima Fall 2006  D. Sima, 2006

Overview 1 Overview of MCPs 2 Attaching L2 caches
4 Connecting memory and I/O 5 Case examples

Figure 1.1: Processor power density trends
1. Overview of MCPs (1) Figure 1.1: Processor power density trends Source: D. Yen: Chip Multithreading Processors Enable Reliable High Throughput Computing

Figure 1.2: Single-stream performance vs. cost
1. Overview of MCPs (2) Figure 1.2: Single-stream performance vs. cost Source: Marr T.T. et al. „Hyper-Threading Technology Architecture and Microarchitecture Intel Technology Journal, Vol. 06, Issue 01, Febr 14, 2002, pp. 4-16

Figure 1.2: Dual/multi-core processors (1)
1. Overview of MCPs (2) Figure 1.2: Dual/multi-core processors (1)

Figure 1.3: Dual/multi-core processors (2)
1. Overview of MCPs (3) Figure 1.3: Dual/multi-core processors (2)

1. Overview of MCPs (4) Macro architecture of dual/multi-core processors (MCPs) Layout of the cores Attaching of L2 caches Attaching of L3 caches (if available) Layout of the I/O and memory architecture

2.1 Main aspects of attaching L2 caches to MCPs (1)
Allocation to the cores Use by instructions/data Integration of L2 caches to the proc. chip Inclusion policy Banking policy

Allocation of L2 caches to the cores
Shared L2 cache for all cores Allocation of L2 caches to the cores Private L2 cache for each core UltraSPARC IV (2004) UltraSPARC T1 (2005) Smithfield (2005) Yonah (2006) Athlon 64 X2 (2005) Core Duo (2006) POWER4 (2001) POWER5 (2005) Montecito (2006?) Expected trend

Inclusion policy of L2 caches
Exclusive L2 Inclusion policy of L2 caches Inclusive L2 L1 L1 L2 L2 Memory Lines replaced (victimized) in the L1 are Memory written into the L2 References to data in the L2 initiate reloading that cache line into the L1, L2 operates usually as write back cache (only modified data that is replaced in the L2 is written back to the memory), Unmodified data that is replaced in the L2 is deleted.

Figure 1.1: Implementation of exclusive L2 caches
Source: Zheng, Y., Davis, B.T., Jordan, M.: “ Performance evaluation of exclusive cache hierarchies”, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2004, pp

Exclusive L2 Inclusion policy of L2 caches Inclusive L2 Most implementations Athlon 64X2 (2005) Expected trend

Use by instructions/data
Unified instr./data cache(s) Use by instructions/data Split instr./data caches UltraSPARC IV (2004) Montecito (2006?) UltraSPARC T1 (2005) POWER4 (2001) POWER5 (2005) Smithfield (2005) Yonah (2006) Core Duo (2006) Athlon 64 X2 (2005) Expected trend

Single-banked implementation Multi-banked implementation
Banking policy Multi-banked implementation

Integration to the processor chip
On chip L2 tags/contr., off chip data Entire L2 on chip UltraSPARC IV (2004) UltraSPARC V (2005) POWER4 (2001) POWER5 (2005) Smithfield (2005) Presler (2005) Athlon 64 X2(2005) Expected trend

2.2 Examples of attaching L2 caches to MCPs (1)
Private L2 caches for each core Unified instruction / data caches Split instruction/data caches On-chip L2 tags/contr., off-chip data Entire L2 on-chip On-chip L2 t/c off-chip data Entire L2 on-chip Examples: Montecito (2006?) UltraSPARC IV (2004) Smithfield (2005) Athlon 64 X2 (2005) Presler (2005) (Exclusive L2) L2 data L2 data Core Core Core Core Core Core L2 tags/contr. L2 tags/contr. L2 L2 Interconn. Core network L2 I L2 D Core L2 I L2 D L2 L2 System Request Queue L3 L3 Syst. if. Mem. contr. Xbar Syst. if. Fire Plane bus Memory HT-bus contr. Mem contr. Syst. if. FSB FSB Memory HT-bus

Shared L2 caches for all cores Dual core/single banked L2 Dual core/multi banked L2 Multi core/multi banked L2 Yonah Duo (2006) POWER4 (2001) UltraSPARC T1 (2005) Examples: (Niagara) Core (2006) POWER5 (2005) (8 cores/4xL2 banks) Core Core Core Core Core Core X-bar L2 contr. X-bar L2 L2 L2 L2 L2 L2 Fabric Bu SContr. Fabric Bus Contr. System if. Mem. contr. Mem. contr. L3 tags/ contr. FSB GX contr. Memory Memory GX bus Mapping of addresses to the banks: Mapping of addresses to the banks: The 128-byte long L2 cache lines are hashed across The four L2 modules are interleaved at 64-byte blocks. the 3 modules. Hashing is performed by modulo 3 arithmetric applied on a large number of real address bits. 6 7 Addr. 2 1 Modulo 3 S 256 64 128 196

3. Attaching L3 caches Macro architecture of dual/multi-core processors (MCPs) Layout of the cores Attaching of L2 caches Attaching of L3 caches (if available) Layout of the I/O and memory architecture

Allocation to the L2 cache(s) Use by instructions/data Integration of L3 caches to the proc. chip Inclusion policy Banking policy

Allocation of L3 caches to the L2 caches
Shared L3 cache for all L2s Allocation of L3 caches to the L2 caches Private L3 cache for each L2 POWER5 (2005) POWER4 (2001) UltraSPARC IV+ (2004) Montecito (2006?)

Exclusive L3 Inclusion policy of L3 caches Inclusive L3 L2 L2 L3 L3 Memory Lines replaced (victimized) in the L2 are Memory written into the L3 References to data in the L3 initiate reloading that cache line into the L2, L3 operates usually as write back cache (only modified data that is replaced in the L3 is written back to the memory), Unmodified data that is replaced in the L3 is deleted.

Exclusive L3 Inclusion policy of L3 caches Inclusive L3 POWER4 (2001) POWER5 (2005) UltraSPARC IV+ (2004) Montecito (2006?) Expected trend

Use by instructions/data
Unified instr./data cache(s) Use by instructions/data Split instr./data caches All multicore processors unveiled until now hold both instruction and data

Single-banked implementation Multi-banked implementation
Banking policy Multi-banked implementation

Integration to the processor chip
On chip L3 tags/contr., off chip data Entire L3 on chip UltraSPARC IV+ (2005) POWER4 (2001) POWER5 (2005) Montecito (2006?) Expected trend

Inclusive L3 cache Private L3 caches for each L2 cache banks Shared L3 cache for all cache banks On-chip L3 tags/contr., off-chip data Entire L3 on-chip On-chip L3 tags/contr., off-chip data Entire L3 on-chip Examples: POWER4 (2001) Montecito (2006?) L2 L2 I L2 D L2 I L2 D Fabric Bus Contr. L3 L3 L3 tags/contr. Arbiter L3 data System if. Mem. contr. FSB Memory

Exclusive L3 cache Private L3 caches for each L2 cache banks Shared L3 cache for all cache banks On-chip L3 tags/contr., off-chip data Entire L3 on-chip On-chip L3 tags/contr., off-chip data Entire L3 on-chip Examples: POWER5 (2005): UltraSPARC IV+ (2005): L3 data L2 L3 tags/contr. L3 data L3 tags/contr. L2 L3 tags/contr. L3 data L2 L2 L3 tags/contr. L3 data Core Core Interconn. network Fabric Bus Contr. Syst. if. Mem. contr. Memory contr. Fire Plane bus Memory Memory

4. Connecting memory and I/O
Macro architecture of dual/multi-core processors (MCPs) Layout of the cores Attaching of L2 caches Attaching of L3 caches (if available) Layout of the I/O and memory architecture

4.1 Overview Layout of the I/O and memory architecture in dual/multi-core processors Connection policy of I/O and memory Integration of the memory controller to the processor chip

4.2 Connection policy (1) Connection policy of I/O and memory
Connecting both I/O and memory via the system bus Dedicated connection of I/O and memory Asymmetric connection of I/O and memory Symmetric connection of I/O and memory PA-8800 (2004) POWER4 (2001) POWER5 (2005) PA-8900 (2005) UltraSPARC T1 (2005) UltraSPARC IV (2004) Smithfield (2005) UltraSPARC IV+ (2005) Presler (2005) Athlon64 X2 (2005) Yonah Duo (2006) Core (2006) Montecito (2006?)

Connecting both I/O and memory via the system bus
4.2 Connection policy (2) Connecting both I/O and memory via the system bus Examples: Smithfield/Presler (2005/2005) Yonah Duo/Core (2006/2006) L2 L2 L2 Syst. bus if. Syst. bus if. FSB FSB Montecito (2006) PA-8800 (2004) PA-8900 (2005) L2 L2 I/ L2 I/ L2 D L2 D Core Core L2 L3 L3 contr. Syst. bus if. Syst. bus if. FSB FSB

Connecting both I/O and memory via the system bus Dedicated connection of I/O and memory Asymmetric connection of I/O and memory Symmetric connection of I/O and memory (Connecting I/O via the internal interconnection network, and memory via the L2/L3 cache) (Connecting both I/O and memory via the internal interconnection network PA-8800 (2004) POWER4 (2001) POWER5 (2005) PA-8900 (2005) UltraSPARC T1 (2005) UltraSPARC IV (2004) Smithfield (2005) UltraSPARC IV+ (2005) Presler (2005) Athlon64 X2 (2005) Yonah Duo (2006) Core (2006) Montecito (2006?)

Asymmetric connection of I/O and memory
4.2 Connection policy (4) Asymmetric connection of I/O and memory UltraSPARC T1 (2005) POWER4 (2001) L2 Core 0 Memory L2 L2 L2 L2 M. contr. X b a r L2 Memory Chip-to-chip/ M. contr. Mem.-to-Mem. Fabric Bus Contr. interconn. L2 Memory M. contr. GX contr. L3 dir./ contr. L2 Memory M. contr. Core 7 GX-bus L3 data Bus if. Mem. contr. JBus Memory

Connecting both I/O and memory via the system bus Dedicated connection of I/O and memory Asymmetric connection of I/O and memory Symmetric connection of I/O and memory (Connecting I/O via the internal interconnection network, and memory via the L2/L3 cache) (Connecting both I/O and memory via the internal interconnection network PA-8800 (2004) POWER4 (2001) POWER5 (2005) PA-8900 (2005) UltraSPARC T1 (2005) UltraSPARC IV (2004) Smithfield (2005) UltraSPARC IV+ (2005) Presler (2005) Athlon64 X2 (2005) Yonah Duo (2006) Core (2006) Montecito (2006?)

Symmetric connection of I/O and memory (1)
4.2 Connection policy (6) Symmetric connection of I/O and memory (1) POWER5 (2005) UltraSPARC IV (2004) L2 data L2 data L2 L3 L2 tags/contr. L2 tags/contr. Chip-chip/ Mem.-Mem. interconn. Fabric Bus Contr. Interconn. Core network Core GX contr. Mem contr. Syst. if. Mem. contr. Memory GX. bus Fire Plane bus Memory

Symmetric connection of I/O and memory (2)
4.2 Connection policy (7) Symmetric connection of I/O and memory (2) Athlon 64 X2 (2005) UltraSPARC IV+ (2005) L3 data L2 L2 L3 tags/contr. System Request Queue L2 Xbar Core Core Interconn. network HT-bus contr. Mem contr. Syst. if. Mem. contr. Memory HT-bus Fire Plane bus Memory

4.3 Integration of the memory controller to the processor chip
Off-chip memory controller On-chip memory controller POWER4 (2001) POWER5 (2005) PA-8800 (2004) UltraSPARC IV (2004) PA-8900 (2005) UltraSPARC IV+ (2005) UltraSPARC T1 (2005) Smithfield (2005) Presler (2005) Athlon 64 X2 (2005) Yonah Duo (2006) Core (2006) Montecito (2006?) Expected trend

Figure 5.1: The move to Intel multi-core
5. Case examples 5.1 Intel MCPs (1) Figure 5.1: The move to Intel multi-core Source: A. Loktu: Itanium 2 for Enterprise Computing

Source: http://www.intel.com/products/processor/index.htm
5.1 Intel MCPs (2) Figure 5.2: Processor specifications of Intel’s Pentium D family (90 nm) Source:

5.1 Intel MCPs (3) ED: Execute Disable Bit
Malicious buffer overflow attacks pose a significant security threat. In a typical attack, a malicious worm creates a flood of code that overwhelms the processor, allowing the worm to propagate itself to the network, and to other computers. It can help prevent certain classes of malicious buffer overflow attacks when combined with a supporting operating system. Execute Disable Bit allows the processor to classify areas in memory by where application code can execute and where it cannot. When a malicious worm attempts to insert code in the buffer, the processor disables code execution, preventing damage and worm propagation. VT: Virtualization Technology It is a set of hardware enhancements to Intel’s server and client platforms that can improve the performance and robustness of traditional software-based virtualization solutions. Virtualization solutions will allow a platform to run multiple operating systems and applications in independent partitions. Using virtualization capabilities, one computer system can function as multiple "virtual" systems. EIST: Enhanced Intel SpeedStep Technology First delivered in Intel’s mobile and server platforms, It allows the system to dynamically adjust processor voltage and core frequency, which can result in decreased average power consumption and decreased average heat production.

5.1 Intel MCPs (4) Figure 5.3: Processor specifications of Intel’s Pentium D family (65 nm) Source:

5.1 Intel MCPs (5) Figure 5.4 Specifications of Intel’s Pentium Processor Extrem Edition models 840/955/965 Source:

5.1 Intel MCPs (6) Figure 5.5: Procesor specifications of Intel’s Yonah Duo (Core Duo) family Source:

Figure 5.6 Specifications of Intel’s Core Processors
5.1 Intel MCPs (7) Figure 5.6 Specifications of Intel’s Core Processors Source:

Figure 5.7: Future 65 nm processors (overview)
5.1 Intel MCPs (8) Category Code Name Cores Cache Market Desktop Kentsfield Dual core multi-die 4 MB Mid 2007 Conroe Dual core single die 4 MB shared End 2006 Allendale 2 MB shared Cedar Mill (NetBurst/P4) Single core 512 kB, 1 MB, 2 MB Early 2006 Presler (NetBurst/P4) Dual core, dual die Desktop/Mobile Millville 1 MB Early 2007 Mobile Yonah2 Dual core, single die 2 MB Yonah1 1/2 MB Mid 2006 Stealey 512 kB Merom 2/4 MB shared Enterprise Sossaman Woodcrest Clovertown Quad core, multi-die Dempsey (NetBurst/Xeon) Tulsa 4/8/16 MB Whitefield Quad core single die 8 MB, 16 MB shared Early 2008 Figure 5.7: Future 65 nm processors (overview) Source: P. Schmid: Top Secret Intel Processor Plans Uncovered

Figure 5.8: Future 45 nm processors (overview)
5.1 Intel MCPs (9) Codename Cores Cache Market Desktop Wolfdale Dual core, single die 3 MB shared 2008 Ridgefield Dual core single die 6 MB shared Yorkfield 8 cores multi-die 12 MB shared 2008+ Bloomfield Quad core, single die - Desktop/Mobile Perryville Single core 2 MB Mobile Penryn 3 MB, 6 MB shared Silverthorne Enterprise Hapertown Figure 5.8: Future 45 nm processors (overview) Source: P. Schmid: Top Secret Intel Processor Plans Uncovered

Figure 5.9: AMD Athlon 64 X2 dual-core processor architecture
Source: AMD Athlon 64 X2 Dual-Core Processor for Desktop – Key Architecture Features,

5.3 Sun’s UltraSPARC IV/IV+ (1)
ARB: Arbiter Figure 5.10: UltraSPARC IV (Jaguar) Source: C. Boussard: Architecture des processeurs

5.3 Sun’s UltraSPARC IV/IV+ (2)
Figure 5.11: UltraSPARC IV+ (Panther) Source: C. Boussard: Architecture des processeurs

Figure 5.12: POWER4 chip logical view
5.4 POWER4/POWER5 (1) Core interface Unit (crossbar) Service Processor Power On Reset Built-In-SelfTest Non-Cacheable Unit MultiChip Module Figure 5.12: POWER4 chip logical view Source: J.M. Tendler, S. Dodson, S. Fields, H. Le, B. Sinharoy: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001

5.4 POWER4/POWER5 (2) Figure 5.13: POWER4 chip
Source: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003

Figure 5.14: POWER4 and POWER5 system structures
5.4 POWER4/POWER5 (3) Fabric Controller Figure 5.14: POWER4 and POWER5 system structures Source: R. Kalla, B. Sinharoy, J.M. Tendler: IBM Power5 chip: A Dual-core multithreaded Processor, IEEE. Micro, Vol. 24, No.2, March-April 2004, pp

5.5 Cell (1) SPE: Synergistic Processing Element
EIB: Element Interface Bus MFC: Memory Flow Controller PPE: Power Processing Element AUC: Atomic Update Cache Figure 5.15: Cell (BE) microarchitecture Source: IBM: „Cell Broadband Engine™ processor – based systems”, IBM corp. 2006

Figure 5.16: Cell SPE architecture
Source: Blachford N.: „Cell Architecture Explained Version 2”,

Figure 5.17: Cell floorplan
Source: Blachford N.: „Cell Architecture Explained Version 2”,

11. Multicore Processors Dezső Sima Fall 2006  D. Sima, 2006.

Similar presentations

Presentation on theme: "11. Multicore Processors Dezső Sima Fall 2006  D. Sima, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

11. Multicore Processors Dezső Sima Fall 2006  D. Sima, 2006.

Similar presentations

Presentation on theme: "11. Multicore Processors Dezső Sima Fall 2006  D. Sima, 2006."— Presentation transcript:

Similar presentations

About project

Feedback