Dezső Sima 2011 December (Ver. 1.5)  Sima Dezső, 2011 Platforms II.

Slides:



Advertisements
Similar presentations
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Advertisements

Premio Predator G2 Workstation Training
The AMD Athlon ™ Processor: Future Directions Fred Weber Vice President, Engineering Computation Products Group.
OPTERON (Advanced Micro Devices). History of the Opteron AMD's server & workstation processor line 2003: Original Opteron released o 32 & 64 bit processing.
Computer Hardware Processing and Internal Memory.
 2003 Micron Technology, Inc. All rights reserved. Information is subject to change without notice. High Performance Next­ Generation Memory Technology.
Dezső Sima, Olivér Asztalos November (Ver )  Sima Dezső, Olivér Asztalos Platforms I.
CPU Chips The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins.
Complete CompTIA A+ Guide to PCs, 6e Chapter 2: On the Motherboard © 2014 Pearson IT Certification
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
Intel® 64-bit Platforms Platform Features. Agenda Introduction and Positioning of Intel® 64-bit Platforms Intel® 64-Bit Xeon™ Platforms Intel® Itanium®
Dezső Sima Evolution of Intel’s Basic Microarchitectures - 2 April 2013 Vers. 3.3.
Input/Output Systems and Peripheral Devices (03-2)
Dezső Sima September 2008 (Ver. 1.0)  Sima Dezső, Macroarchitecture and performance parameters of MMs.
Advantech Embedded System Group Q2 2013
Dezső Sima Fall 2007 (Ver. 1.0)  Sima Dezső, 2007 Multisocket system architectures.
Dezső Sima november (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.
Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology.
III. Multicore Processors (3)
PHY 201 (Blum) Buses Warning: some of the terminology is used inconsistently within the field.
One physical processor – may consist of one or more cores One processing unit – may consist of one or more logical processors One logical computing.
Computer Organization CSC 405 Bus Structure. System Bus Functions and Features A bus is a common pathway across which data can travel within a computer.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
Interconnection Structures
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign ECE 498AL Lecture 6: GPU as part of the PC Architecture.
Serial vs.Parallel Computing Scalable Perf. vs. Availability
LOGO BUS SYSTEM Members: Bui Thi Diep Nguyen Thi Ngoc Mai Vu Thi Thuy Class: 1c06.
Current Computer Architecture Trends CE 140 A1/A2 29 August 2003.
Dezső Sima 2012 Mai (Ver. 1.5)  Sima Dezső, 2012 Platforms I.
Complete CompTIA A+ Guide to PCs, 6e Chapter 2: On the Motherboard © 2014 Pearson IT Certification
Intel’s Penryn Sima Dezső Fall 2007 Version nm quad-core -
PR-DLS Motherboard Training for TSD & RMA engineers.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
NVMe & Modern PC and CPU Architecture 1. Typical PC Layout (Intel) Northbridge ◦Memory controller hub ◦Obsolete in Sandy Bridge Southbridge ◦I/O controller.
DP/MP System Architectures
Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 3 – The Motherboard.
Dezső Sima Evolution of Intel’s Basic Microarchitectures - 2 November 2012 Vers. 3.2.
Dezső Sima 2011 November (Ver. 1.4)  Sima Dezső, 2011 Platforms I.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
1 “ Intel 820 Chipset ” Intel 820 Chipset A Revolutionary Architecture for Mainstream Performance PCs in 2000 By … Miss Taungthong Wattarujeekrit
Evolution of Microprocessors Microprocessor A microprocessor incorporates most of all the functions of a computer’s central processing unit on a single.
Dezső Sima Fall 2007 (Ver. 2.1)  Dezső Sima, 2007 Multicore Processors (2)
Dezső Sima 2012 December (Ver. 1.6)  Sima Dezső, 2012 Platforms I.
1H’09 Public Roadmap 111 Intel ® Public Roadmap 1H’ 2009.
Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.
1 Chapter 2 Central Processing Unit. 2 CPU The "brain" of the computer system is called the central processing unit. Everything that a computer does is.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Dezső Sima September 2015 (Ver. 1.3)  Sima Dezső, 2015 Intel’s High Performance MP Servers and Platforms.
1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.
Dezső Sima 2011 (Ver. 1.1)  Sima Dezső, 2011 Platforms I.
The Technology Catalyst Performance: Computational capability – Improve application performance by as much as 10X, Increase density and lowering cost.
Dezső Sima September 2015 (Ver. 1.3)  Sima Dezső, 2015 Intel’s High Performance MP Servers and Platforms.
THE COMPUTER MOTHERBOARD AND ITS COMPONENTS Compiled By: Jishnu Pradeep.
Sima Dezső 2007 őszi félév (Ver. 2.1)  Dezső Sima, 2007 Többmagos Processzorok (3)
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
Dezső Sima April 2016 (Ver. 1.5)  Sima Dezső, 2016 Intel’s High Performance MP Servers and Platforms.
Manycore processors Sima Dezső October Version 6.2.
Intel’s High Performance
Multiple Processor Systems
Platforms I. Dezső Sima 2012 December (Ver. 1.5)  Sima Dezső, 2011.
Hot Processors Of Today
BIC 10503: COMPUTER ARCHITECTURE
Unit 2 Computer Systems HND in Computing and Systems Development
III. Multicore Processors (2)
Többmagos Processzorok (2)
Presentation transcript:

Dezső Sima 2011 December (Ver. 1.5)  Sima Dezső, 2011 Platforms II.

3. Platform architectures

Contents 3.1. Design space of the basic platform architecture 3.2. DT platforms 3. Platform architectures Design space of the basic architecture of DT platforms Evolution of Intel’s home user oriented multicore DT platforms Evolution of Intel’s business user oriented multicore DT platforms 3.3. DP server platforms Design space of the basic architecture of DP server platforms Evolution of Intel’s low cost oriented multicore DP server platforms Evolution of Intel’s performance oriented multicore DP server platforms

Contents 3.4. MP server platforms Evolution of Intel’s multicore MP server platforms Evolution of AMD’s multicore MP server platforms Design space of the basic architecutre of MP server platforms

3.1. Design space of the basic platform architecture

3.1 Design space of the basic platform architecture (1) Platform architecture Architecture of the processor subsystem Interpreted only for DP/MP systems In SMPs: Specifies the interconnection of the processors and the chipset In NUMAs: Specifies the interconnections between the processors Specifies MCH ICH PP PP MCH... ICH PP PP Memory is attached to the MCH There are serial FB-DIMM channels Processors are connected to the MCH by individual buses Architecture of the I/O subsystem Specifies the structure of the I/O subsystem (Will not be discussed) Example: Core 2/Penryn based MP platform MCH... ICH PP PP The chipset consist of two parts designated as the MCH and the ICH FSB Architecture of the memory subsystem the point and the layout of the interconnection

The notion of Basic platform architecture Platform architecture Architecture of the processor subsystem Architecture of the I/O subsystem Architecture of the memory subsystem Basic platform architecture 3.1 Design space of the basic platform architecture (2)

The notion of Basic platform architecture Platform architecture Architecture of the processor subsystem Architecture of the I/O subsystem Architecture of the memory subsystem Basic platform architecture 3.1 Design space of the basic platform architecture (2)

SMP systems Architecture of the processor subsystem NUMA systems Scheme of attaching the processors to the rest of the platform Scheme of interconnecting the processors MCH ICH PP PP FSB P.. PP P Examples Architecture of the processor subsystem Interpreted only for DP and MP systems. The interpretation depends on whether the multiprocessor system is an SMP or NUMA 3.1 Design space of the basic platform architecture (3)

a) Scheme of attaching the processors to the rest of the platform (In case of SMP systems) Scheme of attaching the processors to the rest of the platform DP platforms MP platforms MCH P P PP Mem ory MCH PP PP Mem ory MCH.. PP PP Mem ory P P MCH P P Mem ory MCH Dual FSBs Single FSB Dual FSBs Single FSB Quad FSBs 3.1 Design space of the basic platform architecture (4)

b) Scheme of interconnecting the processors (In case of NUMA systems) PP PP PP PP Fully connected mesh Mem ory Partially connected mesh Scheme of interconnecting the processors 3.1 Design space of the basic platform architecture (5)

The notion of Basic platform architecture Platform architecture Architecture of the processor subsystem Architecture of the I/O subsystem Architecture of the memory subsystem Basic platform architecture 3.1 Design space of the basic platform architecture (6)

Architecture of the memory subsystem (MSS) Layout of the interconnection Point of attaching the MSS Architecture of the memory subsystem (MSS) 3.1 Design space of the basic platform architecture (7)

MCH Memory Processor ? Point of attaching the MSS a) Point of attaching the MSS (Memory Subsystem) (1) Platform 3.1 Design space of the basic platform architecture (8)

Attaching memory to the MCH (Memory Control Hub) Point of attaching the MSS Attaching memory to the processor(s) Point of attaching the MSS (2) Longer access time (~ 20 – 70 %), Shorter access time (~ 20 – 70 %), As the memory controller is on the processor die, the memory type (e.g. DDR2 or DDR3) and speed grade is bound to the processor chip design. As the memory controller is on the MCH die, the memory type (e.g. DDR2 or DDR3) and speed grade is not bound to the processor chip design. 3.1 Design space of the basic platform architecture (9)

Attaching memory to the MCH (Memory Control Hub) Point of attaching the MSS Attaching memory to the processor(s) DT platformsDP/MP platformsDT platformsDP/MP platforms DT Systems with off-die memory controllers DT Systems with on-die memory controllers Shared memory DP/MP systems Distributed memory DP/MP systems SMP systems (Symmetrical Multiporocessors) (Systems w/ non uniform memory access) NUMA systems Related terminology 3.1 Design space of the basic platform architecture (10)

Attaching memory to the MCH Point of attaching the MSS Attaching memory to the processor(s) MCH ICH FSB Processor MCH ICH FSB Processor Intel’s processors before Nehalem Intel’s Nehalem and subsequent processors Memory Example 1: Point of attaching the MSS in DT systems DT System with off-die memory controller DT System with on-die memory controller Examples 3.1 Design space of the basic platform architecture (11)

Intel’s processors before Nehalem Shared memory DP server aka Symmetrical Multiprocessor (SMP) Memory does not scale with the number of processors MCH ICH FSB Processor Memory Distributed memory DP server aka System w/ non-uniform memory access (NUMA) Memory scales with the number of processors Intel’s Nehalem and subsequent processors MCH ICH FSB Memory Processor Attaching memory to the MCH Point of attaching the MSS Attaching memory to the processor(s) Example 2: Point of attaching the MSS in SMP-based DP servers Examples 3.1 Design space of the basic platform architecture (12)

Point of attaching the MSS Attaching memory to the processor(s) Attaching memory to the MCH POWER4 (2C) (2001) POWER5 (2C) (2005) and subsequent POWER families Montecito (2C) (2006) Opteron server lines (2C) (2003) and all subsequent AMD lines PA-8800 (2004) PA-8900 (2005) and all previous PA lines Core 2 Duo line (2C) (2006) and all preceding Intel lines Core 2 Quad line (2x2C) (2006/2007) Penryn line (2x2C) (2008) Figure: Point of attaching the MSS Nehalem lines (4) (2008) and all subsequent Intel lines Examples Tukwila (4C) (2010??) AMD’s K7 lines (1C) ( ) UltraSPARC III (2001) and all subsequent Sun lines UltraSPARC II (1C) (~1997) 3.1 Design space of the basic platform architecture (13)

Figure: Attaching memory via parallel channels or serial links Layout of the interconnection Attaching memory via parallel channels Attaching memory via serial links Data are transferred over parallel buses Data are transferred over point-to-point links in form of packets 01 E.g: 16 cycles/packet on a 1-bit wide link 1515 E.g: 4 cycles/packet on a 4-bit wide link 01 MC t t t E.g: 64 bits data + address, command and control as well as clock signals in each cycle b) Layout of the interconnection 3.1 Design space of the basic platform architecture (14)

b1) Attaching memory via parallel channels The memory controller and the DIMMs are connected Example 1: Attaching DIMMs via a single parallel memory channel to the memory controller that is implemented on the chipset [45] 3.1 Design space of the basic platform architecture (15) by a single parallel memory channel or a few number of memory channels to synchron DIMMs, such as SDRAM, DDR, DDR2 or DDR3 DIMMs.

Example 2: Attaching DIMMs via 3 parallel memory channels to memory controllers implemented on the processor die (This is actually Intel’s the Tylersburg DP platform, aimed at the Nehalem-EP processor, used for up to 6 cores) [46] 3.1 Design space of the basic platform architecture (16)

The number of lines needed depend on the kind of the memory modules, as indicated below: SDRAM DDR DDR2 DDR3 168-pin 184-pin 240- pin All these DIMM modules provide an 8-byte wide datapath and optionally ECC and registering. The number of lines of the parallel channels 3.1 Design space of the basic platform architecture (17)

Attaching memory via serial links Serial links attach FB-DIMMs.. Serial link Serial link.. FB-DIMMs provide buffering and S/P conversion Proc. /MCH Serial links attach S/P converters w/ parallel channels Proc. /MCH S/P.. S/P.. Serial link Serial link Design space of the basic platform architecture (18) b2) Attaching memory via serial links Serial memory links are point-to-point interconnects that use differential signaling.

65 nm Pentium 4 Prescott DP (2x1C)/ Core2 (2C/2*2C) E5000 MCH 631*ESB/ 632*ESB IOH FSB 65 nm Pentium 4 Prescott DP (2C)/ Core2 (2C/2*2C) FB-DIMM w/DDR2-533 Xeon 5000 (Dempsey) 2x1C Xeon 5100 (Woodcrest) 2C Xeon 5300 (Clowertown) 2x2C / / ESI ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface, providing 1 GB/s transfer rate in each direction) Xeon 5400 (Harpertown) 2x2C Xeon 5200 (Harpertown) 2C / // Example 1: FB-DIMM links in Intel’s Bensley DP platform aimed at Core 2 processors Design space of the basic platform architecture (19)

Example 2: SMI links in Intel’s the Boxboro-EX platform aimed at the Nehalem-EX processors Design space of the basic platform architecture (20) Nehalem-EX (8C) Westmere-EX (10C) QPI DDR SMB ICH10 ESI DDR SMB 7500 IOH QPI Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores) or Xeon 6500 (Nehalem-EX) (Becton) Xeon E (Westmere-EX) ME SMI: Serial link between the processor and the SMB SMB: Scalable Memory Buffer with Parallel/serial conversion SMI links Nehalem-EX (8C) Westmere-EX (10C)

The SMI interface builds on the Fully Buffered DIMM architecture with a few protocol changes, such as those intended to support DDR3 memory devices. It has the same layout as FB-DIMM links (14 outbound and 10 inbound differential lanes as well as a few clock and control lanes). It needs altogether about 50 PC trails. Example 2: The SMI link of Intel’s the Boxboro-EX platform aimed at the Nehalem-EX processors-2 [26] 3.1 Design space of the basic platform architecture (21) SMB

... Attaching memory via parallel channels Layout of the interconnection Attaching memory via serial links Serial links attach S/P- converters w/ par. channels Attaching memory to the processor(s) Point of attaching memory Attaching memory to the MCH P S/P.. S/P.. S/P.. S/P... P MCH.. S/P.. S/P... MCH PP... Serial links attach FB-DIMMs MCH PP Parallel channels attach DIMMs Design space of the architecture of the MSS 3.1 Design space of the basic platform architecture (22)

Subsequent fields from left to right and from top to down of the design space of the architecture of MSS allow to implement an increasing number of memory channels (n M ), as discussed in Section and indicated in the next figure. Max. number of memory channels that can be implemented while using particular design options of the MSS 3.1 Design space of the basic platform architecture (23)

... Attaching memory via parallel channels Layout of the interconnection Attaching memory via serial links Serial links attach S/P- converters w/ par. channels Attaching memory to the processor(s) Point of attaching memory Attaching memory to the MCH P S/P.. S/P.. S/P.. S/P... P MCH.. S/P.. S/P... MCH PP... Serial links attach FB-DIMMs MCH PP nCnC Parallel channels attach DIMMs Design space of the architecture of the MSS 3.1 Design space of the basic platform architecture (24)

The design space of the basic platform architecture-1 Platform architecture Architecture of the processor subsystem Architecture of the I/O subsystem Architecture of the memory subsystem Basic platform architecture 3.1 Design space of the basic platform architecture (25)

The design space of the basic platform architectures-2 Obtained as the combinations of the options available for the main aspects discussed. Basic platform architecture Architecture of the processor subsystem Scheme of attaching the processors (In case of SMP systems) Scheme of interconnecting the processors (In case of NUMA systems) Architecture of the memory subsystem (MSS) Layout of the interconnection Point of attaching the MSS 3.1 Design space of the basic platform architecture (26)

Design space of the basic architecture of particular platforms Design space of the basic architecture of DT platforms Design space of the basic architecture of DP server platforms Design space of the basic architecture of MP server platforms The design space of the basic platform architecture of DT, DP and MP platforms will be discussed subsequently in the Sections 3.2.1, and Design space of the basic platform architecture (27)

3.2. DT platforms Design space of the basic architecture of DT platforms Evolution of Intel’s home user oriented multicore DT platforms Evolution of Intel’s business user oriented multicore DT platforms

3.2 DT platforms Design space of the basic architecture of DT platforms

3.2.1 Design space of the basic architecture of DT platforms (1) MCH... ICH P P Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Point of attaching the MSS MCH.. P ICH P.. Attaching memory to the MCH Attaching memory to the processor Pentium D/EE to Penryn (Up to 4C) 1. G. Nehalem to Sandy Bridge (Up to 6C) DT platforms No. of mem. channels.. S/P.. S/P... MCH P ICH.. S/P.. S/P.. S/P.. S/P... P Serial links attach. S/P conv. w/ par. chan. Parallel channels attach DIMMs

Layout of the interconnection Attaching memory via serial links Seria l links attach FB-DIMMs Attaching memory via parallel channels Serial links attach S/P converters w/ par. channels Pentium D/EE 2x1C (2005/6) Core 2 2C (2006) Core 2 Quad 2x2C (2007) Penryn 2C/2x2C (2008) 1. G. Nehalem 4C (2008) Westmere-EP 6C (2010) 2. G. Nehalem 4C (2009) Westmere-EP 2C+G (2010) Sandy Bridge 2C/4C+G (2011) Sandy Bridge-E 6C (2011) Attaching memory to the MCH Attaching memory to the processor Point of attaching the MSS Evolution of Intel’s DT platforms (Overview) No. of memory channels No need for higher memory bandwidth through serial memory interconnection Parallel channels attach DIMMs Design space of the basic architecture of DT platforms (2)

Up to DDR /4 DDR2 DIMMs up to 4 ranks Pentium D/ Pentium EE (2x1C) 945/955X/975X MCH ICH7 FSB DMI Core2 2C Core 2 Quad (2x2C) /Penryn (2C/2*2C) 965/3-/4- Series MCH ICH8/9/10 FSB DMI Up to DDR2-800 Up to DDR X58 IOH ICH10 QPI DMI 1. gen. Nehalem (4C)/ Westmere-EP (6C) Up to DDR Tylersburg (2008)Anchor Creek (2005) Bridge Creek (2006) (Core 2 aimed) Salt Creek (2007) (Core 2 Quad aimed) Boulder Creek (2008) (Penryn aimed) Evolution of Intel’s home user oriented multicore DT platforms (1)

X58 IOH ICH10 QPI DMI 1. gen. Nehalem (4C)/ Westmere-EP (6C) 2. gen. Nehalem (4C)/ Westmere-EP (2C+G) 5- Series PCH FDI DMI Sandy Bridge (4C+G) 6- Series PCH FDI DMI2 Up to DDR Up to DDR Sugar Bay (2011) Up to DDR Tylersburg (2008) Kings Creek (2009) Evolution of Intel’s home user oriented multicore DT platforms (2)

X58 IOH ICH10 QPI DMI 1. gen. Nehalem (4C)/ Westmere-EP (6C) Up to DDR Tylersburg (2008) Up to DDR Waimea Bay (2011) X79 PCH DMI2 DDR3-1600: up to 1 DIMM per channel DDR3-1333: up to 2 DIMMs per channel Sandy Bridge-E (4C)/6C) Evolution of Intel’s home user oriented multicore DT platforms (3)

Up to DDR /4 DDR2 DIMMs up to 4 ranks Pentium D/ Pentium EE (2x1C) 945/955X/975X MCH ICH7 FSB DMI Core2 (2C) Core 2 Quad (2x2C) Penryn (2C/2*2C) Q965/Q35/Q45 MCH ICH8/9/10 FSB Up to DDR2-800 Up to DDR Up to DDR Piketon (2009) Lyndon (2005) Averill Creek (2006) (Core 2 aimed) Weybridge (2007) (Core 2 Quad aimed) McCreary (2008) (Penryn aimed) 82573E GbE (Tekoe) Gigabit Ethernet LAN connection LCI 82566/82567 LAN PHY LCI/GLCI Gigabit Ethernet LAN connection GbE LAN PHY PCIe 2.0/SMbus 2.0 Gigabit Ethernet LAN connection DMI C-link ME Q57 PCH ME 2. gen. Nehalem (4C) Westmere-EP (2C+G) FDI DMI Evolution of Intel’s business user oriented multicore DT platforms (1)

Sugar Bay (2011) Up to DDR GbE LAN PHY PCIe 2.0/SMbus 2.0 Gigabit Ethernet LAN connection Q57 PCH ME Piketon (2009) 2. gen. Nehalem (4C) Westmere-EP (2C+G) GbE LAN PCIe 2.0/SMbus 2.0 Gigabit Ethernet LAN connection Q67 PCH ME Sandy Bridge (4C+G) FDI DMI FDI DMI2 Up to DDR Evolution of Intel’s business user oriented multicore DT platforms (2)

3.3. DP server platforms Design space of the basic architecture of DP server platforms Evolution of Intel’s low cost oriented multicore DP server platforms Evolution of Intel’s performance oriented multicore DP server platforms

3.3 DP server platforms Design space of the basic architecture of DP server platforms

MCH.. P P MCH.. P P ICH MCH... ICH P P MCH... ICH P P Design space of the basic architecture of DP server platforms (1) Single FSB Dual FSBs 90 nm Pentium 4 DP 2x1C (2005) Core 2/Penryn 2C/2x2C (2006/7) 65 nm Pentium 4 DP 2x1C Core 2/Penryn 2C/2x2C (2006/7) PP.. S/P.. S/P... MCH P P ICH.. S/P.. S/P... MCH ICH P P.. P S/P.. S/P.. S/P.. S/P... P PP NUMA Nehalem-EX/Westmere-EX 8C/10C (2010/11) Nehalem-EP to Sandy Bridge -EP/EN Up to 8 C (2009/11) Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Serial links attach. S/P conv. w/ par. chan. Parallel channels attach DIMMs nMnM DP platforms

Single FSBDual FSBs 90 nm Pentium 4 DP 2x1C (2006) Core 2 2C/ Core 2 Quad 2x2C/ Penryn 2C/2x2C (2006/2007) SMP NUMA Nehalem-EP 4C (2009) Westmere-EP 6C (2010) No. of memory channels (Paxville DP) (Cranberry Lake)(Tylersburg-EP) Nehalem-EX/ Westmere-EX 8C/10C (2010/2011) (Boxboro-EX) 65 nm Pentium 4 DP 2x1C Core 2 2C Core 2 Quad 2x2C Penryn 2C/2x2C (2006/2007) (Bensley) Eff. HP No. of memory channels Sandy Bridge-EN 8C (2011) Romley-EN Sandy Bridge-EP 8C (20 11) Romley-EP Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Serial links attach. S/P converters w/ par. chan. Parallel channels attach DIMMs nMnM HP Scheme of attaching and interconnecting DP processors Evolution of Intel’s DP platforms (Overview) Design space of the basic architecture of DP server platforms (2)

3.3.2 Evolution of Intel’s low cost oriented multicore DP server platforms (1) Evolution of Intel’s low cost oriented multicore DP server platforms

90 nm Pentium 4 Prescott DP (2C) E7520 MCH ICH5R/ 6300ESB IOH FSB HI nm Pentium 4 Prescott DP (2C) FSB DDR-266/333 DDR nm Pentium 4 Prescott DP aimed DP server platform (for up to 2 C) Xeon DP 2.8 /Paxville DP) HI 1.5 (Hub Interface 1.5) 8 bit wide, 66 MHz clock, QDR, 66 MB/s peak transfer rate Core 2 (2C/ Core 2 Quad (2x2C)/ Penryn (2C/2x2C) E5100 MCH ICHR9 FSB ESI Core 2 (2C/ Core 2 Quad (2x2C)/ /Penryn (2C/2x2C) DDR2-533/667 Penryn aimed Cranberry Lake DP server platform (for up to 4 C) Xeon 5300 (Clowertown) 2x2C Xeon 5400 (Harpertown) 4C Xeon 5200 (Harpertown) 2C or ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface, providing 1 GB/s transfer rate in each direction) Evolution from the Pentium 4 Prescott DP aimed DP platform (up to 2 cores) to the Penryn aimed Cranberry Lake DP platform (up to 4 cores) Evolution of Intel’s low cost oriented multicore DP server platforms (2)

Sandy Bridge-EN (Socket B2) aimed Romley-EN DP server platform (for up to 8 cores) Sandy Bridge-EN (8C) Socket B2 C600 PCH DMI2 QPI Sandy Bridge-EN (8C) Socket B2 DDR Penryn aimed Cranberry Lake DP platform (for up to 4 C) Core 2 (2C/2x2C)/ Penryn (2C/4C) proc. E5100 MCH ICHR9 FSB ESI Core 2 (2C/2x2C)/ Penryn (2C/4C) proc. DDR2-533/667 Xeon 5300 (Clowertown) 2x2C Xeon 5400 (Harpertown) 4C Xeon 5200 (Harpertown) 2C or E Sandy Bridge–EN 8C Evolution from the Penryn aimed Cranberry Lake DP platform (up to 4 cores) to the Sandy Bridge-EP aimed Romley-EP DP platform (up to 8 cores) Evolution of Intel’s low cost oriented multicore DP server platforms (3)

3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (1) Evolution of Intel’s performance oriented multicore DP server platforms

90 nm Pentium 4 Prescott DP (2x1C) E7520 MCH ICH5R/ 6300ESB IOH FSB HI nm Pentium 4 Prescott DP (2x1C) FSB DDR-266/333 DDR nm Pentium 4 Prescott DP (2x1C)/ Core2 (2C/2*2C) E5000 MCH 631*ESB/ 632*ESB IOH FSB 65 nm Pentium 4 Prescott DP (2C)/ Core2 (2C/2*2C) FB-DIMM w/DDR2-533 Evolution from the Pentium 4 Prescott DP aimed DP platform (up to 2 cores) to the Core 2 aimed Bensley DP platform (up to 4 cores) 90 nm Pentium 4 Prescott DP aimed DP server platform (for up to 2 C) Core 2 aimed Bensley DP server platform (for up to 4 C) Xeon DP 2.8 /Paxville DP) Xeon 5000 (Dempsey) 2x1C Xeon 5100 (Woodcrest) 2C Xeon 5300 (Clowertown) 2x2C / / ESI ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface, providing 1 GB/s transfer rate in each direction) HI 1.5 (Hub Interface 1.5) 8 bit wide, 66 MHz clock, QDR, 66 MB/s peak transfer rate Xeon 5400 (Harpertown) 2x2C Xeon 5200 (Harpertown) 2C / // Evolution of Intel’s performance oriented multicore DP server platforms (2)

FBDIMM w/DDR nm Pentium 4 Prescott DP (2C)/ Core2 (2C/2*2C) 5000 MCH 631*ESB/ 632*ESB IOH FSB 65 nm Pentium 4 Prescott DP (2C)/ Core2 (2C/2*2C) ESI 65 nm Core 2 aimed high performance Bensley DP server platform (for up to 4 C) 1 First chipset with PCI 2.0 ME: Management Engine Nehalem-EP aimed Tylersburg-EP DP server platform with dual IOHs (for up to 6 C) DDR Nehalem-EP (4C) Westmere-EP (6C) 55xx IOH 1 QPI ICH9/ICH10 ESI QPI CLink DDR ME Nehalem-EP (4C) Westmere-EP (6C) DDR Nehalem-EP (4C) Westmere-EP (6C) QPI ICH9/ICH10 ESI QPI CLink DDR ME Nehalem-EP (4C) Westmere-EP (6C) 55xx IOH 1 QPI 55xx IOH 1 ME Nehalem-EP aimed Tylersburg-EP DP server platform with a single IOH (for up to 6 C) Evolution from the Core 2 aimed Bensley DP platform (up to 4 cores) to the Nehalem-EP aimed Tylersburg-EP DP platform (up to 6 cores) Evolution of Intel’s performance oriented multicore DP server platforms (3)

Basic system architecture of the Sandy Bridge-EN and -EP aimed Romley-EN and –EP DP server platforms Nehalem –EP (4C) Westmere-EP (6C) 34xx PCH DMI QPI Nehalem-EP (4C) Westmere-EP (6C) DDR ME Xeon 55xx (Gainestown) Xeon 56xx (Gulftown) / Nehalem-EP aimed Tylersburg-EP DP server platform (for up to 6 cores) Sandy Bridge-EP (Socket R) aimed Romley-EP DP server platform (for up to 8 cores) (LGA 2011) C600 PCH DMI2 QPI 1.1 DDR QPI 1.1 Sandy Bridge-EP (8C) Socket R E Sandy Bridge–EP 8C E Sandy Bridge-EP 8C Evolution of Intel’s performance oriented multicore DP server platforms (4)

Nehalem-EX (8C) Westmere-EX (10C) QPI DDR SMB ICH10 ESI DDR SMB 7500 IOH QPI Nehalem –EP (4C) Westmere-EP (6C) 34xx PCH ESI QPI DDR ME Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores) Xeon 5500 (Gainestown) Xeon 5600 (Gulftown) or Nehalem-EP aimed Tylersburg-EP DP server platform (for up to 6 cores) Nehalem-EX (8C) Westmere-EX (10C) or Xeon 6500 (Nehalem-EX) (Becton) Xeon E (Westmere-EX) ME Contrasting the Nehalem-EP aimed Tylersburg-EP DP platform (up to 6 cores) to the Nehalem-EX aimed very high performance scalable Boxboro-EX DP platform (up to 10 cores) Nehalem –EP (4C) Westmere-EP (6C) SMI: Serial link between the processor and the SMB SMB: Scalable Memory Buffer with Parallel/serial conversion SMI links Evolution of Intel’s performance oriented multicore DP server platforms (5)

3.4. MP server platforms Evolution of Intel’s multicore MP server platforms Evolution of AMD’s multicore MP server platforms Design space of the basic architecture of MP server platforms

3.4 MP server platforms Design space of the basic architecture of MP server platforms

.. S/P.. S/P... MCH.. PP PP MCH.. PP PP ICH MCH.. PP PP ICH PP PP.. S/P.. S/P... MCH ICH PP PP MCH..... S/P.. S/P... MCH ICH PP PP PP PP MCH... ICH PP PP MCH... ICH PP PP Design space of the basic architecture of MP server platforms (1) MP SMP platforms Single FSB Dual FSBsQuad FSBs Pentium 4 MP 1C (2004) 90 nm Pentium 4 MP 2x1C Core 2/Penryn up to 6C Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Serial links attach. S/P conv. w/ par. chan. Parallel channels attach DIMMs

PP.. PP PP PP PP.... PP PP PP P S/P.. S/P.. S/P.. S/P... P.. P. P.. S/P.. S/P.. S/P.. S/P.. P S/P.. S/P.. S/P.. S/P... P.. P. P.. S/P.. S/P.. S/P.. S/P... MP NUMA platforms Partially connected meshFully connected mesh AMD Direct Connect Architecture 1.0 (2003)AMD Direct Connect Architecture 2.0 (2010) Nehalem-EX/Westmere up to 10C (2010/11) Inter proc. BW Mem. BW Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Serial links attach. S/P conv. w/ par. chan. Parallel channels attach DIMMs Design space of the basic architecture of MP server platforms (2)

Single FSBDual FSBs Quad FSBs Pentium 4 MP 1C (2004) (Not named) 90 nm Pentium 4 MP 2x1C (2006) (Truland) Core 2/Penryn up to 6C (2006/2007) Caneland Part. conn. mesh Fully conn. mesh SMP NUMA Scheme of attaching and interconnecting MP processors Nehalem-EX/ Westmere up to 10C (2010/11) (Boxboro-EX) AMD DCA 1.0 (2003) AMD DCA 2.0 (2010) No. of memory channels Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Serial links attach. S/P converters w/ par. chan. Parallel channels attach DIMMs Interproc. bandwidth Evolution of Intel’s MP platforms (Overview) Design space of the basic architecture of MP server platforms (3)

3.4.2 Evolution of Intel’s multicore MP server platforms (1) Evolution of Intel’s multicore MP server platforms

Xeon MP 1 SC Xeon MP 1 SC FSB Xeon MP 1 SC Xeon MP 1 SC Preceding ICH Preceding NBs E.g. HI 1.5 HI MB/s E.g. DDR-200/ /8501 ICH5 XMB DDR-266/333 DDR2-400 FSB XMB HI 1.5 DDR-266/333 DDR2-400 Xeon 7000 (Paxville MP) 2x1C Xeon 7100 (Tulsa) 2C Xeon MP (Potomac) 1C / / 90 nm Pentium 4 Prescott MP aimed Truland MP server platform (for up to 2 C) Pentium 4 Xeon MP 1C/2x1C Previous Pentium 4 MP aimed MP server platform (for single core processors) Evolution from the first generation MP servers supporting SC processors to the 90 nm Pentium 4 Prescott MP aimed Truland MP server platform (supporting up to 2 cores) Evolution of Intel’s multicore MP server platforms (2)

Core 2 (2C/2x2C) Penryn (6C) Core 2 (2C/2x2C) Penryn (6C) Core 2 (2C/2x2C) Penryn (6C) Core 2 (2C/2x2C) Penryn (6C) xESB/ 632xESB 4 channels up to 8 DIMMs /channel /8501 ICH5 XMB DDR-266/333 DDR2-400 FSB XMB ESI FSB HI 1.5 DDR-266/333 DDR2-400 FB-DIMM DDR2-533/667 Xeon 7000 (Paxville MP) 2x1C Xeon 7100 (Tulsa) 2C Xeon MP (Potomac) 1C / / 1 The E8500 MCH supports an FSB of 667 MT/s and consequently only the SC Xeon MP (Potomac) Xeon 7200 (Tigerton DC) 1x2C Xeon 7300 (Tigerton QC) 2x2C Xeon 7400 (Dunnington 6C) // 90 nm Pentium 4 Prescott MP aimed Truland MP server platform (for up to 2 C) Core 2 aimed Caneland MP server platform (for up to 6 C) ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface, providing 1 GB/s transfer rate in each direction) HI 1.5 (Hub Interface 1.5) 8 bit wide, 66 MHz clock, QDR, 266 MB/s peak transfer rate Pentium 4 Xeon MP 1C/2x1C Evolution from the 90 nm Pentium 4 Prescott MP aimed Truland MP platform (up to 2 cores) to the Core 2 aimed Caneland MP platform (up to 6 cores) Evolution of Intel’s multicore MP server platforms (3)

Nehalem-EX 8C Westmere-EX 10C 7500 IOH QPI SMB DDR SMB ICH10 ESI DDR SMI: Serial link between the processor and the SMBs SMB: Scalable Memory Buffer Parallel/serial converter SMB 2x4 SMI channels 2x4 SMI channels ME ME: Management Engine Xeon 7500 (Nehalem-EX) (Becton) 8C Xeon (Westmere-EX) 10C Nehalem-EX 8C Westmere-EX 10C Nehalem-EX 8C Westmere-EX 10C Nehalem-EX 8C Westmere-EX 10C / Nehalem-EX aimed Boxboro-EX MP server platform (for up to 10 C) Evolution to the Nehalem-EX aimed Boxboro-EX MP platform (that supports up to 10 cores) (In the basic system architecture we show the single IOH alternative) Evolution of Intel’s multicore MP server platforms (4)

3.4.3 Evolution of AMD’s multicore MP server platforms (1) Evolution of AMD’s multicore MP server platforms [47] (1) Introduced in the single core K8-based Opteron DP/MP servers (AMD 24x/84x) (6/2003) Memory: 2 channels DDR-200/333 per processor, 4 DIMMs per channel.

Introduced in the 2x6 core K10-based Magny-Course (AMD 6100)(3/2010) Memory: 2x2 channels DDR per processor, 3 DIMMs per channel Evolution of AMD’s multicore MP server platforms [47] (2) Evolution of AMD’s multicore MP server platforms (2)

9. References

9. References (1) [1]: Wikipedia: Centrino, [2]: Industry Uniting Around Intel Server Architecture; Platform Initiatives Complement Strong Intel IA-32 and IA-64 Targeted Processor Roadmap for 1999, Business Wire, Febr , +Architecture%3B+Platform...-a [3]: Intel Core 2 Duo Processor, [4]: Keutzer K., Malik S., Newton R., Rabaey J., Sangiovanni-Vincentelli A., System Level Design: Orthogonalization of Concerns and Platform-Based Design, IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, Dec. 2000, pp [5]: Krazit T., Intel Sheds Light on 2005 Desktop Strategy, IDG News Service, Dec , [6]: Perich D., Intel Volume platforms Technology Leadership, Presentation at HP World 2004, [7] Powerful New Intel Server Platforms Feature Array Of Enterprise-Class Innovations. Intel’s Press release, Aug. 2, 2004, [8]: Smith S., Multi-Core Briefing, IDF Spring 2005, San Francisco, Press presentation, March , [9]: An Introduction to the Intel QuickPath Interconnect, Jan. 2009, content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf [10]: Davis L. PCI Express Bus,

9. References (2) [11]: Ng P. K., “High End Desktop Platform Design Overview for the Next Generation Intel Microarchitecture (Nehalem) Processor,” IDF Taipei, TDPS001, 2008, Taipei_TDPS001_100.pdf [12]: Computing DRAM, Samsung.com, /products/dram/Products_ComputingDRAM.html [13]: Samsung’s Green DDR3 – Solution 3, 20nm class 1.35V, Sept. 2011, Documents/downloads/green_ddr3_2011.pdf [14]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page , Jan. 2002, [15]: Datasheet, SD9C16_32x72.pdf [16]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, [17]: Detecting Memory Bandwidth Saturation in Threaded Applications, Intel, March , threaded-applications/ [18]: McCalpin J. D., STREAM Memory Bandwidth, July , [19]: Rogers B., Krishna A., Bell G., Vu K., Jiang X., Solihin Y., Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling, ISCA 2009, Vol. 37, Issue 1, pp

9. References (3) [20]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture: Capacity, Performance, Reliability, and Longevity, Febr , [21]: Wikipedia: Intel X58, 2011, [22]: Sharma D. D., Intel 5520 Chipset: An I/O Hub Chipset for Server, Workstation, and High End Desktop, Hotchips 2009, HC I-O-Epub/HC DasSharma-Intel-5520-Chipset.pdf [23]: DDR2 SDRAM FBDIMM, Micron Technology, 2005, [24]: Wikipedia: Fully Buffered DIMM, 2011, [25]: Intel E8500 Chipset eXternal Memory Bridge (XMB) Datasheet, March 2005, bridge-datasheet.pdf [26]: Intel 7500/7510/7512 Scalable Memory Buffer Datasheet, April 2011, buffer-datasheet.pdf [27]: AMD Unveils Forward-Looking Technology Innovation To Extend Memory Footprint for Server Computing, July , [28]: Chiappetta M., More AMD G3MX Details Emerge, Aug , Hot Hardware,

9. References (4) [29]: Goto S. H., The following server platforms AMD, May , PC Watch, [30]: Wikipedia: Socket G3 Memory Extender, 2011, [31]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc. [32]: Ahn J.-H., „Memory Design Overview,” March 2007, Hynix, [33]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II Pro FPGAs”, WP190, Febr , Xilinx, [34]: Kirstein B., „Practical timing analysis for 100-MHz digital design,”, EDN, Aug. 8, 2002, [35]: Jacob B., Ng S. W., Wang D. T., Memory Systems, Elsevier, 2008 [36]: Allan G., „The outlook for DRAMs in consumer electronics”, EETIMES Europe Online, 01/12/2007, =calibrated [37]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II Pro FPGAs”, WP190, Febr , Xilinx,

9. References (5) [38]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, [45]: Memory technology evolution: an overview of system memory technologies, Technology brief, 9 th edition, HP, Dec. 2010, [46]: Kane L., Nguyen H., Take the Lead with Jasper Forest, the Future Intel Xeon Processor for Embedded and Storage, IDF 2009, July , ftp://download.intel.com/embedded/processor/prez/SF09_EMBS001_100.pdf [47]: The AMD Opteron™ 6000 Series Platform: More Cores, More Memory, Better Value, March , -platform-press-presentation-final