Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dezső Sima 2011 December (Ver. 1.5)  Sima Dezső, 2011 Platforms II.

Similar presentations


Presentation on theme: "Dezső Sima 2011 December (Ver. 1.5)  Sima Dezső, 2011 Platforms II."— Presentation transcript:

1 Dezső Sima 2011 December (Ver. 1.5)  Sima Dezső, 2011 Platforms II.

2 3. Platform architectures

3 Contents 3.1. Design space of the basic platform architecture 3.2. DT platforms 3. Platform architectures 3.2.1. Design space of the basic architecture of DT platforms 3.2.2. Evolution of Intel’s home user oriented multicore DT platforms 3.2.3. Evolution of Intel’s business user oriented multicore DT platforms 3.3. DP server platforms 3.3.1. Design space of the basic architecture of DP server platforms 3.3.2. Evolution of Intel’s low cost oriented multicore DP server platforms 3.3.3. Evolution of Intel’s performance oriented multicore DP server platforms

4 Contents 3.4. MP server platforms 3.4.2. Evolution of Intel’s multicore MP server platforms 3.4.3. Evolution of AMD’s multicore MP server platforms 3.4.1. Design space of the basic architecutre of MP server platforms

5 3.1. Design space of the basic platform architecture

6 3.1 Design space of the basic platform architecture (1) Platform architecture Architecture of the processor subsystem Interpreted only for DP/MP systems In SMPs: Specifies the interconnection of the processors and the chipset In NUMAs: Specifies the interconnections between the processors Specifies MCH ICH PP PP MCH... ICH PP PP Memory is attached to the MCH There are serial FB-DIMM channels Processors are connected to the MCH by individual buses Architecture of the I/O subsystem Specifies the structure of the I/O subsystem (Will not be discussed) Example: Core 2/Penryn based MP platform MCH... ICH PP PP The chipset consist of two parts designated as the MCH and the ICH FSB Architecture of the memory subsystem the point and the layout of the interconnection

7 The notion of Basic platform architecture Platform architecture Architecture of the processor subsystem Architecture of the I/O subsystem Architecture of the memory subsystem Basic platform architecture 3.1 Design space of the basic platform architecture (2)

8 The notion of Basic platform architecture Platform architecture Architecture of the processor subsystem Architecture of the I/O subsystem Architecture of the memory subsystem Basic platform architecture 3.1 Design space of the basic platform architecture (2)

9 SMP systems Architecture of the processor subsystem NUMA systems Scheme of attaching the processors to the rest of the platform Scheme of interconnecting the processors MCH ICH PP PP FSB P.. PP P Examples Architecture of the processor subsystem Interpreted only for DP and MP systems. The interpretation depends on whether the multiprocessor system is an SMP or NUMA 3.1 Design space of the basic platform architecture (3)

10 a) Scheme of attaching the processors to the rest of the platform (In case of SMP systems) Scheme of attaching the processors to the rest of the platform DP platforms MP platforms MCH P P PP Mem ory MCH PP PP Mem ory MCH.. PP PP Mem ory P P MCH P P Mem ory MCH Dual FSBs Single FSB Dual FSBs Single FSB Quad FSBs 3.1 Design space of the basic platform architecture (4)

11 b) Scheme of interconnecting the processors (In case of NUMA systems) PP PP PP PP Fully connected mesh Mem ory Partially connected mesh Scheme of interconnecting the processors 3.1 Design space of the basic platform architecture (5)

12 The notion of Basic platform architecture Platform architecture Architecture of the processor subsystem Architecture of the I/O subsystem Architecture of the memory subsystem Basic platform architecture 3.1 Design space of the basic platform architecture (6)

13 Architecture of the memory subsystem (MSS) Layout of the interconnection Point of attaching the MSS Architecture of the memory subsystem (MSS) 3.1 Design space of the basic platform architecture (7)

14 MCH Memory Processor ? Point of attaching the MSS a) Point of attaching the MSS (Memory Subsystem) (1) Platform 3.1 Design space of the basic platform architecture (8)

15 Attaching memory to the MCH (Memory Control Hub) Point of attaching the MSS Attaching memory to the processor(s) Point of attaching the MSS (2) Longer access time (~ 20 – 70 %), Shorter access time (~ 20 – 70 %), As the memory controller is on the processor die, the memory type (e.g. DDR2 or DDR3) and speed grade is bound to the processor chip design. As the memory controller is on the MCH die, the memory type (e.g. DDR2 or DDR3) and speed grade is not bound to the processor chip design. 3.1 Design space of the basic platform architecture (9)

16 Attaching memory to the MCH (Memory Control Hub) Point of attaching the MSS Attaching memory to the processor(s) DT platformsDP/MP platformsDT platformsDP/MP platforms DT Systems with off-die memory controllers DT Systems with on-die memory controllers Shared memory DP/MP systems Distributed memory DP/MP systems SMP systems (Symmetrical Multiporocessors) (Systems w/ non uniform memory access) NUMA systems Related terminology 3.1 Design space of the basic platform architecture (10)

17 Attaching memory to the MCH Point of attaching the MSS Attaching memory to the processor(s) MCH ICH FSB Processor MCH ICH FSB Processor Intel’s processors before Nehalem Intel’s Nehalem and subsequent processors Memory Example 1: Point of attaching the MSS in DT systems DT System with off-die memory controller DT System with on-die memory controller Examples 3.1 Design space of the basic platform architecture (11)

18 Intel’s processors before Nehalem Shared memory DP server aka Symmetrical Multiprocessor (SMP) Memory does not scale with the number of processors MCH ICH FSB Processor Memory Distributed memory DP server aka System w/ non-uniform memory access (NUMA) Memory scales with the number of processors Intel’s Nehalem and subsequent processors MCH ICH FSB Memory Processor Attaching memory to the MCH Point of attaching the MSS Attaching memory to the processor(s) Example 2: Point of attaching the MSS in SMP-based DP servers Examples 3.1 Design space of the basic platform architecture (12)

19 Point of attaching the MSS Attaching memory to the processor(s) Attaching memory to the MCH POWER4 (2C) (2001) POWER5 (2C) (2005) and subsequent POWER families Montecito (2C) (2006) Opteron server lines (2C) (2003) and all subsequent AMD lines PA-8800 (2004) PA-8900 (2005) and all previous PA lines Core 2 Duo line (2C) (2006) and all preceding Intel lines Core 2 Quad line (2x2C) (2006/2007) Penryn line (2x2C) (2008) Figure: Point of attaching the MSS Nehalem lines (4) (2008) and all subsequent Intel lines Examples Tukwila (4C) (2010??) AMD’s K7 lines (1C) (1999-2003) UltraSPARC III (2001) and all subsequent Sun lines UltraSPARC II (1C) (~1997) 3.1 Design space of the basic platform architecture (13)

20 Figure: Attaching memory via parallel channels or serial links Layout of the interconnection Attaching memory via parallel channels Attaching memory via serial links Data are transferred over parallel buses Data are transferred over point-to-point links in form of packets 01 E.g: 16 cycles/packet on a 1-bit wide link 1515 E.g: 4 cycles/packet on a 4-bit wide link 01 MC t t t 1 0 1 1 0 0 E.g: 64 bits data + address, command and control as well as clock signals in each cycle b) Layout of the interconnection 3.1 Design space of the basic platform architecture (14)

21 b1) Attaching memory via parallel channels The memory controller and the DIMMs are connected Example 1: Attaching DIMMs via a single parallel memory channel to the memory controller that is implemented on the chipset [45] 3.1 Design space of the basic platform architecture (15) by a single parallel memory channel or a few number of memory channels to synchron DIMMs, such as SDRAM, DDR, DDR2 or DDR3 DIMMs.

22 Example 2: Attaching DIMMs via 3 parallel memory channels to memory controllers implemented on the processor die (This is actually Intel’s the Tylersburg DP platform, aimed at the Nehalem-EP processor, used for up to 6 cores) [46] 3.1 Design space of the basic platform architecture (16)

23 The number of lines needed depend on the kind of the memory modules, as indicated below: SDRAM DDR DDR2 DDR3 168-pin 184-pin 240- pin All these DIMM modules provide an 8-byte wide datapath and optionally ECC and registering. The number of lines of the parallel channels 3.1 Design space of the basic platform architecture (17)

24 Attaching memory via serial links Serial links attach FB-DIMMs.. Serial link Serial link.. FB-DIMMs provide buffering and S/P conversion Proc. /MCH Serial links attach S/P converters w/ parallel channels Proc. /MCH S/P.. S/P.. Serial link Serial link.. 3.1 Design space of the basic platform architecture (18) b2) Attaching memory via serial links Serial memory links are point-to-point interconnects that use differential signaling.

25 65 nm Pentium 4 Prescott DP (2x1C)/ Core2 (2C/2*2C) E5000 MCH 631*ESB/ 632*ESB IOH FSB 65 nm Pentium 4 Prescott DP (2C)/ Core2 (2C/2*2C) FB-DIMM w/DDR2-533 Xeon 5000 (Dempsey) 2x1C Xeon 5100 (Woodcrest) 2C Xeon 5300 (Clowertown) 2x2C / / ESI ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface, providing 1 GB/s transfer rate in each direction) Xeon 5400 (Harpertown) 2x2C Xeon 5200 (Harpertown) 2C / // Example 1: FB-DIMM links in Intel’s Bensley DP platform aimed at Core 2 processors-1 3.1 Design space of the basic platform architecture (19)

26 Example 2: SMI links in Intel’s the Boxboro-EX platform aimed at the Nehalem-EX processors-1 3.1 Design space of the basic platform architecture (20) Nehalem-EX (8C) Westmere-EX (10C) QPI DDR3-1067 SMB ICH10 ESI DDR3-1067 SMB 7500 IOH QPI Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores) or Xeon 6500 (Nehalem-EX) (Becton) Xeon E7-2800 (Westmere-EX) ME SMI: Serial link between the processor and the SMB SMB: Scalable Memory Buffer with Parallel/serial conversion SMI links Nehalem-EX (8C) Westmere-EX (10C)

27 The SMI interface builds on the Fully Buffered DIMM architecture with a few protocol changes, such as those intended to support DDR3 memory devices. It has the same layout as FB-DIMM links (14 outbound and 10 inbound differential lanes as well as a few clock and control lanes). It needs altogether about 50 PC trails. Example 2: The SMI link of Intel’s the Boxboro-EX platform aimed at the Nehalem-EX processors-2 [26] 3.1 Design space of the basic platform architecture (21) SMB

28 ... Attaching memory via parallel channels Layout of the interconnection Attaching memory via serial links Serial links attach S/P- converters w/ par. channels Attaching memory to the processor(s) Point of attaching memory Attaching memory to the MCH P S/P.. S/P.. S/P.. S/P... P MCH.. S/P.. S/P... MCH PP... Serial links attach FB-DIMMs MCH PP......... Parallel channels attach DIMMs Design space of the architecture of the MSS 3.1 Design space of the basic platform architecture (22)

29 Subsequent fields from left to right and from top to down of the design space of the architecture of MSS allow to implement an increasing number of memory channels (n M ), as discussed in Section 4.2.5 and indicated in the next figure. Max. number of memory channels that can be implemented while using particular design options of the MSS 3.1 Design space of the basic platform architecture (23)

30 ... Attaching memory via parallel channels Layout of the interconnection Attaching memory via serial links Serial links attach S/P- converters w/ par. channels Attaching memory to the processor(s) Point of attaching memory Attaching memory to the MCH P S/P.. S/P.. S/P.. S/P... P MCH.. S/P.. S/P... MCH PP... Serial links attach FB-DIMMs MCH PP......... nCnC Parallel channels attach DIMMs Design space of the architecture of the MSS 3.1 Design space of the basic platform architecture (24)

31 The design space of the basic platform architecture-1 Platform architecture Architecture of the processor subsystem Architecture of the I/O subsystem Architecture of the memory subsystem Basic platform architecture 3.1 Design space of the basic platform architecture (25)

32 The design space of the basic platform architectures-2 Obtained as the combinations of the options available for the main aspects discussed. Basic platform architecture Architecture of the processor subsystem Scheme of attaching the processors (In case of SMP systems) Scheme of interconnecting the processors (In case of NUMA systems) Architecture of the memory subsystem (MSS) Layout of the interconnection Point of attaching the MSS 3.1 Design space of the basic platform architecture (26)

33 Design space of the basic architecture of particular platforms Design space of the basic architecture of DT platforms Design space of the basic architecture of DP server platforms Design space of the basic architecture of MP server platforms The design space of the basic platform architecture of DT, DP and MP platforms will be discussed subsequently in the Sections 3.2.1, 3.3.1 and 3.4.1. 3.1 Design space of the basic platform architecture (27)

34 3.2. DT platforms 3.2.1. Design space of the basic architecture of DT platforms 3.2.2. Evolution of Intel’s home user oriented multicore DT platforms 3.2.3. Evolution of Intel’s business user oriented multicore DT platforms

35 3.2 DT platforms 3.2.1 Design space of the basic architecture of DT platforms

36 3.2.1 Design space of the basic architecture of DT platforms (1) MCH... ICH P P...... Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Point of attaching the MSS MCH.. P ICH P.. Attaching memory to the MCH Attaching memory to the processor Pentium D/EE to Penryn (Up to 4C) 1. G. Nehalem to Sandy Bridge (Up to 6C) DT platforms No. of mem. channels.. S/P.. S/P... MCH P ICH.. S/P.. S/P.. S/P.. S/P... P Serial links attach. S/P conv. w/ par. chan. Parallel channels attach DIMMs

37 Layout of the interconnection Attaching memory via serial links Seria l links attach FB-DIMMs Attaching memory via parallel channels Serial links attach S/P converters w/ par. channels Pentium D/EE 2x1C (2005/6) Core 2 2C (2006) Core 2 Quad 2x2C (2007) Penryn 2C/2x2C (2008) 1. G. Nehalem 4C (2008) Westmere-EP 6C (2010) 2. G. Nehalem 4C (2009) Westmere-EP 2C+G (2010) Sandy Bridge 2C/4C+G (2011) Sandy Bridge-E 6C (2011) Attaching memory to the MCH Attaching memory to the processor Point of attaching the MSS Evolution of Intel’s DT platforms (Overview) No. of memory channels No need for higher memory bandwidth through serial memory interconnection Parallel channels attach DIMMs 3.2.1 Design space of the basic architecture of DT platforms (2)

38 Up to DDR2-667 2/4 DDR2 DIMMs up to 4 ranks Pentium D/ Pentium EE (2x1C) 945/955X/975X MCH ICH7 FSB DMI Core2 2C Core 2 Quad (2x2C) /Penryn (2C/2*2C) 965/3-/4- Series MCH ICH8/9/10 FSB DMI Up to DDR2-800 Up to DDR3-1067 X58 IOH ICH10 QPI DMI 1. gen. Nehalem (4C)/ Westmere-EP (6C) Up to DDR3-1067 Tylersburg (2008)Anchor Creek (2005) Bridge Creek (2006) (Core 2 aimed) Salt Creek (2007) (Core 2 Quad aimed) Boulder Creek (2008) (Penryn aimed) 3.2.2 Evolution of Intel’s home user oriented multicore DT platforms (1)

39 X58 IOH ICH10 QPI DMI 1. gen. Nehalem (4C)/ Westmere-EP (6C) 2. gen. Nehalem (4C)/ Westmere-EP (2C+G) 5- Series PCH FDI DMI Sandy Bridge (4C+G) 6- Series PCH FDI DMI2 Up to DDR3-1067 Up to DDR3-1333 Sugar Bay (2011) Up to DDR3-1333 Tylersburg (2008) Kings Creek (2009) 3.2.2 Evolution of Intel’s home user oriented multicore DT platforms (2)

40 X58 IOH ICH10 QPI DMI 1. gen. Nehalem (4C)/ Westmere-EP (6C) Up to DDR3-1067 Tylersburg (2008) Up to DDR3-1600 Waimea Bay (2011) X79 PCH DMI2 DDR3-1600: up to 1 DIMM per channel DDR3-1333: up to 2 DIMMs per channel Sandy Bridge-E (4C)/6C) 3.2.2 Evolution of Intel’s home user oriented multicore DT platforms (3)

41 Up to DDR2-667 2/4 DDR2 DIMMs up to 4 ranks Pentium D/ Pentium EE (2x1C) 945/955X/975X MCH ICH7 FSB DMI Core2 (2C) Core 2 Quad (2x2C) Penryn (2C/2*2C) Q965/Q35/Q45 MCH ICH8/9/10 FSB Up to DDR2-800 Up to DDR3-1067 Up to DDR3-1333 Piketon (2009) Lyndon (2005) Averill Creek (2006) (Core 2 aimed) Weybridge (2007) (Core 2 Quad aimed) McCreary (2008) (Penryn aimed) 82573E GbE (Tekoe) Gigabit Ethernet LAN connection LCI 82566/82567 LAN PHY LCI/GLCI Gigabit Ethernet LAN connection 82578 GbE LAN PHY PCIe 2.0/SMbus 2.0 Gigabit Ethernet LAN connection DMI C-link ME Q57 PCH ME 2. gen. Nehalem (4C) Westmere-EP (2C+G) FDI DMI 3.2.3 Evolution of Intel’s business user oriented multicore DT platforms (1)

42 Sugar Bay (2011) Up to DDR3-1333 82578 GbE LAN PHY PCIe 2.0/SMbus 2.0 Gigabit Ethernet LAN connection Q57 PCH ME Piketon (2009) 2. gen. Nehalem (4C) Westmere-EP (2C+G) GbE LAN PCIe 2.0/SMbus 2.0 Gigabit Ethernet LAN connection Q67 PCH ME Sandy Bridge (4C+G) FDI DMI FDI DMI2 Up to DDR3-1333 3.2.3 Evolution of Intel’s business user oriented multicore DT platforms (2)

43 3.3. DP server platforms 3.3.1. Design space of the basic architecture of DP server platforms 3.3.2. Evolution of Intel’s low cost oriented multicore DP server platforms 3.3.3. Evolution of Intel’s performance oriented multicore DP server platforms

44 3.3 DP server platforms 3.3.1 Design space of the basic architecture of DP server platforms

45 MCH.. P P MCH.. P P ICH MCH... ICH P P MCH... ICH P P 3.3.1 Design space of the basic architecture of DP server platforms (1) Single FSB Dual FSBs 90 nm Pentium 4 DP 2x1C (2005) Core 2/Penryn 2C/2x2C (2006/7) 65 nm Pentium 4 DP 2x1C Core 2/Penryn 2C/2x2C (2006/7) PP.. S/P.. S/P... MCH P P ICH.. S/P.. S/P... MCH ICH P P.. P S/P.. S/P.. S/P.. S/P... P PP...... NUMA Nehalem-EX/Westmere-EX 8C/10C (2010/11) Nehalem-EP to Sandy Bridge -EP/EN Up to 8 C (2009/11) Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Serial links attach. S/P conv. w/ par. chan. Parallel channels attach DIMMs nMnM DP platforms

46 Single FSBDual FSBs 90 nm Pentium 4 DP 2x1C (2006) Core 2 2C/ Core 2 Quad 2x2C/ Penryn 2C/2x2C (2006/2007) SMP NUMA Nehalem-EP 4C (2009) Westmere-EP 6C (2010) No. of memory channels (Paxville DP) (Cranberry Lake)(Tylersburg-EP) Nehalem-EX/ Westmere-EX 8C/10C (2010/2011) (Boxboro-EX) 65 nm Pentium 4 DP 2x1C Core 2 2C Core 2 Quad 2x2C Penryn 2C/2x2C (2006/2007) (Bensley) Eff. HP No. of memory channels Sandy Bridge-EN 8C (2011) Romley-EN Sandy Bridge-EP 8C (20 11) Romley-EP Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Serial links attach. S/P converters w/ par. chan. Parallel channels attach DIMMs nMnM HP Scheme of attaching and interconnecting DP processors Evolution of Intel’s DP platforms (Overview) 3.3.1 Design space of the basic architecture of DP server platforms (2)

47 3.3.2 Evolution of Intel’s low cost oriented multicore DP server platforms (1) 3.3.2 Evolution of Intel’s low cost oriented multicore DP server platforms

48 90 nm Pentium 4 Prescott DP (2C) E7520 MCH ICH5R/ 6300ESB IOH FSB HI 1.5 90 nm Pentium 4 Prescott DP (2C) FSB DDR-266/333 DDR2-400 90 nm Pentium 4 Prescott DP aimed DP server platform (for up to 2 C) Xeon DP 2.8 /Paxville DP) HI 1.5 (Hub Interface 1.5) 8 bit wide, 66 MHz clock, QDR, 66 MB/s peak transfer rate Core 2 (2C/ Core 2 Quad (2x2C)/ Penryn (2C/2x2C) E5100 MCH ICHR9 FSB ESI Core 2 (2C/ Core 2 Quad (2x2C)/ /Penryn (2C/2x2C) DDR2-533/667 Penryn aimed Cranberry Lake DP server platform (for up to 4 C) Xeon 5300 (Clowertown) 2x2C Xeon 5400 (Harpertown) 4C Xeon 5200 (Harpertown) 2C or ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface, providing 1 GB/s transfer rate in each direction) Evolution from the Pentium 4 Prescott DP aimed DP platform (up to 2 cores) to the Penryn aimed Cranberry Lake DP platform (up to 4 cores) 3.3.2 Evolution of Intel’s low cost oriented multicore DP server platforms (2)

49 Sandy Bridge-EN (Socket B2) aimed Romley-EN DP server platform (for up to 8 cores) Sandy Bridge-EN (8C) Socket B2 C600 PCH DMI2 QPI Sandy Bridge-EN (8C) Socket B2 DDR3-1600 Penryn aimed Cranberry Lake DP platform (for up to 4 C) Core 2 (2C/2x2C)/ Penryn (2C/4C) proc. E5100 MCH ICHR9 FSB ESI Core 2 (2C/2x2C)/ Penryn (2C/4C) proc. DDR2-533/667 Xeon 5300 (Clowertown) 2x2C Xeon 5400 (Harpertown) 4C Xeon 5200 (Harpertown) 2C or E5-2400 Sandy Bridge–EN 8C Evolution from the Penryn aimed Cranberry Lake DP platform (up to 4 cores) to the Sandy Bridge-EP aimed Romley-EP DP platform (up to 8 cores) 3.3.2 Evolution of Intel’s low cost oriented multicore DP server platforms (3)

50 3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (1) 3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms

51 90 nm Pentium 4 Prescott DP (2x1C) E7520 MCH ICH5R/ 6300ESB IOH FSB HI 1.5 90 nm Pentium 4 Prescott DP (2x1C) FSB DDR-266/333 DDR2-400 65 nm Pentium 4 Prescott DP (2x1C)/ Core2 (2C/2*2C) E5000 MCH 631*ESB/ 632*ESB IOH FSB 65 nm Pentium 4 Prescott DP (2C)/ Core2 (2C/2*2C) FB-DIMM w/DDR2-533 Evolution from the Pentium 4 Prescott DP aimed DP platform (up to 2 cores) to the Core 2 aimed Bensley DP platform (up to 4 cores) 90 nm Pentium 4 Prescott DP aimed DP server platform (for up to 2 C) Core 2 aimed Bensley DP server platform (for up to 4 C) Xeon DP 2.8 /Paxville DP) Xeon 5000 (Dempsey) 2x1C Xeon 5100 (Woodcrest) 2C Xeon 5300 (Clowertown) 2x2C / / ESI ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface, providing 1 GB/s transfer rate in each direction) HI 1.5 (Hub Interface 1.5) 8 bit wide, 66 MHz clock, QDR, 66 MB/s peak transfer rate Xeon 5400 (Harpertown) 2x2C Xeon 5200 (Harpertown) 2C / // 3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (2)

52 FBDIMM w/DDR2-533 65 nm Pentium 4 Prescott DP (2C)/ Core2 (2C/2*2C) 5000 MCH 631*ESB/ 632*ESB IOH FSB 65 nm Pentium 4 Prescott DP (2C)/ Core2 (2C/2*2C) ESI 65 nm Core 2 aimed high performance Bensley DP server platform (for up to 4 C) 1 First chipset with PCI 2.0 ME: Management Engine Nehalem-EP aimed Tylersburg-EP DP server platform with dual IOHs (for up to 6 C) DDR3-1333 Nehalem-EP (4C) Westmere-EP (6C) 55xx IOH 1 QPI ICH9/ICH10 ESI QPI CLink DDR3-1333 ME Nehalem-EP (4C) Westmere-EP (6C) DDR3-1333 Nehalem-EP (4C) Westmere-EP (6C) QPI ICH9/ICH10 ESI QPI CLink DDR3-1333 ME Nehalem-EP (4C) Westmere-EP (6C) 55xx IOH 1 QPI 55xx IOH 1 ME Nehalem-EP aimed Tylersburg-EP DP server platform with a single IOH (for up to 6 C) Evolution from the Core 2 aimed Bensley DP platform (up to 4 cores) to the Nehalem-EP aimed Tylersburg-EP DP platform (up to 6 cores) 3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (3)

53 Basic system architecture of the Sandy Bridge-EN and -EP aimed Romley-EN and –EP DP server platforms Nehalem –EP (4C) Westmere-EP (6C) 34xx PCH DMI QPI Nehalem-EP (4C) Westmere-EP (6C) DDR3-1333 ME Xeon 55xx (Gainestown) Xeon 56xx (Gulftown) / Nehalem-EP aimed Tylersburg-EP DP server platform (for up to 6 cores) Sandy Bridge-EP (Socket R) aimed Romley-EP DP server platform (for up to 8 cores) (LGA 2011) C600 PCH DMI2 QPI 1.1 DDR3-1600 QPI 1.1 Sandy Bridge-EP (8C) Socket R E5-2600 Sandy Bridge–EP 8C E5-2600 Sandy Bridge-EP 8C 3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (4)

54 Nehalem-EX (8C) Westmere-EX (10C) QPI DDR3-1067 SMB ICH10 ESI DDR3-1067 SMB 7500 IOH QPI Nehalem –EP (4C) Westmere-EP (6C) 34xx PCH ESI QPI DDR3-1333 ME Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores) Xeon 5500 (Gainestown) Xeon 5600 (Gulftown) or Nehalem-EP aimed Tylersburg-EP DP server platform (for up to 6 cores) Nehalem-EX (8C) Westmere-EX (10C) or Xeon 6500 (Nehalem-EX) (Becton) Xeon E7-2800 (Westmere-EX) ME Contrasting the Nehalem-EP aimed Tylersburg-EP DP platform (up to 6 cores) to the Nehalem-EX aimed very high performance scalable Boxboro-EX DP platform (up to 10 cores) Nehalem –EP (4C) Westmere-EP (6C) SMI: Serial link between the processor and the SMB SMB: Scalable Memory Buffer with Parallel/serial conversion SMI links 3.3.3 Evolution of Intel’s performance oriented multicore DP server platforms (5)

55 3.4. MP server platforms 3.4.2. Evolution of Intel’s multicore MP server platforms 3.4.3. Evolution of AMD’s multicore MP server platforms 3.4.1. Design space of the basic architecture of MP server platforms

56 3.4 MP server platforms 3.4.1 Design space of the basic architecture of MP server platforms

57 .. S/P.. S/P... MCH.. PP PP MCH.. PP PP ICH MCH.. PP PP ICH PP PP.. S/P.. S/P... MCH ICH PP PP MCH..... S/P.. S/P... MCH ICH PP PP PP PP MCH... ICH PP PP MCH... ICH PP PP 3.4.1 Design space of the basic architecture of MP server platforms (1) MP SMP platforms Single FSB Dual FSBsQuad FSBs Pentium 4 MP 1C (2004) 90 nm Pentium 4 MP 2x1C Core 2/Penryn up to 6C Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Serial links attach. S/P conv. w/ par. chan. Parallel channels attach DIMMs

58 PP.. PP PP PP PP.... PP...... PP...... PP........ P S/P.. S/P.. S/P.. S/P... P.. P. P.. S/P.. S/P.. S/P.. S/P.. P S/P.. S/P.. S/P.. S/P... P.. P. P.. S/P.. S/P.. S/P.. S/P... MP NUMA platforms Partially connected meshFully connected mesh AMD Direct Connect Architecture 1.0 (2003)AMD Direct Connect Architecture 2.0 (2010) Nehalem-EX/Westmere up to 10C (2010/11) Inter proc. BW Mem. BW Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Serial links attach. S/P conv. w/ par. chan. Parallel channels attach DIMMs 3.4.1 Design space of the basic architecture of MP server platforms (2)

59 Single FSBDual FSBs Quad FSBs Pentium 4 MP 1C (2004) (Not named) 90 nm Pentium 4 MP 2x1C (2006) (Truland) Core 2/Penryn up to 6C (2006/2007) Caneland Part. conn. mesh Fully conn. mesh SMP NUMA Scheme of attaching and interconnecting MP processors Nehalem-EX/ Westmere up to 10C (2010/11) (Boxboro-EX) AMD DCA 1.0 (2003) AMD DCA 2.0 (2010) No. of memory channels Layout of the interconnection Attaching memory via serial links Serial links attach FB-DiMMs Attaching memory via parallel channels Serial links attach. S/P converters w/ par. chan. Parallel channels attach DIMMs Interproc. bandwidth Evolution of Intel’s MP platforms (Overview) 3.4.1 Design space of the basic architecture of MP server platforms (3)

60 3.4.2 Evolution of Intel’s multicore MP server platforms (1) 3.4.2 Evolution of Intel’s multicore MP server platforms

61 Xeon MP 1 SC Xeon MP 1 SC FSB Xeon MP 1 SC Xeon MP 1 SC Preceding ICH Preceding NBs E.g. HI 1.5 HI 1.5 266 MB/s E.g. DDR-200/266 8500 1 /8501 ICH5 XMB DDR-266/333 DDR2-400 FSB XMB HI 1.5 DDR-266/333 DDR2-400 Xeon 7000 (Paxville MP) 2x1C Xeon 7100 (Tulsa) 2C Xeon MP (Potomac) 1C / / 90 nm Pentium 4 Prescott MP aimed Truland MP server platform (for up to 2 C) Pentium 4 Xeon MP 1C/2x1C Previous Pentium 4 MP aimed MP server platform (for single core processors) Evolution from the first generation MP servers supporting SC processors to the 90 nm Pentium 4 Prescott MP aimed Truland MP server platform (supporting up to 2 cores) 3.4.2 Evolution of Intel’s multicore MP server platforms (2)

62 Core 2 (2C/2x2C) Penryn (6C) Core 2 (2C/2x2C) Penryn (6C) Core 2 (2C/2x2C) Penryn (6C) Core 2 (2C/2x2C) Penryn (6C) 7300 631xESB/ 632xESB 4 channels up to 8 DIMMs /channel 8500 1 /8501 ICH5 XMB DDR-266/333 DDR2-400 FSB XMB ESI FSB HI 1.5 DDR-266/333 DDR2-400 FB-DIMM DDR2-533/667 Xeon 7000 (Paxville MP) 2x1C Xeon 7100 (Tulsa) 2C Xeon MP (Potomac) 1C / / 1 The E8500 MCH supports an FSB of 667 MT/s and consequently only the SC Xeon MP (Potomac) Xeon 7200 (Tigerton DC) 1x2C Xeon 7300 (Tigerton QC) 2x2C Xeon 7400 (Dunnington 6C) // 90 nm Pentium 4 Prescott MP aimed Truland MP server platform (for up to 2 C) Core 2 aimed Caneland MP server platform (for up to 6 C) ESI: Enterprise System Interface 4 PCIe lanes, 0.25 GB/s per lane (like the DMI interface, providing 1 GB/s transfer rate in each direction) HI 1.5 (Hub Interface 1.5) 8 bit wide, 66 MHz clock, QDR, 266 MB/s peak transfer rate Pentium 4 Xeon MP 1C/2x1C Evolution from the 90 nm Pentium 4 Prescott MP aimed Truland MP platform (up to 2 cores) to the Core 2 aimed Caneland MP platform (up to 6 cores) 3.4.2 Evolution of Intel’s multicore MP server platforms (3)

63 Nehalem-EX 8C Westmere-EX 10C 7500 IOH QPI SMB DDR3-1067 SMB ICH10 ESI DDR3-1067 SMI: Serial link between the processor and the SMBs SMB: Scalable Memory Buffer Parallel/serial converter SMB 2x4 SMI channels 2x4 SMI channels ME ME: Management Engine Xeon 7500 (Nehalem-EX) (Becton) 8C Xeon 7-4800 (Westmere-EX) 10C Nehalem-EX 8C Westmere-EX 10C Nehalem-EX 8C Westmere-EX 10C Nehalem-EX 8C Westmere-EX 10C / Nehalem-EX aimed Boxboro-EX MP server platform (for up to 10 C) Evolution to the Nehalem-EX aimed Boxboro-EX MP platform (that supports up to 10 cores) (In the basic system architecture we show the single IOH alternative) 3.4.2 Evolution of Intel’s multicore MP server platforms (4)

64 3.4.3 Evolution of AMD’s multicore MP server platforms (1) 3.4.3 Evolution of AMD’s multicore MP server platforms [47] (1) Introduced in the single core K8-based Opteron DP/MP servers (AMD 24x/84x) (6/2003) Memory: 2 channels DDR-200/333 per processor, 4 DIMMs per channel.

65 Introduced in the 2x6 core K10-based Magny-Course (AMD 6100)(3/2010) Memory: 2x2 channels DDR3-1333 per processor, 3 DIMMs per channel. 3.4.3 Evolution of AMD’s multicore MP server platforms [47] (2) 3.4.3 Evolution of AMD’s multicore MP server platforms (2)

66 9. References

67 9. References (1) [1]: Wikipedia: Centrino, http://en.wikipedia.org/wiki/Centrino [2]: Industry Uniting Around Intel Server Architecture; Platform Initiatives Complement Strong Intel IA-32 and IA-64 Targeted Processor Roadmap for 1999, Business Wire, Febr. 24 1999, http://www.thefreelibrary.com/Industry+Uniting+Around+Intel+Server +Architecture%3B+Platform...-a053949226 [3]: Intel Core 2 Duo Processor, http://www.intel.com/pressroom/kits/core2duo/ [4]: Keutzer K., Malik S., Newton R., Rabaey J., Sangiovanni-Vincentelli A., System Level Design: Orthogonalization of Concerns and Platform-Based Design, IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, Dec. 2000, pp. 1-29. [5]: Krazit T., Intel Sheds Light on 2005 Desktop Strategy, IDG News Service, Dec. 07 2004, http://pcworld.about.net/news/Dec072004id118866.htm [6]: Perich D., Intel Volume platforms Technology Leadership, Presentation at HP World 2004, http://98.190.245.141:8080/Proceed/HPW04CD/papers/4194.pdf [7] Powerful New Intel Server Platforms Feature Array Of Enterprise-Class Innovations. Intel’s Press release, Aug. 2, 2004, http://www.intel.com/pressroom/archive/releases/2004/20040802comp.htm [8]: Smith S., Multi-Core Briefing, IDF Spring 2005, San Francisco, Press presentation, March 1 2005, http://www.silentpcreview.com/article224-page2 [9]: An Introduction to the Intel QuickPath Interconnect, Jan. 2009, http://www.intel.com/ content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf [10]: Davis L. PCI Express Bus, http://www.interfacebus.com/PCI-Express-Bus-PCIe-Description.html

68 9. References (2) [11]: Ng P. K., “High End Desktop Platform Design Overview for the Next Generation Intel Microarchitecture (Nehalem) Processor,” IDF Taipei, TDPS001, 2008, http://intel.wingateweb.com/taiwan08/published/sessions/TDPS001/FA08%20IDF- Taipei_TDPS001_100.pdf [12]: Computing DRAM, Samsung.com, http://www.samsung.com/global/business/semiconductor /products/dram/Products_ComputingDRAM.html [13]: Samsung’s Green DDR3 – Solution 3, 20nm class 1.35V, Sept. 2011, http://www.samsung.com/global/business/semiconductor/Greenmemory/Downloads/ Documents/downloads/green_ddr3_2011.pdf [14]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org [15]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf [16]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf [17]: Detecting Memory Bandwidth Saturation in Threaded Applications, Intel, March 2 2010, http://software.intel.com/en-us/articles/detecting-memory-bandwidth-saturation-in- threaded-applications/ [18]: McCalpin J. D., STREAM Memory Bandwidth, July 21 2011, http://www.cs.virginia.edu/stream/by_date/Bandwidth.html [19]: Rogers B., Krishna A., Bell G., Vu K., Jiang X., Solihin Y., Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling, ISCA 2009, Vol. 37, Issue 1, pp. 371-382

69 9. References (3) [20]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture: Capacity, Performance, Reliability, and Longevity, Febr. 18 2004, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf [21]: Wikipedia: Intel X58, 2011, http://en.wikipedia.org/wiki/Intel_X58 [22]: Sharma D. D., Intel 5520 Chipset: An I/O Hub Chipset for Server, Workstation, and High End Desktop, Hotchips 2009, http://www.hotchips.org/archives/hc21/2_mon/ HC21.24.200.I-O-Epub/HC21.24.230.DasSharma-Intel-5520-Chipset.pdf [23]: DDR2 SDRAM FBDIMM, Micron Technology, 2005, http://download.micron.com/pdf/datasheets/modules/ddr2/HTF18C64_128_256x72F.pdf [24]: Wikipedia: Fully Buffered DIMM, 2011, http://en.wikipedia.org/wiki/Fully_Buffered_DIMM [25]: Intel E8500 Chipset eXternal Memory Bridge (XMB) Datasheet, March 2005, http://www.intel.com/content/dam/doc/datasheet/e8500-chipset-external-memory- bridge-datasheet.pdf [26]: Intel 7500/7510/7512 Scalable Memory Buffer Datasheet, April 2011, http://www.intel.com/content/dam/doc/datasheet/7500-7510-7512-scalable-memory- buffer-datasheet.pdf [27]: AMD Unveils Forward-Looking Technology Innovation To Extend Memory Footprint for Server Computing, July 25 2007, http://www.amd.com/us/press-releases/Pages/Press_Release_118446.aspx [28]: Chiappetta M., More AMD G3MX Details Emerge, Aug. 22 2007, Hot Hardware, http://hothardware.com/News/More-AMD-G3MX-Details-Emerge/

70 9. References (4) [29]: Goto S. H., The following server platforms AMD, May 20 2008, PC Watch, http://pc.watch.impress.co.jp/docs/2008/0520/kaigai440.htm [30]: Wikipedia: Socket G3 Memory Extender, 2011, http://en.wikipedia.org/wiki/Socket_G3_Memory_Extender [31]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc. [32]: Ahn J.-H., „Memory Design Overview,” March 2007, Hynix, http://netro.ajou.ac.kr/~jungyol/memory2.pdf [33]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II Pro FPGAs”, WP190, Febr. 25 2003, Xilinx, http://www.xilinx.com/support/documentation/white_papers/wp190.pdf [34]: Kirstein B., „Practical timing analysis for 100-MHz digital design,”, EDN, Aug. 8, 2002, www.edn.com [35]: Jacob B., Ng S. W., Wang D. T., Memory Systems, Elsevier, 2008 [36]: Allan G., „The outlook for DRAMs in consumer electronics”, EETIMES Europe Online, 01/12/2007, http://eetimes.eu/showArticle.jhtml?articleID=196901366&queryText =calibrated [37]: Ebeling C., Koontz T., Krueger R., „System Clock Management Simplified with Virtex-II Pro FPGAs”, WP190, Febr. 25 2003, Xilinx, http://www.xilinx.com/support/documentation/white_papers/wp190.pdf

71 9. References (5) [38]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1 [45]: Memory technology evolution: an overview of system memory technologies, Technology brief, 9 th edition, HP, Dec. 2010, http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00256987/c00256987.pdf [46]: Kane L., Nguyen H., Take the Lead with Jasper Forest, the Future Intel Xeon Processor for Embedded and Storage, IDF 2009, July 27 2009, ftp://download.intel.com/embedded/processor/prez/SF09_EMBS001_100.pdf [47]: The AMD Opteron™ 6000 Series Platform: More Cores, More Memory, Better Value, March 29 2010, http://www.slideshare.net/AMDUnprocessed/amd-opteron-6000-series -platform-press-presentation-final-3564470


Download ppt "Dezső Sima 2011 December (Ver. 1.5)  Sima Dezső, 2011 Platforms II."

Similar presentations


Ads by Google