Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dezső Sima, Olivér Asztalos 201 4 November (Ver. 1. 7 )  Sima Dezső, Olivér Asztalos 2012 -2014 Platforms I.

Similar presentations


Presentation on theme: "Dezső Sima, Olivér Asztalos 201 4 November (Ver. 1. 7 )  Sima Dezső, Olivér Asztalos 2012 -2014 Platforms I."— Presentation transcript:

1 Dezső Sima, Olivér Asztalos 201 4 November (Ver. 1. 7 )  Sima Dezső, Olivér Asztalos 2012 -2014 Platforms I.

2 Contents 2. Main components of platforms 1. Introduction to platforms 5. References 3. Platform architectures 4. Memory subsystem design considerations

3 1. Introduction to platforms 1.1. The notion of platform 1.2. Description of particular platforms 1.3. Representation forms of platforms 1.4. Compatibility of platform components

4 1.1. The notion of platform

5 The notion platform is widely used in different segments of the IT industry e.g. by IC manufacturers, system providers or even by software suppliers with different interpretations. Here we are focusing on the platform concept as used typically by system providers. 1.1 The notion of platform 1.1 The notion of platform (1)

6 Modular (unified) system design and the notion platform Core 2 Duo Core 2 Extreme (2C) 965 Series MCH ICH8 FSB DMI C-link Two memory channels DDR2-800/666/533 Two DIMMs per channel FSB: 1066/800/533 MT/s speed ME Figure: Intel’s Core 2 Duo (and Core 2 Extreme (the highest speed model) aimed DT platform (the Bridge Creek platform) Modular system design means that the system architecture is partitioned to a few standard components (modules), such as the processor, memory control hub (MCH), I/O control hub (ICH) that are interconnected by specified (standard) interconnections. 1.1 The notion of platform (2)

7 Modular system design became part of scientific research at the end of the 1990s, see e.g. [4]. The need for a modular system design, called platform design, arose in the PC industry when PCI-based system designs were substituted by port based system designs, about 1998-1999. Remark 1.1 The notion of platform (3)

8 Late PCI-based system architecture (~ 1998) (used typically with Pentium II/III (built around Intel’s 440xx chipset) System controller PCI bus Processor bus Main Memory (EDO/SDRAM) Peripheral controller PCI device adapter ISA device adapter ISA bus Pentium II/ Pentium III Pentium II/ Pentium III AGP 2xIDE/ATA33/66 2xUSB (Legacy and/or slow devices) System controller PCI bus Processor bus Main Memory (SDRAM) Peripheral controller PCI device adapter ISA device adapter ISA bus AGP 2xIDE/ 2x/4x USB Hub interface ATA 33/66/100 PCI to ISA bridge LPC Super I/O (KBD, MS, etc.) AC'97 Legacy devices Pentium III Early port-based system architecture (~ 1999) (used first with Pentium III (built around Intel’s 810 chipset) 1.1 The notion of platform (4)

9 Main goals of modular system level design to reduce the complexity of designing complex systems by partitioning it to modules, to have stable interfaces (at least for a few number of years) interconnecting the modules in this way Platform components are typically co-designed, announced and delivered as a set. Co-design of platform components to minimize design rework while upgrading a given system design, like moving from one processor generation to the next and thus to shorten the time to market. 1.1 The notion of platform (5)

10 The notion of platforms System providers however, may use the notion platform either in a more general or a more specific sense. Interpretation of the notion platform Interpretation in a more general sense Interpretation in a more specific sense A modular system design targeting a given application area, used as terms like DT or MP platforms. A particular modular system architecture, developed for a given application area, such as a given DT or MP platform, like Intel’s Sandy Bridge Based Sugar Bay DT platform or AMD’s Phenom II X! based Dragon platform (2008) for gamers (2009) 1.1 The notion of platform (6)

11 With the platform concept in mind manufacturers, like Intel or AMD will plan, design and market all key components of a platforms, such as the processor or the processors and the related chipset as an integrated entity [5]. This is beneficial for the manufacturers since it motivates OEMs as system providers, to buy all key parts of a computer system from the same manufacturer. Benefits of the platform concept for computer manufacturers 1.1 The notion of platform (7)

12 Benefits of the platform concept for customers The platform concept is beneficial for the customers as well since an integrated “backbone” of a system architecture promises a more reliable and more cost effective system. 1.1 The notion of platform (8)

13 In a more specific sense the notion platform refers to a particular modular system architecture, that is developed for a given application area, such as a DT, DP or MP platform. the processor or processors, the chipset, the memory subsystem (MSS) that is attached by a specific memory interface in some cases, such as in mobile or business oriented DT platforms also the networking component [7] as well the buses interconnecting the above components of the platform.. In this sense the notion platform is interpreted as a standardized backbone of a system architecture developed for a given application area that is built up typically of Subsequently, we will focus on the interpretation of the notion platform in this latter sense. Chipset Buses interconnecting the preceding basic components Processor or processors Basic components of a platform (LAN controller) 1.1 The notion of platform (9) Interpretation the notion platform in a more specific sense The memory subsystem

14 Example 1: Intel’s Core 2 aimed home user DT platform (Bridge Creek) [3] 2 DIMMs/channel card C-link 1066 MT/s Display Platform 1.1 The notion of platform (10)

15 Nehalem-EX 8C Westmere-EX 10C 7500 IOH QPI SMB DDR3-1067 SMB ICH10 ESI DDR3-1067 SMI: Serial link between the processors and SMBs SMB: Scalable Memory Buffer Parallel/serial conversion SMB 2x4 SMI channels 2x4 SMI channels Example 2: Intel’s Nehalem-EX aimed Boxboro-EX MP server platform, assuming 1 IOH ME ME: Management Engine Xeon 7500 (Nehalem-EX) (Becton) 8C Xeon 7-4800 (Westmere-EX) 10C Nehalem-EX 8C Westmere-EX 10C Nehalem-EX 8C Westmere-EX 10C Nehalem-EX 8C Westmere-EX 10C / Platform Interfaces connecting platform components 1.1 The notion of platform (11)

16 The structure of a platform is termed as its architecture (or topology). It describes the basic components and their interconnections and will be discussed in Section 3. 1.1 The notion of platform (12)

17 Historical remarks System providers began using the notion “platform” about 2000, like Philips’ Nexperia digital video platform (1999), Texas Intruments (TI) OMAP platform for SOCs (2002), Intel’s first generation mobile oriented Centrino platform for laptops, designated as the Carmel platform (3/2003). Intel contributed significantly for spreading the notion platform when based on the success of their Centrino platform they introduced this concept also for their desktops [5] and servers [6], [7] in 2004. 1.1 The notion of platform (13)

18 Intel’s early server and workstation roadmap from Aug. 2004 [6] Note a)This roadmap already makes use of the notion platform without revealing platform names. b) In 2004 Intel made a transition from 32 bit systems to 64 bit systems. 1.1 The notion of platform (14)

19 Intel’s multicore platform roadmap announced at the IDF Spring 2005 [8] Note This roadmap includes also the particular platform designations for desktops, UP servers etc. 1.1 The notion of platform (15)

20 1.2. Description of a particular platform

21 Description of a particular platform Detailing the platform architecture Description of a particular platform Example: The Tylersburg DT platform (2008) 1.2 Description of a particular platform (1) Processor MCH ICH

22 Detailing the platform architecture includes the specification architecture (topology) of the processor-, the memory- and the I/O subsystems (to be discussed in Section 3). 1.2 Description of a particular platform (2) Example: The Tylersburg DT platform (2008) Processor MCH ICH It is concerned with issues, such as whether the processors of an MP server are connected to the MCH via an FSB or otherwise, or whether the memory is attached to the system architecture through the MCH or through the processors etc.).

23 Identification of the platform components Description of a particular platform Detailing the platform architecture Description of a particular platform X58 IOH ICH10 1. gen. Nehalem (4C)/ Westmere-EP (6C) Example: The Tylersburg DT platform (2008) Processor MCH ICH 1.2 Description of a particular platform (3)

24 Identification of the platform components Description of a particular platform Specification of the interfaces interconnecting the platform components Detailing the platform architecture Description of a particular platform X58 IOH ICH10 1. gen. Nehalem (4C)/ Westmere-EP (6C) X58 IOH ICH10 QPI DMI 1. gen. Nehalem (4C)/ Westmere-EP (6C) Example: The Tylersburg DT platform (2008) 1.2 Description of a particular platform (4) Processor MCH ICH

25 The specification of a platform will be completed by the datasheets of the related platform components. Remark 1.2 Description of a particular platform (5)

26 Architecture of DT platforms Platform architecture Architecture of MP platforms Architecture of DP platforms Architecture of mobile platforms In these slides platform architectures will be discussed in Section 3, nevertheless restricted only for DT, DP and MP platforms. Dependence of the platform architecture on the platform category Of course, beyond the above categories also further processor categories and related platforms exist, such as embedded processors and related platforms. In conformity with different platform categories also different platform architectures arise, as indicated below. Platforms may be classified according to the target area of application, such as Desktop (DT) platforms Platforms Quad processor (MP) platforms Dual processor (DP) platforms Mobile platforms 1.2 Description of a particular platform (6)

27 1.3. Representation forms of platforms

28 1.3 Representation forms of platforms (1) 1.3 Representation forms of platforms a)Thumbnail representation b)Extended representation (an arbitrarily chosen representation form in these slides) c)Block diagram of a platform.

29 Core 2 Duo Core 2 Extreme (2C) 965 Series MCH ICH8 FSB DMI DDR2-800/666/566 C-link Two DDR2 channels FSB: 1066/800/566 MT/s speed ME Two DIMMs per channel Example In particular, the thumbnail representation reveals the platform architecture, identifies the basic components of a platform, such as the processor or processors, the chipset, in some cases (e.g. in mobile platforms) also the Gigabit Ethernet controller, and specifies the interconnection links (buses) between the platform components. Intel’s Core 2 Duo aimed home user oriented platform (The bridge Creek platform) 1.3 Representation forms of platforms (3) a) Thumbnail representation It is a concise representation of a particular platform.

30 7/2006 6/2006 965 Series 6/2006 (Broadwater) FSB 1066/800/566 MT/s 2 DDR2 channels DDR2-800/666/533 4 ranks/channel 8 GB max. Core 2-aimed (65 nm) E6xxx/E4xxx X6800 (Conroe: E6xxx/X6800) 1 Allendale: E4xxx) 1 Core 2 Extreme 2C Core 2 Duo 2C 65 nm Conroe: 291 mtrs/143 mm 2 Allendale: 167 mtrs/111 mm 2 Conroe: 4 MB/Allendale 2 MB L2 X6800/E6xxx: 1066 MT/s E4xxx: 800MT/s LGA775 ICH8 6/2006 Bridge Creek DP cores MCH ICH DT platform b) Extended representation This kind of representation indicates a few additional data of the processor and the chipset, (like data of the die, the cache system or the memory) reveals the dates of the introduction of platform components, and identifies compatibility ranges of processors or chipsets in platforms by encircling compatible components, but lacks the graphical representation of the platform. 1 The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007. 1.3 Representation forms of platforms (4)

31 Example for stating the compatibility range of a platform The Core 2 Duo aimed DT platform that targets home users (designated as the Bridge Creek platform). 1 The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007. 1.3 Representation forms of platforms (5) Core 2 Duo Core 2 Extreme (2C) 965 Series MCH ICH8 FSB DMI DDR2-800/666/566 C-link Two DDR2 channels FSB: 1066/800/566 MT/s speed ME Two DIMMs per channel the previous Pentium D/EE and Pentium 4 6x0/6x1/EE and the subsequent Core 2 Quad lines of processors, Beyond the target processor this platform may be used also with as shown in the next slides. Core 2-aimed (65 nm) 7/2006 6/2006 965 Series 6/2006 (Broadwater) FSB 1066/800/566 MT/s 2 DDR2 channels DDR2-800/666/533 4 ranks/channel 8 GB max. Core 2 Duo (2C) Core 2 Extr. (2C) Core 2 Duo (2C):E6xxx/E4xxx Core 2 Extreme (2C): X6800 E6xxx/X6800 1 : Conroe E4xxx) 1 : Allendale 65 nm Conroe: 291 mtrs/143 mm 2 Allendale: 167 mtrs/111 mm 2 Conroe: 4 MB/Allendale 2 MB L2 X6800/E6xxx: 1066 MT/s E4xxx: 800MT/s LGA775 ICH8 6/2006 Bridge Creek DT core MCH ICH DT platform

32 DT cores MCH ICH Pentium D/EE 8xx 1 (Smithfield) 2x1C 90 nm 2x115 mtrs 2x103 mm 2 2x1 MB L2 800/533 MT/s No multithreading LGA775 5/2005 Pentium D/EE 9xx 2,3 (Presler) 2x1C 65 nm 2x188 mtrs 2x81 mm 2 2x2 MB L2 1066/800 MT/s No multithreading LGA775 1/2006 Pentium 4 6x0/6x1/EE (Prescott-2M) 1C 90 nm 169 mtrs 135 mm 2 2 MB L2 800 MT/s Two-way multithreading LGA775 2/2005 1 Pentium EE 840 supports only 800 MT/s 2 Pentium D 9xx support only 800 MT/s 3 Pentium EE 955/965 supports only 1066 MT/s Supports also Pentium D/EE processors/90/65 nm Supports also Pentium 4 6x0/6x1/EE processors/90nm Support of Pentium 4/D/EE processors 1.3 Representation forms of platforms (6) Core 2-aimed (65 nm) 7/2006 6/2006 965 Series 6/2006 (Broadwater) FSB 1066/800/566 MT/s 2 DDR2 channels DDR2-800/666/533 4 ranks/channel 8 GB max. Core 2 Duo (2C) Core 2 Extr. (2C) Core 2 Duo (2C):E6xxx/E4xxx Core 2 Extreme (2C): X6800 E6xxx/X6800 1 : Conroe E4xxx) 1 : Allendale 65 nm Conroe: 291 mtrs/143 mm 2 Allendale: 167 mtrs/111 mm 2 Conroe: 4 MB/Allendale 2 MB L2 X6800/E6xxx: 1066 MT/s E4xxx: 800MT/s LGA775 ICH8 6/2006 Bridge Creek

33 11/2006 Core 2 Quad (2x2C): Q6xxx Q6xxx: Kentsfield 65 nm 2x291 mtrs/2x143 mm 2 2*4 MB L2 1066 MT/s LGA775 Core 2 Quad (2x2C) Supports also Core 2 Quad processors/65 nm Support of Core 2 Quad processors) 1.3 Representation forms of platforms (7) Core 2-aimed (65 nm) 7/2006 6/2006 965 Series 6/2006 (Broadwater) FSB 1066/800/566 MT/s 2 DDR2 channels DDR2-800/666/533 4 ranks/channel 8 GB max. Core 2 Duo (2C) Core 2 Extr. (2C) Core 2 Duo (2C):E6xxx/E4xxx Core 2 Extreme (2C): X6800 E6xxx/X6800 1 : Conroe E4xxx) 1 : Allendale 65 nm Conroe: 291 mtrs/143 mm 2 Allendale: 167 mtrs/111 mm 2 Conroe: 4 MB/Allendale 2 MB L2 X6800/E6xxx: 1066 MT/s E4xxx: 800MT/s LGA775 ICH8 6/2006 Bridge Creek DT core MCH ICH DT platform

34 c) Block diagram of a platform Example: The Core 2 aimed home user DT platform (Bridge Creek) (without an integrated display controller) [3] 2 DIMMs/channel card C-link 1066 MT/s Display 1.3 Representation forms of platforms (8)

35 1.4. Compatibility of platform components

36 1.4 Compatibility of platform components 1.4 Compatibility of platform components (1) One of the goals of platform based designs is to use stabilized interfaces (at least for a while) to minimize or eliminate design rework while moving from one processor generation to the next [2]. Consequently, assuming platform based designs, platform components, such as processors or chipsets of a given line are typically compatible with their previous or subsequent generations as long as the same interfaces are used and interface parameters (such FSB speed) or other implementation requirements (either from side of the components to be substituted or the substituting components) do not restrict this.

37 In the discussed DT platform the target processor is the Core 2, that is connected to the MCH by an FSB with 1066/800/533 MT/s. The target processor of the platform however, can be substituted either by processors of three previous generations or processors of the subsequent generation (Core 2 Quad) since all these processors have FSBs of 533/800/1066 MT/s, as shown before. 1.4 Compatibility of platform components (2) Limits of compatibility Nevertheless, The highest performance level Core 2 Quad, termed as the Core 2 Extreme Quad, provided already an increased FSB speed of 1333 MT/s and therefore was not more supported by the Core 2 aimed platform considered. Core 2 Duo Core 2 Extreme (2C) 965 Series MCH ICH8 FSB DMI C-link Two memory channels DDR2-800/666/533 Two DIMMs per channel FSB: 1066/800/533 MT/s ME

38 2. Basic components of platforms 2.1. Processors 2.2. The memory subsystem 2.3. Buses interconnecting platform components

39 the processor or processors, the chipset, the memory subsystem (MSS) that is attached by a specific memory interface, in some cases, such as in mobile or business oriented DT platforms also the networking component [7], as well as the buses interconnecting the above components. As already discussed in Section 1. the notion platform is interpreted as a standardized backbone of a system architecture developed for a given application area that is built up typically of Subsequently, we will discuss the following three basic components of platforms: Chipset Buses interconnecting the preceding basic components Processor or processors Basic components of a platform (LAN controller) 1.1 The notion of platform (6) Basic components of platforms - Overview Processors (Section 2.1) The memory subsystem (Section 2.2) and Buses interconnecting platform components (excluding memory buses) (Section 2.3). The memory subsystem

40 2.1. Processors

41 2.1 Processors (1) Figure 2.1: Overview of Intel ’ s Tick-Tock model (based on [17]) Key microarchitectural features Intel’s Tick-Tock model Adv. microarch., hyperthreading, 64-bit New microarch., 4-wide core, 128-bit SIMD, no hyperthreading 11/2007 New microarch., hyperthreading, (inclusive) L3, integrated MC, QPI 01/2006 90nm 130nm TICK TOCK 180nm 2 YEARS 65nm TICK Pentium 4 / Cedar Mill TOCK Core 2 2 YEARS New microarch. Adv. microarch., hyperthreading Pentium 4 /Northwood TICK TOCK Pentium 4 /Prescott Pentium 4 /Willamette 07/2006 11/2008 New microarch. hyperthreading, 256-bit AVX, integr. GPU, ring bus, 11/2000 01/2002 02/2004 01/2011 01/2010 32nm 45nm 2 YEARS 22nm 2 YEARS TICK PENRYN Family TOCK NEHALEM TICK WESTMERE TOCK SANDY BRIDGE TICK IVY BRIDGE TOCK HASWELL 04/2012

42 Basic architecturesBasic architectures and their shrinks Pentium 4 (Prescott) 2005 90 nm Pentium 4 2006 65 nm Pentium 4 Core 2 2006 65 nm Core 2 2007 45 nm Penryn Nehalem 2008 45 nm Nehalem 2010 32 nm Westmere Sandy Bridge 2011 32 nm Sandy Bridge 2012 22 nm Ivy Bridge Haswell201322 nm Haswell Basic architectures and their related shrinks Considered from the Pentium 4 Prescott (the third core of Pentium 4) on 2.1 Processors (2)

43 Basic Arch.Techn.Core/technologyCoresIntro.Cache arch.Interf. Core265 nm X6800 Conroe E6xxx Conroe E4xxx Allendale E6xxx Allendale QX67xx Kentsfield Q6xxx Kentsfield 2C 2x2C 2*2C 7/2006 1/2007 7/2007 11/2006 1/2007 4 MB L2/2C 2/4 MB L2/2C 4 MB L2 /2C 4 MB l2/2C FSB Penryn45 nm E8xxx Wolfdale E7xxx Wolfdale-3M QX9xxx Yorkfield XE Q9xxx Yorkfield Q9xxx Yorkfield-6M Q8xxx Yorkfield-4M 2C 2x2C 2*2C 2x2C 1/2008 4/2008 11/2007 1/2008 8/2008 6 MB L2/2C 3 MB L2/2C 6 MB L2/2C 3 MB L2/2C 2 MB L2/2C FSB 1. G. Nehalem-EP 45 nm i7-920-965 Bloomfield4C11/2008¼ MB L2/C, 8 MB L3QPI 2. G. Nehalem-EPi7-8xxx/i5-7xx Lynnfield4C9/2009¼ MB L2/C, 8 MB L3DMI Westmere-EP32 nm i7-9xxX Gulftown i7-9xx Gulftown i5-6xx/i3-5xx Clarkdale 6C 6C 2C+G 3/2010 7/2010 1/2010 ¼ MB L2/C, 12 MB L3 ¼ MB L2/C, max. 4 MB L2 QPI DMI Sandy Bridge32 nm i7-39/38xx i7-26/27xx i5-23/24/25xx Sandy Bridge i3-21xx 6C 2/4C+G 2C+G 11/2011 1/2011 ¼ MB L2/C, 15 MB L3 ¼ MB L2/C, 4/8 MB L3 ¼ MB L2/C, 3/6 MB L3 ¼ MB L2/C, 3 MB L3 DMI 2.0 PCIe 2.0 Ivy Bridge22 nm i7-3770 i5-33/34/35xx Iyv Brigde i3-32xx 4C+G 2/4C+G 2C 4/2012 9/2012 ¼ MB L2/C, 8 MB L3 ¼ MB L2/C, 6 MB L3 ¼ MB L2/C, 3 MB L3 DMI 2.0 PCIe 3.0 (PCIe 3.0) Table 2.1: Intel’s Core 2 based and subsequent multicore DT processor lines 2.1 Processors (5)

44 Basic Arch.Core/technologyDP server processors Pentium 4 (Prescott) Pentium 490 nm10/2005Paxville DP 2.82x1 C, 2 MB L2/C Pentium 465 nm5/20065000 (Dempsy)2x1 C, 2 MB L2/C Core 2 Core265 nm 6/2006 11/206 5100 (Woodchrest) 5300 (Clowertown) 1x2 C, 4 MB L2/C 2x2 C, 4 MB L2/C Penryn45 nm11/20075400 (Harpertown)2x2 C, 6 MB L2/2C Nehalem Nehalem-EP45 nm3/20095500 (Gainstown)1x4 C, ¼ MB L2/C, 8 MB L3 Westmere-EP32 nm3/201056xx (Gulftown)1x6 C, ¼ MB L2/C, 12 MB L3 Nehalem-EX45 nm3/2010 6500 (Beckton)1x8 C, ¼ MB L2/C, 24 MB L3 Westmere-EX32 nm 4/2011E7-28xx (Westmere-EX)1X10 C, ¼ MB L2/C, 30 MB L3 Sandy Bridge Sandy Bridge-EN32 nm5/2012E5-2xxx1x8 C, ¼ MB L2/C, 20 MB L3 Ivy Bridge22 nm Table 2.2: Overview of Intel’s multicore DP server processors 2.1 Processors (6)

45 Basic Arch.Core/technologyMP server processors Pentium 4 (Prescott) Pentium 490 nm11/2005Paxville MP2x1 C, 2 MB L2/C Pentium 465 nm8/20067100 (Tulsa)2x1 C, 1 MB L2/C 16 MB L3 Core 2 Core265 nm9/2007 7200 (Tigerton DC) 7300 (Tigerton QC) 1x2 C, 4 MB L2/C 2x2 C, 4 MB L2/C Penryn45 nm9/20087400 (Dunnington)1x6 C, 3 MB L2/2,C 16 MB L3 Nehalem Nehalem-EP 45 nm Westmere-EP 32 nm Nehalem-EX 45 nm3/20107500 (Beckton)1x8 C, ¼ MB L2/C, 24 MB L3 Westmere-EX 32nm 4/2011E7-48xx (Westmere-EX)1x10 C, ¼ MB L2/C, 30 MB L3 Sandy Bridge Sandy Bridge-EP 32 nm5/2012E5-4xxx1x8C, ¼ MB L2/C, 20 MB L3 Ivy Bridge22 nm Table 2.2: Overview of Intel’s multicore MP server processors 2.1 Processors (7)

46 2.2. The memory subsystem 2.2.1. Key parameters of the memory subsystem 2.2.2. Main attributes of the memory technology used 2.2.2.1. Overview: Main attributes of the memory technology used 2.2.2.2. Memory type 2.2.2.2. Speed grades 2.2.2.4. DIMM density 2.2.2.5. Use of ECC support 2.2.2.6. Use of registering

47 2.2.1 Key performance parameters of the memory subsystem (1) 2.2.1 Key performance parameters of the memory subsystem This issue will be discussed in Section 4.

48 2.2.2 Main attributes of the memory technology used Speed grade Use of registering Memory type Use of ECC support Main attributes of the memory technology used 2.2.2.1 Overview: Main attributes of the memory technology used DIMM density 2.2.2.2 Section 2.2.2.2 2.2.2.4 2.2.2.5 2.2.2.6 2.2.2 Main attributes of the memory technology used

49 2.2.2.2 Memory type (1) a) Overview: Main DRAM types 1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers 2.2.2.2 Memory type FB-DIMM (2006) DRDRAM (1999) DDR3 (2007) DDR2 (2004) DDR (2000) SDRAM (1996) XDR (2006) 1 Synchronous DRAMs DRAMs with serial bus connection DRAMs for general use DRAMs with parallel bus connection Main stream DRAM types Challenging DRAM types DRAM (1970) FPM (1983) FP (~1974) Asynchronous DRAMs EDO (1995) Commodity DRAMs DDR 4 (20 14 )

50 b) Synchronous DRAMs (SDRAM, DDR, DDR2, DDR3, DDR4) 2.2.2.2 Memory type (2)

51 All these DIMM modules are 64 bit ( 8-byte ) wide SDRAM to DDR 4 DIMMs 2.2.2.2 Memory type (3) SDRAM (SDR) DDR DDR2 DDR3 168-pin 184-pin 240- pin DDR 4 2 84 -pin

52 Memory Cell Array I/O Buffers Memory controller (MC) DRAM device Sourcing/sinking data to/from the I/O buffers at a rate of f Cell at a width of FW (Fetch Width) Receiving/transmitting data to/from the MC f Cell f CK Data transmission at a rate of f CK (SDRAM) or 2 x f CK (DDR to DDR 4 ) on the rising edge of the strobe (CK) for SDRAMs or on both edges of the strobe (DQS) for DDR/DDR2/DDR3 /DDR4. Principle of operation of synchronous DRAMs (SDRAM to DDR 4 memory chips) 2.3.2.2 Memory type (4)

53 When a new memory technology (e.g. DDR2 or DDR3) appears f Core is initially 100 MHz,.this sets the initial speed grade of f CK accordingly (e.g. to 400 MT/s for DDR2 or to 800 MT/s for DDR3). As memory technology evolves f Core will be raised from 100 MHz to 133, 167 and to 200 MHz. Along with f Core f CK and the final speed grade will also be raised. Raising f Cell from 100 MHz to 200 MHz characterizes the evolution of each memory technology f Cell is 100 to 200 MHz f CK stands in a given ratio with f Cell (the clock frequency of the memory cell array) as follows: The core clock frequency of the memory cell array (fcell) 4 x fcellDDR3 2 x fcellDDR2 fcellDDR fcellSDRAM f CK 8 x fcellDDR4 2.3.2.2 Memory type (5) The memory cell array sources/sinks data to/from the I/O buffers at a rate of f Cell, where f Cell is the clock frequency of the memory cell aray, at a data width of FW, where FW is the fetch width of the memory cell array. Sourcing/sinking data by the memory cell array

54 It specifies how many times more bits the cell array fetches per column cycle then the data width of the device (xn). E.g. a 4-bit wide DRAM device (x4 DRAM chip) with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array in every f Cell cycle. The fetch width (FW) of the memory cell array 2.3.2.2 Memory type (6) The fetch width (FW) of the memory cell array of synchronous DRAMs is as follows: 8DDR3 4DDR2 2DDR 1SDRAM FW 8DDR4 The DDR4 architecture is an 8n prefetch with two or four selectable bank groups. This design permit the DDR4 memory devices to have separate: activation, read, write or refresh operations underway in each unique bank group.

55 DDR3 SDRAM DDR4 SDRAM DDR4 devices have 16 banks organized in four groups compared to DDR3’s 8 independent banks. Source: http://www.chip.de/artikel/DDR4-RAM-So-funktioniert-der-neue-Arbeitsspeicher_68928617.html

56 DDR4 uses a point-to-point topology (one DIMM per channel) compared to its predecesors multi-drop bus (multiple DIMMs per channel). Figure 2.3: RAS feature comparison of DDR3 and DDR4 SDRAM Source: http://www.bit-tech.net/hardware/memory/2010/08/26/ddr4-what-we-can-expect/2 Source: http://www.samsung.com/global/business/semiconductor/file/media/DDR4_Brochure-0.pdf

57 DRAM core clock 100 MHz Clock (CK/CK#) 200 MHz Memory Cell Array I/O Buffers DDR2 SDRAM DDR2-400 2 x f CK f Cell 4xn bits n bits Data Strobe (DQS) 200 MHz Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 400 MT/s E.g. Memory Cell Array I/O Buffers DDR SDRAM DDR-200 f CK f Cell 2xn bits n bits Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 200 MT/s DRAM core clock 100 MHz Clock (CK/CK#) 100 MHz Data Strobe (DQS) 100 MHz E.g. DRAM core frequency 100 MHz Clock frequency (f CK ) 100 MHz Clock (CK) 100 MHz E.g. Memory Cell Array I/O Buffers SDRAM SDRAM-100 f CK f Cell n bits Data transfer on the rising edges of CK over the data lines (DQ0 - DQn-1) 100 MT/s

58 DRAM core clock 100 MHz Clock (CK/CK#) 400 MHz Memory Cell Array I/O Buffers DDR3 SDRAM DDR3-800 2 x f CK f Cell n bits 8xn bits Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 800 MT/s Data Strobe (DQS) 400 MHz E.g. DRAM core clock 100 MHz Clock (CK/CK#) 800 MHz Memory Cell Array I/O Buffers DDR4 SDRAM DDR4-1600 2 x f CK f Cell n bits 8xn bits Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 1600 MT/s Data Strobe (DQS) 800 MHz E.g. Buffers Memory Cell Array 8xn bits Buffers Bank group 0 Bank group 1 Bank group 0 Bank group 0 Bank group 1 Bank group 0 Bank group 0 Bank group 1 Bank group 0 Bank group 0 Bank group 1 Bank group 0

59 shorter signal rise/fall times higher speed grades but lower voltage budgethigher requirements for signal integrity Smaller voltage swings Q = C in x V = I x t t R ~ C in x V/I Q: Charge on the input capacitance of the line (C in ) C in : Input capacitance of the line V: Voltage I: Current strength of the driver t R : Rise time Relation between voltage swings and rise/fall times of signals Voltage/Voltage swing Memory type SDRAM DDR DDR2 DDR3 DDR4 3.3 V 2.5 V 1.8 V 1.5 V 1.2 V The main technique to increase memory speed 2.2.2.2 Memory type (9)

60 Figure 2.4: Signal types used in MMs for control, address and data signals LVDS: Low Voltage Differential Signaling LVTTL: Low Voltage TTL (D)RSL: (Differential) Rambus Signaling Level SSTL: Stub Series Terminated Logic V CM : Common Mode Voltage V REF : Reference Voltage Typ.voltage swings Smaller voltage swings Signaling system used Signaling used in buses 2.2.2.2 Memory type (9b) Signals LVTTL (3.3 V) FPM/EDO SDRAM HI1.5 TTL (5 V) FPM/EDO 3.3-5 V Single ended t SSTL SSTL2 (DDR) SSTL1.8 (DDR2) SSTL1.5 (DDR3) SSTL1.2 (DDR4) RSL (RDRAM) FSB 600-800 mV Voltage referenced t V REF LVDS PCIe QPI, DMI, ESI FB-DIMMs DRSL XDR (data) 200-300 mV Differential t S+ S- V CM

61 Figure 2.7: Signaling alternatives of buses used with memories FPM EDO SDRAM DDR DDR2 DDR3 DDR4 RDRAM FBDIMM Signaling of data lines Voltage ref. (RSL, SSTL) Differential (DRSL, LVDS) Single ended (TTL, LVTTL) XDR XDR2 Signaling of command, control and adress lines Voltage ref. (RSL, SSTL) Single ended (TTL, LVTTL) Differential (DRSL, LVDS) 2.2.2.2 Memory type (10)

62 Table 2.4: Key features of synchronous DRAM devices Key features of synchronous DRAM devices (SDRAM to DDR3) 2.2.2.2 Memory type (11)

63 Approximate appearance dates and speed grades of DDR DRAMs as well as the bandwidth provided by a dual channel memory subsystem 1 Bandwidth of a dual channel memory subsystem [12] 2.2.2.2 Memory type (12) Bandwidth 1 Source: http://www.samsung.com/global/business/semiconductor/file/media/DDR4_Brochure-0.pdf

64 Green and ultra-low power memories Green memories: lower dissipation memories Low voltage DDR3L memories: Use of 1.35 V supply voltage instead of 1.50 V to reduce dissipation They represents the latest achievements of the DRAM memory technology 2.2.2.2 Memory type (13) Ultra low voltage DDR3U memories: Use of 1.25 V supply voltage instead of 1.50 V to reduce dissipation

65 Green and ultra-low power memories- Examples [13] 2.2.2.2 Memory type (14)

66 c) FB-DIMMs 2.2.2.2 Memory type (15) 1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers FB-DIMM (2006) DRDRAM (1999) DDR3 (2007) DDR2 (2004) DDR (2000) SDRAM (1996) XDR (2006) 1 Synchronous DRAMs DRAMs with serial bus connection DRAMs for general use DRAMs with parallel bus connection Main stream DRAM types Challenging DRAM types DRAM (1970) FPM (1983) FP (~1974) Asynchronous DRAMs EDO (1995) DDR 4 (20 14 )

67 Introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses) Introduce full buffering (registered DIMMs buffer only addresses) CRC error checking (cyclic redundancy check) Principle of operation 2.2.2.2 Memory type (16)

68 The architecture of FB-DIMM memories [19] 2.2.2.2 Memory type (17)

69 Figure 2.8: Maximum supported FB-DIMM configuration [20] (6 channels/8 DIMMs) 2.2.2.2 Memory type (18)

70 Serial (differential) transmission between the North Bridge and the DIMMs (each bit needs a pair of wires) Read packets (frames, bursts): 168 bits (12 x 14 bits) 144 data bits (equals the number of data bits produced by a 72 bit wide DDR2 module (64 data bits + 8 ECC bits) in two memory cycles) 24 CRC bits. Every 12 cycles (that is every two memory cycles) constitute a packet. Write packets (frames, bursts): 120 bits (12 x 10 bits) 98 payload bits 22 CRC bits. Clocked at 6 x data rate of the DDR2 e.g. for a DDR-667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz Number of seral links 14 read lanes (2 wires each) 10 write lanes (2 wires each) Implementation details (1) 2.2.2.2 Memory type (19)

71 98 payload bits. 2 frame type bits, 24 bits of command, 72 bits for data and commands, according to the frame type, e.g. 72 bits of data, 36 bits of data + one command or two commands. Commands all commands include a 3-bit FB-DIMM module address to select one of 8 modules. Implementation details (2) 2.2.2.2 Memory type (20)

72 Source: PC stats FB-DIMM-4300 (DDR2-533 SDRAM); Clock Speed: 133MHz, Data Rate: 532MHz, Through-put 4300MB/s FB-DIMM-5300 (DDR2-667 SDRAM); Clock Speed: 167MHz, Data Rate: 667MHz, Through-put 5300MB/s FB-DIMM-6400 (DDR2-800 SDRAM); Clock Speed: 200MHz, Data Rate: 800MHz, Through-put 6400MB/s FB-DIMM data puffer Figure 2.9: Different implementations of FB-DIMMs (Advanced Memory Buffer, AMB) Manages the read/write operations of the module 2.2.2.2 Memory type (22)

73 (There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs) Figure 2.10: Block diagram of the AMB [21] 2.2.2.2 Memory type (23)

74 Necessary routing to connect the north bridge to the DIMM socket a) In case of a DDR2 DIMM (240 pins) b) In case of an FB-DIMM (69 pins) A 3-layer PCB is needed A 2-layer PCB is needed (but a 3. layer is used for power lines) Figure 2.11: PCB routing [19] 2.2.2.2 Memory type (24)

75 Assessing benefits and drawbacks of FB-DIMM memories (as compared to DDR2/3 memories) Benefits of FB-DIMMs higher memory size and bandwidth more DIMM modules (up to 8) per channel higher memory size (6x8=48 DIMM size) more memory channels (up to 6) Drawbacks of FB-DIMMs higher latency (Typical dissipation figures: DDR2: about 5 W AMB: about 5 W FB-DIMM with DDR2: about 10 W) higher cost higher dissipation asuming 8 GB/DIMM up to 512 GB same bandwidth figures as the parts based on (DDR2) 2.2.2.2 Memory type (25)

76 Latency [22] Due to their additional serialization tasks and daisy-chained nature FB-DIMMs have about 15 % higher overall average latency than DDR2 memories. Production The production of FB-DIMMs stopped with DDR2-800 modules, no DDR3 modules came to the market due to the drawbacks of the technology. 2.2.2.2 Memory type (26)

77 2.2.2.2 Speed grades (1) Overview of the speed grades of DDR DRAMs Bandwidth 1 1 Bandwidth of a dual channel memory subsystem [12] 2.2.2.2 Speed grades Bandwidth 1 Source: http://www.samsung.com/global/business/semiconductor/file/media/DDR4_Brochure-0.pdf

78 Then subsequent speed grades of FSBs and also those of the memories were chosen as subsequent integral multiples of 133 MHz, such as 266 = 2 x 133 400 ~= 3 x 133 533 ~= 4 x 133 667 ~= 5 x 133 800 ~= 6 x 133 1067 ~= 7 x 133 1333 ~= 8 x 133 1600 ~= 9 x 133 etc. Remark Speed grades of FSBs and DRAMs were defined at the time when the base clock frequency of the FSBs was 133 MHz (around 2000). 2.2.2.2 Speed grades (2)

79 Figure 2.12: The evolution of peak transfer rates of parallel connected synchronous DRAMs as manifested in Intel’s chipsets Transfer rate (MT/s) 50 100 500 Year 0305969798992000010204060708 * * * * * * * * 20 * 1000 SDRAM 66 2000 200 1500 10 ~ 10*/10 years DDR 266 DDR2 533 SDRAM 100 DDR3 1333 DDR2 667 DDR2 800 DDR 333 SDRAM 133 * DDR 400 * DDR3 1600 Rate of increasing the transfer rates in synchronous DRAMs 2.2.2.2 Speed grades (3) 0911101213 * DDR4 2133

80 Kind of attaching memory (In Intel’s MC systems, typically) Attaching memory by parallel channels Attaching memory by serial channels Using serial channels with S/P converters Memory is attached to the MCH Memory is attached to the processor(s) Up to DDR2-667 Up to DDR3-1600 Up to DDR3- 1600/2133 Memory speed grades used in Intel’s multicore systems Using FB-DIMMs Up to DDR2-667 2.2.2.2 Speed grades (4)

81 Figure 2.13: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [23]) 256M 64K 16M 1G 4M 256K 64M 1M 20151980198519901995200020052010 500 1000 1500 2000 16K Units 10 6 Year Density: ~4×/4Y a) Device density 2.2.2.4. DIMM density 2.2.2.4 DIMM density (1)

82 b) DIMM (module) density Based on device densities of 1 to 8 Gb and with typical width of x4 to x16 (bits) DDR 3 or DDR 4 modules provide typical densities of up to 8 or 16 GB. 2.2.2.4 DIMM density (2) DDR4 DIMM’s theoretical maximum capacity is 512 GB compared to DDR3’s 128 GB. 2 ranks x 8 high x 18 devices x 16 Gb (0,5 KB page size) vs. 2 ranks x 4 high x 18 devices x 8 Gb (2 KB page size)

83 Implemented as SEC-DED (Single Error Corretion Double Error Detection) Single bit Error Correction The minimum number of check-bits (P) for single bit error corection ? 2 P ≥ the minimum number of states to be distinguished. For D data bits P check-bits are added. Figure: The code word Requirement: Data bitsCheck bits 2.2.2.5 Use of ECC support (1) ECC basics (as used in DIMMs) D P 2.2.2.5 Use of ECC support

84 It is needed to specify the bit position of a possible single bit error in the code word consisting of both data and check bits This requires D + P states one additional state to specify the „no error” state. 2 P ≥ D + P + 1 The minimum number of states to be distinguished: the minimum number of states to be distinguished is: D + P + 1 to implement single bit error correction the minimum number of check bits (P) needs to satisfy the requirement: Accordingly: 2.2.2.5 Use of ECC support (2)

85 Double bit error detection an additional parity bit is needed to check for an additional error. Then the minimum number of check-bits (CB) needed for SEC-DED is: CB = P + 1 2 CB-1 ≥ D + CB -1 + 1 Table 2.5: The number of check-bits (CB) needed for D data bits since Data bits (D)Check bits (CB) 12 3:23 7:44 15:85 31:166 63:327 127:648 255:1289 511:25610 2 CB-1 ≥ D + CB 2 P ≥ D + P + 1 P = CB - 1 2.2.2.5 Use of ECC support (3)

86 Supported memory features of DT and DP/MP platforms DT memories typically do not support ECC or registered (buffered) DIMMs, Servers make typically use of registered DIMMs with ECC protection. 2.2.2.5 Use of ECC support (4)

87 Figure 2.14:Typical layout of a registered memory module with ECC [14] Two register chips, for buffering the address- and command lines A PLL (Phase Locked Loop) unit for deskewing clock distribution. Typical implementation of ECC protected registered DIMMs (used typically in servers) ECC Register PLL Main components 2.2.2.5 Use of ECC support (5)

88 2.2.2.6 Use of registering (1) Higher memory capacities need more modules Higher loading the lines Signal integrity problems Buffering address and command lines, Phase locked clocking of the modules Problems arising while implementing higher memory capacities 2.2.2.6 Use of registering

89 Registering Principle to reduce signal loading in a memory channel in order to increase the number of supported DIMM slots (max. mem. capacity), needed first of all in servers. Buffering address and control lines 2.2.2.6 Use of registering (2)

90 Figure 2.17: Example. Block diagram of a registered DDR DIMM [16] SDRAMSDRAM SDRAMSDRAM SDRAMSDRAM SDRAMSDRAM SDRAMSDRAM SDRAMSDRAM SDRAMSDRAM SDRAMSDRAM SDRAMSDRAM PI74SSTV168 57 Register PI74SSTV168 57 Register Address/Control form Motherboard Address Control from Motherboard PI6CV857 PLL Input Clock for Motherboard Data From / To Motherboard Example: Block diagram of a registered DDR DIMM 2.2.2.6 Use of registering (3)

91 Implementation of registering Figure 2.15: Registered signals in case of an SDRAM memory module [15] REGISTERREGISTER REGE: Register enable signal Note: Data (DQ) and data strobe (DQS) signals are not registered as only address an control signals are common for all memory chips. By means of a register chip that buffers address and control lines 2.2.2.6 Use of registering (4)

92 Number of register chips required Synchronous memory modules (SDRAM to DDR3 DIMMs) have about 20 – 30 address and control lines, Register chips buffer usually 14 lines, Typically, two register chips are needed per memory module [16]. 2.2.2.6 Use of registering (5)

93 Figure 2.16:Typical layout of a registered memory module with ECC [14] Two register chips, for buffering the address- and command lines A PLL (Phase locked loop) unit for deskewing clock distribution. Typical layout of registered DIMMs ECC Register PLL 2.2.2.6 Use of registering (6)

94 Figure 2.18: Registered DIMM module with ECC [14] Registered DIMM module with ECC ECC 2.2.2.6 Use of registering ( 7 )

95 in servers (Memory capacities: a few tens of GB to a few hundreds of GB) Typical use of registered DIMM (RDIMM) Typical use of unregistered DIMMs (UDIMMs) in desktops/laptops (Memory capacities: up to a few GB) 2.2.2.6 Use of registering ( 8 )

96 2.3. Buses interconnecting platform components

97 2.3 Buses interconnecting platform components (1) Buses interconnecting processors (In NUMA topologies) Buses interconnecting processors to chipsets Buses interconnecting MCHs to ICHs (In 2-part chipsets) Use of buses in Intel’s DT/DP and MP platforms 2.3 Buses interconnecting platform components Remark Buses connecting the memory subsystem with the main body of the platforms are memory specific interfaces and will be discussed in Section 4. Nehalem-EX (8C) Westmere-EX (10C) QPI DDR3-1067 SMB ICH10 ESI DDR3-1067 SMB 7500 IOH QPI Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores) Nehalem-EX (8C) Westmere-EX (10C) or Xeon 6500 (Nehalem-EX) (Becton) Xeon E7-2800 (Westmere-EX) ME SMI: Serial link between the processor and the SMB SMB: Scalable Memory Buffer with Parallel/serial conversion SMI links

98 Parallel/serial bus Parallel bus HI1.5 4-bit wide (4 PCIe lanes) Serial bus (Point-to-point interconnection) DMI (Direct Media Interface) ESI (Enterprise System Interface) DMI2 (Direct Media Interface 2.G.) FSB (Front Side Bus) 64-bit wide8-bit wide Used to interconnect processors to chipsets in previous platforms Used to interconnect MCHs to ICHs in previous platforms 16-bit wide QPI (Quick Path Interconnect) QPI1.1 (Quick Path Interconnect v.1.1) Used to interconnect processors to processors and processors to chipsets Used to interconnect processors to chipsets or MCHs to ICHs Implementation of buses used in Intel’s DT/DP and MP platforms 2.3 Buses interconnecting platform components (2)

99 Buses used in Intel’s DT/DP/MP platforms Buses interconnecting processors (In NUMA topologies) Buses interconnecting processors to chipsets Buses interconnecting MCHs to ICHs (In 2-parts chipsets) Serial bus Parallel/serial bus Parallel bus FSB (64-bit: 1993)HI 1.5 (1999) DMI/ESI (2004 1 ) QPI (2008) 64-bit wide ~150 lines 3.2-12.8 GB/s total in both directions 8-bit wide 16 lines 266 MB/s total in both directions 4 PCIe lanes 18 lines 1 GB/s/direction 4 PCIe lanes 18 lines 2 GB/s/direction DMI2 (2011) 20 lanes 84 lines 9.6/11.72/12.8 GB/s in each direction DMI/ESI (2008) 2 4 PCIe lanes 18 lines 1 GB/s/direction 4 PCIe lanes 18 lines 2 GB/s/direction DMI2 (2011) QPI (2008) 20 lanes 84 lines 9.6/11.72/12.8 GB/s in each direction QPI1.1 (2012?) Specification na. Low-cost systems High-performance systems 2.3 Buses interconnecting platform components (3)

100 1 DMI: Introduced as an interface between the MCH and the ICH first along with the ICH6, supporting Pentium 4 Prescott processors, in 2004. 2 DMI: Introduced as an interface between the processors and the chipset first between Nehalem-EP and the 34xxPCH, in 2008, after the memory controllers were placed to the processor die. Remarks 2.3 Buses interconnecting platform components (4)

101 Figure 2.4: Signal types used in MMs for control, address and data signals Signals Voltage referenced Single ended Differential LVDS: Low Voltage Differential Signaling LVTTL: Low Voltage TTL (D)RSL: (Differential) Rambus Signaling Level SSTL: Stub Series Terminated Logic V CM : Common Mode Voltage V REF : Reference Voltage t t V REF LVTTL (3.3 V) FPM/EDO SDRAM HI1.5 TTL (5 V) FPM/EDO SSTL SSTL2 (DDR) SSTL1.8 (DDR2) SSTL1.5 (DDR3) RSL (RDRAM) FSB LVDS PCIe QPI, DMI, ESI FB-DIMMs t S+ S- V CM Smaller voltage swings Typ.voltage swings 600-800 mV DRSL XDR (data) 200-300 mV 3.3-5 V Signaling system used Signaling used in buses 2.3 Buses interconnecting platform components (5)

102 Main features of parallel buses used in Intel’s multicore platforms FSBHI 1.5 Typical use Connecting the processors and the chipset Connecting MCH and ICH Introduced With the Pentium (1993)With the Pentium III (1999) Width 64 bit8 bit Clock 100-400 MHz66 MHz DDR/QDR QDR since Pentium 4 (2000)QDR Transfer rate 400-1600 MT/s266 MT/s Bandwidth 3.2-12.8 GB/s in both directions altogether 266 MB/s in both directions altogether Signaling Voltage referenced data signalsSingle-ended data signals No. of lines ~ 150 lines~ 16 lines FSB/HI 1.5: Bus type interconnects 2.3 Buses interconnecting platform components (6)

103 Main features of serial buses used in Intel’s platforms DMI/ESIDMI2QPIQPI 1.1 Typical use To interconnect MCHs and ICHs or processors to chipsets in NUMA platforms To interconnect processors in NUMA topologies or processors to chipsets Introduced In connection with 2. gen. Nehalem in 2008 In connection with Sandy Bridge in 2011 In connection with Nehalem-EP in 2008 In connection with Sandy Bridge in 2012 (?) Width 4 PCI lanes4 PCI2 lanes20 lanes No specification available yet Clock 2.5 GHz5 GHz2.4/2.93/3.2 GHz DDR –– Encoding 10bit/8bit no Bandwidth/ direction 1 GB/s2 GB/s9.6/11.72/12.8 GB/s Signaling LVDS No. of lines 18 lines 84 lines DMI/QPI: Point-to-point interconnection 2.3 Buses interconnecting platform components (7)

104 Comparing main features of Intel’s FSB and QPI [9] 2.3 Buses interconnecting platform components (8) GTL+: A kind of voltage refenced signaling

105 Figure 2.5: LVDS Single Link Interface Circuit [10] Principle of LVDS signal transmission used in serial buses 2.3 Buses interconnecting platform components (9)

106 PCI Express Data Frame [10] PCIe package format (data frames) The related fields are: FieldInterpretation Frame1-byte Start-of-Frame/End of Frame Seq#2-byte Sequence Number Header16- or 20-byte Header Data0-4096-byte Data field CRC 4 byte ECRC (End-to-End CRC) + 4-byte LCRC (Link CRC) (CRC: Cyclic Redundancy Check) 2.3 Buses interconnecting platform components (10)

107 16 data 2 protocol 2 CRC TX Unidirectional link RX Unidirectional link Figure 2.6: Signals of the QuickPath Interconnect bus (QPI-bus) [11] Principle of the QuickPath Interconnect bus (QPI bus) 2.3 Buses interconnecting platform components (11)

108 5. References

109 5. References (1) [1]: Wikipedia: Centrino, http://en.wikipedia.org/wiki/Centrino [2]: Industry Uniting Around Intel Server Architecture; Platform Initiatives Complement Strong Intel IA-32 and IA-64 Targeted Processor Roadmap for 1999, Business Wire, Febr. 24 1999, http://www.thefreelibrary.com/Industry+Uniting+Around+Intel+Server +Architecture%3B+Platform...-a053949226 [3]: Intel Core 2 Duo Processor, http://www.intel.com/pressroom/kits/core2duo/ [4]: Keutzer K., Malik S., Newton R., Rabaey J., Sangiovanni-Vincentelli A., System Level Design: Orthogonalization of Concerns and Platform-Based Design, IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, Dec. 2000, pp. 1-29. [5]: Krazit T., Intel Sheds Light on 2005 Desktop Strategy, IDG News Service, Dec. 07 2004, http://pcworld.about.net/news/Dec072004id118866.htm [6]: Perich D., Intel Volume platforms Technology Leadership, Presentation at HP World 2004, http://98.190.245.141:8080/Proceed/HPW04CD/papers/4194.pdf [7] Powerful New Intel Server Platforms Feature Array Of Enterprise-Class Innovations. Intel’s Press release, Aug. 2, 2004, http://www.intel.com/pressroom/archive/releases/2004/20040802comp.htm [8]: Smith S., Multi-Core Briefing, IDF Spring 2005, San Francisco, Press presentation, March 1 2005, http://www.silentpcreview.com/article224-page2 [9]: An Introduction to the Intel QuickPath Interconnect, Jan. 2009, http://www.intel.com/ content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf [10]: Davis L. PCI Express Bus, http://www.interfacebus.com/PCI-Express-Bus-PCIe-Description.html

110 5. References (2) [11]: Ng P. K., “High End Desktop Platform Design Overview for the Next Generation Intel Microarchitecture (Nehalem) Processor,” IDF Taipei, TDPS001, 2008, http://intel.wingateweb.com/taiwan08/published/sessions/TDPS001/FA08%20IDF- Taipei_TDPS001_100.pdf [12]: Computing DRAM, Samsung.com, http://www.samsung.com/global/business/semiconductor /products/dram/Products_ComputingDRAM.html [13]: Samsung’s Green DDR3 – Solution 3, 20nm class 1.35V, Sept. 2011, http://www.samsung.com/global/business/semiconductor/Greenmemory/Downloads/ Documents/downloads/green_ddr3_2011.pdf [14]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org [15]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf [16]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf [17]: Fisher S., “Technical Overview of the 45 nm Next Generation Intel Core Microarchitecture (Penryn),” IDF 2007, ITPS001, http://isdlibrary.intel-dispatch.com/isd/89/45nm.pdf [18]: Razin A., Core, Nehalem, Gesher. Intel: New Architecture Every Two Years, Xbit Laboratories, 04/28/2006, http://www.xbitlabs.com/news/cpu/display/20060428162855.html [19]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazine, March 2005, pp. 1-7

111 5. References (3) [22]: Ganesh B., Jaleel A., Wang D., Jacob B., Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, 2007, [20]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1 [21]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,” Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA-S009.pdf [23]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf


Download ppt "Dezső Sima, Olivér Asztalos 201 4 November (Ver. 1. 7 )  Sima Dezső, Olivér Asztalos 2012 -2014 Platforms I."

Similar presentations


Ads by Google