Download presentation
Presentation is loading. Please wait.
Published byChester McCarthy Modified over 6 years ago
1
Platforms I. Dezső Sima 2012 December (Ver. 1.5) Sima Dezső, 2011
2
Contents 1. Introduction to platforms 2. Main components of platforms
3. Platform architectures 4. Memory subsystem design considerations 5. References
3
1. Introduction to platforms
1.1. The notion of platform 1.2. Description of particular platforms 1.3. Representation forms of platforms 1.4. Compatibility of platform components
4
1.1. The notion of platform
5
1.1 The notion of platform (1)
The notion platform is widely used in different segments of the IT industry e.g. by IC manufacturers, system providers or even by software suppliers with different interpretations. Here we are focusing on the platform concept as used typically by system providers.
6
1.1 The notion of platform (2)
Modular (unified) system design and the notion platform Modular system design means that the system architecture is partitioned to a few standard components (modules), such as the processor, memory control hub (MCH), I/O control hub (ICH) that are interconnected by specified (standard) interconnections. Core 2 Duo Core 2 Extreme (2C) FSB FSB: 1066/800/533 MT/s speed 965 Series MCH Two memory channels DDR2-800/666/533 Two DIMMs per channel ME DMI C-link ICH8 Figure: Intel’s Core 2 Duo (and Core 2 Extreme (the highest speed model) aimed DT platform (the Bridge Creek platform)
7
1.1 The notion of platform (3)
Modular system design became part of scientific research at the end of the 1990s, see e.g. [4]. Remark The need for a modular system design, called platform design, arose in the PC industry when PCI-based system designs were substituted by port based system designs, about
8
1.1 The notion of platform (4)
Pentium II/ Pentium II/ Pentium III Pentium III Pentium III Processor bus Processor bus System Main Memory System Main Memory AGP AGP controller (EDO/SDRAM) controller (SDRAM) 2xIDE/ Hub interface ATA 33/66/100 PCI bus LPC Peripheral Super I/O (KBD, MS, etc.) 2xIDE/ATA33/66 controller AC'97 2x/4x USB PCI device (Legacy and/or Peripheral adapter slow devices) controller 2xUSB PCI bus PCI to ISA PCI device ISA bus bridge adapter ISA device ISA bus adapter ISA device Legacy devices adapter Late PCI-based system architecture (~ 1998) (used typically with Pentium II/III (built around Intel’s 440xx chipset) Early port-based system architecture (~ 1999) (used first with Pentium III (built around Intel’s 810 chipset) 8
9
1.1 The notion of platform (5)
Main goals of modular system level design to reduce the complexity of designing complex systems by partitioning it to modules, to have stable interfaces (at least for a few number of years) interconnecting the modules in this way to minimize design rework while upgrading a given system design, like moving from one processor generation to the next and thus to shorten the time to market. Co-design of platform components Platform components are typically co-designed, announced and delivered as a set.
10
Interpretation of the notion platform
1.1 The notion of platform (6) The notion of platforms System providers however, may use the notion platform either in a more general or a more specific sense. Interpretation of the notion platform Interpretation in a more general sense Interpretation in a more specific sense A modular system design targeting a given application area, used as terms like DT or MP platforms. A particular modular system architecture, developed for a given application area, such as a given DT or MP platform, like Intel’s Sandy Bridge Based Sugar Bay DT platform or AMD’s Phenom II X! based Dragon platform (2008) for gamers (2009)
11
1.1 The notion of platform (7)
Benefits of the platform concept for computer manufacturers With the platform concept in mind manufacturers, like Intel or AMD will plan, design and market all key components of a platforms, such as the processor or the processors and the related chipset as an integrated entity [5]. This is beneficial for the manufacturers since it motivates OEMs as system providers, to buy all key parts of a computer system from the same manufacturer. 11
12
1.1 The notion of platform (8)
Benefits of the platform concept for customers The platform concept is beneficial for the customers as well since an integrated “backbone” of a system architecture promises a more reliable and more cost effective system.
13
1.1 The notion of platform (9)
Interpretation the notion platform in a more specific sense In a more specific sense the notion platform refers to a particular modular system architecture, that is developed for a given application area, such as a DT, DP or MP platform. In this sense the notion platform is interpreted as a standardized backbone of a system architecture developed for a given application area that is built up typically of the processor or processors, the chipset, the memory subsystem (MSS) that is attached by a specific memory interface in some cases, such as in mobile or business oriented DT platforms also the networking component [7] as well the buses interconnecting the above components of the platform.. Basic components of a platform rendszerarchitektúra váza Processor or processors Chipset The memory subsystem (LAN controller) Buses interconnecting the preceding basic components Subsequently, we will focus on the interpretation of the notion platform in this latter sense.
14
1.1 The notion of platform (10)
Example 1: Intel’s Core 2 aimed home user DT platform (Bridge Creek) [3] Platform 1066 MT/s Display card 2 DIMMs/channel 2 DIMMs/channel C-link
15
1.1 The notion of platform (11)
Example 2: Intel’s Nehalem-EX aimed Boxboro-EX MP server platform, assuming 1 IOH Platform Xeon 7500 (Nehalem-EX) (Becton) 8C / Xeon (Westmere-EX) 10C SMB SMB SMB Nehalem-EX 8C Westmere-EX 10C SMB QPI Nehalem-EX 8C Westmere-EX 10C SMB SMB SMB SMB QPI QPI QPI QPI SMB SMB SMB Nehalem-EX 8C Westmere-EX 10C Nehalem-EX 8C Westmere-EX 10C SMB SMB QPI SMB SMB QPI QPI SMB 2x4 SMI channels 2x4 SMI channels 7500 IOH DDR3-1067 DDR3-1067 ME ESI SMI: Serial link between the processors and SMBs SMB: Scalable Memory Buffer Parallel/serial conversion Interfaces connecting platform components ICH10 ME: Management Engine
16
1.1 The notion of platform (12)
The structure of a platform is termed as its architecture (or topology). It describes the basic components and their interconnections and will be discussed in Section 3.
17
1.1 The notion of platform (13)
Historical remarks System providers began using the notion “platform” about 2000, like Philips’ Nexperia digital video platform (1999), Texas Intruments (TI) OMAP platform for SOCs (2002), Intel’s first generation mobile oriented Centrino platform for laptops, designated as the Carmel platform (3/2003). Intel contributed significantly for spreading the notion platform when based on the success of their Centrino platform they introduced this concept also for their desktops [5] and servers [6], [7] in 2004.
18
1.1 The notion of platform (14)
Intel’s early server and workstation roadmap from Aug [6] Note This roadmap already makes use of the notion platform without revealing platform names. b) In 2004 Intel made a transition from 32 bit systems to 64 bit systems.
19
1.1 The notion of platform (15)
Intel’s multicore platform roadmap announced at the IDF Spring 2005 [8] Note This roadmap includes also the particular platform designations for desktops, UP servers etc.
20
1.2. Description of a particular platform
Adott platform leírása
21
1.2 Description of a particular platform (1)
Detailing the platform architecture Example: The Tylersburg DT platform (2008) Processor MCH ICH
22
1.2 Description of a particular platform (2)
Detailing the platform architecture includes the specification architecture (topology) of the processor-, the memory- and the I/O subsystems (to be discussed in Section 3). Example: The Tylersburg DT platform (2008) Processor MCH ICH It is concerned with issues, such as whether the processors of an MP server are connected to the MCH via an FSB or otherwise, or whether the memory is attached to the system architecture through the MCH or through the processors etc.).
23
1.2 Description of a particular platform (3)
Detailing the platform architecture Identification of the platform components Example: The Tylersburg DT platform (2008) 1. gen. Nehalem (4C)/ Westmere-EP (6C) Processor X58 IOH MCH ICH10 ICH
24
1.2 Description of a particular platform (4)
Detailing the platform architecture Identification of the platform components Specification of the interfaces interconnecting the platform components Example: The Tylersburg DT platform (2008) 1. gen. Nehalem (4C)/ Westmere-EP (6C) 1. gen. Nehalem (4C)/ Westmere-EP (6C) Processor QPI X58 IOH X58 IOH MCH DMI ICH10 ICH10 ICH
25
1.2 Description of a particular platform (5)
Remark The specification of a platform will be completed by the datasheets of the related platform components.
26
1.2 Description of a particular platform (6)
Dependence of the platform architecture on the platform category Platforms may be classified according to the target area of application, such as Platforms Mobile platforms Desktop (DT) platforms Dual processor (DP) platforms Quad processor (MP) platforms Of course, beyond the above categories also further processor categories and related platforms exist, such as embedded processors and related platforms. In conformity with different platform categories also different platform architectures arise, as indicated below. Platform architecture Architecture of mobile platforms Architecture of DT platforms Architecture of DP platforms Architecture of MP platforms In these slides platform architectures will be discussed in Section 3, nevertheless restricted only for DT, DP and MP platforms.
27
1.3. Representation forms of platforms
28
1.3 Representation forms of platforms (1)
Thumbnail representation Roadmap like representation (an arbitrarily chosen representation form in these slides) Block diagram of a platform. bélyegkép ábrázolás, útvonalterv jellegü
29
1.3 Representation forms of platforms (3)
a) Thumbnail representation It is a concise representation of a particular platform. In particular, the thumbnail representation reveals the platform architecture, identifies the basic components of a platform, such as the processor or processors, the chipset, in some cases (e.g. in mobile platforms) also the Gigabit Ethernet controller, and specifies the interconnection links (buses) between the platform components. Example Core 2 Duo Core 2 Extreme (2C) FSB FSB: 1066/800/566 MT/s speed 965 Series MCH Two DDR2 channels DDR2-800/666/566 Two DIMMs per channel ME DMI C-link ICH8 Intel’s Core 2 Duo aimed home user oriented platform (The bridge Creek platform)
30
1.3 Representation forms of platforms (4)
6/2006 b) Roadmap like representation Bridge Creek DP cores MCH ICH DT platform This kind of representation 7/2006 indicates a few additional data of the processor and the chipset, (like data of the die, the cache system or the memory) reveals the dates of the introduction of platform components, and identifies compatibility ranges of processors or chipsets in platforms by encircling compatible components, but lacks the graphical representation of the platform. E6xxx/E4xxx X6800 (Conroe: E6xxx/X6800)1 Allendale: E4xxx)1 Core 2 Extreme 2C Core 2 Duo 2C 65 nm Conroe: 291 mtrs/143 mm2 Allendale: 167 mtrs/111 mm2 Conroe: 4 MB/Allendale 2 MB L2 X6800/E6xxx: 1066 MT/s E4xxx: 800MT/s LGA775 6/2006 965 Series (Broadwater) FSB 1066/800/566 MT/s 2 DDR2 channels DDR2-800/666/533 4 ranks/channel 8 GB max. 6/2006 ICH8 1The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007. Core 2-aimed (65 nm)
31
1.3 Representation forms of platforms (5)
Core 2-aimed (65 nm) 7/2006 6/2006 965 Series (Broadwater) FSB 1066/800/566 MT/s 2 DDR2 channels DDR2-800/666/533 4 ranks/channel 8 GB max. Core 2 Duo (2C) Core 2 Extr. (2C) Core 2 Duo (2C):E6xxx/E4xxx Core 2 Extreme (2C): X6800 E6xxx/X68001: Conroe E4xxx)1: Allendale 65 nm Conroe: 291 mtrs/143 mm2 Allendale: 167 mtrs/111 mm2 Conroe: 4 MB/Allendale 2 MB L2 X6800/E6xxx: 1066 MT/s E4xxx: 800MT/s LGA775 ICH8 Bridge Creek DT core MCH ICH DT platform Example for stating the compatibility range of a platform The Core 2 Duo aimed DT platform that targets home users (designated as the Bridge Creek platform). Core 2 Duo Core 2 Extreme (2C) FSB FSB: 1066/800/566 MT/s speed 965 Series MCH Two DDR2 channels DDR2-800/666/566 Two DIMMs per channel ME DMI C-link ICH8 Beyond the target processor this platform may be used also with the previous Pentium D/EE and Pentium 4 6x0/6x1/EE and the subsequent Core 2 Quad lines of processors, as shown in the next slides. 1The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007.
32
1.3 Representation forms of platforms (5)
Core 2-aimed (65 nm) 7/2006 6/2006 965 Series (Broadwater) FSB 1066/800/566 MT/s 2 DDR2 channels DDR2-800/666/533 4 ranks/channel 8 GB max. Core 2 Duo (2C) Core 2 Extr. (2C) Core 2 Duo (2C):E6xxx/E4xxx Core 2 Extreme (2C): X6800 E6xxx/X68001: Conroe E4xxx)1: Allendale 65 nm Conroe: 291 mtrs/143 mm2 Allendale: 167 mtrs/111 mm2 Conroe: 4 MB/Allendale 2 MB L2 X6800/E6xxx: 1066 MT/s E4xxx: 800MT/s LGA775 ICH8 Bridge Creek DT core MCH ICH DT platform Example for stating the compatibility range of a platform The Core 2 Duo aimed DT platform that targets home users (designated as the Bridge Creek platform). Core 2 Duo Core 2 Extreme (2C) FSB FSB: 1066/800/566 MT/s speed 965 Series MCH Two DDR2 channels DDR2-800/666/566 Two DIMMs per channel ME DMI C-link ICH8 Beyond the target processor this platform may be used also with the previous Pentium D/EE and Pentium 4 6x0/6x1/EE and the subsequent Core 2 Quad lines of processors, as shown in the next slides. 1The Allendale is a later stepping (Steppings L2/M0) of the Core 2 (Steppings B2/G0), that provided typically only 2 MB L2 and appeared 1/2007.
33
1.3 Representation forms of platforms (6)
Support of Pentium 4/D/EE processors 6/2006 Bridge Creek 2/2005 5/2005 1/2006 7/2006 Pentium 4 6x0/6x1/EE Pentium D/EE 8xx1 Pentium D/EE 9xx2,3 Core 2 Duo (2C) Core 2 Extr. (2C) DT cores (Prescott-2M) 1C (Smithfield) 2x1C (Presler) 2x1C Core 2 Duo (2C):E6xxx/E4xxx Core 2 Extreme (2C): X6800 E6xxx/X68001: Conroe E4xxx)1: Allendale 90 nm 169 mtrs 135 mm2 2 MB L2 800 MT/s Two-way multithreading LGA775 90 nm 2x115 mtrs 2x103 mm2 2x1 MB L2 800/533 MT/s No multithreading LGA775 65 nm 2x188 mtrs 2x81 mm2 2x2 MB L2 1066/800 MT/s No multithreading LGA775 65 nm Conroe: 291 mtrs/143 mm2 Allendale: 167 mtrs/111 mm2 Conroe: 4 MB/Allendale 2 MB L2 X6800/E6xxx: 1066 MT/s E4xxx: 800MT/s LGA775 6/2006 965 Series MCH (Broadwater) FSB 1066/800/566 MT/s 2 DDR2 channels DDR2-800/666/533 4 ranks/channel 8 GB max. 1Pentium EE 840 supports only 800 MT/s 2Pentium D 9xx support only 800 MT/s 3Pentium EE 955/965 supports only 1066 MT/s 6/2006 ICH ICH8 Supports also Pentium 4 6x0/6x1/EE processors/90nm Supports also Pentium D/EE processors/90/65 nm Core 2-aimed (65 nm)
34
1.3 Representation forms of platforms (7)
6/2006 Support of Core 2 Quad processors) Bridge Creek DT platform 11/2006 7/2006 Core 2 Duo (2C) Core 2 Extr. (2C) Core 2 Quad (2x2C) DT core Core 2 Duo (2C):E6xxx/E4xxx Core 2 Extreme (2C): X6800 E6xxx/X68001: Conroe E4xxx)1: Allendale Core 2 Quad (2x2C): Q6xxx Q6xxx: Kentsfield 65 nm Conroe: 291 mtrs/143 mm2 Allendale: 167 mtrs/111 mm2 Conroe: 4 MB/Allendale 2 MB L2 X6800/E6xxx: 1066 MT/s E4xxx: 800MT/s LGA775 65 nm 2x291 mtrs/2x143 mm2 2*4 MB L2 1066 MT/s LGA775 6/2006 MCH 965 Series (Broadwater) FSB 1066/800/566 MT/s 2 DDR2 channels DDR2-800/666/533 4 ranks/channel 8 GB max. 6/2006 ICH ICH8 Supports also Core 2 Quad processors/65 nm Core 2-aimed (65 nm)
35
1.3 Representation forms of platforms (8)
c) Block diagram of a platform Example: The Core 2 aimed home user DT platform (Bridge Creek) (without an integrated display controller) [3] 1066 MT/s Display card 2 DIMMs/channel 2 DIMMs/channel C-link
36
1.4. Compatibility of platform components
37
1.4 Compatibility of platform components (1)
One of the goals of platform based designs is to use stabilized interfaces (at least for a while) to minimize or eliminate design rework while moving from one processor generation to the next [2]. Consequently, assuming platform based designs, platform components, such as processors or chipsets of a given line are typically compatible with their previous or subsequent generations as long as the same interfaces are used and interface parameters (such FSB speed) or other implementation requirements (either from side of the components to be substituted or the substituting components) do not restrict this.
38
1.4 Compatibility of platform components (2)
Limits of compatibility In the discussed DT platform the target processor is the Core 2, that is connected to the MCH by an FSB with 1066/800/533 MT/s. The target processor of the platform however, can be substituted either by processors of three previous generations or processors of the subsequent generation (Core 2 Quad) since all these processors have FSBs of 533/800/1066 MT/s, as shown before. Core 2 Duo Core 2 Extreme (2C) FSB FSB: 1066/800/533 MT/s 965 Series MCH Two memory channels DDR2-800/666/533 Two DIMMs per channel ME DMI C-link ICH8 Nevertheless, The highest performance level Core 2 Quad, termed as the Core 2 Extreme Quad, provided already an increased FSB speed of 1333 MT/s and therefore was not more supported by the Core 2 aimed platform considered.
39
2. Basic components of platforms
2.1. Processors 2.2. The memory subsystem 2.3. Buses interconnecting platform components
40
1.1 The notion of platform (6)
Basic components of platforms - Overview As already discussed in Section 1. the notion platform is interpreted as a standardized backbone of a system architecture developed for a given application area that is built up typically of the processor or processors, the chipset, in some cases, such as in mobile or business oriented DT platforms also the networking component [7], the buses interconnecting the above components of the platform as well as the memory subsystem (MSS) that is attached by a specific memory interface.. Basic components of a platform Processor or processors Chipset The memory subsystem (LAN controller) Buses interconnecting the preceding basic components rendszerarchitektúra váza Subsequently, we will discuss the following three basic components of platforms: Processors (Section 2.1) Buses interconnecting platform components (excluding memory buses) (Section 2.2) and The memory subsystem (Section 2.2).
41
2.1. Processors
42
2.1 Processors (1) Intel’s Tick-Tock model
Key microarchitectural features TICK TOCK 2 YEARS Pentium 4 /Willamette 180nm 11/2000 New microarch. TICK TOCK 2 YEARS Pentium 4 /Northwood 130nm 01/2002 Adv. microarch., hyperthreading TICK TOCK 2 YEARS Pentium 4 /Prescott 90nm 02/2004 Adv. microarch., hyperthreading, 64-bit TICK Pentium 4 / Cedar Mill 01/2006 2 YEARS 65nm TOCK Core 2 New microarch., 4-wide core, 128-bit SIMD, no hyperthreading 07/2006 TICK PENRYN Family 11/2007 2 YEARS 45nm TOCK NEHALEM New microarch., hyperthreading, (inclusive) L3, integrated MC, QPI 11/2008 TICK WESTMERE 01/2010 2 YEARS 32nm TOCK SANDY BRIDGE New microarch. hyperthreading, 256-bit AVX, integr. GPU, ring bus, 01/2011 TICK IVY BRIDGE 04/2012 2 YEARS 22nm TOCK HASWELL Figure 2.1: Overview of Intel’s Tick-Tock model (based on [17])
43
Basic architectures and their shrinks
2.1 Processors (2) Basic architectures and their related shrinks Considered from the Pentium 4 Prescott (the third core of Pentium 4) on Basic architectures Basic architectures and their shrinks Pentium 4 (Prescott) nm Pentium 4 nm Pentium 4 Core 2 nm Core 2 nm Penryn Nehalem nm Nehalem nm Westmere Sandy Bridge nm Sandy Bridge nm Ivy Bridge Haswell nm Haswell
44
revealed at a shareholder’s meeting back in 4/2006 [18]
2.1 Processors (4) In 2003 Intel shifted the focus of their processor development from the performance goal to the aspect of performance per watt, as stated in a slide from 4/2006, see below. Figure 2.2: Intel’s plan to develop their manufacturing technology and processor lines revealed at a shareholder’s meeting back in 4/2006 [18]
45
2.1 Processors (5) Basic Arch. Techn. Core/technology Cores Intro. Cache arch. Interf. Core2 65 nm X Conroe E6xxx Conroe E4xxx Allendale E6xxx Allendale QX67xx Kentsfield Q6xxx Kentsfield 2C 2x2C 2*2C 7/2006 7/2006 1/2007 7/2007 11/2006 4 MB L2/2C 2/4 MB L2/2C 4 MB L2 /2C 4MB L2/2C 4 MB l2/2C FSB Penryn 45 nm E8xxx Wolfdale E7xxx Wolfdale-3M QX9xxx Yorkfield XE Q9xxx Yorkfield Q9xxx Yorkfield-6M Q8xxx Yorkfield-4M 1/2008 4/2008 11/2007 8/2008 6 MB L2/2C 3 MB L2/2C 2 MB L2/2C 1. G. Nehalem-EP i Bloomfield 4C 11/2008 ¼ MB L2/C, 8 MB L3 QPI 2. G. Nehalem-EP i7-8xxx/i5-7xx Lynnfield 9/2009 DMI Westmere-EP 32 nm i7-9xxX Gulftown i7-9xx Gulftown i5-6xx/i3-5xx Clarkdale 6C 6C 2C+G 3/2010 7/2010 1/2010 ¼ MB L2/C, 12 MB L3 ¼ MB L2/C, max. 4 MB L2 Sandy Bridge i7-39/38xx i7-26/27xx i5-23/24/25xx Sandy Bridge i3-21xx 2/4C+G 2C+G 11/2011 1/2011 ¼ MB L2/C, 15 MB L3 ¼ MB L2/C, 4/8 MB L3 ¼ MB L2/C, 3/6 MB L3 ¼ MB L2/C, 3 MB L3 DMI 2.0 PCIe 2.0 Ivy Bridge 22 nm i7-3770 i5-33/34/35xx Iyv Brigde i3-32xx 4C+G 4/2012 9/2012 ¼ MB L2/C, 6 MB L3 PCIe 3.0 (PCIe 3.0) Table 2.1: Intel’s Core 2 based and subsequent multicore DT processor lines
46
2.1 Processors (6) Basic Arch. Core/technology DP server processors Pentium 4 (Prescott) Pentium 4 90 nm 10/2005 Paxville DP 2.8 2x1 C, 2 MB L2/C Pentium 4 65 nm 5/2006 5000 (Dempsy) Core 2 Core2 65 nm 6/2006 11/206 5100 (Woodchrest) 5300 (Clowertown) 1x2 C, 4 MB L2/C 2x2 C, 4 MB L2/C Penryn 45 nm 11/2007 5400 (Harpertown) 2x2 C, 6 MB L2/2C Nehalem Nehalem-EP 45 nm 3/2009 5500 (Gainstown) 1x4 C, ¼ MB L2/C, 8 MB L3 Westmere-EP 32 nm 3/2010 56xx (Gulftown) 1x6 C, ¼ MB L2/C, 12 MB L3 Nehalem-EX 45 nm 6500 (Beckton) 1x8 C, ¼ MB L2/C, 24 MB L3 Westmere-EX 32 nm 4/2011 E7-28xx (Westmere-EX) 1X10 C, ¼ MB L2/C, 30 MB L3 Sandy Bridge Sandy Bridge-EN 32 nm 5/2012 E5-2xxx 1x8 C, ¼ MB L2/C, 20 MB L3 Ivy Bridge 22 nm Table 2.2: Overview of Intel’s multicore DP server processors
47
Table 2.2: Overview of Intel’s multicore MP server processors
Basic Arch. Core/technology MP server processors Pentium 4 (Prescott) Pentium 4 90 nm 11/2005 Paxville MP 2x1 C, 2 MB L2/C Pentium 4 65 nm 8/2006 7100 (Tulsa) 2x1 C, 1 MB L2/C 16 MB L3 Core 2 Core2 65 nm 9/2007 7200 (Tigerton DC) 7300 (Tigerton QC) 1x2 C, 4 MB L2/C 2x2 C, 4 MB L2/C Penryn 45 nm 9/2008 7400 (Dunnington) 1x6 C, 3 MB L2/2,C 16 MB L3 Nehalem Nehalem-EP nm Westmere-EP nm Nehalem-EX nm 3/2010 7500 (Beckton) 1x8 C, ¼ MB L2/C, 24 MB L3 Westmere-EX nm 4/2011 E7-48xx (Westmere-EX) 1x10 C, ¼ MB L2/C, 30 MB L3 Sandy Bridge Sandy Bridge-EP 32 nm 5/2012 E5-4xxx 1x8C, ¼ MB L2/C, 20 MB L3 Ivy Bridge 22 nm Table 2.2: Overview of Intel’s multicore MP server processors
48
2.2. The memory subsystem Key parameters of the memory subsystem Main attributes of the memory technology used Overview: Main attributes of the memory technology used Memory type Speed grades DIMM density Use of ECC support Use of registering
49
2.2.1 Key performance parameters of the memory subsystem (1)
This issue will be discussed in Section 4.
50
2.2.2 Main attributes of the memory technology used
Overview: Main attributes of the memory technology used Main attributes of the memory technology used Memory type Speed grade DIMM density Use of ECC support Use of registering Section 50
51
DRAMs with parallel bus connection DRAMs with serial bus connection
Memory type (1) Memory type a) Overview: Main DRAM types 1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers DRAM (1970) FB-DIMM (2006) DRDRAM (1999) DDR3 (2007) DDR2 (2004) DDR (2000) SDRAM (1996) FPM (1983) FP (~1974) XDR (2006)1 Year of intro. Asynchronous DRAMs Synchronous DRAMs DRAMs with parallel bus connection DRAMs with serial bus connection DRAMs for general use Main stream DRAM types Challenging DRAM types EDO (1995) Commodity DRAMs
52
Memory type (2) b) Synchronous DRAMs (SDRAM, DDR, DDR2, DDR3)
53
2.2.2.2 Memory type (3) SDRAM to DDR3 DIMMs SDRAM 168-pin DDR 184-pin
All these DIMM modules are 8-byte wide 53
54
Memory type (4) Principle of operation of synchronous DRAMs (SDRAM to DDR3 memory chips) DRAM device Memory Cell Array I/O Buffers fCell Memory controller (MC) fCK Sources/sinks data to/from the I/O buffers Receives/transmit data to/from the MC Data transmission on the rising edge of the strobe (CK) for SDRAMs or on both edges of the strobe (DQS) for DDR/DDR2/DDR3. at a rate of fCell at a width of FW at a rate of fCK (SDRAM) or 2 x fclock (DDR to DDR3)
55
2.2.2.2 Memory type (5) Sourcing/sinking data by the memory cell array
The memory cell array sources/sinks data to/from the I/O buffers at a rate of fCell, where fCell is the clock frequency of the memory cell aray, at a data width of FW, where FW is the fetch width of the memory cell array. The core clock frequency of the memory cell array (fcell) fCell is 100 to 200 MHz It stands in a given ratio with the clock frequency of the memory device (fCK) as follows: fCK SDRAM fcore DDR DDR2 2 x fcore DDR3 4 x fcore Raising fCell from 100 MHz to 200 MHz characterizes the evolution of each memory technology When a new memory technology (e.g. DDR2 or DDR3) appears fCore is initially 100 MHz, .this sets the initial speed grade of fCK accordingly (e.g. to 400 MT/s for DDR2 or to 800 MT/s for DDR3). As memory technology evolves fCore will be raised from 100 MHz to 133, 167 and to 200 MHz. Along with fCore fCK and the final speed grade will also be raised.
56
2.2.2.2 Memory type (6) The fetch width (FW) of the memory cell array
It specifies how many times more bits the cell array fetches per column cycle then the data width of the device (xn). E.g. a 4-bit wide DRAM device (x4 DRAM chip) with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array in every fCell cycle. The fetch width (FW) of the memory cell array of synchronous DRAMs is as follows: DRAM type FW SDRAM 1 DDR 2 DDR2 4 DDR3 8
57
Memory type (7) Transferring data between the I/O Buffers and the Memory Controller Data transmission between the I/O buffers and the Memory Controller is clocked by a frequency of fCK. Data transmission occurs for SDRAMs at the rising edge of the strobe signal (CK) for DDR/DDR2/DDR3 at both edges of the strobe signal (DQS), designated as the Double Data Rate transfer) The final transfer rate (speed grade) results in fCK for SDRAMs 2 x fCK for DDR/DDR2/DDR3 Accordingly, typical speed grade ranges cover 100 to 200 MT/s for SDRAM devices, 200 to 400 MT/s for DDR devices, 400 to 800 MT/s for DDR2 devices and 800 to 1600 MT/s for DDR3 devices.
58
58 Memory Cell Array I/O Buffers SDRAM fCell fCK
DRAM core frequency 100 MHz Clock frequency (fCK) 100 MHz Clock (CK) 100 MHz E.g. Memory Cell Array I/O Buffers SDRAM fCell fCK Data transfer on the rising edges of CK over the data lines (DQ0 - DQn-1) 100 MT/s SDRAM-100 n bits n bits DRAM core clock 100 MHz Clock (CK/CK#) 100 MHz Data Strobe (DQS) 100 MHz E.g. Memory Cell Array I/O Buffers DDR SDRAM fCell fCK Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 200 MT/s DDR-200 n bits 2xn bits DRAM core clock 100 MHz Clock (CK/CK#) 200 MHz Data Strobe (DQS) 200 MHz E.g. Memory Cell Array fCell I/O Buffers DDR2 SDRAM 2 x fCK Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 400 MT/s DDR2-400 n bits 4xn bits DRAM core clock 100 MHz Clock (CK/CK#) 400 MHz Data Strobe (DQS) 400 MHz E.g. fCell Memory Cell Array I/O Buffers DDR3 SDRAM 2 x fCK Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 800 MT/s DDR3-800 n bits 8xn bits 58
59
2.2.2.2 Memory type (9) The main technique to increase memory speed
Relation between voltage swings and rise/fall times of signals Q = Cin x V = I x t tR ~ Cin x V/I Q: Charge on the input capacitance of the line (Cin) Cin: Input capacitance of the line V: Voltage I: Current strength of the driver tR: Rise time shorter signal rise/fall times higher speed grades Smaller voltage swings but lower voltage budget higher requirements for signal integrity Memory type Voltage/Voltage swing SDRAM DDR DDR2 DDR3 3.3 V 2.5 V 1.8 V 1.5 V 59
60
Signaling of command, control and adress lines
Memory type (10) Signaling of command, control and adress lines Single ended (TTL, LVTTL) Voltage ref. (RSL, SSTL) Differential (DRSL, LVDS) Signaling of data lines Voltage ref. (RSL, SSTL) Differential (DRSL, LVDS) Single ended (TTL, LVTTL) FPM EDO SDRAM DDR DDR2 DDR3 RDRAM XDR XDR2 FBDIMM Figure 2.7: Signaling alternatives of buses used with memories 60
61
Memory type (11) Key features of synchronous DRAM devices (SDRAM to DDR3) SDRAM DDR SDRAM DDR2 SDRAM DDR3 SDRAM JEDEC standard JESD 21-C Release 4 JESD 79 JESD 79-2 JESD 79-3 Key features Synchronous, pipelined, burst oriented Double data rate 2n prefetch architecture 4n prefetch architecture 8n pref. architecture Standard First/last release JESD 21-C Release 4 11/1993 JESD 79 6/2000 JESD 79E 5/2005 JESD /2003 JESD 79-2C 5/2006 JESD /2007 Device density 64 Mb 128 Mb - 1Gb 256 Mb - 4 Gb 256 Mb – 4 Gb 512 Mb – 8Gb Organization x4/8/16 Device speed (MT/s) 66 100/133 200/266 200/266/ 333/400 400/533/ 667/800 800/1066/ 1333/1600 4/16 Mb Mb x8/16 Mb x8/16 Mb x8/16 256 Mb – 1 Gb x8/16 256 Mb -1 Gb x8/16 512 Mb – 16 Gb Typ. processors Pentium (3V) Pentium III P4 (Willamette) P4 (Northwood) P4 (Prescott) P4 (Prescott) P4 (Presler) Pentium D Core2 Duo Core2 Duo to Sandy Bridge Voltage 3.3 V 2.5 V 1.8 V 1.5 V No. of pins on the modul 168 184 240 Table 2.4: Key features of synchronous DRAM devices 61
62
Memory type (12) Approximate appearance dates and speed grades of DDR DRAMs as well as the bandwidth provided by a dual channel memory subsystem Bandwidth1 1 Bandwidth of a dual channel memory subsystem [12] 62
63
2.2.2.2 Memory type (13) Green and ultra-low power memories
They represents the latest achievements of the DRAM memory technology Green memories: lower dissipation memories Ultra-low-power DDR3 memories: Use of 1.35 V supply voltage instead of 1.50 V to reduce dissipation
64
Memory type (14) Green and ultra-low power memories- Examples [13]
65
DRAMs with parallel bus connection DRAMs with serial bus connection
Memory type (15) c) FB-DIMMs DRAMs for general use DRAMs with parallel bus connection DRAMs with serial bus connection Asynchronous DRAMs Synchronous DRAMs DRAM (1970) FP (~1974) FPM (1983) EDO (1995) SDRAM (1996) DDR (2000) DDR2 (2004) DDR3 (2007) DRDRAM (1999) XDR (2006)1 FB-DIMM (2006) Year of intro. Main stream DRAM types Challenging DRAM types 1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers
66
2.2.2.2 Memory type (16) Principle of operation
Introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses) Introduce full buffering (registered DIMMs buffer only addresses) CRC error checking (cyclic redundancy check) 66
67
Memory type (17) The architecture of FB-DIMM memories [19] 67
68
Figure 2.8: Maximum supported FB-DIMM configuration [20]
Memory type (18) Figure 2.8: Maximum supported FB-DIMM configuration [20] (6 channels/8 DIMMs)
69
2.2.2.2 Memory type (19) Implementation details (1)
Serial (differential) transmission between the North Bridge and the DIMMs (each bit needs a pair of wires) Number of seral links 14 read lanes (2 wires each) 10 write lanes (2 wires each) Clocked at 6 x data rate of the DDR2 e.g. for a DDR-667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz Every 12 cycles (that is every two memory cycles) constitute a packet. Read packets (frames, bursts): 168 bits (12 x 14 bits) 144 data bits (equals the number of data bits produced by a 72 bit wide DDR2 module (64 data bits + 8 ECC bits) in two memory cycles) 24 CRC bits. Write packets (frames, bursts): 120 bits (12 x 10 bits) 98 payload bits 22 CRC bits. 69
70
2.2.2.2 Memory type (20) Implementation details (2) 98 payload bits.
2 frame type bits, 24 bits of command, 72 bits for data and commands, according to the frame type, e.g. 72 bits of data, 36 bits of data + one command or two commands. Commands all commands include a 3-bit FB-DIMM module address to select one of 8 modules. 70
71
(Advanced Memory Buffer, AMB)
Memory type (22) FB-DIMM data puffer (Advanced Memory Buffer, AMB) Manages the read/write operations of the module Source: PC stats FB-DIMM-4300 (DDR2-533 SDRAM); Clock Speed: 133MHz, Data Rate: 532MHz, Through-put 4300MB/s FB-DIMM-5300 (DDR2-667 SDRAM); Clock Speed: 167MHz, Data Rate: 667MHz, Through-put 5300MB/s FB-DIMM-6400 (DDR2-800 SDRAM); Clock Speed: 200MHz, Data Rate: 800MHz, Through-put 6400MB/s Figure 2.9: Different implementations of FB-DIMMs 71
72
Figure 2.10: Block diagram of the AMB [21]
Memory type (23) Figure 2.10: Block diagram of the AMB [21] (There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs) 72
73
2.2.2.2 Memory type (24) Figure 2.11: PCB routing [19]
Necessary routing to connect the north bridge to the DIMM socket b) In case of an FB-DIMM (69 pins) a) In case of a DDR2 DIMM (240 pins) A 3-layer PCB is needed A 2-layer PCB is needed (but a 3. layer is used for power lines) Figure 2.11: PCB routing [19] 73
74
Memory type (25) Assessing benefits and drawbacks of FB-DIMM memories (as compared to DDR2/3 memories) Benefits of FB-DIMMs more memory channels (up to 6) higher memory size and bandwidth more DIMM modules (up to 8) per channel higher memory size (6x8=48 DIMM size) asuming 8 GB/DIMM up to 512 GB same bandwidth figures as the parts based on (DDR2) Drawbacks of FB-DIMMs higher latency higher dissipation (Typical dissipation figures: DDR2: about 5 W AMB: about 5 W FB-DIMM with DDR2: about 10 W) higher cost 74
75
2.2.2.2 Memory type (26) Latency [22]
Due to their additional serialization tasks and daisy-chained nature FB-DIMMs have about 15 % higher overall average latency than DDR2 memories. Production The production of FB-DIMMs stopped with DDR2-800 modules, no DDR3 modules came to the market due to the drawbacks of the technology. 75
76
2.2.2.2 Speed grades (1) 2.2.2.2 Speed grades
Overview of the speed grades of DDR DRAMs Bandwidth1 1 Bandwidth of a dual channel memory subsystem [12] 76
77
Speed grades (2) Remark Speed grades of FSBs and DRAMs were defined at the time when the base clock frequency of the FSBs was 133 MHz (around 2000). Then subsequent speed grades of FSBs and also those of the memories were chosen as subsequent integral multiples of 133 MHz, such as 266 = 2 x 133 400 ~= 3 x 133 533 ~= 4 x 133 667 ~= 5 x 133 800 ~= 6 x 133 1067 ~= 7 x 133 1333 ~= 8 x 133 1600 ~= 9 x 133 etc. 77
78
as manifested in Intel’s chipsets
Speed grades (3) Rate of increasing the transfer rates in synchronous DRAMs Transfer rate (MT/s) 50 100 500 Year 03 05 96 97 98 99 2000 01 02 04 06 07 08 * 20 1000 SDRAM 66 5000 200 10 ~ 10*/10 years DDR 266 DDR2 533 DDR3 1333 667 DDR2 800 333 133 400 1600 Figure 2.12: The evolution of peak transfer rates of parallel connected synchronous DRAMs as manifested in Intel’s chipsets 78
79
Kind of attaching memory
Speed grades (4) Memory speed grades used in Intel’s multicore systems Kind of attaching memory (In Intel’s MC systems, typically) Attaching memory by parallel channels Attaching memory by serial channels Memory is attached to the MCH Memory is attached to the processor(s) Using serial channels with S/P converters Using FB-DIMMs Up to DDR2-667 Up to DDR3-1600 Up to DDR2-667 Up to DDR3-1067 79
80
2.2.2.4 DIMM density (1) 4M 16M 64M 256M 1G 256K 1M 64K 16K
a) Device density 256M 64K 16M 1G 4M 256K 64M 1M 2015 1980 1985 1990 1995 2000 2005 2010 500 1000 1500 16K Units 106 Year Density: ~4×/4Y Figure 2.13: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [23])
81
2.2.2.4 DIMM density (2) b) DIMM (module) density
Based on device densities of 1 to 4 8 Gb and with typical width of x4 to x16 (bits) DDR2 or DDR3 modules provide typical densities of up to 8 or 16 GB.
82
2.2.2.5 Use of ECC support (1) 2.2.2.5 Use of ECC support ECC basics
(as used in DIMMs) Implemented as SEC-DED (Single Error Corretion Double Error Detection) Single bit Error Correction For D data bits P check-bits are added. Data bits Check bits D P Figure: The code word The minimum number of check-bits (P) for single bit error corection ? Requirement: 2P ≥ the minimum number of states to be distinguished.
83
Use of ECC support (2) The minimum number of states to be distinguished: It is needed to specify the bit position of a possible single bit error in the code word consisting of both data and check bits This requires D + P states one additional state to specify the „no error” state. the minimum number of states to be distinguished is: D + P + 1 Accordingly: to implement single bit error correction the minimum number of check bits (P) needs to satisfy the requirement: 2P ≥ D + P + 1
84
2.2.2.5 Use of ECC support (3) Double bit error detection
an additional parity bit is needed to check for an additional error. Then the minimum number of check-bits (CB) needed for SEC-DED is: CB = P + 1 P = CB - 1 since 2P ≥ D + P + 1 2CB-1 ≥ D + CB 2CB-1 ≥ D + CB Data bits (D) Check bits (CB) 1 2 3:2 3 7:4 4 15:8 5 31:16 6 63:32 7 127:64 8 255:128 9 511:256 10 Table 2.5: The number of check-bits (CB) needed for D data bits
85
Use of ECC support (4) Supported memory features of DT and DP/MP platforms DT memories typically do not support ECC or registered (buffered) DIMMs, Servers make typically use of registered DIMMs with ECC protection. 85
86
Use of ECC support (5) Typical implementation of ECC protected registered DIMMs (used typically in servers) Main components Two register chips, for buffering the address- and command lines A PLL (Phase Locked Loop) unit for deskewing clock distribution. ECC Register PLL Figure 2.14:Typical layout of a registered memory module with ECC [14] 86
87
2.2.2.6 Use of registering (1) 2.2.2.6 Use of registering
Problems arising while implementing higher memory capacities Higher memory capacities need more modules Higher loading the lines Signal integrity problems Buffering address and command lines, Phase locked clocking of the modules 87
88
2.2.2.6 Use of registering (2) Registering Principle
Buffering address and control lines to reduce signal loading in a memory channel in order to increase the number of supported DIMM slots (max. mem. capacity), needed first of all in servers,
89
Use of registering (3) Example: Block diagram of a registered DDR DIMM S D R A M PI74SSTV168 57 Register Address/Control form Motherboard Address Control from Motherboard PI6CV857 PLL Input Clock for Motherboard Data From / To Motherboard Figure 2.17: Example. Block diagram of a registered DDR DIMM [16] 89
90
2.2.2.6 Use of registering (4) Implementation of registering
By means of a register chip that buffers address and control lines R E G I S T REGE: Register enable signal Figure 2.15: Registered signals in case of an SDRAM memory module [15] Note: Data (DQ) and data strobe (DQS) signals are not registered as only address an control signals are common for all memory chips. 90
91
2.2.2.6 Use of registering (5) Number of register chips required
Synchronous memory modules (SDRAM to DDR3 DIMMs) have about 20 – 30 address and control lines, Register chips buffer usually 14 lines, Typically, two register chips are needed per memory module [16]. 91
92
2.2.2.6 Use of registering (6) Typical layout of registered DIMMs
Two register chips, for buffering the address- and command lines A PLL (Phase locked loop) unit for deskewing clock distribution. ECC Register PLL Figure 2.16:Typical layout of a registered memory module with ECC [14] 92
93
Use of registering (7) Example: Block diagram of a registered DDR DIMM S D R A M PI74SSTV168 57 Register Address/Control form Motherboard Address Control from Motherboard PI6CV857 PLL Input Clock for Motherboard Data From / To Motherboard Figure 2.17: Example. Block diagram of a registered DDR DIMM [16] 93
94
2.2.2.6 Use of registering (8) Registered DIMM module with ECC
Figure 2.18: Registered DIMM module with ECC [14] 94
95
Use of registering (9) Typical use of unregistered DIMMs (UDIMMs) in desktops/laptops (Memory capacities: up to a few GB) Typical use of registered DIMM (RDIMM) in servers (Memory capacities: a few tens of GB to a few hundreds of GB)
96
2.3. Buses interconnecting platform components
97
2.3 Buses interconnecting platform components (1)
Use of buses in Intel’s DT/DP and MP platforms Buses interconnecting processors (In NUMA topologies) Buses interconnecting processors to chipsets Buses interconnecting MCHs to ICHs (In 2-part chipsets) Xeon 6500 (Nehalem-EX) (Becton) Xeon E7-2800 (Westmere-EX) or SMB SMB SMB Nehalem-EX (8C) Westmere-EX (10C) QPI Nehalem-EX (8C) Westmere-EX (10C) SMB SMB SMB SMB SMB QPI QPI SMI links SMI links DDR3-1067 7500 IOH DDR3-1067 ME ESI SMI: Serial link between the processor and the SMB SMB: Scalable Memory Buffer with Parallel/serial conversion ICH10 Nehalem-EX aimed Boxboro-EX scalable DP server platform (for up to 10 cores) Remark Buses connecting the memory subsystem with the main body of the platforms are memory specific interfaces and will be discussed in Section 4.
98
2.3 Buses interconnecting platform components (2)
Implementation of buses used in Intel’s DT/DP and MP platforms Parallel/serial bus Parallel bus Serial bus (Point-to-point interconnection) 64-bit wide 8-bit wide 16-bit wide 4-bit wide (4 PCIe lanes) Used to interconnect processors to chipsets in previous platforms Used to interconnect MCHs to ICHs in previous platforms Used to interconnect processors to processors and processors to chipsets Used to interconnect processors to chipsets or MCHs to ICHs FSB (Front Side Bus) HI1.5 QPI (Quick Path Interconnect) QPI1.1 (Quick Path Interconnect v.1.1) DMI (Direct Media Interface) ESI (Enterprise System Interface) DMI2 (Direct Media Interface 2.G.)
99
2.3 Buses interconnecting platform components (3)
Buses used in Intel’s DT/DP/MP platforms Buses interconnecting processors (In NUMA topologies) Buses interconnecting processors to chipsets Buses interconnecting MCHs to ICHs (In 2-parts chipsets) FSB (64-bit: 1993) HI 1.5 (1999) 64-bit wide ~150 lines GB/s total in both directions 8-bit wide 16 lines 266 MB/s total in both directions Parallel bus Parallel/serial bus Low-cost systems High-performance systems QPI (2008) DMI/ESI (2008)2 QPI (2008) DMI/ESI (20041) 20 lanes 84 lines 9.6/11.72/12.8 GB/s in each direction 4 PCIe lanes 18 lines 1 GB/s/direction 20 lanes 84 lines 9.6/11.72/12.8 GB/s in each direction 4 PCIe lanes 18 lines 1 GB/s/direction Serial bus DMI2 (2011) QPI1.1 (2012?) DMI2 (2011) 4 PCIe lanes 18 lines 2 GB/s/direction Specification na. 4 PCIe lanes 18 lines 2 GB/s/direction
100
2.3 Buses interconnecting platform components (4)
Remarks 1 DMI: Introduced as an interface between the MCH and the ICH first along with the ICH6, supporting Pentium 4 Prescott processors, in 2004. 2 DMI: Introduced as an interface between the processors and the chipset first between Nehalem-EP and the 34xxPCH, in 2008, after the memory controllers were placed to the processor die.
101
2.3 Buses interconnecting platform components (5)
Signaling used in buses Signals Single ended Voltage referenced Differential t S+ S- VCM t t VREF Typ.voltage swings 3.3-5 V mV mV Signaling system used TTL (5 V) FPM/EDO SSTL SSTL2 (DDR) SSTL1.8 (DDR2) SSTL1.5 (DDR3) RSL (RDRAM) FSB LVDS PCIe QPI, DMI, ESI FB-DIMMs LVTTL (3.3 V) FPM/EDO SDRAM HI1.5 DRSL XDR (data) Smaller voltage swings LVDS: Low Voltage Differential Signaling LVTTL: Low Voltage TTL (D)RSL: (Differential) Rambus Signaling Level SSTL: Stub Series Terminated Logic VCM: Common Mode Voltage VREF: Reference Voltage Figure 2.4: Signal types used in MMs for control, address and data signals
102
2.3 Buses interconnecting platform components (6)
Main features of parallel buses used in Intel’s MC platforms FSB HI 1.5 Typical use Connecting the processors and the chipset Connecting MCH and ICH Introduced With the Pentium (1993) With the Pentium III (1999) Width 64 bit 8 bit Clock MHz 66 MHz DDR/QDR QDR since Pentium 4 (2000) QDR Transfer rate MT/s 266 MT/s Bandwidth GB/s in both directions altogether 266 MB/s Signaling Voltage referenced data signals Single-ended data signals No. of lines ~ 150 lines ~ 16 lines FSB/HI 1.5: Bus type interconnects
103
2.3 Buses interconnecting platform components (7)
Main features of serial buses used in Intel’s MC platforms DMI/ESI DMI2 QPI QPI 1.1 Typical use To interconnect MCHs and ICHs or processors to chipsets in NUMA platforms To interconnect processors in NUMA topologies or processors to chipsets Introduced In connection with 2. gen. Nehalem in 2008 with Sandy Bridge in 2011 In connection with Nehalem-EP in 2008 Sandy Bridge in 2012 (?) Width 4 PCI lanes 4 PCI2 lanes 20 lanes No specification available yet Clock 2.5 GHz 5 GHz 2.4/2.93/3.2 GHz DDR – Encoding 10bit/8bit no Bandwidth/ direction 1 GB/s 2 GB/s 9.6/11.72/12.8 GB/s Signaling LVDS No. of lines 18 lines 84 lines DMI/QPI: Point-to-point interconnection
104
2.3 Buses interconnecting platform components (8)
Comparing main features of Intel’s FSB and QPI [9] GTL+: A kind of voltage refenced signaling
105
2.3 Buses interconnecting platform components (9)
Principle of LVDS signal transmission used in serial buses Figure 2.5: LVDS Single Link Interface Circuit [10]
106
2.3 Buses interconnecting platform components (10)
PCIe package format (data frames) PCI Express Data Frame [10] The related fields are: Field Interpretation Frame 1-byte Start-of-Frame/End of Frame Seq# 2-byte Sequence Number Header 16- or 20-byte Header Data byte Data field CRC 4 byte ECRC (End-to-End CRC) + 4-byte LCRC (Link CRC) (CRC: Cyclic Redundancy Check)
107
2.3 Buses interconnecting platform components (11)
Principle of the QuickPath Interconnect bus (QPI bus) 16 data 2 protocol 2 CRC TX Unidirectional link RX Unidirectional link Figure 2.6: Signals of the QuickPath Interconnect bus (QPI-bus) [11]
108
5. References
109
5. References (1) [1]: Wikipedia: Centrino, [2]: Industry Uniting Around Intel Server Architecture; Platform Initiatives Complement Strong Intel IA-32 and IA-64 Targeted Processor Roadmap for 1999, Business Wire, Febr , +Architecture%3B+Platform...-a [3]: Intel Core 2 Duo Processor, [4]: Keutzer K., Malik S., Newton R., Rabaey J., Sangiovanni-Vincentelli A., System Level Design: Orthogonalization of Concerns and Platform-Based Design, IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, Dec. 2000, pp [5]: Krazit T., Intel Sheds Light on 2005 Desktop Strategy, IDG News Service, Dec , [6]: Perich D., Intel Volume platforms Technology Leadership, Presentation at HP World 2004, [7] Powerful New Intel Server Platforms Feature Array Of Enterprise-Class Innovations. Intel’s Press release, Aug. 2, 2004 , [8]: Smith S., Multi-Core Briefing, IDF Spring 2005, San Francisco, Press presentation, March , [9]: An Introduction to the Intel QuickPath Interconnect, Jan. 2009, content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf [10]: Davis L. PCI Express Bus,
110
5. References (2) [11]: Ng P. K., “High End Desktop Platform Design Overview for the Next Generation Intel Microarchitecture (Nehalem) Processor,” IDF Taipei, TDPS001, 2008, Taipei_TDPS001_100.pdf [12]: Computing DRAM, Samsung.com, /products/dram/Products_ComputingDRAM.html [13]: Samsung’s Green DDR3 – Solution 3, 20nm class 1.35V, Sept. 2011, Documents/downloads/green_ddr3_2011.pdf [14]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page , Jan. 2002, [15]: Datasheet, SD9C16_32x72.pdf [16]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, [17]: Fisher S., “Technical Overview of the 45 nm Next Generation Intel Core Microarchitecture (Penryn),” IDF 2007, ITPS001, [18]: Razin A., Core, Nehalem, Gesher. Intel: New Architecture Every Two Years, Xbit Laboratories, 04/28/2006, [19]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazine, March 2005, pp. 1-7
111
5. References (3) [20]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, [21]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,” Febr. 18, 2004, Intel Developer Forum, [22]: Ganesh B., Jaleel A., Wang D., Jacob B., Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, 2007, [23]: DRAM Pricing – A White Paper, Tachyon Semiconductors,
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.