Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dezső Sima September 2008 (Ver. 1.0)  Sima Dezső, 2008 1. Macroarchitecture and performance parameters of MMs.

Similar presentations


Presentation on theme: "Dezső Sima September 2008 (Ver. 1.0)  Sima Dezső, 2008 1. Macroarchitecture and performance parameters of MMs."— Presentation transcript:

1 Dezső Sima September 2008 (Ver. 1.0)  Sima Dezső, 2008 1. Macroarchitecture and performance parameters of MMs

2 Overview 1. Introduction 2. Macroarchitecture of main memories 3. Key performance parameters of main memories 4. References

3 General purpose main memories, i.e. main memories used in desktops, servers and laptops 1. Introduction (1) Scope

4 Figure: Main memories on motherboards Server [77] Desktop [32] 1. Introduction (2)

5 1. Introduction (3) Figure: Different kinds of memory modules

6 Layout of main memories Macroarchitecture of the main memory Layout of the memory modules Figure: Main dimensions of the layout of main memories 1. Introduction (4)

7 2. Macroarchitecture of main memories

8 2.1 Introduction 2.4 Number of memory controllers 2.3 Point of attachment 2.5 Number of memory channels 2.6 Attributes of memory channels 2.2 Attachment policy

9 L2 contr. Core L2 FSB c. FSB North Bridge Mem. channel Mem. modules L2 FSB c. FSB North Bridge Memory Macroarchitecture of main memories Example 1 Memory L2 contr. Core Processor Figure: Single channel main memory attached via the FSB and the north bridge 2.1. Introduction (1)

10 L2 contr. Core L2 Core FSB c. FSB North Bridge Mem. channels Mem. modules L2 contr. Core L2 Core FSB c. FSB North Bridge Memory Processor Figure: Dual channel main memory attached via the FSB and the north bridge Example 2 2.1. Introduction (2)

11 IN (Xbar) B. c. M. c. IO-bus Core L2 Memory B. c. M. c. IO-bus Mem. channel Mem. modules Memory Processor IN (Xbar) Core L2 Figure: Single channel main memory attached via a dedicated memory controller Example 3 2.1. Introduction (3)

12 IN (Xbar) Syst. Req. Queue B. c. M. c. IO-bus Core L2 Memory IN (Xbar) Syst. Req. Queue B. c. M. c. IO-bus Core L2 Mem. channels Mem. modules Memory Processor Figure: Dual channel main memory attached via a dedicated memory controller Example 4 2.1. Introduction (4)

13 Macroarchitecture of main memories No. of mem. contr.s (in case of direct attachment) No. of mem. channels Attachment policyPoint of attachment Figure: Main dimensions of the macroarchitecture of main memories Attributes of mem. channels 2.1. Introduction (5)

14 Attachment policy Direct attachment Indirect attachment POWER4 (2001) UltraSPARC IV+ (2005) POWER5 (2005) Montecito (2006) UltraSPARC T1 (2005) UltraSPARC IV (2004) Athlon 64 X2 line (2005) PA-8800 (2004) PA-8900 (2005) Core Duo line (2006) Longer access times (~20-30%), Independency of memory technology and speed Shorter access times (~20-30%), Dependency of memory technology and speed POWER6 (2007) Figure: Attachment policy 2.2. Attachment policy (1) Attachment via the FSB and north bridge (mem. control hub) Attachment via mem. controller(s) Opteron line (2003) Barcelona (2007) Cell BE (2006)

15 L2 contr. Core L2 Core FSB c. FSB Core Duo (2006) Core 2 Duo (2006) IN (Xbar) System Request Queue B. c. M. c. HT-bus Athlon 64 X2 (2005) North Bridge Memory Figure:Indirect attachment of the main memory to the syst. architecture Figure: Direct attachment of the main memory to the syst. architecture Core L2 2.2. Attachment policy (2)

16 The highest cache level (via an IN) The point of attachment Between the two highest cache levels (via the IN connecting these levels) 2-level caches: 3-level caches: 2-level caches: 3-level caches: The IN connecting the L2 cache The IN connecting the L3 cache The IN connecting the L1 and L2 caches The IN connecting the L2 and L3caches The M. c is connected usually in this way if the highest cache level is exclusive. The M. c is connected usually in this way if the highest cache level is inclusive. L3 IN L3 M IN L3 L2 M IN L2 M IN1 L2 C C M Figure: Possible points of attachment of main memory to the system architecture 2.3. Point of attachment (1)

17 Data missing in L2/L3 (high traffic) L2 M.c. Replaced lines Replaced, modified data (low traffic) Lines missing in L2 are reloaded and deleted from L3 L3 Memory L2 IN L2 L3 M.c. L3 M.c. Memory Montecito (2006) POWER4 (2001) UltraSPARC IV+ (2004) POWER5 (2004) Interrelationsship between inclusion policy of L3 caches and point of attachment Memory L3 L2 Inclusive L3Exclusive L3 2.3. Point of attachment (2)

18 2.3. Point of attachment (3) Core L2 IL2 D L3 Core L2 IL2 D L3 FSB c. FSB Montecito (2006) L2 contr. Core L2 Core FSB c. FSB Athlon 64 X2 (2005)Core 2 Duo (2006) In case of a two-level cache hierarchyIn case of a three-level cache hierarchy IN (Xbar) Memory System Request Queue B. c. M. c. HT-bus L2 Core Figure: Examples for attaching memory via the highest cache level

19 2.3. Point of attachment (4) UltraSPARC T1 (2005)UltraSPARC IV+ (2005) In case of a two-level cache hierarchyIn case of a three-level cache hierarchy (exclusive L3) L2 M. c. B. c. L2 Core 7 M. c. Core 0 X b a r Memory JBus Core L3 tags/contr. L3 data Interconn. network M. c. Memory B. c. Fire Plane bus Core L2 Figure: Examples for attaching memory via the interconnection network connecting the two highest cache levels

20 Number ofmemory controllers (in case of direct attachment) Dual memory controllers Single memory controller Usual implementations POWER6 (2007) Figure: Number of memory controllers (in case of direct attachment) UltraSPARC T2 (2007) Quad memory controllers 2.4. Number of memory controllers (1) Barcelona (2007) E.g. POWER5 (2004) K8-based processors (2006) A few recent designs Typ. use Exceptional designs UltraSPARC T1 (2005)

21 Figure: Block diagrams of the POWER5 and POWER6 processors [57] 2.4. Number of memory controllers (2)

22 Figure: Block diagrams of AMD’s K8 and Barcelona processors [58] 2.4. Number of memory controllers (3)

23 Figure: Block diagram of the UltraSPARC 2 (Niagara-2) [59] 2.4. Number of memory controllers (4)

24 Number of memory channels (per north bridge/memory controller) Dual memory channels Single memory channel Quad memory channels E.g. Intel’s 845/848 chipset families for P4 desktops and earlier desktop chipsets Intel’s 865 and higher chipset families for P4 desktops, Intel’s P4 based DP server chipsets Intel’s 5000 (Bensley) and 7000 Caneland platforms for Core Duo DC and MC processors Figure: Number of memory channels supported per north bridge/memory controller 2.5. Number of memory channels (1) Typ. useEarly desktopsRecent desktops, single core DP/MP servers Recent DC and QC DP/MP servers with FB DIMM memory Cell BE

25 Figure: Block diagram of an early P4 desktop having a single memory channel (Intel 845 chipset) [49] 2.5. Number of memory channels (2) Example 1

26 Figure: Block diagram of a more advanced P4 desktop including dual memory channels (Intel’s 975 chipset) [50] 2.5. Number of memory channels (3) Example 2

27 Figure: Block diagram of an early P4-based DP server including dual memory channels (Supermicro’s E7520 chipset based X6DH8-G2/X6DHE-G2 motherboard) [51] Example 3 2.5. Number of memory channels (4)

28 Memory Interface Controller (MIC) Dual XDR TM memory channels Interleaved adressing in the channels The MIC can be configured to support only a single channel ECC support (32 + 4 bits) 2.5. Number of memory channels (5) Dual 36 bits wide XDR channels Figure: Basic blocks of the Cell BE processor [60] 3.2 Gb/s x 2 x 4 B = 25.6GB/s Memory bandwidth at 3.2 Gb/s transfer rate:

29 2.5. Number of memory channels (6) Remark In dual channel configurations (or in general, in case of multiple memory channels) a scheme is needed to define the allocation of memory addresses to the individual channels. Allocation of addresses to the individual channels Asymmetric mode Interleaved mode Addresses are allocated alternating to the channels at 64 B boundaries, assuming 64 B long cache lines. Two consecutive cache lines can be retrieved simultaneously. Both memory channels must be populated with modules having the same size (e.g. 1 GB). Provides maximum performance in real applications. Addresses start in the first channel and are allocated to this channel until the highest rank of this channel. Then addresses continue in the second channnel. No need to populate both channels, or populate them with the same size. In real applications, performance is limited to single channel performance. Figure: Address allocation alternatives to the individual channels

30 5000 (Dempsey, Netburst), DC 5100 (Woodcrest, Core 2), DC 5300 (Clowertown, Core 2), QC 2.5. Number of memory channels (7) FB-DIMM up to 64 GB Xeon In workstations the snoop filter eliminates snoop traffic to the graphics port 5000 (Blackford) Figure: Block diagram of Intel’s 5000 (Bensley) DP platform for DC/QC Core 2 Duo processors including quad memory channels [52] Example 4

31 FB-DIMM up to 512 GB 7200 (Tigerton DC, Core2), DC Xeon 7300 (Tigerton QC, Core2), QC 2.5. Number of memory channels (8) Figure: Block diagram of Intel’s 7300 (Bensley) MP platform for DC/QC Core 2 Duo processors including quad memory channels [53] Example 5

32 Figure: Maximum supported FB-DIMM configuration [54] (6 channels/8 DIMMs) Remark The FBI technology supports even 6 memory channels with 8 DIMMs each [54], nevertheless actual implementations support typically only four DIMMs. 2.5. Number of memory channels (9)

33 Attributes of memory channels Supported type of mem. modules Supported no. of mem. modules Supported no. of ranks per mem. module Supported attributes of DRAM devices Figures: Attributes of memory channels 2.6. Attributes of memory channels (1)

34 Suported type of memory modules Memory modules of different DRAM types Memory modules of the same DRAM type In order to provide a choice and evolution path in times of memory technology transfers (e.g. while DDR2 technology replaces DDR technology) DRAM type B DRAM type A Usual implementation E.g. DDR DDR2 Figure: Type of memory modules supported on the memory channel(s) 2.6. Attributes of memory channels (2)

35 Example Intel’s 915P/G chipsets support dual memory channels with either DDR or DDR2 technologies. Per channel a single memory module is supported (with one or two memory ranks on each). Accordingly, a mainboard based on the 915G chipset, such as MSI’s 915G Combo mainboard, is a designated as a combo mainboard. 2.6. Attributes of memory channels (3) Note: Motherboards allowing to choose from two different DRAM types are termed Combo boards.

36 Figure: MSI’s 915G Combo motherboard (based on Intel’s 915G chipset) [61] North bridge of the 915G chipset 4 DIMM slots 2.6. Attributes of memory channels (4)

37 Figure: DIMM slots of the MSI’s 915G Combo motherboard [61] DDR2 DDR Two DDR or DDR2 channels with a single DIMM slot on each channel 2.6. Attributes of memory channels (5)

38 Supported number of memory modules It depends on the DRAM connection technology DRAM speed Number of ranks mounted onto the memory module(s). 2.6. Attributes of memory channels (6)

39 The maximum number of supported memory modules depends heavily on the memory connection technology, that is whether the modules are connected via a parallel bus (as in case of SDRAM, DDR, DDR2, DDR3 modules) or via a serial bus (like in case of FBDIMM modules). Number of memory modules supported per memory channel 1-4 memory modules 6-8 memory modules Modules connected via a parallel bus Modules connected via a serial bus E.g.SDRAM, DDR, DDR2, DDR3 modules FBDIMM modules Figure: Number of memory modules vs memory connection technology in synchronous DRAMs 2.6. Attributes of memory channels (7) Dependency on the memory connection technology

40 Remarks 1. Early chipsets supporting low speed 1 or 4 Byte wide asynchronous DRAMs often allowed 4 – 8 memory modules to attach. 2.6. Attributes of memory channels (8) 2. The Pentium processor provided a 64-bit wide datapath. So early (430 family) chipsets supported typically two pairs of 32-bit wide FPM/EDO modules.

41 skews jitter and reflections (caused by impedance mismatch while terminating transmission lines) Higher transfer rates limit the number of memory modules that can be supported on a memory channel. 2.6. Attributes of memory channels (9) For higher transfer rates Obviously, the more memory modules are present on a channel the serious signal integrity problems arise. impede more and more signal integrity. Dependency on the memory speed

42 Figure: Scaling down the number of supported DIMMs per channel with increasing data rates (assuming two ranks per DIMM) [62] 2.6. Attributes of memory channels (10)

43 Figure: Scaling down the number of PCI-X slots with increasing PCI-X bus speed [55] 2.6. Attributes of memory channels (11)

44 But increasing server performance doubles memory capacity demand about every two years [66] increasing device densities but decreasing number of modules supported for higher transfer rates by memory channels, Figure: Channel capacity of synchronous SDRAMs vs memory capacity demand [66] With the maximum memory capacity per memory channel remains roughly the same for synchronous SDRAM devices [66]. 2.6. Attributes of memory channels (12) Levelling off channel capacity for synchronous DRAMs

45 2.6. Attributes of memory channels (13) Increasing server capacity demand calls for memory technologies with higher capacity potential, such as DRAM technologies with serial bus connection, like FB-DIMM.

46 Dependency on the number of ranks mounted onto the memory modules Dual memory ranks mounted on the memory modules result in higher bus loading, and may reduce the maximum number of supported memory slots. E.g. the north bridge of Intel’ 815 chipset supports at 133 MHz memory speed up to three SDRAM DIMMs with just a single rank or up to two SDRAM DIMMs with dual ranks. 2.6. Attributes of memory channels (14)

47 Number of memory modules supported per memory channel 1-2 memory modules 6-8 memory modules Figure: Number of memory modules supported per memory channel by Intel’s P4/Core 2 Duo north bridges Desktops/ entry level servers Typical use 2.6. Attributes of memory channels (15) DP/MP servers with FBDIMM mem. modules

48 2.6. Attributes of memory channels (16) Figure: Example 1. P4 based desktop motherboard (MSI’s 915G Combo motherboard with Intel’s 915G chipset) [61] 4 DIMM slots Two DDR or DDR2 channels with a single DIMM slot on each channel DDR2 DDR

49 Figure: Example 2. P4-based entry-level DP server motherboard (Supermicro’s P8SCT with Intel’s E7221 chipset) [63] CPU MCH (E7221) 2.6. Attributes of memory channels (17) Two DDR2 channels with two DIMM slots on each channel Ch. ACh. B4 DIMM slots

50 Figure: Example 3. Block diagram of a Core 2 based four-processor MP server (Supermicro’s X7QC3 with Intel’s 7300 North bridge) [64] 2.6. Attributes of memory channels (18) 4 DDR2 FB-DIMM channels 6 DIMM slots on each channel

51 192 GB ATI ES1000 Graphics with 32MB video memory 7200 DC 7300 QC (Tigerton) Xeon SBE2 SB 7300 NB 2.6. Attributes of memory channels (19) Figure: Example 3. Core 2 based four-processor MP server motherboard (Supermicro’s X7QC3 with Inte’s 7300 North bridge) [64] 4 DDR2 FB-DIMM channels 6 DIMM slots on each channel

52 Figure: Example 4. Block diagram of Intel’s Core 2 based 7300 (Caneland) MP platform with the 7300 (Clarksboro) chipset (9/2007) [65] up to 512 GB 7200 (Tigerton DC, Core2), DC Xeon 7300 (Tigerton QC, Core2), QC 2.6. Attributes of memory channels (20) Four DDR2 FB-DIMM channels with 8 DIMM slots on each channel

53 Rank: logical unit A rank consists of a set of DRAM devices (of a given width) that are needed to achieve the expected data width of the memory module. E.g. a 64-bit wide rank consists of 8 8-bit wide or 4 16-bit wide DRAM devices. DRAM devices constituting a rank are mounted side by side onto a memory module. Optionally, a rank may include an additional DRAM device to hold ECC bits. All devices of a rank share the address and the command bus. All devices of a rank are selected by the same CS (Chip Select) signal, whereas different ranks have different CS signals. A memory rank is sometimes designated also as a row. 2.6. Attributes of memory channels (21) Supported number of ranks per memory module Memory module: physical unit A rank covers usually one side of the memory module (using x8 or x16 devices, but 64-bit wide ranks built up of x4 devices (16 devices) cover typically both sides.

54 2.6. Attributes of memory channels (22) Figure: Connecting ranks to the memory controller [68]

55 A memory module may contain a single rank on one of its sides a single rank on both of its sides two ranks, each one of its sides A memory module is basically a PC card that carries one or more ranks, and fits into a memory slot of the motherboard. Memory modules may be populated either on one side or on both sides. Memory module: physical unit 2.6. Attributes of memory channels (23)

56 Figure: Example 1: One 64-bit wide DDR3 SO-DIMM rank consisting of 4 16-bit DRAM devices, that are mounted on one side of the module [67] 2.6. Attributes of memory channels (24)

57 Figure: Example 2: One 64-bit wide DDR3 SO-DIMM rank consisting of 8 8-bit DRAM devices, that are mounted on both sides of the module [67] 2.6. Attributes of memory channels (25)

58 Figure: Example 3. Two 64-bit wide DDR3 SO-DIMM ranks, each consisting of 4 16-bit DRAM devices, that are mounted on both sides of the module [67] 2.6. Attributes of memory channels (26)

59 Supported number of ranks per memory module Dual ranks are supported per mem. module A single rank is supported per mem. module Figure: Supported number of ranks (rows) per memory module 2.6. Attributes of memory channels (27) Typical implementation In few cases, usually as a restriction for higher DRAM speeds Examples a) The north bridge of Intel’s 815 chipset supports up to three SDRAM-133 DIMMs with just a single rank or up to two SDRAM-133 DIMMs with dual ranks. up to three SDRAM-100 DIMMs with dual ranks or b) The north bridge of Intel’s P35 chipset for Core 2 Duo processors supports up to two DDR2-800/667 or DDR3 1066/800 DIMMs with dual ranks

60 Supported attributes of DRAM devices DRAM width DRAM density DRAM speed Figure: Supported attributes of DRAM devices 2.6. Attributes of memory channels (28) DRAM type

61 2.6. Attributes of memory channels (29) DRAM (1970) FBDIMM (2006) DRDRAM (1999) DDR3 (2007) DDR2 (2004) DDR (2000) SDRAM (1996) FPM (1983) FP (~1974) XDR (2006) 1 Year of intro. Asynchronous DRAMs Synchronous DRAMs DRAMs with parallel bus connection DRAMs with serial bus connection DRAM types ( for general use) 1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers Main stream DRAM types Challenging DRAM types Figure: DRAM types for general use EDO (1995) (Described in Sections 4, 5, 6 of the Chapter DRAM devices)

62 DRAM width Most recent north bridges/memory controllers support x8 and x16 DRAM devices. DRAM density DRAM speed North bridges/memory controllers specify the width of supported DRAM devices. North bridges/memory controllers specify supported DRAM densities. Example 1 Also north bridges/memory controllers specify supported DRAM speeds. The north bridge of Intel’s 815 chipsets for Pentium 4 processors supports SDRAM devices with 16Mb/64Mb/128Mb/256Mb densities 2.6. Attributes of memory channels (30) Example 2 The north bridge of Intel’s Series 3 chipset family for Core Duo and Core Quad processors supports DDR2 and DDR3 devices with 512Mb and 1Gb densities..

63 5/0210/02 845GL 845GV 845G845E 845GE 400 MHz 533/400 MHz 10/02 845xx family (Brookdale) Single channel SDR/DDR SDRAM 5/02 FSB HT not supportedHT supported 845 5/02 10/02 845PE PC133, DDR 266/200 DDR 333/266 9/01 1/02 PC133 DDR 266/200 PC133, DDR 266/200 (unbuffered) HT support DRAM speed Features Memory MCH/GMCH Max. memory 2 GB 11/01 845 Example: Supported DRAM speeds of the north bridges of Intel’s 845xx family of chipsets. Another example: The north bridge of Intel’s Series 3 chipsets for Core 2 Duo and Core 2 Quad processors support DDR2 devices with 667/800 MT/s or DDR3 devices with 800/1066 MT/s transfer rate. 2.6. Attributes of memory channels (31)

64 3. Key performance parameters of main memories

65 3.1 Memory capacity 3.3 Memory latency 3.2 Memory bandwidth

66 Memory capacity (CM) CM = n CU x n CH x n M x n R x C D n M : No. of memory modules per channel n CU : No. of north bridges/memory control units n CH : No. of memory channels per north bridge/control unit C R : Rank capacity (device density x no. of DRAM devices) with n R : No. of ranks per memory module E.g. The Core 2 based P35 chipset supports up to two memory channels with up to two dual-ranked memory modules per channel, with 8 x8 devices of 512 Mb or 1 Gb density per rank. The resulting maximum memory capacity is: CM max = 1 x 2 x 2 x 2 x 1 Gb x8 = 8 GB 3.1. Memory capacity (1)

67 3.1. Memory capacity (2) Crucial factors limiting the maximum capacity of main memories n M : No. of memory modules supported per memory channel C R : Rank capacity (device density x no. of DRAM devices/rank).

68 Number of memory modules supported per memory channel 1-4 memory modules 6-8 memory modules Modules connected via a parallel bus Modules connected via a serial bus SDRAM, DDR, DDR2, DDR3 modules FBDIMM modules Higher transfer rates limit the number of mem. modules typically to one or two. Figure: Number of memory modules supported by memory channel E.g. 3.1. Memory capacity (3)

69 Rank capacity (C R ) C R = n D x D with n D : Number of DRAM devices/rank D: Device density Number of DRAM devices/ rank E.g. A one-sided (single rank) DDR3 memory module built up of 8 devices Typically: up to 8 3.1. Memory capacity (4)

70 Figure: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [35]) 3.1. Memory capacity (5) 256M 64K 16M 1G 4M 256K 64M 1M 20151980198519901995200020052010 500 1000 1500 2000 16K Units 10 6 Year Density: ~4×/4Y Device density

71 Typical maximum main memory sizes(CMmax) of recent Core 2 based desktops Ranks include typically up to 8 DRAM devices. 2 memory channels 1 modules per channel dual ranked modules populated with 8 x8 DDR2 or DDR3 devices of 1 Gb density: C Mmax = 1 x 2 x 1 x 2 x 1 = 8 GB 4 memory channels 6 modules per channel dual ranked modules populated with 8 x8 FB-DIMM DDR2 devices of 4 Gb density: C Mmax = 1 x 4 x 6 x 2 x 4 = 192 GB Typical maximum main memory sizes of recent Core 2 based servers, assuming: 3.1. Memory capacity (6) assuming:

72 For the same number of control units/modules/ranks The rate of increasing DRAM densities In accordance with Moore’s law (saying that the transistor count per chip is doubling about every 24 month DRAM densities evolve about 4 x/ 4 years. the maximum size of main memories increases also about 4 x/4 years. 3.1. Memory capacity (7)

73 Bandwidth of memory systems Total bandwidth (BW) provided by a memory system: BW = n CU x n CH x T x W M T: Transfer rate of the module (no. of data transfers/sec) n CU : No. of north bridges/memory control units n CH : No. of memory channels per north bridge/control unit W M : Data width of the memory modules E.g. A memory system with a single, dual channel controller and 8 Byte wide DDR2 800 modules provides a total bandwidth of: BW = 1 x 2 x 800 x 8 MB/s = 12.8 GB/s Processors with increasing number of cores require obviously, increasingly higher memory bandwidth. 3.2. Memory bandwidth (8) with

74 Figure: The interpretation of t CCD [36] 3.2. Memory bandwidth (10) The min. column cycle time (t CCD ) of the memory cell array t CCD (Core column delay) is the min. time interval between consecutive Reads or Writes. Remark t CCD is designated also as the Read/Write command to Read/Write command delay

75 Figure: The evolution of the column cycle time (t CCD ) in different SDRAM types (ns) [37] 3.2. Memory bandwidth (11) ns Note: The min. column cycle time (t CCD ) of synchronous DRAMs is: SDRAM: 7.5 ns DDR/2/3 5 ns

76 The crucial factor limiting the memory bandwidth of the main memory: Transfer rate of the memory module (no. of data transfers/sec) The transfer rate of the memory module (T) equals the transfer rate of the DRAM devices used. T max = 1/t CCD x FW with t CCD : Min. column cycle time of the memory cell array FW: Fetch width of the memory cell array 3.2. Memory bandwidth (9) The peak transfer rate (T max ) of synchronous DRAM devices:

77 specifies how many times more bits the cell array fetches per column cycle then the data width of the device. E.g. an x4 DRAM chip with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array per column cycle. The fetch width (FW) of the memory cell array of synchronous DRAMs is: SDRAM: 1 DDR: 2 DDR2: 4 DDR3: 8 DRAM type FW 3.2. Memory bandwidth (12) The fetch width (FW) of the memory cell array

78 SDRAM: 1/7.5 x 1 = 133 MT/s DDR: 1/5 X 2 = 400 MT/s DDR2: 1/5 x 4 = 800 MT/s DDR3: 1/5 x 8 = 1600 MT/s The peak transfer rates of the different DRAM technologies are: T max = 1/t CCD x FW 3.2. Memory bandwidth (13)

79 3.2. Memory bandwidth (14) Transfer rate (MT/s) 50 100 500 Year 0305969798992000010204060708 * * * * * * * * 20 * 1000 SDRAM 66 5000 200 2000 10 ~ 10*/10years DDR 266 DDR2 533 SDRAM 100 DDR3 1067 DDR2 667 DDR2 800 DDR 333 SDRAM 133 * DDR 400 Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets

80 Peak transfer rates evolve by ≈ 10x/10 years, that means doubling in 3-4 years Sources of the evolution the introduction of new syncronous DRAM technologies (SDRAM/DDR/DDR2/DDR3) The evolution of peak transfer rates of synchronous DRAMs 3.2. Memory bandwidth (15) More specifically the more and more advanced approaches to improve first of all signaling (by using SSTL_2/1.8/1.5, differential CK/DQS) synchronisation (by using source synchronisation, DLLs to align CK with DQs etc.) and line terminations (by using ODT, dynamic ODT, ZQ calibration etc.)

81 The evolution of processor clock frequencies vs transfer rates of main memories in mainstream processors 3.2. Memory bandwidth (16)

82 Figure: Evolution of clock frequencies in Intel’s desktop processors The evolution of processor clock frequencies (f C ) in desktops 3.2. Memory bandwidth (17)

83 3.2. Memory bandwidth (21) Transfer rate (MT/s) 50 100 500 Year 0305969798992000010204060708 * * * * * * * * 20 * 1000 SDRAM 66 5000 200 2000 10 ~ 10*/10years DDR 266 DDR2 533 SDRAM 100 DDR3 1067 DDR2 667 DDR2 800 DDR 333 SDRAM 133 * DDR 400 Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets

84 Figure: Evolution of clock frequencies in Intel’s desktop processors The evolution of processor clock frequencies (f C ) in desktops between 1995-2003 3.2. Memory bandwidth (19)

85 clock frequencies arose by a rate of ≈ 100x/10 years transfer rates of main memories only by a rate of ≈ 10x/10 years. In the time period of about 1995 - 2003 3.2. Memory bandwidth (20)

86 clock frequencies arose by a rate of ≈ 100x/10 years transfer rates of main memories only by a rate of ≈ 10x/10 years. In the time period of about 1995 - 2003 a strong motivation arose to increase the bandwidth of main memories by increasing the width of the datapath to the main memory, first of all by introducing dual memory channels. Dual memory channels became the commonplace even in desktops. the gap between clock frequencies and memory transfer rates became continuously wider. In this time period higher clock rates were the main source for higher proc. performance, but higher processor performance invokes higher memory traffic 3.2. Memory bandwidth (22) In this time period

87 Figure: Evolution of clock frequencies in Intel’s desktop processors after about 2003 The evolution of processor clock frequencies (f C ) in desktops after about 2003 3.2. Memory bandwidth (24)

88 After about 2003 however, clock frequencies became saturated (due to meeting the thermal wall), and single core processors represented the mainline until about 2005. the gap between clock frequencies and memory transfer rates became narrover. Nevertheless, beginning with ~ 2005 the era of multicores emerged with doubling the core count about every two years. A new scenario becomes dominant with steadily increasing bandwidth/transfer rate requirements. 3.2. Memory bandwidth (23) In the time period of about 2003 - 2005 Beginning with about 2005

89 The status quo in increasing bandwidth/transfer rates 3.2. Memory bandwidth (25)

90 . double data rate SDRAM migration 3.2. Memory bandwidth (26) Figure: Evolution of the bandwidth of dual-channel synchronous DRAM memory systems [56]

91 Figure: Evolution of transfer rates (per pin bandwidth figures) of different DRAM types [40] 3.2. Memory bandwidth (27)

92 Device level memory latency System level memory latency 3.3. Memory latency (1) Memory latency

93 Figure: Estimated maximum and minimum read latencies of DRAM devices (ns) 3.3. Memory latency (2) 1 Read latency of DRAM, FPM, EDO and BEDO parts = t RAC (Row access time (time from row address until data valid)) Read latency of SDRAM parts = CL + t RCD (CAS Latency + Row to Column delay) 2 The 815 chipset supports SDRAMs while the 820 RDRAMs 3 A new revision of the 845 supports DDRs instead of SDRAMs 486 DXP PII PIII386 DX 86 88818283 84 85 87899091 92 939495 96 979899 200 180 160 140 120 100 80 60 40 20 2000 * PC AT * * * * * * * * * * 64 K 256 K 64 M Year processor Chipset Typ. DRAM chips (bits) (ns) FPM 4 M 1 M 16 M128 M 64 M 16 M 64 M 256 M 200 150 100 70 80 60 70 50 60 50 35 FPM EDO SDRAM EDO SDRAM RDRAM 64 K 01 02 030405 06 07 FPM 64 K P4 128 M 256 M SDRAM Core2 512 M 1 G 2 G DDR2 * * * * * * 30 25 40 24 22 256 K 256 M 512 M 1 G DDR DDR2 DDR3 DDR2 40 * Desktop DRAM type Read latency 1 512 M 1 G 835 865 915 845 256 M 512 M 1 G 845 3 512 M RDRAM 128 M 256 M 815 2 820 2 850 FPM EDO SDRAM 4 M 256 K FPM 1 M 440ZX 430VX 430FX 420TX 430LX 16 M 4 M 256 K * 100 80 *

94 Figure : Estimated typical system-level memory latency in x86-based PCs (in ns) 486 DXPPPro PIIPIII 386 DX PC AT (286) (8088) P4 Memory latency ns 300 200 100 * * * * * 155 135 140 120 210 * 200 86 88818283 84 85 8789199091 92 939495 96 979899 2000 Year 01 02 030405 06 0708 * 160 * 110 * 85 * 70 50 Core2 processor Chipset Typ. DRAM parts (bits) Desktop DRAM type 16 K DRAM 64 K DRAM 64 K 128 K 256 K 1 M DRAM FPM DRAM FPM 256 K FPM 4 M 1 M 256 K FPM 1 M 420TX 430LX 16 M 64 M EDO FPM EDO FPM SDRAM 4 M 430VX 430FX 16 M 4 M 64 M 128 M 16 M 64 M 256 M EDO SDRAM RDRAM SDRAM 64 M 128 M 256 M SDRAM DDR 845 256 M 512 M 1 G 845 3 512 M RDRAM 128 M 256 M 815 2 820 2 850 440ZX 512 M 1 G 2 G DDR2 256 M 512 M 1 G DDR DDR2 DDR3 DDR2 512 M 1 G 835 865 915 RDRAM 3.3. Memory latency (3)

95 Figure 5.1c: System-level memory latencies in x86-based PCs (in proc. clock cycles) 486 DXPPPro PIIPIII 386 DX PC AT (286) (8088) P4 Core2 processor Chipset Typ. DRAM parts (bits) Desktop DRAM type 16 K DRAM 64 K DRAM 64 K 128 K 256 K 1 M DRAM FPM DRAM FPM 256 K FPM 4 M 1 M 256 K FPM 1 M 420TX 430LX 16 M 64 M EDO FPM EDO FPM SDRAM 4 M 430VX 430FX 16 M 4 M 64 M 128 M 16 M 64 M 256 M EDO SDRAM RDRAM SDRAM 64 M 128 M 256 M SDRAM DDR 845 256 M 512 M 1 G 845 3 512 M RDRAM 128 M 256 M 815 2 820 2 850 440ZX 512 M 1 G 2 G DDR2 256 M 512 M 1 G DDR DDR2 DDR3 DDR2 512 M 1 G 835 865 915 Memory latency in proc. cycles 86 88818283 84 85 878919909192939495 96 979899 100 10 1 2000 Year 50 1000 30 20 500 200 2 3 5 * * * 10 40 85 300 * * * 1 1 3 0102030405 06 0708 * * * * 240 220 280 180 RDRAM 3.3. Memory latency (4)

96 [1]: 64MB Apple G3 Beige 168p SDRAM DIMM, http://www.memoryx.net/apl168s64.html [2]: 4, 8 MEG x 32 DRAM SIMMs, Micron, http://www.pjrc.com/mp3/simm/datasheet.html [3]: 168 Pin, PC133 SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.2 [4]: 184 Pin Unbuffered DDR SDRAM DIMM Family, JEDEC Standard No. 21-C, Page 4.5.10 [5]: Direct Rambus DRAMM RIMM Module, 512 MB, MC-4R512FKE6D, Elpida, http://pdf1.alldatasheet.com/datasheet-pdf/view/60081/ELPIDA/MC-4R512FKE6D.html [6]: DDR2 SDRAM UDIMM Features, Micron, http://www.micron.com/products/modules/udimm/partlist [7]: DDR3 SDRAM UDIMM Features, Micron, http://www.micron.com/products/modules/udimm/partlist [8]: DDR2 SDRAM FBDIMM Features, Micron, http://www.micron.com/products/modules/fbdimm/partlist [9]: Torres G., „Memory Tutorial”, July 19, 2005, Hardwaresecrets, http://www.hardwaresecrets.com/article/167/1 [10]: Besedin D., „First look at DDR3”, Digit-life, June 29, 2007, http://www.digit-life.com/articles2/mainboard/ddr3-rmma.html 4. References (1)

97 [11]: http://www.hardwaresecrets.com/fullimage.php?image=2862 4. References (2) [12]: http://cgi.ebay.com/Vintage-Microsoft-8-Bit-ISA-PC-RAM-Card-W-Gold-5150_ W0QQitemZ310017171151QQcmdZViewItem [13]: http://www.hardwaresecrets.com/fullimage.php?image=2856 [14]: http://www.memex.com.au/images/72psimm.jpg [15]: Ahn J.-H., „DRAM Operation & Architecture,” 2007. 9. 10., Hynix, http://netro.ajou.ac.kr/~jungyol/memory2.pdf [16]: http://www.twinmos.com/dram/dram_p_dt_ddr.htm#s [17]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg [18]: http://www.twinmos.com/dram/dram_p_dt_ddr3_1333.htm#s [19]: http:// item.express.ebay.com/16mb-EDO-3-3V-72-Pin-SODIMM-LAPTOP-RAM- LAPTOP-16mb-EDO_W0QQitemZ230060958674QQihZ013QQcmdZExpressItem [20]: http:// www.twinmos.com/dram/dram_p_nb_sdr_sodimm.htm [21]: http:// www.cdw.com/shop/products/default.aspx?EDC=915882 [22]: http:// laptoping.com/category/laptop-memory [23]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg [24]: http://www.twinmos.com/dram/images/photo_dt_ddr2.jpg

98 [25]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/sdram/ SD18C32_64_128x72D.pdf [25]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/sdram/ sd18c32_64_128x72.pdf [26]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr/ DDF18C64_128x72D.pdf [27]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr2/ HTF18C64_128_256x72D.pdf [28]: Datasheet, Micron, http://download.micron.com/pdf/datasheets/modules/ddr3/ JSF18C256x72PD.pdf [29]: PLL Clock Driver for 2.5V DDR-SDRAM Memory, Datasheet, Pericom, Febr. 2003, http://www.pericom.com/pdf/datasheets/PI6CV857.pdf [30]: PC2100 and PC1600 DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Revison 1.3, Jan. 2002, http://www.jedec.org/download/search/4_20_04R13.PDF [31]: Supermicro Motherboards, http://www.supermicro.com/products/motherboard/ [32]: http://www.pricegrabber.com/search_getprod.php/masterid=3191326 [33]: Definition of CDCV857 PLL Clock Driver for Registered DDR DIMM Applications, JESD82, JEDEC, July 2000 4. References (3)

99 [34]: http://www.tranzistoare.ro/datasheets2/32/327037_1.pdf [35]: DRAM Pricing – A White Paper, Tachyon Semiconductors, http://www.tachyonsemi.com/about/papers/DRAM%Pricing.pdf [36]: 16 Mb Synchronous DRAM, MT48LC4M4A1/A2, MT48LC2M8A1/A2 Micron, http://datasheet.digchip.com/297/297-04447-0-MT48LC2M8A1.pdf [37]: Rhoden D., „The Evolution of DDR”, Via Technology Forum, 2005, http://www.via.com.tw/en/downloads/presentations/events/vtf2005/vtf05hd_inphi.pdf 4. References (4) [39]: Van Roon T., „What exactly is a PLL?,” April 2006, http://www.uoguelph.ca/~antoon/gadgets/pll/pll.html [38]: Haskill, „The Love/Hate relationship with DDR SDRAM Controllers,” Mosaid, Oct. 2006, http://www.mosaid.com/corporate/products-services/ip/ SDRAM_Controller_whitepaper_Oct_2006.pdf [40]: Choi J. H., „High Speed DRAM,” Memory Division, Samsung, 2004, http://asic.postech.ac.kr/1.Nrl/2.NRL%20Seminar/invitation/041208ChoiJH.pdf [42]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc. [41]: Introduction to Xilinx, Xilinx FPGA Design Workshop, http://www.eas.asu.edu/~kchatha/http://www.eas.asu.edu/~kchatha/ cse320_f07/xilinx_intro.ppt

100 [44]: Tam S., „Single Error Correction and Double Error Detection,”, XILINX Application Note XAP645 (v.2.2), Aug. 2006, http://www.xilinx.com/support/documentation/http://www.xilinx.com/support/documentation/ application_notes/xapp645.pdf [45]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page 4.20.4-1, Jan. 2002, http://www.jedec.org [46]: Understanding DDR3 Serial Presence Detect (SPD) Table, July 17, 2007, Simmtester, http://www.simmtester.com/PAGE/news/showpubnews.asp?num=153 [47]: DDR2 DIMM SPD Definition, August 25, 2006, http://docmemory.com/page/news/showpubnews.asp?num=141 [48]: Memory Module Serial Presence-Detect, TN-04-42, Micron, 2002 http://download.micron.com/pdf/technotes/TN_04_42_C.pdf [43]: 64-bit Flow-Thru Error Detection and Correction Unit, IDT49C466, Integrated Device Technology Inc., 1999, http://www.digchip.com/datasheets/parts/http://www.digchip.com/datasheets/parts/ datasheet/222/IDT49C466.php 4. References (5) [49] Intel 845 Chipset: 8245 Memory Controller Hub (MCH) for DDR, Datasheet, Jan. 2002, Intel, No. 298604-001 [51] Supermicro X6DH8-G2, X6DHE-G2 Mainboards User’s Manual, Rev. 1.1b, June 2007, SUPER MICRO Computer Inc. [50] Intel 975X Express Chipset: 82975X Memory Controller Hub (MCH), Datasheet, Nov. 2005, Intel, No. 310158-001

101 [54]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005,Introducing FB-DIMM Memory: Birth of Serial RAM? http://www.pcstats.com/articleview.cfm?articleid=1812&page=1 [55]: PCI Technology overview, Febr. 2003, http://www.digi.com/pdf/prd_msc_pcitech.pdf [56]: DDR3 SDRAM, Samsung, http://www.samsung.com/global/business/semiconductor/ products/dram/Products_DDR3SDRAM.html [57]: Le H. Q. et al., „IBM POWER6 microarchitecture,” IBM J. R&D, Vol. 51, No. 6, 2007. pp 639-662 [58]: Kanter D., „Inside Barcelona: AMD's Next Generation,” May 2007, http://www.realworldtech.com/includes/templates/articles.cfm? ArticleID=RWT051607033728&mode=print [59]: Golla R., „Niagara2: A Highly Threaded Server-on-a-Chip,” Oct. 2006 http://www.opensparc.net/pubs/preszo//06/04-Sun-Golla.pdf [60]: Hofstee P., „Tutorial: Hardware and Software Architectures for the CELL BROADBAND ENGINE processor”, IBM Corp., September 2005 http://www.crest.gatech.edu/conferences/cases2005/pdf/Cell-tutorial.pdfhttp://www.crest.gatech.edu/conferences/cases2005/pdf/Cell-tutorial.pdf 4. References (6) [52]: Intel® 5000P/5000V/5000Z Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2006. http://www.intel.com/design/chipsets/datashts/313071.htm [53]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, http://www.intel.com/design/chipsets/datashts/313082.htm

102 [63]: http://www.supermicro.com/manuals/motherboard/E7221/MNL-0776.pdf [65]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, http://www.intel.com/design/chipsets/datashts/313082.htm [64]: http://www.supermicro.com/manuals/motherboard/7300/MNL-0955.pdf [66]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture,”, Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf [67]: 204-Pin DDR3 SDRAM Unbuffered SO-DIMM Design Specification, JEDEC Standard No. 21C, Page 4.20.18-1 [68]: Jacob B. & Wang D., „Memory Systems: Circuits, Architecture and Performance Analysis,” Lecture notes, University of Maryland, ENEE759H, Spring 2005 [69]: Datasheet, http://download.micron.com/pdf/datasheets/modules/sdram/http://download.micron.com/pdf/datasheets/modules/sdram/ SD9C16_32x72.pdf [70]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, http://www.pericom.com/pdf/applications/AN037.pdf 4. References (7) [61]: 915 P/G Combo Mainboard (MS-7058) Manual, Mai 2004, MSI [62]: Haas J. & Vogt P., „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazin, http://www.intel.com/technology/magazine/http://www.intel.com/technology/magazine/ computing/fully-buffered-dimm-0305.htm


Download ppt "Dezső Sima September 2008 (Ver. 1.0)  Sima Dezső, 2008 1. Macroarchitecture and performance parameters of MMs."

Similar presentations


Ads by Google