Dezső Sima September 2008 (Ver. 1.0)  Sima Dezső, 2008 1. Macroarchitecture and performance parameters of MMs.

Slides:



Advertisements
Similar presentations
Memory Modules Overview Spring, 2004 Bill Gervasi Senior Technologist, Netlist Chairman, JEDEC Small Modules & DRAM Packaging Committees.
Advertisements

CP1610: Introduction to Computer Components Primary Memory.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
LOGO.  Concept:  Is read-only memory.  Do not lose data when power is lost.  ROM memory is used to produce chips with integrated.
Anshul Kumar, CSE IITD CSL718 : Main Memory 6th Mar, 2006.
Accelerating DRAM Performance
Memory Chapter 3. Slide 2 of 14Chapter 1 Objectives  Explain the types of memory  Explain the types of RAM  Explain the working of the RAM  List the.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
EECC551 - Shaaban #1 Lec # 10 Fall Computer System Components SDRAM PC100/PC MHZ bits wide 2-way inteleaved ~ 900 MBYTES/SEC.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
DRAM. Any read or write cycle starts with the falling edge of the RAS signal. –As a result the address applied in the address lines will be latched.
EECC550 - Shaaban #1 Lec # 10 Spring Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
 2003 Micron Technology, Inc. All rights reserved. Information is subject to change without notice. High Performance Next­ Generation Memory Technology.
EECC551 - Shaaban #1 Lec # 10 Winter Computer System Components SDRAM PC100/PC MHZ bits wide 2-way inteleaved ~ 900 MBYTES/SEC.
Complete CompTIA A+ Guide to PCs, 6e Chapter 2: On the Motherboard © 2014 Pearson IT Certification
F1020/F1031 COMPUTER HARDWARE MEMORY. Read-only Memory (ROM) Basic instructions for booting the computer and loading the operating system are stored in.
* Definition of -RAM (random access memory) :- -RAM is the place in a computer where the operating system, application programs & data in current use.
Dezső Sima September 2008 (Ver. 1.0)  Sima Dezső, Synchronous memory modules.
Dezső Sima Fall 2007 (Ver. 1.0)  Sima Dezső, 2007 Multisocket system architectures.
CSIT 301 (Blum)1 Memory. CSIT 301 (Blum)2 Types of DRAM Asynchronous –The processor timing and the memory timing (refreshing schedule) were independent.
Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology.
Memory Technology “Non-so-random” Access Technology:
SDRAM Synchronous dynamic random access memory (SDRAM) is dynamic random access memory (DRAM) that is synchronized with the system bus. Classic DRAM has.
Chapter 1 Upgrading Memory Prepared by: Khurram N. Shamsi.
Types of RAM By Alysha Gould. TYPES OF RAM SIMM’S DIMM’S DRAM SDRAM RDAM VDRAM.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 8 – Memory Basics Logic and Computer Design.
Memory. Random Access Memory Defined What is memory? operating system and other system software that control the usage of the computer equipment application.
Computer Organization CSC 405 Bus Structure. System Bus Functions and Features A bus is a common pathway across which data can travel within a computer.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
Survey of Existing Memory Devices Renee Gayle M. Chua.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Complete CompTIA A+ Guide to PCs, 6e Chapter 2: On the Motherboard © 2014 Pearson IT Certification
COMPUTER ARCHITECTURE (P175B125) Assoc.Prof. Stasys Maciulevičius Computer Dept.
Main Memory -Victor Frandsen. Overview Types of Memory The CPU & Main Memory Types of RAM Properties of DRAM Types of DRAM & Enhanced DRAM Error Detection.
A+ Guide to Managing and Maintaining Your PC Fifth Edition Chapter 6 Managing Memory.
GSBUG Hardware Info SIG April 11, GSBUG Hardware Info SIG Agenda – April 11, :00 – 7:05 Administration 7:05 – 8:15 Featured Topic – System.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
Main Memory CS448.
A+ Guide to Managing and Maintaining your PC, 6e Chapter 7 Upgrading Memory (v0.1)
CPEN Digital System Design
Memory Chapter 6. Objectives After completing this chapter you will be able to Differentiate between different memory technologies Plan for a memory installation.
It is the work space for the CPU Temporary storage for data/programs the CPU is working with. Started as a collection of IC’s on Motherboard. Two main.
1.  RAM is our working memory storage. All the data, which the PC uses and works with during operation, are stored here.  Data are stored on drives,
Dynamic Random Access Memory (DRAM) CS 350 Computer Organization Spring 2004 Aaron Bowman Scott Jones Darrell Hall.
1 Chapter 2 Central Processing Unit. 2 CPU The "brain" of the computer system is called the central processing unit. Everything that a computer does is.
The Evolution of Dynamic Random Access Memory (DRAM) CS 350 Computer Organization and Architecture Spring 2002 Section 1 Nicole Chung Brian C. Hoffman.
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Memory Hierarchy (I). 2 Outline Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Suggested Reading: 6.1.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
Instructor: Syed Shuja Hussain Chapter 2: The System Unit.
Instructor: Chapter 2: The System Unit. Learning Objectives: Recognize how data is processed Understand processors Understand memory types and functions.
“With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
Types of RAM (Random Access Memory) Information Technology.
Instructor: Syed Shuja Hussain Chapter 2: The System Unit.
THE COMPUTER MOTHERBOARD AND ITS COMPONENTS Compiled By: Jishnu Pradeep.
5. Synchronous memory modules
Random Access Memory (RAM)
Types of RAM (Random Access Memory)
Computer Memory.
RAM Chapter 5.
Chapter 4: MEMORY.
Direct Rambus DRAM (aka SyncLink DRAM)
DRAM Hwansoo Han.
Bob Reese Micro II ECE, MSU
Presentation transcript:

Dezső Sima September 2008 (Ver. 1.0)  Sima Dezső, Macroarchitecture and performance parameters of MMs

Overview 1. Introduction 2. Macroarchitecture of main memories 3. Key performance parameters of main memories 4. References

General purpose main memories, i.e. main memories used in desktops, servers and laptops 1. Introduction (1) Scope

Figure: Main memories on motherboards Server [77] Desktop [32] 1. Introduction (2)

1. Introduction (3) Figure: Different kinds of memory modules

Layout of main memories Macroarchitecture of the main memory Layout of the memory modules Figure: Main dimensions of the layout of main memories 1. Introduction (4)

2. Macroarchitecture of main memories

2.1 Introduction 2.4 Number of memory controllers 2.3 Point of attachment 2.5 Number of memory channels 2.6 Attributes of memory channels 2.2 Attachment policy

L2 contr. Core L2 FSB c. FSB North Bridge Mem. channel Mem. modules L2 FSB c. FSB North Bridge Memory Macroarchitecture of main memories Example 1 Memory L2 contr. Core Processor Figure: Single channel main memory attached via the FSB and the north bridge 2.1. Introduction (1)

L2 contr. Core L2 Core FSB c. FSB North Bridge Mem. channels Mem. modules L2 contr. Core L2 Core FSB c. FSB North Bridge Memory Processor Figure: Dual channel main memory attached via the FSB and the north bridge Example Introduction (2)

IN (Xbar) B. c. M. c. IO-bus Core L2 Memory B. c. M. c. IO-bus Mem. channel Mem. modules Memory Processor IN (Xbar) Core L2 Figure: Single channel main memory attached via a dedicated memory controller Example Introduction (3)

IN (Xbar) Syst. Req. Queue B. c. M. c. IO-bus Core L2 Memory IN (Xbar) Syst. Req. Queue B. c. M. c. IO-bus Core L2 Mem. channels Mem. modules Memory Processor Figure: Dual channel main memory attached via a dedicated memory controller Example Introduction (4)

Macroarchitecture of main memories No. of mem. contr.s (in case of direct attachment) No. of mem. channels Attachment policyPoint of attachment Figure: Main dimensions of the macroarchitecture of main memories Attributes of mem. channels 2.1. Introduction (5)

Attachment policy Direct attachment Indirect attachment POWER4 (2001) UltraSPARC IV+ (2005) POWER5 (2005) Montecito (2006) UltraSPARC T1 (2005) UltraSPARC IV (2004) Athlon 64 X2 line (2005) PA-8800 (2004) PA-8900 (2005) Core Duo line (2006) Longer access times (~20-30%), Independency of memory technology and speed Shorter access times (~20-30%), Dependency of memory technology and speed POWER6 (2007) Figure: Attachment policy 2.2. Attachment policy (1) Attachment via the FSB and north bridge (mem. control hub) Attachment via mem. controller(s) Opteron line (2003) Barcelona (2007) Cell BE (2006)

L2 contr. Core L2 Core FSB c. FSB Core Duo (2006) Core 2 Duo (2006) IN (Xbar) System Request Queue B. c. M. c. HT-bus Athlon 64 X2 (2005) North Bridge Memory Figure:Indirect attachment of the main memory to the syst. architecture Figure: Direct attachment of the main memory to the syst. architecture Core L Attachment policy (2)

The highest cache level (via an IN) The point of attachment Between the two highest cache levels (via the IN connecting these levels) 2-level caches: 3-level caches: 2-level caches: 3-level caches: The IN connecting the L2 cache The IN connecting the L3 cache The IN connecting the L1 and L2 caches The IN connecting the L2 and L3caches The M. c is connected usually in this way if the highest cache level is exclusive. The M. c is connected usually in this way if the highest cache level is inclusive. L3 IN L3 M IN L3 L2 M IN L2 M IN1 L2 C C M Figure: Possible points of attachment of main memory to the system architecture 2.3. Point of attachment (1)

Data missing in L2/L3 (high traffic) L2 M.c. Replaced lines Replaced, modified data (low traffic) Lines missing in L2 are reloaded and deleted from L3 L3 Memory L2 IN L2 L3 M.c. L3 M.c. Memory Montecito (2006) POWER4 (2001) UltraSPARC IV+ (2004) POWER5 (2004) Interrelationsship between inclusion policy of L3 caches and point of attachment Memory L3 L2 Inclusive L3Exclusive L Point of attachment (2)

2.3. Point of attachment (3) Core L2 IL2 D L3 Core L2 IL2 D L3 FSB c. FSB Montecito (2006) L2 contr. Core L2 Core FSB c. FSB Athlon 64 X2 (2005)Core 2 Duo (2006) In case of a two-level cache hierarchyIn case of a three-level cache hierarchy IN (Xbar) Memory System Request Queue B. c. M. c. HT-bus L2 Core Figure: Examples for attaching memory via the highest cache level

2.3. Point of attachment (4) UltraSPARC T1 (2005)UltraSPARC IV+ (2005) In case of a two-level cache hierarchyIn case of a three-level cache hierarchy (exclusive L3) L2 M. c. B. c. L2 Core 7 M. c. Core 0 X b a r Memory JBus Core L3 tags/contr. L3 data Interconn. network M. c. Memory B. c. Fire Plane bus Core L2 Figure: Examples for attaching memory via the interconnection network connecting the two highest cache levels

Number ofmemory controllers (in case of direct attachment) Dual memory controllers Single memory controller Usual implementations POWER6 (2007) Figure: Number of memory controllers (in case of direct attachment) UltraSPARC T2 (2007) Quad memory controllers 2.4. Number of memory controllers (1) Barcelona (2007) E.g. POWER5 (2004) K8-based processors (2006) A few recent designs Typ. use Exceptional designs UltraSPARC T1 (2005)

Figure: Block diagrams of the POWER5 and POWER6 processors [57] 2.4. Number of memory controllers (2)

Figure: Block diagrams of AMD’s K8 and Barcelona processors [58] 2.4. Number of memory controllers (3)

Figure: Block diagram of the UltraSPARC 2 (Niagara-2) [59] 2.4. Number of memory controllers (4)

Number of memory channels (per north bridge/memory controller) Dual memory channels Single memory channel Quad memory channels E.g. Intel’s 845/848 chipset families for P4 desktops and earlier desktop chipsets Intel’s 865 and higher chipset families for P4 desktops, Intel’s P4 based DP server chipsets Intel’s 5000 (Bensley) and 7000 Caneland platforms for Core Duo DC and MC processors Figure: Number of memory channels supported per north bridge/memory controller 2.5. Number of memory channels (1) Typ. useEarly desktopsRecent desktops, single core DP/MP servers Recent DC and QC DP/MP servers with FB DIMM memory Cell BE

Figure: Block diagram of an early P4 desktop having a single memory channel (Intel 845 chipset) [49] 2.5. Number of memory channels (2) Example 1

Figure: Block diagram of a more advanced P4 desktop including dual memory channels (Intel’s 975 chipset) [50] 2.5. Number of memory channels (3) Example 2

Figure: Block diagram of an early P4-based DP server including dual memory channels (Supermicro’s E7520 chipset based X6DH8-G2/X6DHE-G2 motherboard) [51] Example Number of memory channels (4)

Memory Interface Controller (MIC) Dual XDR TM memory channels Interleaved adressing in the channels The MIC can be configured to support only a single channel ECC support ( bits) 2.5. Number of memory channels (5) Dual 36 bits wide XDR channels Figure: Basic blocks of the Cell BE processor [60] 3.2 Gb/s x 2 x 4 B = 25.6GB/s Memory bandwidth at 3.2 Gb/s transfer rate:

2.5. Number of memory channels (6) Remark In dual channel configurations (or in general, in case of multiple memory channels) a scheme is needed to define the allocation of memory addresses to the individual channels. Allocation of addresses to the individual channels Asymmetric mode Interleaved mode Addresses are allocated alternating to the channels at 64 B boundaries, assuming 64 B long cache lines. Two consecutive cache lines can be retrieved simultaneously. Both memory channels must be populated with modules having the same size (e.g. 1 GB). Provides maximum performance in real applications. Addresses start in the first channel and are allocated to this channel until the highest rank of this channel. Then addresses continue in the second channnel. No need to populate both channels, or populate them with the same size. In real applications, performance is limited to single channel performance. Figure: Address allocation alternatives to the individual channels

5000 (Dempsey, Netburst), DC 5100 (Woodcrest, Core 2), DC 5300 (Clowertown, Core 2), QC 2.5. Number of memory channels (7) FB-DIMM up to 64 GB Xeon In workstations the snoop filter eliminates snoop traffic to the graphics port 5000 (Blackford) Figure: Block diagram of Intel’s 5000 (Bensley) DP platform for DC/QC Core 2 Duo processors including quad memory channels [52] Example 4

FB-DIMM up to 512 GB 7200 (Tigerton DC, Core2), DC Xeon 7300 (Tigerton QC, Core2), QC 2.5. Number of memory channels (8) Figure: Block diagram of Intel’s 7300 (Bensley) MP platform for DC/QC Core 2 Duo processors including quad memory channels [53] Example 5

Figure: Maximum supported FB-DIMM configuration [54] (6 channels/8 DIMMs) Remark The FBI technology supports even 6 memory channels with 8 DIMMs each [54], nevertheless actual implementations support typically only four DIMMs Number of memory channels (9)

Attributes of memory channels Supported type of mem. modules Supported no. of mem. modules Supported no. of ranks per mem. module Supported attributes of DRAM devices Figures: Attributes of memory channels 2.6. Attributes of memory channels (1)

Suported type of memory modules Memory modules of different DRAM types Memory modules of the same DRAM type In order to provide a choice and evolution path in times of memory technology transfers (e.g. while DDR2 technology replaces DDR technology) DRAM type B DRAM type A Usual implementation E.g. DDR DDR2 Figure: Type of memory modules supported on the memory channel(s) 2.6. Attributes of memory channels (2)

Example Intel’s 915P/G chipsets support dual memory channels with either DDR or DDR2 technologies. Per channel a single memory module is supported (with one or two memory ranks on each). Accordingly, a mainboard based on the 915G chipset, such as MSI’s 915G Combo mainboard, is a designated as a combo mainboard Attributes of memory channels (3) Note: Motherboards allowing to choose from two different DRAM types are termed Combo boards.

Figure: MSI’s 915G Combo motherboard (based on Intel’s 915G chipset) [61] North bridge of the 915G chipset 4 DIMM slots 2.6. Attributes of memory channels (4)

Figure: DIMM slots of the MSI’s 915G Combo motherboard [61] DDR2 DDR Two DDR or DDR2 channels with a single DIMM slot on each channel 2.6. Attributes of memory channels (5)

Supported number of memory modules It depends on the DRAM connection technology DRAM speed Number of ranks mounted onto the memory module(s) Attributes of memory channels (6)

The maximum number of supported memory modules depends heavily on the memory connection technology, that is whether the modules are connected via a parallel bus (as in case of SDRAM, DDR, DDR2, DDR3 modules) or via a serial bus (like in case of FBDIMM modules). Number of memory modules supported per memory channel 1-4 memory modules 6-8 memory modules Modules connected via a parallel bus Modules connected via a serial bus E.g.SDRAM, DDR, DDR2, DDR3 modules FBDIMM modules Figure: Number of memory modules vs memory connection technology in synchronous DRAMs 2.6. Attributes of memory channels (7) Dependency on the memory connection technology

Remarks 1. Early chipsets supporting low speed 1 or 4 Byte wide asynchronous DRAMs often allowed 4 – 8 memory modules to attach Attributes of memory channels (8) 2. The Pentium processor provided a 64-bit wide datapath. So early (430 family) chipsets supported typically two pairs of 32-bit wide FPM/EDO modules.

skews jitter and reflections (caused by impedance mismatch while terminating transmission lines) Higher transfer rates limit the number of memory modules that can be supported on a memory channel Attributes of memory channels (9) For higher transfer rates Obviously, the more memory modules are present on a channel the serious signal integrity problems arise. impede more and more signal integrity. Dependency on the memory speed

Figure: Scaling down the number of supported DIMMs per channel with increasing data rates (assuming two ranks per DIMM) [62] 2.6. Attributes of memory channels (10)

Figure: Scaling down the number of PCI-X slots with increasing PCI-X bus speed [55] 2.6. Attributes of memory channels (11)

But increasing server performance doubles memory capacity demand about every two years [66] increasing device densities but decreasing number of modules supported for higher transfer rates by memory channels, Figure: Channel capacity of synchronous SDRAMs vs memory capacity demand [66] With the maximum memory capacity per memory channel remains roughly the same for synchronous SDRAM devices [66] Attributes of memory channels (12) Levelling off channel capacity for synchronous DRAMs

2.6. Attributes of memory channels (13) Increasing server capacity demand calls for memory technologies with higher capacity potential, such as DRAM technologies with serial bus connection, like FB-DIMM.

Dependency on the number of ranks mounted onto the memory modules Dual memory ranks mounted on the memory modules result in higher bus loading, and may reduce the maximum number of supported memory slots. E.g. the north bridge of Intel’ 815 chipset supports at 133 MHz memory speed up to three SDRAM DIMMs with just a single rank or up to two SDRAM DIMMs with dual ranks Attributes of memory channels (14)

Number of memory modules supported per memory channel 1-2 memory modules 6-8 memory modules Figure: Number of memory modules supported per memory channel by Intel’s P4/Core 2 Duo north bridges Desktops/ entry level servers Typical use 2.6. Attributes of memory channels (15) DP/MP servers with FBDIMM mem. modules

2.6. Attributes of memory channels (16) Figure: Example 1. P4 based desktop motherboard (MSI’s 915G Combo motherboard with Intel’s 915G chipset) [61] 4 DIMM slots Two DDR or DDR2 channels with a single DIMM slot on each channel DDR2 DDR

Figure: Example 2. P4-based entry-level DP server motherboard (Supermicro’s P8SCT with Intel’s E7221 chipset) [63] CPU MCH (E7221) 2.6. Attributes of memory channels (17) Two DDR2 channels with two DIMM slots on each channel Ch. ACh. B4 DIMM slots

Figure: Example 3. Block diagram of a Core 2 based four-processor MP server (Supermicro’s X7QC3 with Intel’s 7300 North bridge) [64] 2.6. Attributes of memory channels (18) 4 DDR2 FB-DIMM channels 6 DIMM slots on each channel

192 GB ATI ES1000 Graphics with 32MB video memory 7200 DC 7300 QC (Tigerton) Xeon SBE2 SB 7300 NB 2.6. Attributes of memory channels (19) Figure: Example 3. Core 2 based four-processor MP server motherboard (Supermicro’s X7QC3 with Inte’s 7300 North bridge) [64] 4 DDR2 FB-DIMM channels 6 DIMM slots on each channel

Figure: Example 4. Block diagram of Intel’s Core 2 based 7300 (Caneland) MP platform with the 7300 (Clarksboro) chipset (9/2007) [65] up to 512 GB 7200 (Tigerton DC, Core2), DC Xeon 7300 (Tigerton QC, Core2), QC 2.6. Attributes of memory channels (20) Four DDR2 FB-DIMM channels with 8 DIMM slots on each channel

Rank: logical unit A rank consists of a set of DRAM devices (of a given width) that are needed to achieve the expected data width of the memory module. E.g. a 64-bit wide rank consists of 8 8-bit wide or 4 16-bit wide DRAM devices. DRAM devices constituting a rank are mounted side by side onto a memory module. Optionally, a rank may include an additional DRAM device to hold ECC bits. All devices of a rank share the address and the command bus. All devices of a rank are selected by the same CS (Chip Select) signal, whereas different ranks have different CS signals. A memory rank is sometimes designated also as a row Attributes of memory channels (21) Supported number of ranks per memory module Memory module: physical unit A rank covers usually one side of the memory module (using x8 or x16 devices, but 64-bit wide ranks built up of x4 devices (16 devices) cover typically both sides.

2.6. Attributes of memory channels (22) Figure: Connecting ranks to the memory controller [68]

A memory module may contain a single rank on one of its sides a single rank on both of its sides two ranks, each one of its sides A memory module is basically a PC card that carries one or more ranks, and fits into a memory slot of the motherboard. Memory modules may be populated either on one side or on both sides. Memory module: physical unit 2.6. Attributes of memory channels (23)

Figure: Example 1: One 64-bit wide DDR3 SO-DIMM rank consisting of 4 16-bit DRAM devices, that are mounted on one side of the module [67] 2.6. Attributes of memory channels (24)

Figure: Example 2: One 64-bit wide DDR3 SO-DIMM rank consisting of 8 8-bit DRAM devices, that are mounted on both sides of the module [67] 2.6. Attributes of memory channels (25)

Figure: Example 3. Two 64-bit wide DDR3 SO-DIMM ranks, each consisting of 4 16-bit DRAM devices, that are mounted on both sides of the module [67] 2.6. Attributes of memory channels (26)

Supported number of ranks per memory module Dual ranks are supported per mem. module A single rank is supported per mem. module Figure: Supported number of ranks (rows) per memory module 2.6. Attributes of memory channels (27) Typical implementation In few cases, usually as a restriction for higher DRAM speeds Examples a) The north bridge of Intel’s 815 chipset supports up to three SDRAM-133 DIMMs with just a single rank or up to two SDRAM-133 DIMMs with dual ranks. up to three SDRAM-100 DIMMs with dual ranks or b) The north bridge of Intel’s P35 chipset for Core 2 Duo processors supports up to two DDR2-800/667 or DDR3 1066/800 DIMMs with dual ranks

Supported attributes of DRAM devices DRAM width DRAM density DRAM speed Figure: Supported attributes of DRAM devices 2.6. Attributes of memory channels (28) DRAM type

2.6. Attributes of memory channels (29) DRAM (1970) FBDIMM (2006) DRDRAM (1999) DDR3 (2007) DDR2 (2004) DDR (2000) SDRAM (1996) FPM (1983) FP (~1974) XDR (2006) 1 Year of intro. Asynchronous DRAMs Synchronous DRAMs DRAMs with parallel bus connection DRAMs with serial bus connection DRAM types ( for general use) 1 Used in the Cell BE and the PlayStation 3, but not yet in desktops or servers Main stream DRAM types Challenging DRAM types Figure: DRAM types for general use EDO (1995) (Described in Sections 4, 5, 6 of the Chapter DRAM devices)

DRAM width Most recent north bridges/memory controllers support x8 and x16 DRAM devices. DRAM density DRAM speed North bridges/memory controllers specify the width of supported DRAM devices. North bridges/memory controllers specify supported DRAM densities. Example 1 Also north bridges/memory controllers specify supported DRAM speeds. The north bridge of Intel’s 815 chipsets for Pentium 4 processors supports SDRAM devices with 16Mb/64Mb/128Mb/256Mb densities 2.6. Attributes of memory channels (30) Example 2 The north bridge of Intel’s Series 3 chipset family for Core Duo and Core Quad processors supports DDR2 and DDR3 devices with 512Mb and 1Gb densities..

5/0210/02 845GL 845GV 845G845E 845GE 400 MHz 533/400 MHz 10/02 845xx family (Brookdale) Single channel SDR/DDR SDRAM 5/02 FSB HT not supportedHT supported 845 5/02 10/02 845PE PC133, DDR 266/200 DDR 333/266 9/01 1/02 PC133 DDR 266/200 PC133, DDR 266/200 (unbuffered) HT support DRAM speed Features Memory MCH/GMCH Max. memory 2 GB 11/ Example: Supported DRAM speeds of the north bridges of Intel’s 845xx family of chipsets. Another example: The north bridge of Intel’s Series 3 chipsets for Core 2 Duo and Core 2 Quad processors support DDR2 devices with 667/800 MT/s or DDR3 devices with 800/1066 MT/s transfer rate Attributes of memory channels (31)

3. Key performance parameters of main memories

3.1 Memory capacity 3.3 Memory latency 3.2 Memory bandwidth

Memory capacity (CM) CM = n CU x n CH x n M x n R x C D n M : No. of memory modules per channel n CU : No. of north bridges/memory control units n CH : No. of memory channels per north bridge/control unit C R : Rank capacity (device density x no. of DRAM devices) with n R : No. of ranks per memory module E.g. The Core 2 based P35 chipset supports up to two memory channels with up to two dual-ranked memory modules per channel, with 8 x8 devices of 512 Mb or 1 Gb density per rank. The resulting maximum memory capacity is: CM max = 1 x 2 x 2 x 2 x 1 Gb x8 = 8 GB 3.1. Memory capacity (1)

3.1. Memory capacity (2) Crucial factors limiting the maximum capacity of main memories n M : No. of memory modules supported per memory channel C R : Rank capacity (device density x no. of DRAM devices/rank).

Number of memory modules supported per memory channel 1-4 memory modules 6-8 memory modules Modules connected via a parallel bus Modules connected via a serial bus SDRAM, DDR, DDR2, DDR3 modules FBDIMM modules Higher transfer rates limit the number of mem. modules typically to one or two. Figure: Number of memory modules supported by memory channel E.g Memory capacity (3)

Rank capacity (C R ) C R = n D x D with n D : Number of DRAM devices/rank D: Device density Number of DRAM devices/ rank E.g. A one-sided (single rank) DDR3 memory module built up of 8 devices Typically: up to Memory capacity (4)

Figure: Evolution of DRAM densities (Mbit) and no. of units shipped/year (Based on [35]) 3.1. Memory capacity (5) 256M 64K 16M 1G 4M 256K 64M 1M K Units 10 6 Year Density: ~4×/4Y Device density

Typical maximum main memory sizes(CMmax) of recent Core 2 based desktops Ranks include typically up to 8 DRAM devices. 2 memory channels 1 modules per channel dual ranked modules populated with 8 x8 DDR2 or DDR3 devices of 1 Gb density: C Mmax = 1 x 2 x 1 x 2 x 1 = 8 GB 4 memory channels 6 modules per channel dual ranked modules populated with 8 x8 FB-DIMM DDR2 devices of 4 Gb density: C Mmax = 1 x 4 x 6 x 2 x 4 = 192 GB Typical maximum main memory sizes of recent Core 2 based servers, assuming: 3.1. Memory capacity (6) assuming:

For the same number of control units/modules/ranks The rate of increasing DRAM densities In accordance with Moore’s law (saying that the transistor count per chip is doubling about every 24 month DRAM densities evolve about 4 x/ 4 years. the maximum size of main memories increases also about 4 x/4 years Memory capacity (7)

Bandwidth of memory systems Total bandwidth (BW) provided by a memory system: BW = n CU x n CH x T x W M T: Transfer rate of the module (no. of data transfers/sec) n CU : No. of north bridges/memory control units n CH : No. of memory channels per north bridge/control unit W M : Data width of the memory modules E.g. A memory system with a single, dual channel controller and 8 Byte wide DDR2 800 modules provides a total bandwidth of: BW = 1 x 2 x 800 x 8 MB/s = 12.8 GB/s Processors with increasing number of cores require obviously, increasingly higher memory bandwidth Memory bandwidth (8) with

Figure: The interpretation of t CCD [36] 3.2. Memory bandwidth (10) The min. column cycle time (t CCD ) of the memory cell array t CCD (Core column delay) is the min. time interval between consecutive Reads or Writes. Remark t CCD is designated also as the Read/Write command to Read/Write command delay

Figure: The evolution of the column cycle time (t CCD ) in different SDRAM types (ns) [37] 3.2. Memory bandwidth (11) ns Note: The min. column cycle time (t CCD ) of synchronous DRAMs is: SDRAM: 7.5 ns DDR/2/3 5 ns

The crucial factor limiting the memory bandwidth of the main memory: Transfer rate of the memory module (no. of data transfers/sec) The transfer rate of the memory module (T) equals the transfer rate of the DRAM devices used. T max = 1/t CCD x FW with t CCD : Min. column cycle time of the memory cell array FW: Fetch width of the memory cell array 3.2. Memory bandwidth (9) The peak transfer rate (T max ) of synchronous DRAM devices:

specifies how many times more bits the cell array fetches per column cycle then the data width of the device. E.g. an x4 DRAM chip with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array per column cycle. The fetch width (FW) of the memory cell array of synchronous DRAMs is: SDRAM: 1 DDR: 2 DDR2: 4 DDR3: 8 DRAM type FW 3.2. Memory bandwidth (12) The fetch width (FW) of the memory cell array

SDRAM: 1/7.5 x 1 = 133 MT/s DDR: 1/5 X 2 = 400 MT/s DDR2: 1/5 x 4 = 800 MT/s DDR3: 1/5 x 8 = 1600 MT/s The peak transfer rates of the different DRAM technologies are: T max = 1/t CCD x FW 3.2. Memory bandwidth (13)

3.2. Memory bandwidth (14) Transfer rate (MT/s) Year * * * * * * * * 20 * 1000 SDRAM ~ 10*/10years DDR 266 DDR2 533 SDRAM 100 DDR DDR2 667 DDR2 800 DDR 333 SDRAM 133 * DDR 400 Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets

Peak transfer rates evolve by ≈ 10x/10 years, that means doubling in 3-4 years Sources of the evolution the introduction of new syncronous DRAM technologies (SDRAM/DDR/DDR2/DDR3) The evolution of peak transfer rates of synchronous DRAMs 3.2. Memory bandwidth (15) More specifically the more and more advanced approaches to improve first of all signaling (by using SSTL_2/1.8/1.5, differential CK/DQS) synchronisation (by using source synchronisation, DLLs to align CK with DQs etc.) and line terminations (by using ODT, dynamic ODT, ZQ calibration etc.)

The evolution of processor clock frequencies vs transfer rates of main memories in mainstream processors 3.2. Memory bandwidth (16)

Figure: Evolution of clock frequencies in Intel’s desktop processors The evolution of processor clock frequencies (f C ) in desktops 3.2. Memory bandwidth (17)

3.2. Memory bandwidth (21) Transfer rate (MT/s) Year * * * * * * * * 20 * 1000 SDRAM ~ 10*/10years DDR 266 DDR2 533 SDRAM 100 DDR DDR2 667 DDR2 800 DDR 333 SDRAM 133 * DDR 400 Figure: The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets The evolution of peak transfer rates of synchronous DRAMs in Intel’s chipsets

Figure: Evolution of clock frequencies in Intel’s desktop processors The evolution of processor clock frequencies (f C ) in desktops between Memory bandwidth (19)

clock frequencies arose by a rate of ≈ 100x/10 years transfer rates of main memories only by a rate of ≈ 10x/10 years. In the time period of about Memory bandwidth (20)

clock frequencies arose by a rate of ≈ 100x/10 years transfer rates of main memories only by a rate of ≈ 10x/10 years. In the time period of about a strong motivation arose to increase the bandwidth of main memories by increasing the width of the datapath to the main memory, first of all by introducing dual memory channels. Dual memory channels became the commonplace even in desktops. the gap between clock frequencies and memory transfer rates became continuously wider. In this time period higher clock rates were the main source for higher proc. performance, but higher processor performance invokes higher memory traffic 3.2. Memory bandwidth (22) In this time period

Figure: Evolution of clock frequencies in Intel’s desktop processors after about 2003 The evolution of processor clock frequencies (f C ) in desktops after about Memory bandwidth (24)

After about 2003 however, clock frequencies became saturated (due to meeting the thermal wall), and single core processors represented the mainline until about the gap between clock frequencies and memory transfer rates became narrover. Nevertheless, beginning with ~ 2005 the era of multicores emerged with doubling the core count about every two years. A new scenario becomes dominant with steadily increasing bandwidth/transfer rate requirements Memory bandwidth (23) In the time period of about Beginning with about 2005

The status quo in increasing bandwidth/transfer rates 3.2. Memory bandwidth (25)

. double data rate SDRAM migration 3.2. Memory bandwidth (26) Figure: Evolution of the bandwidth of dual-channel synchronous DRAM memory systems [56]

Figure: Evolution of transfer rates (per pin bandwidth figures) of different DRAM types [40] 3.2. Memory bandwidth (27)

Device level memory latency System level memory latency 3.3. Memory latency (1) Memory latency

Figure: Estimated maximum and minimum read latencies of DRAM devices (ns) 3.3. Memory latency (2) 1 Read latency of DRAM, FPM, EDO and BEDO parts = t RAC (Row access time (time from row address until data valid)) Read latency of SDRAM parts = CL + t RCD (CAS Latency + Row to Column delay) 2 The 815 chipset supports SDRAMs while the 820 RDRAMs 3 A new revision of the 845 supports DDRs instead of SDRAMs 486 DXP PII PIII386 DX * PC AT * * * * * * * * * * 64 K 256 K 64 M Year processor Chipset Typ. DRAM chips (bits) (ns) FPM 4 M 1 M 16 M128 M 64 M 16 M 64 M 256 M FPM EDO SDRAM EDO SDRAM RDRAM 64 K FPM 64 K P4 128 M 256 M SDRAM Core2 512 M 1 G 2 G DDR2 * * * * * * K 256 M 512 M 1 G DDR DDR2 DDR3 DDR2 40 * Desktop DRAM type Read latency M 1 G M 512 M 1 G M RDRAM 128 M 256 M FPM EDO SDRAM 4 M 256 K FPM 1 M 440ZX 430VX 430FX 420TX 430LX 16 M 4 M 256 K * *

Figure : Estimated typical system-level memory latency in x86-based PCs (in ns) 486 DXPPPro PIIPIII 386 DX PC AT (286) (8088) P4 Memory latency ns * * * * * * Year * 160 * 110 * 85 * Core2 processor Chipset Typ. DRAM parts (bits) Desktop DRAM type 16 K DRAM 64 K DRAM 64 K 128 K 256 K 1 M DRAM FPM DRAM FPM 256 K FPM 4 M 1 M 256 K FPM 1 M 420TX 430LX 16 M 64 M EDO FPM EDO FPM SDRAM 4 M 430VX 430FX 16 M 4 M 64 M 128 M 16 M 64 M 256 M EDO SDRAM RDRAM SDRAM 64 M 128 M 256 M SDRAM DDR M 512 M 1 G M RDRAM 128 M 256 M ZX 512 M 1 G 2 G DDR2 256 M 512 M 1 G DDR DDR2 DDR3 DDR2 512 M 1 G RDRAM 3.3. Memory latency (3)

Figure 5.1c: System-level memory latencies in x86-based PCs (in proc. clock cycles) 486 DXPPPro PIIPIII 386 DX PC AT (286) (8088) P4 Core2 processor Chipset Typ. DRAM parts (bits) Desktop DRAM type 16 K DRAM 64 K DRAM 64 K 128 K 256 K 1 M DRAM FPM DRAM FPM 256 K FPM 4 M 1 M 256 K FPM 1 M 420TX 430LX 16 M 64 M EDO FPM EDO FPM SDRAM 4 M 430VX 430FX 16 M 4 M 64 M 128 M 16 M 64 M 256 M EDO SDRAM RDRAM SDRAM 64 M 128 M 256 M SDRAM DDR M 512 M 1 G M RDRAM 128 M 256 M ZX 512 M 1 G 2 G DDR2 256 M 512 M 1 G DDR DDR2 DDR3 DDR2 512 M 1 G Memory latency in proc. cycles Year * * * * * * * * * * RDRAM 3.3. Memory latency (4)

[1]: 64MB Apple G3 Beige 168p SDRAM DIMM, [2]: 4, 8 MEG x 32 DRAM SIMMs, Micron, [3]: 168 Pin, PC133 SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page [4]: 184 Pin Unbuffered DDR SDRAM DIMM Family, JEDEC Standard No. 21-C, Page [5]: Direct Rambus DRAMM RIMM Module, 512 MB, MC-4R512FKE6D, Elpida, [6]: DDR2 SDRAM UDIMM Features, Micron, [7]: DDR3 SDRAM UDIMM Features, Micron, [8]: DDR2 SDRAM FBDIMM Features, Micron, [9]: Torres G., „Memory Tutorial”, July 19, 2005, Hardwaresecrets, [10]: Besedin D., „First look at DDR3”, Digit-life, June 29, 2007, 4. References (1)

[11]: 4. References (2) [12]: W0QQitemZ QQcmdZViewItem [13]: [14]: [15]: Ahn J.-H., „DRAM Operation & Architecture,” , Hynix, [16]: [17]: [18]: [19]: item.express.ebay.com/16mb-EDO-3-3V-72-Pin-SODIMM-LAPTOP-RAM- LAPTOP-16mb-EDO_W0QQitemZ QQihZ013QQcmdZExpressItem [20]: [21]: [22]: laptoping.com/category/laptop-memory [23]: [24]:

[25]: Datasheet, Micron, SD18C32_64_128x72D.pdf [25]: Datasheet, Micron, sd18c32_64_128x72.pdf [26]: Datasheet, Micron, DDF18C64_128x72D.pdf [27]: Datasheet, Micron, HTF18C64_128_256x72D.pdf [28]: Datasheet, Micron, JSF18C256x72PD.pdf [29]: PLL Clock Driver for 2.5V DDR-SDRAM Memory, Datasheet, Pericom, Febr. 2003, [30]: PC2100 and PC1600 DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page , Revison 1.3, Jan. 2002, [31]: Supermicro Motherboards, [32]: [33]: Definition of CDCV857 PLL Clock Driver for Registered DDR DIMM Applications, JESD82, JEDEC, July References (3)

[34]: [35]: DRAM Pricing – A White Paper, Tachyon Semiconductors, [36]: 16 Mb Synchronous DRAM, MT48LC4M4A1/A2, MT48LC2M8A1/A2 Micron, [37]: Rhoden D., „The Evolution of DDR”, Via Technology Forum, 2005, 4. References (4) [39]: Van Roon T., „What exactly is a PLL?,” April 2006, [38]: Haskill, „The Love/Hate relationship with DDR SDRAM Controllers,” Mosaid, Oct. 2006, SDRAM_Controller_whitepaper_Oct_2006.pdf [40]: Choi J. H., „High Speed DRAM,” Memory Division, Samsung, 2004, [42]: Interfacing to DDR SDRAM with CoolRunner-II CPLDs, Application Note XAPP384, Febr. 2003, XILINC inc. [41]: Introduction to Xilinx, Xilinx FPGA Design Workshop, cse320_f07/xilinx_intro.ppt

[44]: Tam S., „Single Error Correction and Double Error Detection,”, XILINX Application Note XAP645 (v.2.2), Aug. 2006, application_notes/xapp645.pdf [45]: DDR SDRAM Registered DIMM Design Specification, JEDEC Standard No. 21-C, Page , Jan. 2002, [46]: Understanding DDR3 Serial Presence Detect (SPD) Table, July 17, 2007, Simmtester, [47]: DDR2 DIMM SPD Definition, August 25, 2006, [48]: Memory Module Serial Presence-Detect, TN-04-42, Micron, [43]: 64-bit Flow-Thru Error Detection and Correction Unit, IDT49C466, Integrated Device Technology Inc., 1999, datasheet/222/IDT49C466.php 4. References (5) [49] Intel 845 Chipset: 8245 Memory Controller Hub (MCH) for DDR, Datasheet, Jan. 2002, Intel, No [51] Supermicro X6DH8-G2, X6DHE-G2 Mainboards User’s Manual, Rev. 1.1b, June 2007, SUPER MICRO Computer Inc. [50] Intel 975X Express Chipset: 82975X Memory Controller Hub (MCH), Datasheet, Nov. 2005, Intel, No

[54]: „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005,Introducing FB-DIMM Memory: Birth of Serial RAM? [55]: PCI Technology overview, Febr. 2003, [56]: DDR3 SDRAM, Samsung, products/dram/Products_DDR3SDRAM.html [57]: Le H. Q. et al., „IBM POWER6 microarchitecture,” IBM J. R&D, Vol. 51, No. 6, pp [58]: Kanter D., „Inside Barcelona: AMD's Next Generation,” May 2007, ArticleID=RWT &mode=print [59]: Golla R., „Niagara2: A Highly Threaded Server-on-a-Chip,” Oct [60]: Hofstee P., „Tutorial: Hardware and Software Architectures for the CELL BROADBAND ENGINE processor”, IBM Corp., September References (6) [52]: Intel® 5000P/5000V/5000Z Chipset Memory Controller Hub (MCH) – Datasheet, Sept [53]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007,

[63]: [65]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, [64]: [66]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture,”, Febr. 18, 2004, Intel Developer Forum, [67]: 204-Pin DDR3 SDRAM Unbuffered SO-DIMM Design Specification, JEDEC Standard No. 21C, Page [68]: Jacob B. & Wang D., „Memory Systems: Circuits, Architecture and Performance Analysis,” Lecture notes, University of Maryland, ENEE759H, Spring 2005 [69]: Datasheet, SD9C16_32x72.pdf [70]: Solanki V., „Design Guide Lines for Registered DDR DIMM Module,” Application Note AN37, Pericom, Nov. 2001, 4. References (7) [61]: 915 P/G Combo Mainboard (MS-7058) Manual, Mai 2004, MSI [62]: Haas J. & Vogt P., „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazin, computing/fully-buffered-dimm-0305.htm