Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology.

Slides:



Advertisements
Similar presentations
DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08.
Advertisements

Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
MEMORY TECHNOLOGY FOR SMALL FORM FACTOR SYSTEMS
Chapter 5 Internal Memory
Memory Modules Overview Spring, 2004 Bill Gervasi Senior Technologist, Netlist Chairman, JEDEC Small Modules & DRAM Packaging Committees.
Double Data Rate SDRAM – The Next Generation An overview of the industry roadmap for main system memory technology, and details on DDR which represents.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
Anshul Kumar, CSE IITD CSL718 : Main Memory 6th Mar, 2006.
Accelerating DRAM Performance
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Low Power Memory. Quick Start Training Agenda What constitutes low power memory Variations & vendors of low power memory How to interface using CoolRunner-II.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
Serial Network SDRAM ENEE 759H Spring Introduction SDRAM system drawbacks  No parallelism for memory accesses  Multitude of pins for address/command/data.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
DRAM. Any read or write cycle starts with the falling edge of the RAS signal. –As a result the address applied in the address lines will be latched.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
 2003 Micron Technology, Inc. All rights reserved. Information is subject to change without notice. High Performance Next­ Generation Memory Technology.
Input/Output Systems and Peripheral Devices (03-2)
* Definition of -RAM (random access memory) :- -RAM is the place in a computer where the operating system, application programs & data in current use.
CSIT 301 (Blum)1 Memory. CSIT 301 (Blum)2 Types of DRAM Asynchronous –The processor timing and the memory timing (refreshing schedule) were independent.
DDR MEMORY  NEW TCEHNOLOGY  BANDWIDTH  SREVERS, WORKSTATION  NEXT GENERATION OF SDRAM.
Memory Technology “Non-so-random” Access Technology:
Spring 2007W. Rhett DavisNC State UniversityECE 747Slide 1 ECE 747 Digital Signal Processing Architecture SoC Lecture – Working with DRAM April 3, 2007.
SDRAM Synchronous dynamic random access memory (SDRAM) is dynamic random access memory (DRAM) that is synchronized with the system bus. Classic DRAM has.
Chapter 5-1 Memory System
Survey of Existing Memory Devices Renee Gayle M. Chua.
COMPUTER ARCHITECTURE (P175B125) Assoc.Prof. Stasys Maciulevičius Computer Dept.
A+ Guide to Managing and Maintaining Your PC Fifth Edition Chapter 6 Managing Memory.
Chapter 5 Internal Memory. Semiconductor Memory Types.
San Jose January 23-24, 2001 Taipei February 14-15, 2001 DDR Penetrates Mobile Computing Bill Gervasi Technology Analyst Chairman, JEDEC Memory Parametrics.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
A+ Guide to Managing and Maintaining your PC, 6e Chapter 7 Upgrading Memory (v0.1)
CPEN Digital System Design
University of Tehran 1 Interface Design DRAM Modules Omid Fatemi
Memory Chapter 6. Objectives After completing this chapter you will be able to Differentiate between different memory technologies Plan for a memory installation.
Dynamic Random Access Memory (DRAM) CS 350 Computer Organization Spring 2004 Aaron Bowman Scott Jones Darrell Hall.
Computer Architecture Lecture 24 Fasih ur Rehman.
Semiconductor Memory Types
1 COMPUTER ARCHITECTURE (for Erasmus students) Assoc.Prof. Stasys Maciulevičius Computer Dept.
The Evolution of Dynamic Random Access Memory (DRAM) CS 350 Computer Organization and Architecture Spring 2002 Section 1 Nicole Chung Brian C. Hoffman.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
1 Memory Hierarchy (I). 2 Outline Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Suggested Reading: 6.1.
1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
“With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 7th Edition
HyperTransport™ Technology I/O Link
Reducing Hit Time Small and simple caches Way prediction Trace caches
William Stallings Computer Organization and Architecture 7th Edition
RAM Chapter 5.
Lecture 15: DRAM Main Memory Systems
William Stallings Computer Organization and Architecture 8th Edition
Lecture: Memory, Multiprocessors
Lecture: DRAM Main Memory
William Stallings Computer Organization and Architecture 8th Edition
Chapter 4: MEMORY.
Direct Rambus DRAM (aka SyncLink DRAM)
DRAM Hwansoo Han.
William Stallings Computer Organization and Architecture 8th Edition
Bob Reese Micro II ECE, MSU
Presentation transcript:

Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Motivations to introduce FB-DIMMs in servers/workstations Shortcommings of the stub-bus topology used with conventional DRAM architectures [2] Impedance discontinuities effect signal integrity [2] Stub-bus topology Data lines of the memory controller are electrically connected to the data lines of every DRAM device on the bus (memory channel) Memory channels may have 8 DIMMs with 8 DRAM devices/DIMM (i.e. 72 devices/channel) Heavy signal loading due to the large number of devices and impedance discontinuities on the bus limit the number of DRAM devices connected to the channel the more the higher the data rate

Figure: Scaling number of channels with memory hubs [7]. Two ranks of DRAM devices per DIMM is assumed. In the case of single rank per DIMM, while the number of DIMMs per channel may be doubled, the declining trend shown in the figure remains the same.

For higher DRAM speeds less DRAM devices can be connected per memory channel [2] Stub-bus channel capacity (device density x nr. of devices) has hit its ceiling [2] but increasing server performance doubles memory capacity demand about every two years [2]

from Jacob mem systems 2007

Increasing the number of memory channels Each DDR2 memory channel requires 240 pins

introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses) introduce full buffering (registered DIMMs buffer only addresses) CRC error checking (cyclic redundancy check) FB-DIMM technology (1) Principle of operation

Figure: FB-DIMM memory architecture [4] FB-DIMM technology (2)

Figure: Maximum supported FB-DIMM configuration [6] (6 channels/8 DIMMs)

Serial transmission between the North Bridge and the DIMMs (each bit needs a pair of wires) Read packets (frames, bursts): 168 bits (12 x 14 bits) 144 data bits (equals the number of data bits produced by a 72 bit wide DDR2 module (64 data bits + 8 ECC bits) in two memory cycles) 24 CRC bits. Every 12 cycles (that is every two memory cycles) constitute a packet. Write packets (frames, bursts): 120 bits (12 x 10 bits) 98 payload bits 22 CRC bits. Clocked at 6 x double pumped data rate e.g. for a DDR 667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz FB-DIMM technology (3) Implementation details (1) Number of seral links 14 read lanes (2 wires each) 10 write lanes (2 wires each)

98 payload bits. 2 frame type bits, 24 bits of command, 72 bits for data and commands, according to the frame type, e.g. 72 bits of data, 36 bits of data + one command or two commands. Commands row select, precharge, refresh, read, write etc. all commands include a 3-bit FB-DIMM module address to select one of 8 modules. FB-DIMM technology (4) Implementation details (2)

Read bandwidth: One FB-DIMM channel transfers in one frame (that is in 12 cycles): 128 data bits, + 16 ECC bits One frame lasts 2 memory cycles One DDR2 DIMM channel transfers in 2 memory cycles: 2 x 72 bits (2 x 64-bit data + 2 x 8-bit ECC) The read bandwidth of an FB-DIMM channel equals the bandwidth of a DDR2 channel Write bandwidth: The write bandwidth of an FB-DIMM channel is up to 0.5 x the read bandwidth. But FB-DIMMs allow simultan read and write operation FB-DIMM technology (5) Implementation details (3)

Source: PC stats FB-DIMM-4300 (DDR2-533 SDRAM); Clock Speed: 133MHz, Data Rate: 532MHz, Through-put 4300MB/s PC (DDR2-667 SDRAM); Clock Speed: 167MHz, Data Rate: 667MHz, Through-put 5300MB/s PC (DDR2-800 SDRAM); Clock Speed: 200MHz, Data Rate: 800MHz, Through-put 6400MB/s FB-DIMM data puffer Figure: Different implementations of FB-DIMMs FB-DIMM technology (6) (Advanced Memory Buffer, AMB) Manages the read/write operations of the module

(There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs) Figure: Block diagram of the AMB [3]

Necessary routing to connect the north bridge to the DIMM socket a) In case of a DDR2 DIMM (240 pins) b) In case of an FB-DIMM (69 pins) A 3-layer PCB is needed A 2-layer PCB is needed (but a 3. layer is used for power lines) Figure: PCB routing [4] FB-DIMM technology (7)

Figure: Latency and bandwith figures of different DRAM technologies for a mix of SPEC applications [5] FB-DIMM technology (8)

Advantage of FB-DIMMs vs DDR2 and DDR3 DIMMs more memory channels (up to 6) higher total bandwidth more DIMM modules (up to 8) per channel higher memory capacity (up to 192 GB) Disadvantage of FB-DIMMs vs DDR2 and DDR3 DIMMs higher latency and lower bandwidth figures for 4 to 8 DIMM modules higher cost higher dissipation less wires simplified PCB routing symultaneous read/write operation in a channel FB-DIMM technology (9) Pros and cons of FB-DIMMs (Typical dissipation figures: DDR2: about 5 W AMB: about 5 W DDR2 FB-DIMM: about 10 W)

Latency The other issue is potentially more troubling. Intel addressed this by not having the signals be stored and then retransmitted. The data travels along a special fast-pass- through channel in the buffer itself. This lessens much of the latency that would be induced by store and forward architectures.

Figure: FB-DIMM heat sinks (heat spreaders)

5/2006 Intel adopts it in its Bensley platform (5000) for DPs 8/2007 Sun introduces it in the Niagara II 9/2006 AMD has taken it off from their road map 9/2007 Intel uses it in the Caneland platform (7000) for MPs 2007 Major memory manufacturers intend to develop DDR3 DIMMs instead of DDR3 based FB-DIMMs FB-DIMM technology (10) Market penetration of the FB-DIMM technology Standardisation 3/2007 JESD205 DDR2 SDRAM Fully Buffered DIMM (FBDIMM) Design Specification DDR2-533, DDR2-667, DDR2-800 x72 ECC, 240 pin 256 Mb, 512 Mb, 1 Gb, 2 Gb, 4 Gb devices 1/2007 JESD 206 FBDiMM Architecture and Protocol

The key difference between DDR and DDR2 is that the DDR2 data bus is clocked at twice the speed of the memory cells, so four data words can be transferred in each memory cell cycle without speeding up the memory cells themselves.bus DDR2 vs (SDRAM) DDR FB-DIMM technology (11) Figure: Clocking schemes of the SDR, DDR and DDR2 SDRAM techologies [1]

Although introduced in Q at 200/266 MHz, initially DDR2 could not be competitive due to too high latency figures. As lower latency parts became available by the end of 2004 DDR2 became widespread. DDR2's bus frequency is boosted by electrical interface improvements, on-die termination, prefetch buffers and off-chip drivers. However, latency is greatly increased as a trade-off. The DDR2 prefetch buffer is 4 bits deep, whereas it is 2 bits deep for DDR (and 8 bits deep for DDR3). While DDR SDRAM has typical read latencies of between 2 and 3 bus cycles, early DDR2 may have read latencies between 4 and 6 cycles.prefetch bufferslatencyDDR3 MemoryTimingsLatencyBandwidth in dual-channel mode DDR400 SDRAM2.5–3–312.5 ns6.4 GB/sec DDR400 SDRAM2–3–210 ns6.4 GB/sec DDR533 SDRAM3–4–411.2 ns8.5 GB/sec DDR533 SDRAM2.5–3–39.4 ns8.5 GB/sec DDR2-533 SDRAM5–5–518.8 ns8.5 GB/sec DDR2-533 SDRAM4–4–415 ns8.5 GB/sec DDR2-533 SDRAM3–3–311.2 ns8.5 GB/sec DDR2-600 SDRAM5–5–516.6 ns9.6 GB/sec DDR2-600 SDRAM4–4–413.3 ns9.6 GB/sec Table: Burst timing, latency and bandwidth figures of DDR and DDR2 DRAM technologies [1]

Early DDR2-533 SDRAM modules available at the time of the announcement of i925 and i915 chipsets (6/2004) had timings (CAS Latency - RAS to CAS Delay - RAS Precharge Time). CAS latency (Column Address Select),(CL) the time delay (in number of clock cycles) between a memory chip is accessed for data and the first data bit becomes available For instance, after accessing a 400 MHz CL3 device, the first bit arrives in 3 x 2.5 ns = 7.5 ns

FB-DIMM technology () DDR2 has 240 pins instead of 168 pins used by DDR DIMMs Power savings are achieved primarily due to a drop in operating voltage (1.8 V compared to DDR's 2.5 V).

Official JEDEC Specifications DDR2DDR3 Rated Speed Mbps Mbps Vdd/Vddq1.8V +/- 0.1V1.5V +/ V Internal Banks48 TerminationLimitedAll DQ signals TopologyConventional TFly-by Driver ControlOCD CalibrationSelf Calibration with ZQ Thermal SensorNoYes (Optional) Source: Anandtech DDR3 Appeared mid 2007 e.g. in Intel’s P35 Bearlake Source: Wiki

5.2. Speed gap between processor and memory (1a) Figure 5.1a: DRAM types

5.2. Speed gap between processor and memory (1b) Figure 5.1b: Latency of DRAM chips

5.2. Speed gap between processor and memory (1c) Figure 5.1c: System-level memory latency in x86-based PCs

5.2. Speed gap between processor and memory (1d) Figure 5.1d: Latency of DRAM chips (in clock cycles)

Figure 5.2: Relative transfer rate of memories (D: dual channel) 5.2. Speed gap between processor and memory (2)

References [1]: Gavrichenkov I., „DDR2 vs. DDR: Revenge Gained,” Xbit Laboratories, 12/17/2004, [2]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture,”, Febr. 18, 2004, Intel Developer Forum, [3]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,” Febr. 18, 2004, Intel Developer Forum, [4]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazine, March 2005, pp. 1-7 [5]: Ganesh B., Jaleel A., Wang D., Jacob B., „Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling”, Proc. HPCA 2007 [6]: - „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005,Introducing FB-DIMM Memory: Birth of Serial RAM? [7]: Haas J. & Vogt P., „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazin, Technology Intel Magazin, technology/magazine/computing/fully-buffered-dimm-0305.htm