San Jose January 23-24, 2001 Taipei February 14-15, 2001 An Analysis of Virtual Channel Memory and Enhanced Memories Technologies Bill Gervasi Technology.

Slides:



Advertisements
Similar presentations
Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The.
Advertisements

SE-292 High Performance Computing
Low Power Systems Using Transmeta Crusoe Processors Bill Gervasi Technology Analyst, Transmeta Chairman, JEDEC Memory Parametrics
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08.
Improving DRAM Performance by Parallelizing Refreshes with Accesses
Serial Presence Detect – Using It Effectively to Improve System Performance Bill Gervasi Technology Analyst
Chapter 4 Memory Management Basic memory management Swapping
Introduction to DDR SDRAM
1 Overview Assignment 4: hints Memory management Assignment 3: solution.
Cache and Virtual Memory Replacement Algorithms
Module 10: Virtual Memory
Chapter 10: Virtual Memory
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
ITEC 352 Lecture 25 Memory(2). Review RAM –Why it isnt on the CPU –What it is made of –Building blocks to black boxes –How it is accessed –Problems with.
DDR SDRAM The Memory of Choice for Mobile Computing Bill Gervasi Technology Analyst, Transmeta Corporation Chairman, JEDEC Memory Parametrics
SE-292 High Performance Computing
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
MEMORY TECHNOLOGY FOR SMALL FORM FACTOR SYSTEMS
Recent Progress In Embedded Memory Controller Design
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Double Data Rate SDRAM – The Next Generation An overview of the industry roadmap for main system memory technology, and details on DDR which represents.
Memory Design Considerations That Affect Price and Performance Bill Gervasi Technology Analyst, Transmeta Chairman, JEDEC Memory Parametrics
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
Accelerating DRAM Performance
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
Future Vision of Memory Modules for DRAM Bill Gervasi Vice President, DRAM Technology SimpleTech
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
Computer Architecture, Memory Hierarchy & Virtual Memory
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
ClickCAM Using Click for Exploring Power Saving Schemes in Router Architectures Jonathan Ellithorpe, Laura Keys CS 252, Spring 2009.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 Lecture 7: Caching in Row-Buffer of DRAM Adapted from “A Permutation-based Page Interleaving Scheme: To Reduce Row-buffer Conflicts and Exploit Data.
Embedded DRAM for a Reconfigurable Array S.Perissakis, Y.Joo 1, J.Ahn 1, A.DeHon, J.Wawrzynek University of California, Berkeley 1 LG Semicon Co., Ltd.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Computing Systems Memory Hierarchy.
Main Memory Background Random Access Memory (vs. Serial Access Memory) Cache uses SRAM: Static Random Access Memory –No refresh (6 transistors/bit vs.
Survey of Existing Memory Devices Renee Gayle M. Chua.
San Jose January 23-24, 2001 Taipei February 14-15, 2001 DDR Penetrates Mobile Computing Bill Gervasi Technology Analyst Chairman, JEDEC Memory Parametrics.
University of Tehran 1 Interface Design DRAM Modules Omid Fatemi
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
EEC4133 Computer Organization & Architecture Chapter 7: Memory by Muhazam Mustapha, April 2014.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
“With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
15-740/ Computer Architecture Lecture 25: Main Memory
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
ESE532: System-on-a-Chip Architecture
Cache Memory Presentation I
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
15-740/ Computer Architecture Lecture 19: Main Memory
DRAM Hwansoo Han.
Bob Reese Micro II ECE, MSU
Presentation transcript:

San Jose January 23-24, 2001 Taipei February 14-15, 2001 An Analysis of Virtual Channel Memory and Enhanced Memories Technologies Bill Gervasi Technology Analyst Chairman, JEDEC Memory Parametrics Committee

Agenda Next Generation PC Controllers Concerns With Standard SDRAM Cached SDRAMs: Enhanced & Virtual Channel Controller Complexity vs DRAM type Pros and Cons of Cached DRAMs Conclusions & Call to Action

Next Gen PC Controllers R1 Req Arb R2 R3 Path Arb Data FIFOs Command FIFOs DRAM I/O

Requestors CPU port: Cache fills dominate –DRAM frequency = 1/3 to 1/5 of CPU frequency –Big L2 caches randomize memory accesses Graphics port: Lots of random accesses –Especially 3D rendering South Bus port: Mix of short, long packets

SDRAM Roadmap DDR II DDR-333 DDR-266 PC-133 PC-100 SDRAM 66 The good news: ever faster cores, power manageable, simple evolutionary changes, at about the same price The bad news: random access time has not changed appreciably, and power is higher than it needs to be

Concerns With SDRAM 1.Power 2.Latency 3.Refresh Overhead

SDRAM Power vs Latency Active power is very high –Active on power 500X inactive off power –Encourages controllers to close pages But access time to a closed page is long –Row activation time + column read time

SDRAM Power Profile Relative Power CPU Clock Latency** Active on100%0 x 5 = 0 Inactive on 3 x 5 = 15 Active off 1 x 5 = 5 Inactive off0.2%4 x 5 = 20 Sleep 0.4%200 x 5=1000 Power State* 12% 4% * Not industry standard terms – simplified for brevity ** Assuming memory clock frequency = 1/5 CPU frequency Open Page Closed Page

Refresh Gigabit generation refresh overhead –256Mb generation is 75ns each 15.6us –1Gb generation will be 120ns each 7.8us This is a 3X performance penalty

An Argument for Cached SDRAM Architectures

What Are Cached SDRAMs? Narrow I/O channel DRAM ARRAY Wide I/O channel x4 to x32 x256 to x1024 SRAM ARRAY onchip

Cached DRAM architectures can address SDRAMs key limitations However, only commodity memories are affordable for mass market use I hope to encourage the adoption of cache for all standard future SDRAMs

Cached SDRAM Solutions 1.Power: Encourage closed page use 2.Latency: Fast access to closed pages 3.Refresh: Background operation

Cached DRAM SRAM cache before DRAM array –Much like CPU onchip caches –Exploit wide internal buses for fast block transfer of DRAM to SRAM and back –Allow DRAM core to be deactivated while… –… I/O performed on SRAM

Two Leading Cached DRAMs Enhanced SDRAM –SRAM in sense amps –Direct mapped into array Virtual Channel SDRAM –SRAM in periphery –DRAM associativity maintained by controller Note: Other cached DRAM architectures exist, however none have been proposed as a commodity DRAM standard.

Enhanced & Virtual Channel DRAM array Channel SRAM (1Kbx17) I/O Segment Sense Amps Sense Amps & SRAM (4Kbx4)

Basic Functions Enhanced –Activate –Read (& cache fill) –Write (array & cache update) –Precharge –Auto Refresh –Self Refresh Virtual Channel –Activate –Prefetch (cache fill) –Read (from cache) –Write (cache update) –Restore (array update) –Precharge –Auto Refresh –Self Refresh Note: Identical to standard SDRAM

Power Factors Enhanced –Page open to read to cache –Closed while reading –Open during writes Virtual Channel –Page open to read to cache –Closed while reading –Closed during writes –Reopen for restore

Latency Factors Enhanced –Read automatically fills cache –Reads on closed page permitted –No write latencies –Masking increases write recovery time (normal for SDRAM architectures) Virtual Channel –Prefetch for cache fill –Reads on closed page permitted –Writes on closed page permitted –Reactivation to restore –Inherently RMW – no penalty for masking

Refresh Factors Enhanced –Activation not allowed –Permits reads during auto refresh Virtual Channel –Activation not allowed –No reads during auto refresh

Lets Compare the Power and Activity Profiles of SDRAM, Enhanced, and Virtual Channel

Closed Page SDRAM Profile NOPACTRRPRENOP ACTWWPRENOP Command Activity Power Profile Command Activity Lower Power Higher Power

Enhanced Profile NOPACTR-PRERRNOP ACTWWPRENOP Lower Power Higher Power Command Activity Power Profile Command Activity

Virtual Channel Profile NOPACTF-PRERWNOP RWACTRSTNOP Lower Power Higher Power Command Activity Power Profile Command Activity

Power Profile Factors Standard L3 cache hit analysis Profile of memory operations affected by randomness of accesses –Balance of activations & precharges, reads & writes –Requestor channel profile depends on application – games, video, or office app?

Cache Entry Size Max words burst per hit before reload needed Entry size bus width burst length Miss impacts performance & power Affects controller association overhead Die size impacted by entry size Burst length = 4 x4x8x16 Enhanced Virtual643216

Randomness Enhanced controllers replace any entry –Physical memory locality determines Virtual controllers can lock channels, e.g. –Screen refresh channel never replaced –Priorities can be assigned to channels –Weighted replacement algorithms possible

Enhanced Controller Issues Direct mapping restricts complexity 4 physical banks x 4 cache lines Comparitor for cache hit

Virtual Controller Issues Tags required to do closed page writes –Keep track of where to restore 4 physical banks x 16 channels CAM needed to exploit large # channels –CAM routing challenging – metal over array?

Complexity vs Throughput THROUGHPUT COMPLEXITY VCM EMS SDRAM One entry hit logic Closed bank Visible refresh 2 cmd FIFO 1 entry per phys bank Write flush Direct map 3 cmd FIFO 4 entries per phys bank Scheduled writeback Hidden refresh 6-8 cmd FIFO Full CAM 16 channels per phys bank * 4 phys banks Scheduled writeback with reorder Gates 10-20K35-50K K 200+K

Brian University of Michigan analysis

Virtual Channel PROS –Best performance headroom from cache associativity –Flexible channel maps –Saves power on reads and writes –More hits in random access environment –Pin compatible in SDR & DDR I configurations CONS –Die penalty from 7-13% –Incompatible command set Cannot have one die for standard and virtual –Refresh blocks access –Simple controller performance < standard –Small cache entry sizes cause more replacement –2 banks = higher power

Enhanced Memory PROS –Die penalty from 2-5% –Compatible command set Same die supports standard or enhanced –Optional features –Simple to implement –Pin compatible –No performance drop even for simple controller –Refresh does not block reads –4 banks = lower power CONS –Royalty to DRAM suppliers –Performance boost lower than VCM max –Less flexible maps –No power savings on writes

User Perspective Users want a standard cached DRAM architecture – solves real system problems –Lower power for closed bank policy –Lower latency –Hidden refresh Costs cannot exceed 5% over non-cached

Conclusions Cached DRAM is highly desirable –Both Enhanced and Virtual offer significant benefits VCM requires more controller complexity –Low end not well served Next commodity must fit all markets Enhanced preferred over Virtual Channel –Better chance at becoming standard –Compatibility, simple transition lowers adoption risk –Like L2 went direct set associative, maybe VCM is a good fit for DDR III in 2006/2007

Call to Action Act now! Demand vendors provide cached DRAM Express support for Enhanced style Commit controller designs Industry likely to decide in March 2001

Thank You