A brief introduction to Memory system proportionality and resilience Mattan Erez The University of Texas at Austin.

Slides:



Advertisements
Similar presentations
Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011.
Advertisements

Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
Computer Organization and Architecture
Computer Organization and Architecture
CP1610: Introduction to Computer Components Primary Memory.
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
L18 – Memory Hierarchy 1 Comp 411 – Fall /30/2009 Memory Hierarchy Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity.
Chapter 5 Internal Memory
Computer Organization and Architecture
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
Module IV Memory Organization.
Virtualized and Flexible ECC for Main Memory
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Microprocessor-based systems Curse 7 Memory hierarchies.
CS1104: Computer Organisation School of Computing National University of Singapore.
Chapter 5 Internal Memory. Semiconductor Memory Types.
P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.
L/O/G/O Computer Memory Chapter 3 (a) CS.216 Computer Architecture and Organization.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Computer Architecture Lecture 24 Fasih ur Rehman.
Physical Memory and Physical Addressing By Alex Ames.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
1 Memory Hierarchy (I). 2 Outline Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Suggested Reading: 6.1.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Cache Advanced Higher.
William Stallings Computer Organization and Architecture 7th Edition
ESE532: System-on-a-Chip Architecture
Memory COMPUTER ARCHITECTURE
Yu-Lun Kuo Computer Sciences and Information Engineering
Toward Exascale Resilience Part 3: Fault and error modes and models
Morgan Kaufmann Publishers Memory & Cache
The Memory Hierarchy Chapter 5
William Stallings Computer Organization and Architecture 7th Edition
Lecture 15: DRAM Main Memory Systems
William Stallings Computer Organization and Architecture 8th Edition
Influence of Cheap and Fast NVRAM on Linux Kernel Architecture
Lecture: DRAM Main Memory
William Stallings Computer Organization and Architecture 7th Edition
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
William Stallings Computer Organization and Architecture 8th Edition
BIC 10503: COMPUTER ARCHITECTURE
Lecture: Memory Technology Innovations
Lecture 6: Reliability, PCM
Memory Organization.
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
2.C Memory GCSE Computing Langley Park School for Boys.
William Stallings Computer Organization and Architecture 8th Edition
Cache Memory and Performance
Memory Principles.
Presentation transcript:

A brief introduction to Memory system proportionality and resilience Mattan Erez The University of Texas at Austin

With support from –NSF Award # (c) Mattan Erez

The constraints : –Power/energy –Time –Money –Correctness (c) Mattan Erez

Can’t work harder – have to work smarter (c) Mattan Erez

Can’t work harder – have to work smarter Efficient & Proportional (c) Mattan Erez

Multiple constraints and needs  Heterogeneity asymmetry specialization (c) Mattan Erez

big.LITTLE and per core DvFS –Static and dynamic asymmetry Accelerators – Programmable and fixed-function (c) Mattan Erez

Throughput cores Latency cores Specialized cores  proportionality of compute (c) Mattan Erez

Great, we solved the easy part: proportional compute (c) Mattan Erez

Programming is hard Memory is hard (c) Mattan Erez

The memory challenge (outline) –Why do memory systems look the way they do? Efficiency and reliability Abstractions for architects –Role of heterogeneity and programs –Some interesting mechanisms –Where do we go from here? (c) Mattan Erez

Storage device options abound –SRAM –DRAM –FLASH –STT-/M- RAM –PCM, Memristor, RRAM, … (c) Mattan Erez

Architects should think in abstractions –Research should be general (c) Mattan Erez

The fundamentals Cheap Energy efficient High capacity Low latency High throughput Reliable (c) Mattan Erez

Memory must be cheap –Memory is already too expensive 15 – 50% of system cost (or more) –But, margins razor thin (commodity) (c) Mattan Erez

Memory must be energy efficient –Memory accounts for 15 – 60+% of “compute” energy and getting worse More capacity Proportional compute (c) Mattan Erez

Memory must be high capacity –Memory footprints keep growing More interesting data More data More specialized data structures (c) Mattan Erez

Memory must be high capacity Many chips (c) Mattan Erez

Memory must be high capacity Many chips Denser mem, Fewer chips (c) Mattan Erez

Memory must be high capacity Many chips Denser mem, Fewer chips Many dice, Fewer chips © Micron (c) Mattan Erez

Memory composed of –Multiple modules –Each module with N >=1 components –Lots of cells per component –Each cell >= 1 bit (c) Mattan Erez

Memory should be low latency –Latency == performance –Except when sufficient concurrency (c) Mattan Erez

Low latency conflicts with –Cheap –Capacity –Sometimes with Throughput Energy Reliability (c) Mattan Erez

Latency components: –Memory cell access –Data transfer –Queuing (c) Mattan Erez

Latency impacted by cell access denser/efficient  higher latency –Small cells  sensitive circuits  slow reads –Writes even more complicated + latency – density – cost – energy – latency + density + cost + energy (c) Mattan Erez

Caching to the rescue –Store some data somewhere fast (c) Mattan Erez

Short distances improve latency –It’s all about the wires (c) Mattan Erez

Short distances improve latency –It’s all about the wires + latency + energy – latency – energy (c) Mattan Erez

Short distances improve latency –It’s all about the wires + latency + energy – capacity – latency – energy + capacity (c) Mattan Erez

Hierarchy to the rescue (c) Mattan Erez

Hierarchy to the rescue Processor (c) Mattan Erez

Hierarchy to the rescue Processor (c) Mattan Erez

Memory system –Modules, packages, components –Arranged in a hierarchy Each memory component –Has hierarchy –Has caching/buffering (c) Mattan Erez

Latency impacted by queuing (c) Mattan Erez

Deep queues improve locality –Higher throughput –Higher efficiency –Often hurts latency (c) Mattan Erez

Memory must be high throughput –More and more latency hiding (c) Mattan Erez

Memory must be high throughput Many chips Fewer chips High capacity per pin requires efficient access (c) Mattan Erez

Packaging/interfaces improve throughput –3D integration –Limited capacity TSV Silicon Interposer (c) Mattan Erez

Packaging/interfaces improve throughput –3D integration –Limited capacity –More hierarchy levels TSV Silicon Interposer Heterogeneity! (c) Mattan Erez

Latency + throughput + efficiency  parallelism (c) Mattan Erez

Latency + throughput + efficiency  parallelism Access cell (c) Mattan Erez

Latency + throughput + efficiency  parallelism Access many cells in parallel Many pins in parallel (c) Mattan Erez

Latency + throughput + efficiency  parallelism Access even more cells in parallel Many pins in parallel over multiple cycles (c) Mattan Erez

Latency + throughput + efficiency  parallelism Access even more cells in parallel Many pins in parallel over multiple cycles (c) Mattan Erez

To recap –Locality –Hierarchy –Parallelism (c) Mattan Erez

Memory must be reliable –Written data must be recalled “correctly” (c) Mattan Erez

Reliability challenges –“Natural” change in cell value –Induced change in cell value –Read errors –Write errors –Wearout and defects (c) Mattan Erez

Natural causes of value change Probability correct Time Decay –Refresh (c) Mattan Erez

Natural causes of value change Probability correct Time Probability correct Time Decay –Refresh Flip –ECC + scrub (c) Mattan Erez

Natural causes of value change Probability correct Time Probability correct Time Decay –Refresh / scrub Flip –ECC + scrub (c) Mattan Erez

Natural causes of value change Probability correct Time Probability correct Time – Retention control Less dense Writes slower/higher-energy Can use different devices Decay –Refresh / scrub Flip –ECC + scrub (c) Mattan Erez

Induced change in value –Writing or reading disturbs nearby cells –Worse for writes –Technology and design dependent (c) Mattan Erez

Read errors –Because of small margins for efficiency (c) Mattan Erez

Write errors –Writing implies changing a state or value –Stochastic Probability correct Time (c) Mattan Erez

Write errors –Writing implies changing a state or value –Stochastic –Waiting inefficient and slow –Write error control conflicts w/ other errors Probability correct Time (c) Mattan Erez

Wearout and defects Probability correct Time Probability correct Time Probability correct Time Probability correct Time (c) Mattan Erez

Wearout and defects –Sparing –Extra margins –Retirement –Compensation (ECC) (c) Mattan Erez

Summary: no “good” memory –It’s all about tradeoffs (c) Mattan Erez

Example: DRAM –Efficient and reliable reads and writes –Leaky –Density problematic –Fairly fast reads and writes –Parallelism determined by cost –Fast decay (short retention) High variability –Rare, but important flips –Very rare disturbs –Slow wearout (c) Mattan Erez

Example: FLASH –High-energy writes –Very dense –Slow reads –Very slow writes –Parallelism determined by technology –Very slow decay (good retention) –Flips only on periphery –Write disturbs problematic –Fast wearout (c) Mattan Erez

Example: PCM –High-energy writes –Dense –Fast reads –Fast-ish writes –Parallelism determined by cost –Slow decay (OK retention) –Flips only on periphery –Write disturbs problematic –Fast wearout (c) Mattan Erez

Summary of heterogeneity: interrelated tradeoffs –Efficiency –Capacity –Latency –Bandwidth –Parallelism (granularity) –Reliability (c) Mattan Erez

The role of heterogeneous processors heterogeneous applications (c) Mattan Erez

Proportional compute –Latency oriented –Throughput oriented –Specialized (c) Mattan Erez

Application characteristics –Latency sensitive –Bandwidth sensitive –Few tasks or many tasks (c) Mattan Erez

Application compute styles –Batch –Realtime –Response time (c) Mattan Erez

Application data usage –Capacity –Locality –Precision –Persistence (c) Mattan Erez

Memory must be heterogeneous but illusion of homogeneity is nice (c) Mattan Erez

The solution: Adaptive proportional memory systems (c) Mattan Erez

The solution: Adaptive proportional memory systems (c) Mattan Erez

Adaptive reliability (c) Mattan Erez

Adaptive reliability –Adapt the mechanism –Adapt the error rate precision (stochastic precision) (c) Mattan Erez

Adaptive reliability –By type Application guided –By location (83) Machine-state guided (c) Mattan Erez

Example: Virtualized ECC –Adapt the mechanism (c) Mattan Erez

Physical Memory DataECC Virtual Address space Low Virtual Page to Physical Frame mapping Page frame – x Page frame – y Page frame – z Virtual page – x Virtual page – y Virtual page – z Protection Level Medium High (c) Mattan Erez

Physical Memory Data Virtual Address space Low Virtual Page to Physical Frame mapping Physical Frame to ECC Page mapping ECC Stronger ECC Page frame – x Page frame – y Page frame – z ECC page – y ECC page – z Virtual page – x Virtual page – y Virtual page – z Protection Level Medium High (c) Mattan Erez

Physical Memory DataT1EC Virtual Address space Low Virtual Page to Physical Frame mapping Physical Frame to ECC Page mapping T2EC for Chipkill T2EC for Double Chipkill Page frame – x Page frame – y Page frame – z ECC page – y ECC page – z Virtual page – x Virtual page – y Virtual page – z Protection Level Medium High (c) Mattan Erez

Two-Tiered ECC (c) Mattan Erez

Caching work great Normalized Execution Time (c) Mattan Erez

Bonus: flexibility of ECC Normalized Energy-Delay Product (c) Mattan Erez

Example: FREE-p –Adapt to wearout state (c) Mattan Erez

FREE-p insights: –Wearout is gradual –Adapt ECC strength to wearout degree –Limit ECC need with adaptive repair Retire and remap (c) Mattan Erez

Adapt ECC strength – Multi-tiered ECC (c) Mattan Erez

Different cells need different protection (c) Mattan Erez

Fine-grain retirement is key (c) Mattan Erez

Adaptive FG repair Page for remapping D/P flag ptr data (c) Mattan Erez

Impact of repair and remap is gradual (c) Mattan Erez

Adaptive granularity (parallelism) (c) Mattan Erez

Applications have diverse locality Low spatial localityHigh spatial locality~50% is referenced 1~23~67~8 (c) Mattan Erez

Coarse- or fine- grained memory access? CG-Only: Wide DRAM channel FG-Only: Many narrow channels (c) Mattan Erez

Coarse- or fine- grained memory access? D0 E0 Data C C C0C0 C0C0 C1C1 C1C1 D1 E1 D2 E2 D3 E3 C2C2 C2C2 C3C3 C3C3 C4C4 C4C4 D4 E4 CG FG C5C5 C5C5 C6C6 C6C6 C7C7 C7C7 D5 E5 D6 E6 D7 E7 ECC D0 E0 Data C C C0C0 C0C0 CG FG ECC (c) Mattan Erez

Dynamically adapt granularity  best of both worlds Processor Core Processor Core $I $D L2 Last Level Cache MC DRAM Core 0 Core 0 Core 1 Core 1 Core N-1 Core N-1 … SLP Co-scheduling CG/FG w/ split CG Sector cache (8 8B sectors per line) Sector cache (8 8B sectors per line) Sub-Ranked DRAM w/ 2X ABUS Support both CG and FG Sub-Ranked DRAM w/ 2X ABUS Support both CG and FG Locality Predictor (c) Mattan Erez

Subranked memory –Adapt parallelism (c) Mattan Erez DBUS x8 ABUS SR0SR1SR2SR3SR4SR5SR6SR7SR8 Reg/Demux

Sectored caches –Track granularity (c) Mattan Erez tag sector 0 D D V V sector 1 D D V V sector 2 D D V V sector 3 D D V V tag data D D V V Long cache line Sector cache tag data D D V V Short cache line

Granularity information? –Application provided (when allocated) –System learned (when accessed) (c) Mattan Erez

CG+ECC FG+ECC AG+ECC Dynamically adapt granularity PROPORTIONALITY (c) Mattan Erez

Memory is heterogeneous Applications are heterogeneous this is a good thing (c) Mattan Erez

Adaptivity is key (c) Mattan Erez

Sometimes need application help – Programming models and analyses crucial (c) Mattan Erez

Think abstractions rather than examples (c) Mattan Erez

Memory is heterogeneous Applications are heterogeneous this is a good thing Adaptivity is key Sometimes need application help – Programming models and analyses crucial Think abstractions rather than examples (c) Mattan Erez