LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Performance of Cache Memory
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Probabilistic Design Methodology to Improve Run- time Stability and Performance of STT-RAM Caches Xiuyuan Bi (1), Zhenyu Sun (1), Hai Li (1) and Wenqing.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1 Error Correction Coding for Flash Memories Eitan Yaakobi, Jing Ma, Adrian Caulfield, Laura Grupp Steven Swanson, Paul H. Siegel, Jack K. Wolf Flash Memory.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
Computer Architecture Lecture 26 Fasih ur Rehman.
Yun-Chung Yang SimTag: Exploiting Tag Bits Similarity to Improve the Reliability of the Data Caches Jesung Kim, Soontae Kim, Yebin Lee 2010 DATE(The Design,
Yun-Chung Yang TRB: Tag Replication Buffer for Enhancing the Reliability of the Cache Tag Array Shuai Wang; Jie Hu; Ziavras S.G; Dept. of Electr. & Comput.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Implicit-Storing and Redundant- Encoding-of-Attribute Information in Error-Correction-Codes Yiannakis Sazeides 1, Emre Ozer 2, Danny Kershaw 3, Panagiota.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
1 Lecture: Virtual Memory Topics: virtual memory, TLB/cache access (Sections 2.2)
CAM Content Addressable Memory
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Two Dimensional Highly Associative Level-Two Cache Design
CAM Content Addressable Memory
Multilevel Memories (Improving performance using alittle “cash”)
Basic Performance Parameters in Computer Architecture:
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Lecture: Large Caches, Virtual Memory
Computer Architecture & Operations I
Bojian Zheng CSCD70 Spring 2018
Lecture: DRAM Main Memory
Lecture 23: Cache, Memory, Virtual Memory
Module IV Memory Organization.
Lecture 22: Cache Hierarchies, Memory
Lecture 6: Reliability, PCM
Performance metrics for caches
Performance metrics for caches
Performance metrics for caches
Use ECP, not ECC, for hard failures in resistive memories
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Lecture 22: Cache Hierarchies, Memory
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Performance metrics for caches
Principle of Locality: Memory Hierarchies
Performance metrics for caches
Overview Problem Solution CPU vs Memory performance imbalance
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
Presentation transcript:

LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider March 23, 2010

The Problem  As CMOS technology shrinks, random defects increase  Traditionally, these defects handled with redundant rows, columns, and words to replace defective ones  As random defects increase, traditional defect strategy may no longer be sufficient

The Solution  Extend the role of Error-Correcting Codes to compensate for defects  Error-Correcting Codes (ECC) also used to compensate for transient soft errors  Find a method that allows ECCs to be used for both defects and soft errors

Multi-bit ECC  Multi-bit ECC – ECC that can correct multiple errors in one codeword  Suffers larger latency and higher coding redundancy than single error correction  Therefore unusable in L1 cache without suffering major performance issues

Overall Goal  Implement multi-bit ECC in L2 cache design to correct L2 cache defects without causing significant IPC degradation, area use, or energy cost

Steps to Success  1. Apply multi-bit ECC only to cache blocks that require it  2. Implement buffers to limit repeated use of multi- bit ECC  3. Ensure data integrity for soft errors where ECC can no longer alone compensate for it

Limited multi-bit ECC  Cache blocks with one or more defective cells identified during memory testing; Multi-bit ECC selectively applied then  Content-Addressable Memory (CAM) then used to identify blocks requiring multi-bit ECC (referred to as m-blocks)  ISSUE: CAM requires large energy consumption

Proposed Architecture  Standard L2 cache core protecting all subblocks with single error correction, double error detection (SEC- DED) codes  Multi-bit ECC core using fully associative multi-bit ECC cache (M-ECC cache), ECC encoder/decoder, and two buffers. M-ECC cache contains location tags and corresponding check bits  Dirty Replication Cache to ensure soft error tolerance

Proposed Architecture

Multi-bit ECC Core  In case of write, subblock data encoded and check bits stored  In case of read, check bits fetched and decoded  ISSUE: Constant use of multi-bit ECC will increase latency and energy consumption at higher defect densities  Solution: Two additional buffers

Multi-bit ECC Core Buffers  Pre-decoding Buffer: Small cache that keeps copies of mostly recently accessed m-blocks; Searched before accessing M-ECC cache  Employs least recently used (LRU) policy for replacement when full; Successful due to cache access temporal locality  Reduces large amount of ECC decoding and some M-ECC cache access

Multi-bit ECC Core Buffers  FLU buffer – small CAM that keeps addresses of recently accessed cache blocks that are NOT m- blocks  Also employs LRU policy  Further reduces M-ECC cache access

M-ECC core Flow Chart

Soft Error Tolerance  ISSUE: When ECC devoted to defect tolerance, defective subblock is vulnerable to soft errors  Only necessary for blocks containing defects (including blocks with single defects protected by SEC-DED rather than multi-bit ECC)  Further, only necessary when cache block is dirty; Clean blocks can redirect to memory when soft error detected

Dirty Replication Cache  Use of Dirty Replication (DR) cache  When cache block made dirty, data is also kept in this cache  When data leaves this cache, a write is performed to main memory  Ensures a backup is always available

Evaluation  Cache defect density set at 0.5%  Multi-ECC: BCH-based DEC-TED code (double error correction, triple error detection); Subblocks with more than two errors repaired by redundancy  Cache subblocks contain 64 bits  BCH DEC-TED decoder has parallelism of 2, uses PGZ decoding algorithm- resulting latency of 82 cycles  Cacti 5 used to model caches; Through verilog, determined extra logic is 0.2% of area of L2 cache core

Evaluation  Compared on four bases:  Base: Defect-free L2 cache with no defect tolerant functions  M-ECC only; No buffers  M-ECC-pbuf: Use of predecoding buffer  M-ECC-pfbuf: Use of predecoding and FLU buffers  First, determine best size of buffers for use; Then compare performance of IPC and power consumption

Size of precoding Buffer

Size of FLU buffer

Normalize IPC comparison

Normalized Power Consumption

Results  Similar IPC performance, M-ECC core power performance 30% of L2 cache core, which itself is about 10% of the entire system cache

DR Write-back hit rates  L2 cache fixed at 1 MB 8-way associative, DR varies

DR Write-back hit rates  DR fully associative with 64 blocks, 1 MB L2 cache varies

Conclusions  Goal was to effectively use multi-bit ECC for L2 cache defect tolerance at minimal performance and implementation cost  Multi-bit ECC implemented only where more than one defect found  Two small buffers included to reduce performance impact of multi-bit ECC  Dirty Replication Cache included to ensure soft error tolerance

Conclusions  IPC performance nearly the same as defect-free cache  M-ECC cache has less than 2.5% of area overhead and 36% of energy consumption overhead  Dirty replication cache has area overhead of only 0.3%, storing 96.4% of write-back data from L1 cache