Presentation is loading. Please wait.

Presentation is loading. Please wait.

LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.

Similar presentations


Presentation on theme: "LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider."— Presentation transcript:

1 LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider March 23, 2010

2 The Problem  As CMOS technology shrinks, random defects increase  Traditionally, these defects handled with redundant rows, columns, and words to replace defective ones  As random defects increase, traditional defect strategy may no longer be sufficient

3

4 The Solution  Extend the role of Error-Correcting Codes to compensate for defects  Error-Correcting Codes (ECC) also used to compensate for transient soft errors  Find a method that allows ECCs to be used for both defects and soft errors

5 Multi-bit ECC  Multi-bit ECC – ECC that can correct multiple errors in one codeword  Suffers larger latency and higher coding redundancy than single error correction  Therefore unusable in L1 cache without suffering major performance issues

6 Overall Goal  Implement multi-bit ECC in L2 cache design to correct L2 cache defects without causing significant IPC degradation, area use, or energy cost

7 Steps to Success  1. Apply multi-bit ECC only to cache blocks that require it  2. Implement buffers to limit repeated use of multi- bit ECC  3. Ensure data integrity for soft errors where ECC can no longer alone compensate for it

8 Limited multi-bit ECC  Cache blocks with one or more defective cells identified during memory testing; Multi-bit ECC selectively applied then  Content-Addressable Memory (CAM) then used to identify blocks requiring multi-bit ECC (referred to as m-blocks)  ISSUE: CAM requires large energy consumption

9 Proposed Architecture  Standard L2 cache core protecting all subblocks with single error correction, double error detection (SEC- DED) codes  Multi-bit ECC core using fully associative multi-bit ECC cache (M-ECC cache), ECC encoder/decoder, and two buffers. M-ECC cache contains location tags and corresponding check bits  Dirty Replication Cache to ensure soft error tolerance

10 Proposed Architecture

11 Multi-bit ECC Core  In case of write, subblock data encoded and check bits stored  In case of read, check bits fetched and decoded  ISSUE: Constant use of multi-bit ECC will increase latency and energy consumption at higher defect densities  Solution: Two additional buffers

12 Multi-bit ECC Core Buffers  Pre-decoding Buffer: Small cache that keeps copies of mostly recently accessed m-blocks; Searched before accessing M-ECC cache  Employs least recently used (LRU) policy for replacement when full; Successful due to cache access temporal locality  Reduces large amount of ECC decoding and some M-ECC cache access

13 Multi-bit ECC Core Buffers  FLU buffer – small CAM that keeps addresses of recently accessed cache blocks that are NOT m- blocks  Also employs LRU policy  Further reduces M-ECC cache access

14 M-ECC core Flow Chart

15 Soft Error Tolerance  ISSUE: When ECC devoted to defect tolerance, defective subblock is vulnerable to soft errors  Only necessary for blocks containing defects (including blocks with single defects protected by SEC-DED rather than multi-bit ECC)  Further, only necessary when cache block is dirty; Clean blocks can redirect to memory when soft error detected

16 Dirty Replication Cache  Use of Dirty Replication (DR) cache  When cache block made dirty, data is also kept in this cache  When data leaves this cache, a write is performed to main memory  Ensures a backup is always available

17

18 Evaluation  Cache defect density set at 0.5%  Multi-ECC: BCH-based DEC-TED code (double error correction, triple error detection); Subblocks with more than two errors repaired by redundancy  Cache subblocks contain 64 bits  BCH DEC-TED decoder has parallelism of 2, uses PGZ decoding algorithm- resulting latency of 82 cycles  Cacti 5 used to model caches; Through verilog, determined extra logic is 0.2% of area of L2 cache core

19 Evaluation  Compared on four bases:  Base: Defect-free L2 cache with no defect tolerant functions  M-ECC only; No buffers  M-ECC-pbuf: Use of predecoding buffer  M-ECC-pfbuf: Use of predecoding and FLU buffers  First, determine best size of buffers for use; Then compare performance of IPC and power consumption

20 Size of precoding Buffer

21 Size of FLU buffer

22 Normalize IPC comparison

23 Normalized Power Consumption

24 Results  Similar IPC performance, M-ECC core power performance 30% of L2 cache core, which itself is about 10% of the entire system cache

25 DR Write-back hit rates  L2 cache fixed at 1 MB 8-way associative, DR varies

26 DR Write-back hit rates  DR fully associative with 64 blocks, 1 MB L2 cache varies

27 Conclusions  Goal was to effectively use multi-bit ECC for L2 cache defect tolerance at minimal performance and implementation cost  Multi-bit ECC implemented only where more than one defect found  Two small buffers included to reduce performance impact of multi-bit ECC  Dirty Replication Cache included to ensure soft error tolerance

28 Conclusions  IPC performance nearly the same as defect-free cache  M-ECC cache has less than 2.5% of area overhead and 36% of energy consumption overhead  Dirty replication cache has area overhead of only 0.3%, storing 96.4% of write-back data from L1 cache


Download ppt "LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider."

Similar presentations


Ads by Google