Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

Slides:



Advertisements
Similar presentations
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
Advertisements

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Discussion of: “Terrestrial-based Radiation Upsets: A Cautionary Tale” CprE 583 Tony Kuker 12/06/05.
® 1 Shubu Mukherjee, FACT Group Radiation-Induced Soft Errors: An Architectural Perspective Shubu Mukherjee 1, Joel Emer 2, & Steven. K Reinhardt 1,3 “If.
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin, and Sule Ozev.
2007 MURI Review The Effect of Voltage Fluctuations on the Single Event Transient Response of Deep Submicron Digital Circuits Matthew J. Gadlage 1,2, Ronald.
School of Computing Exploiting Eager Register Release in a Redundantly Multi-threaded Processor Niti Madan Rajeev Balasubramonian University of Utah.
Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer.
Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin and Sangyeun Cho Dept. of Computer Science University.
® 1 ISCA 2004 Shubu Mukherjee, FACT Group, MMDC, Intel Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor Techniques to Reduce.
® 1 Shubu Mukherjee, FACT Group Cache Scrubbing in Microprocessors: Myth or Necessity? Practical Experience Report Shubu Mukherjee Joel Emer, Tryggve Fossum,
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014 MICRO-47 Cambridge UK Prashant Nair - Georgia Tech David.
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
Nak Hee Seong Sungkap Yeo Hsien-Hsin S. Lee
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
March 16-18, 2008SSST'20081 Soft Error Rate Determination for Nanometer CMOS VLSI Circuits Fan Wang Vishwani D. Agrawal Department of Electrical and Computer.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.
Min-Sheng Lee Efficient use of memory bandwidth to improve network processor throughput Jahangir Hasan 、 Satish ChandraPurdue University T. N. VijaykumarIBM.
Cluster Prefetch: Tolerating On-Chip Wire Delays in Clustered Microarchitectures Rajeev Balasubramonian School of Computing, University of Utah July 1.
1 Razor: A Low Power Processor Design Presented By: - Murali Dharan.
1 Enhancing Random Access Scan for Soft Error Tolerance Fan Wang* Vishwani D. Agrawal Department of Electrical and Computer Engineering, Auburn University,
1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
Low Voltage Low Power Dram
Physical Memory and Physical Addressing By: Preeti Mudda Prof: Dr. Sin-Min Lee CS147 Computer Organization and Architecture.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
Case Study - SRAM & Caches
Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
Determining the Optimal Process Technology for Performance- Constrained Circuits Michael Boyer & Sudeep Ghosh ECE 563: Introduction to VLSI December 5.
Transient Fault Detection via Simultaneous Multithreading Shubhendu S. Mukherjee VSSAD, Alpha Technology Compaq Computer Corporation.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Canary SRAM Built in Self Test for SRAM VMIN Tracking
CPEN Digital System Design
Preeti Ranjan Panda, Anant Vishnoi, and M. Balakrishnan Proceedings of the IEEE 18th VLSI System on Chip Conference (VLSI-SoC 2010) Sept Presenter:
Yun-Chung Yang TRB: Tag Replication Buffer for Enhancing the Reliability of the Cache Tag Array Shuai Wang; Jie Hu; Ziavras S.G; Dept. of Electr. & Comput.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Morgan Kaufmann Publishers
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Redundant Multithreading Techniques for Transient Fault Detection Shubu Mukherjee Michael Kontz Steve Reinhardt Intel HP (current) Intel Consultant, U.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
1 November 11, 2015 A Massively Parallel, Hybrid Dataflow/von Neumann Architecture Yoav Etsion November 11, 2015.
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.
33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.
CS203 – Advanced Computer Architecture
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
EE 653: Group #3 Impact of Drowsy Caches on SER Arjun Bir Singh Mohammad Abdel-Majeed Sameer G Kulkarni.
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Raghuraman Balasubramanian Karthikeyan Sankaralingam
Self-* Systems CSE 598B Paper title: Dynamically employing ECC in caches Presented by: Niranjan Soundararajan.
SE-Aware HPC Extension : Selective Data Protection for reducing failures due to soft errors 7/20/2006 Kyoungwoo Lee.
‘99 ACM/IEEE International Symposium on Computer Architecture
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Presentation transcript:

Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan

2 Abstract  On-chip caches are increasing in sizes –Protection needed in order to store correct data. –ECC serves as an efficient means to protect the data  ECC has its own overhead –Area: Extra space for its logic –Latency: ECC computations take time  This work deals with reducing latency involved in ECC computation. –Track the cache lines frequently accessed. –Dynamically turn ECC computation on and off for specific cache lines.

3 Background  Information redundancy Data in the processing core is protected by schemes such as RMT (Redundant Multi-threading) [1][2]. ECC protection is easy to implement for on-chip caches. Also size of the caches prevent them from being replicated [3].  Current evaluation shows raw FIT (Failures In Time) rate numbers for latches and SRAM cells to vary between – 0.01 FIT/bit. This value increases with elevation. At 1.5 km – FIT/bit is 3.5x while at 10 km (airplanes) – FIT/bit is 100x larger [4][5][6][7].

4 Background  As processor power dissipation becomes more and more important, supply voltages get reduced. This will greatly increase the FIT/bit [8][9].  “As an example, consider a 32 MB data cache. This cache has 222 quad-words. Let us assume that an SRAM cell has an average FIT rate of The single-bit FIT rate for the entire cache is * 222 * 72 = 3.02 * 105,i.e. the MTTF is 10 9 / (3.02 * 105) = 3311 hours” [3].  Consider the case of large multiprocessor systems with tens of megabytes of caches. Protection becomes an important issue if the systems are involved in critical computations like space research and flight control.

5 Background  All these data point out that cache data must be protected. ECC is the best way to protect SRAM.  This work addresses some of the problems related to applying ECC for caches that need to operate at low latency like the L1 caches.

6 Motivation  ECC overhead [10] –Increase in area due to circuitry – 11%(approx. 15mm 2 ) –Increase in latency – 10% (approx 5 ns)  Applications show temporal locality in accessing cache lines. By dynamically turning ECC on and off for cache lines, latency of cache access gets reduced. Since the frequency of operations is going to be high, the time between individual accesses is going to be less.

7 Motivation  Chance of error affecting data is less –Due to frequency of operations –Cache lines with high temporal locality is less compared to total number of cache lines.

8 Benchmarks

9

10 Benchmarks

11 Self-Tuning Implementation  Keep track of cache line access. After every 5000 cycles, tune the ECC of cache lines to turn on or off.  Overhead: –Keeping track of cache line access: Simple, fast counters make implementation easy. –Tuning ECC for lines: Simple average computation and turning ECC on for lines with more activity than the average.

12 Implementation  Implementation simplified if –Counters maintained for a set of cache lines. –ECC tuning done at this granularity.  Granularity can be at 10, 20 … 100 lines.

13 Self-tuning Results  From the graphs we see the temporal locality. Based on these results, ECC was turned off for the lines with high locality. BENCHMARKRELATIVE ACCESS FREQUENCY OVERHEAD WUPWISE8.30.5% VPR-ROUTE5.92.5% PARSER % PERLBMK6.32.5% GCC2.71.9% GZIP4.73.2%

14 Conclusion  ECC is indispensable as chip reliability reduces and maintaining correct data becomes an issue.  Processor-Memory bottleneck is an eternal issue. Increasing cache latency through ECC protection creates further problems.  This work tries to reduce cache (protected by ECC) latency using a scheme to dynamically turn ECC on and off.

15 References  [1] S. S. Mukherjee, M. Kontz, and S. K. Reinhardt, “Detailed Design and Evaluation of Redundant Multithreading Alternatives,” ISCA,  [2] S. S. Mukherjee, C. T. Weaver, J. Emer, S. K. Reinhardt, and T. Austin, “A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor,” MICRO, December  [3] S. S. Mukherjee, T. Fossum, J. Emer, and S. K. Reinhardt, “Cache Scrubbing in Microprocessors: Myth or Necessity?” 10th International Symposium on Pacific Rim Dependable Computing (PRDC), Papeete, Tahiti, March  [4] J.F.Ziegler, “Terrestrial cosmic rays,” IBM J. of Research and Development, pp. 19 – 39, Vol. 40, No. 1, Jan  [5] Y.Tosaka, S.Satoh, K.Suzuki, T.Suguii, H.Ehara, G.A.Woffinden, and S.A.Wender, “Impact of Cosmic Ray Neutron Induced Soft Errors, on Advanced Submicron CMOS circuits,” VLSI Symposium on VLSI Technology Digest of Technical Papers, 1996.

16 References  [6] T.Karnik, B.Bloechel, K.Soumyanath, V.De, and S.Borkar, “Scaling trends of Cosmic Rays induced Soft Errors in static latches beyond 0.18µ,” Symposium on VLSI Circuits Digest of Technical Papers,  [7] S.Hareland, J. Maiz, M.Alavi, K.Mistry, S.Walstra, and C.Dai, “Impact of CMOS Scaling and SOI on soft error rates of logic processes,” Symposium on VLSI Technology Digest of Technical Papers,  [8]Robert Baumann, “Soft Errors in Commercial Semiconductor Technology: Overview and Scaling Trends,” IEEE 2002 Reliability Physics Tutorial Notes, Reliability Fundamentals, pp. 121_01.1 – 121_01.14, April 7,  [9] P.Shivakumar, M.Kistler, S.W.Keckler, D.Burger, and L.Alvisi, “Modeling the Effect of Technology Trends on the Soft Error Rate of Combinatorial Logic,” Dependable Systems and Networks,  [10] H. L. Kalter et al., “A 50-ns 16-Mb DRAM with a 10 ns data rate and on-chip ECC,” IEEE J. Solid-State Circuits, vol. 25, pp. 1118–1128, Oct